Blogs

Temporal + InsightFinder: LLM Observability for Agentic Workflows

Theresa Potratz

  • 4 Jun 2026
  • 5 min read

TL;DR: Temporal tracks if your AI workflows run, but it can’t tell you if your LLM outputs are actually good. By pairing them via OpenTelemetry, you automatically capture prompts, responses, and token usage to stop quiet model drift without changing your core workflow logic.

Temporal tells you what your AI agents did. InsightFinder tells you whether it was any good.

Most teams running LLM-based workflows on Temporal have solid visibility into execution — retries, failures, activity history. What they’re missing is a layer that looks at the actual LLM outputs: whether the model reasoned correctly, stayed on task, or started drifting in ways that don’t trigger any workflow error. That’s a different problem, and it requires different tooling. InsightFinder plugs into your existing Temporal setup via OpenTelemetry and gives you prompt/response monitoring, token tracking, and quality analysis without requiring you to rearchitect anything.

Feature What Temporal Covers What InsightFinder Covers
Focus Execution Reliability AI & Model Reliability
Core Metrics Timeouts, retries, activity completion Prompts, responses, token drift, quality

 

How the integration works

The connection happens through OpenTelemetry. Temporal’s SDK includes a TracingInterceptor that automatically creates spans for workflows and activities, maintains parent-child relationships across Temporal boundaries, and attaches standard attributes like workflow_id, run_id, and activity_type. You wire that interceptor to an OTLP exporter pointed at InsightFinder’s trace server, and your existing workflow execution traces start carrying LLM observability data.

The data flow looks like this:

How Temporal integrates with InsightFinder

 

There’s no separate instrumentation pipeline to maintain. The same trace that tells you an activity completed in 340ms also carries the prompt that went in and the response that came out.

Setting it up

Install four packages: temporalio, opentelemetry-api, opentelemetry-sdk, and opentelemetry-exporter-otlp-proto-grpc.

  1. Configure the Tracer Provider 

Configure the tracer provider. Your InsightFinder credentials go in as gRPC metadata headers on the OTLP exporter:

python

def setup_opentelemetry():

    resource = Resource.create({

        “service.name”: “agent-service-name”,

        “service.version”: “1.0.0”,

    })

 

    headers = {

        “ifuser”: “<INSIGHTFINDER_USER>”,

        “iflicensekey”: “<INSIGHTFINDER_LICENSE_KEY>”,

        “ifproject”: “<INSIGHTFINDER_PROJECT_NAME>”,

        “ifsystem”: “<INSIGHTFINDER_SYSTEM_NAME>”,

    }

 

    otlp_exporter = OTLPSpanExporter(

        endpoint=os.getenv(“OTEL_EXPORTER_OTLP_ENDPOINT”, “https://<customer>-trace.insightfinder.com”),

        headers=headers,

        insecure=False

    )

 

    provider = TracerProvider(resource=resource)

    provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

    trace.set_tracer_provider(provider)

 

    return trace.get_tracer(__name__)

  1. Connect the Temporal Client

Initialize the Temporal client with the interceptor. This is what activates distributed tracing across your workflows:

python

async def main():

    tracer = setup_opentelemetry()

 

    client = await Client.connect(

        os.getenv(“TEMPORAL_ADDRESS”, “localhost:7233”),

        interceptors=[TracingInterceptor(tracer)]

    )

 

    result = await client.execute_workflow(

        YourWorkflow.run,

        “your-input”,

        id=f”workflow-{uuid.uuid4()}”,

        task_queue=”your-task-queue”,

    )

  1. Inject LLM Attributes into Activities

Annotate your LLM activities. Inside any activity that calls an LLM, grab the current span and set attributes before and after the call. chat.prompt and chat.response are what InsightFinder uses to extract prompt/response pairs — everything else adds context:

python

@activity.defn

async def your_llm_activity(task: str) -> str:

    current_span = trace.get_current_span()

    user_prompt = f”Your prompt text here: {task}”

 

    if current_span:

        current_span.set_attribute(“chat.prompt”, user_prompt)

        current_span.set_attribute(“x-session-id”, activity.info().workflow_id)

 

    response = await your_llm_client.chat.completions.create(

        model=”gpt-4o”,

        messages=[{“role”: “user”, “content”: user_prompt}]

    )

 

    llm_response = response.choices[0].message.content

 

    if current_span:

        current_span.set_attribute(“chat.response”, llm_response)

        current_span.set_attribute(“chat.model”, response.model)

        current_span.set_attribute(“chat.prompt_tokens”, response.usage.prompt_tokens)

        current_span.set_attribute(“chat.completion_tokens”, response.usage.completion_tokens)

        current_span.set_attribute(“chat.total_tokens”, response.usage.total_tokens)

 

    return llm_response

Token usage attributes give you cost tracking. x-username and x-session-id are optional but worth adding if you want to correlate LLM behavior back to specific users or workflow runs.

What you’re looking at once it’s running

InsightFinder analyzes the prompt/response pairs flowing through your traces. In practice, that means you can track response quality trends over time, catch model drift before it surfaces as a downstream bug, and tie specific LLM behavior to specific workflow executions using the workflow_id Temporal carries through the span hierarchy.

A single annotated activity span ends up looking like this:

workflow-execution

  └── StartWorkflow:YourMainWorkflow

      └── RunActivity:your_llm_activity

            chat.prompt: “Analyze this task…”

            chat.response: “Based on analysis…”

            chat.prompt_tokens: 45

            chat.completion_tokens: 120

That’s a complete audit trail — not just that the activity ran, but what it said.

If you’re already on Temporal, the lift here is smaller than it probably looks. The interceptor handles span creation and propagation. Your changes are localized to the activities where LLM calls happen. Start a free InsightFinder trial to get your first agent connected and start seeing what your models are actually doing inside your workflows.

Contents

See how InsightFinder helps your team deliver reliable services across every layer of the stack

Take InsightFinder AI for a no-obligation test drive. We’ll provide you with a detailed report on your outages to uncover what could have been prevented.