Observability¶
tuvl ships a two-pillar observability stack: structured JSON logging (structlog) and
distributed tracing (OpenTelemetry). Both pillars are correlated — every log line
automatically carries the current trace_id and span_id so you can jump from a log
event straight to the trace in Jaeger or Grafana Tempo.
Production only
Span export and HTTP-level tracing are disabled in tuvl dev mode. Use tuvl run
to activate the full telemetry pipeline.
Structured Logging¶
tuvl uses structlog 25.5.0 for all internal logging. In production every log line is a single JSON object written to stdout; in development a human-friendly coloured renderer is used instead.
Log format¶
Production (TUVL_ENV ≠ development):
{
"event": "Agent LLM response",
"level": "info",
"timestamp": "2025-08-01T12:00:00.123456Z",
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_id": "00f067aa0ba902b7",
"step_id": "classify",
"model": "ollama/llama3",
"input_tokens": 312,
"output_tokens": 47
}
Development (TUVL_ENV=development): human-readable coloured output via structlog's
ConsoleRenderer.
Controlling the renderer¶
| Variable | Default | Description |
|---|---|---|
TUVL_ENV |
"" |
Set to development for the coloured console renderer |
OTel correlation¶
The inject_otel_context structlog processor injects trace_id and span_id from
the active OpenTelemetry span into every log record. Any log line emitted inside a
workflow.execute or node.<kind> span automatically carries both identifiers — no
extra instrumentation needed.
Standard library bridge¶
Python's logging module is bridged via structlog.stdlib.ProcessorFormatter so that
third-party libraries that use logging.getLogger(...) also produce correlated JSON
log lines.
Emitting structured logs from custom nodes¶
import structlog
log = structlog.get_logger(__name__)
class MyRunner:
async def run(self, context):
log.info("processing request", step_id=self.cfg["id"], items=len(context["items"]))
...
Distributed Tracing (OpenTelemetry)¶
tuvl emits OTel spans for every workflow execution and every node step. Spans are exported over gRPC (OTLP) to any compatible collector.
Span hierarchy¶
Each workflow invocation produces a parent span containing one child span per node:
workflow.execute (parent)
├── node.agent (child — agent step)
├── node.functional (child — functional step)
├── node.router (child — router step)
└── node.HumanInTheLoop (child — HITL step)
Valid node kinds: functional, agent, api_call, mcp, router, model-op,
response, HumanInTheLoop.
Span attributes¶
Parent span (workflow.execute):
| Attribute | Value |
|---|---|
tuvl.workflow.name |
metadata.name from the workflow YAML |
Child spans (node.<kind>):
| Attribute | Value |
|---|---|
tuvl.node.id |
Step id field |
tuvl.node.kind |
Step kind |
tuvl.step.signal |
Routing signal emitted by the step |
tuvl.step.duration_ms |
Wall-clock duration in milliseconds |
tuvl.context.snapshot |
JSON-serialised workflow context (secure fields masked) |
Secure field values appear as "*****" in the context snapshot. The set of secure
fields is populated from every model field with secure: true in its
ModelDefinition YAML. See
Data Masking for details.
HTTP / W3C traceparent¶
FastAPI is instrumented with FastAPIInstrumentor (production mode only). Incoming
requests that carry a traceparent header (W3C Trace Context) are automatically
linked as children of the upstream trace — enabling end-to-end context propagation
from your gateway or frontend to the workflow engine.
LiteLLM GenAI telemetry¶
tuvl registers LiteLLM's built-in OpenTelemetry callback at startup:
This emits gen_ai.* semantic-convention spans for every LLM call, giving you
per-model latency, token usage, and error rates in the same trace as the workflow
spans.
Configuration¶
tuvl resolves telemetry config in this order (first wins):
- Environment variables — always take precedence
.tuvl/telemetry.yaml— written by the Dev UI Settings → Telemetry panel- Compiled-in defaults —
enabled=true, endpointlocalhost:4317, servicetuvl
Config file¶
The file lives at <project>/.tuvl/telemetry.yaml and uses the standard tuvl
kind/version/metadata/spec envelope:
kind: TelemetryConfig
version: v1
metadata:
name: default
spec:
# Disable to suppress span export while keeping production mode active.
enabled: true
# gRPC endpoint of your OTLP collector.
# Common values:
# Jaeger all-in-one: http://localhost:4317
# Grafana Tempo: http://localhost:4317
# OpenTelemetry Collector: http://localhost:4317
otlp_endpoint: http://localhost:4317
# Attached to every span as service.name.
service_name: tuvl
tuvl init --sample writes this file automatically.
Environment variables¶
Environment variables override the config file at runtime — no restart needed for temporary changes:
| Variable | Default | Description |
|---|---|---|
TUVL_ENV |
"" |
development enables console log renderer; any other value uses JSON |
TUVL_TELEMETRY_ENABLED |
true |
Set to false to disable span export |
OTEL_EXPORTER_OTLP_ENDPOINT |
— | Standard OTel env var; takes precedence over TUVL_OTLP_ENDPOINT |
TUVL_OTLP_ENDPOINT |
http://localhost:4317 |
gRPC OTLP collector endpoint (fallback) |
TUVL_SERVICE_NAME |
tuvl |
service.name resource attribute on every span |
Dev UI¶
The .tuvl/telemetry.yaml file can also be edited from the Dev UI without touching
the file directly:
Settings → Observability → Telemetry
The panel shows a live YAML preview and an Advanced editor. Changes are saved
immediately but take effect only after restarting the engine with tuvl run.
Collector Setup¶
Jaeger (local development)¶
The quickest way to visualise traces locally:
docker run -d --name jaeger \
-p 4317:4317 \ # OTLP gRPC
-p 16686:16686 \ # Jaeger UI
jaegertracing/all-in-one:latest
Then set in .tuvl/telemetry.yaml:
Open http://localhost:16686 to browse traces.
Grafana Tempo¶
services:
tempo:
image: grafana/tempo:latest
ports:
- "4317:4317" # OTLP gRPC receiver
Point otlp_endpoint at http://tempo:4317 inside Docker, or
http://localhost:4317 from the host.
OpenTelemetry Collector¶
For production deployments that fan out to multiple backends:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
jaeger:
endpoint: jaeger:14250
prometheus:
endpoint: 0.0.0.0:8889
Data Masking¶
tuvl's masking layer runs before any context data is attached to a span. Secure fields
are identified by the secure: true flag on model fields defined in your project's
ModelDefinition YAMLs. At startup tuvl collects
every field name marked secure: true into the SECURE_FIELDS set.
Masked values appear as "*****" in the tuvl.context.snapshot span attribute. The
mask is applied recursively through nested dicts and lists.
To add a project-specific field to the secure set at runtime:
Disabling Telemetry¶
Set enabled: false in the config file or use the environment variable:
The engine logs OTel: telemetry disabled at startup and the TracerProvider is not
configured. All spans are no-ops (NonRecordingSpan). Structured logging continues to
work normally.