KeiSeiKit-1.0/_blocks/obs-traces.md
Parfii-bot 0be354a920 KeiSeiKit-public — clean state
Single-commit clean baseline after security scrub of niche-tells,
project codenames, internal jargon, and contributor-email leaks.

Contents:
- 100 Rust crates (_primitives/_rust/)
- 37 agent manifests (_manifests/) + generated specs (_generated/)
- 67 user-invocable skills (skills/)
- 33 hooks (hooks/)
- Composition blocks (_blocks/)
- Documentation (docs/, README.md)
- TS adapter packages (_ts_packages/)
- Assembler (_assembler/)
- Roles (_roles/)
- Templates (_templates/)
- Forgejo CI (.forgejo/)

Author: Denis Parfionovich <info@greendragon.info>

License: see LICENSE.
2026-05-01 12:09:03 +08:00

3.5 KiB

OBSERVABILITY — Distributed traces (OpenTelemetry + W3C traceparent)

A trace is a tree of spans across services, stitched by trace_id. Without traces, a p99-latency investigation in a microservice topology is a guessing game. OpenTelemetry is the vendor-neutral standard; pick a backend later.

Core data model (OTel spec 1.37+):

Field Meaning
trace_id 16-byte hex (32 chars) — identifies the whole trace
span_id 8-byte hex (16 chars) — identifies one operation inside the trace
parent_span_id span_id of the caller (empty for root)
name Short operation name (GET /users/:id, db.query)
kind server / client / producer / consumer / internal
attributes Key-value metadata (http.method, db.system, net.peer.name)
status OK / ERROR + optional message
events Timestamped points inside the span (exceptions, annotations)
start_time / end_time nanosecond epoch

W3C Trace Context propagation (mandatory for cross-service traces):

  • Header: traceparent: 00-<trace_id>-<span_id>-<flags> [VERIFIED: www.w3.org/TR/trace-context/]
  • Example: traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
  • Optional tracestate: <vendor>=<value>,... for vendor-specific data
  • Every service MUST propagate both headers unchanged on outbound requests; extract on inbound to continue the trace.

Sampling strategies (traces are expensive at volume):

  • Head-based (decide at root): ParentBased(TraceIdRatioBased(p)) with p=0.01-0.10 typical.
  • Tail-based (decide after span completes): OTel Collector tail_sampling processor — keep ALL errors + slow traces + sample p=0.01 rest [VERIFIED: github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor].
  • Hybrid preferred: head-sample 100% in dev, tail-sample in prod.

Transport (OTLP — the OTel wire protocol):

  • OTLP/gRPC on port 4317 (default for app → collector, binary, efficient)
  • OTLP/HTTP on port 4318 (JSON / protobuf over HTTP, browser-friendly, firewall-friendly) [VERIFIED: opentelemetry.io/docs/specs/otlp/]
  • Collector is the choke point: apps ship OTLP → collector → backend (Jaeger, Tempo, Honeycomb, Datadog, Grafana Cloud).

Backends (pick by retention budget & query needs):

  • Jaeger — self-host, in-memory or Cassandra/Elasticsearch storage [VERIFIED: jaegertracing.io]
  • Tempo (Grafana) — self-host, object-storage backend, cheapest at scale, trace-id-only lookup [VERIFIED: grafana.com/docs/tempo/]
  • Vendor — Honeycomb / Datadog / Lightstep / Grafana Cloud (pay per GB, no ops)

Language bindings:

  • Rust: opentelemetry + opentelemetry-otlp + tracing-opentelemetry [VERIFIED: docs.rs/opentelemetry]
  • Go: go.opentelemetry.io/otel + auto-instrumentation for net/http, database/sql [VERIFIED: opentelemetry.io/docs/languages/go/]
  • Python: opentelemetry-sdk + opentelemetry-instrumentation-<lib> auto-loaders
  • Node/TS: @opentelemetry/sdk-node + @opentelemetry/auto-instrumentations-node

Log correlation: every log entry MUST include trace_id + span_id fields (see obs-structured-logs). One click in Grafana / Tempo from trace → logs.

Forbidden: rolling your own header format instead of W3C traceparent (breaks every off-the-shelf collector); sampling 100% in prod on >1k RPS service (cost + backend OOM); omitting kind on spans (breaks service-graph view); propagating tracestate across trust boundaries without validation (can be used for tracking).