KeiSeiKit-1.0/skills/observability-setup/SKILL.md
Parfii-bot 0be354a920 KeiSeiKit-public — clean state
Single-commit clean baseline after security scrub of niche-tells,
project codenames, internal jargon, and contributor-email leaks.

Contents:
- 100 Rust crates (_primitives/_rust/)
- 37 agent manifests (_manifests/) + generated specs (_generated/)
- 67 user-invocable skills (skills/)
- 33 hooks (hooks/)
- Composition blocks (_blocks/)
- Documentation (docs/, README.md)
- TS adapter packages (_ts_packages/)
- Assembler (_assembler/)
- Roles (_roles/)
- Templates (_templates/)
- Forgejo CI (.forgejo/)

Author: Denis Parfionovich <info@greendragon.info>

License: see LICENSE.
2026-05-01 12:09:03 +08:00

5.3 KiB
Raw Blame History

name description argument-hint
observability-setup Hub-and-spoke pipeline for installing the logs + metrics + traces triad on an existing service. Decomposes into 5 phases — scale/stack intake, code-side instrumentation, scrape+ship wiring, dashboard import, alert rules. Pure-click except for env-specific values (endpoints, tokens). Reuses `_blocks/obs-structured-logs.md`, `_blocks/obs-metrics.md`, `_blocks/obs-traces.md`, `_primitives/metrics-scrape.sh`, `_primitives/log-ship.sh`. <service-or-repo-name>

Observability-Setup — 5-Phase Pipeline (index)

See _blocks/pipeline-5phase-template.md for the 5-phase wizard contract and _blocks/rule-pure-click-contract.md for the AskUserQuestion rule. Skill-specific phase tables are inline below.

You are installing observability on an existing service or repo. The user tells you which service. You walk five phases, each with an AskUserQuestion click-batch. Every durable decision lands in a named file inside the target repo (observability.md, prometheus.yml, otel-collector.yaml, Grafana dashboard JSON, Alertmanager rules).

This SKILL.md is the INDEX. Each phase lives in its own file and runs in order. Never skip a phase — skipping Phase 4 gives you metrics with no dashboards; skipping Phase 5 gives you dashboards nobody watches.


Pipeline overview (5 phases + final report)

Phase File Purpose AskUserQuestion
1 phase-1-intake.md Scale / stack / log target click-batch 1× (3 questions)
2 phase-2-instrument.md Code-side SDK + config diff 1×
3 phase-3-scrape-ship.md Metrics scrape + log forward wiring 1×
4 phase-4-dashboards.md RED + USE + per-service dashboards 1×
5 phase-5-alerts.md Error rate / p99 latency / saturation 1×

Minimum AskUserQuestion count: 5. (Phase 1 bundles three related questions into one AskUserQuestion call with multiSelect per question, per native protocol.)


Variables the pipeline produces

Name Set in Meaning
SERVICE argument Service/repo name the user invokes the skill with
SCALE Phase 1 single-host / small-cluster / prod
STACK Phase 1 prom-grafana / otel-vendor / better-stack / custom
LOG_TARGET Phase 1 stdout-only / file / ship-loki / ship-datadog / ship-http
LANGUAGES Phase 2 Subset of {rust, go, python, node, swift} — SDKs to wire
SCRAPE_CFG Phase 3 prometheus.yml / otel-collector.yaml path
SHIP_CMD Phase 3 log-ship.sh invocation for the service
DASHBOARDS Phase 4 List of imported / generated dashboard slugs
ALERTS Phase 5 List of alert rule names

Final report (emit after Phase 5)

=== OBSERVABILITY-SETUP REPORT ===
Service:       <SERVICE>
Scale:         <SCALE>   Stack: <STACK>   Logs: <LOG_TARGET>
Instrumented:  <LANGUAGES>
Scrape cfg:    <SCRAPE_CFG>
Ship cmd:      <SHIP_CMD>
Dashboards:    <DASHBOARDS>
Alerts:        <ALERTS>
Next action:   commit + deploy + watch first 30 min of traffic

Rules (apply throughout)

  • Pure-click contract. Only values that must be typed are endpoint URLs, API keys (via env, never a prompt), and the service name (intake argument).
  • NO HALLUCINATION (RULE 0.4). Never invent Grafana dashboard IDs. If the user wants a dashboard, either generate the JSON from _blocks/obs-metrics.md naming conventions or link to the official exporter README. Dashboard IDs from grafana.com/dashboards/ MUST be verified via WebFetch in-session.
  • Reuse over rewrite. Phase 2 always cites _blocks/obs-structured-logs.md, _blocks/obs-metrics.md, _blocks/obs-traces.md. Phase 3 invokes _primitives/metrics-scrape.sh and _primitives/log-ship.sh — do not re-implement their logic inline.
  • Secrets via env (RULE 0.8). API keys for Datadog, Better Stack, Grafana Cloud, etc. ALWAYS read from env (LOG_SHIP_DD_API_KEY, GF_API_KEY). Never write a token into any generated file.
  • Constructor Pattern. Each phase file < 100 LOC. This index < 120 LOC.
  • Surgical Changes. Only write to the target service repo's observability.md, config/prometheus.yml, config/otel-collector.yaml, dashboards/*.json, alerts/*.yaml. Do NOT touch application source beyond the minimum init-call required by Phase 2.

References

  • phase-1-intake.md · phase-2-instrument.md · phase-3-scrape-ship.md · phase-4-dashboards.md · phase-5-alerts.md
  • _blocks/obs-structured-logs.md — JSON-lines field taxonomy (Phase 2 + Phase 3)
  • _blocks/obs-metrics.md — RED / USE signal families + naming (Phase 4 + Phase 5)
  • _blocks/obs-traces.md — W3C traceparent + OTLP transport (Phase 2 + Phase 3)
  • _primitives/metrics-scrape.sh — Prometheus /metrics pretty-print + alert-check
  • _primitives/log-ship.sh — stdin → stdout+forward (Loki / Datadog / custom HTTP)
  • Prometheus docs [VERIFIED: prometheus.io/docs/]
  • OpenTelemetry docs [VERIFIED: opentelemetry.io/docs/]
  • Grafana dashboards catalog [VERIFY: grafana.com/grafana/dashboards/]
  • Better Stack docs [VERIFY: betterstack.com/docs/]