5.1 KiB
5.1 KiB
| name | description | argument-hint |
|---|---|---|
| observability-setup | Hub-and-spoke pipeline for installing the logs + metrics + traces triad on an existing service. Decomposes into 5 phases — scale/stack intake, code-side instrumentation, scrape+ship wiring, dashboard import, alert rules. Pure-click except for env-specific values (endpoints, tokens). Reuses `_blocks/obs-structured-logs.md`, `_blocks/obs-metrics.md`, `_blocks/obs-traces.md`, `_primitives/metrics-scrape.sh`, `_primitives/log-ship.sh`. | <service-or-repo-name> |
Observability-Setup — 5-Phase Pipeline (index)
You are installing observability on an existing service or repo. The user tells
you which service. You walk five phases, each with an AskUserQuestion
click-batch. Every durable decision lands in a named file inside the target
repo (observability.md, prometheus.yml, otel-collector.yaml, Grafana
dashboard JSON, Alertmanager rules).
This SKILL.md is the INDEX. Each phase lives in its own file and runs in
order. Never skip a phase — skipping Phase 4 gives you metrics with no
dashboards; skipping Phase 5 gives you dashboards nobody watches.
Pipeline overview (5 phases + final report)
| Phase | File | Purpose | AskUserQuestion |
|---|---|---|---|
| 1 | phase-1-intake.md | Scale / stack / log target click-batch | 1× (3 questions) |
| 2 | phase-2-instrument.md | Code-side SDK + config diff | 1× |
| 3 | phase-3-scrape-ship.md | Metrics scrape + log forward wiring | 1× |
| 4 | phase-4-dashboards.md | RED + USE + per-service dashboards | 1× |
| 5 | phase-5-alerts.md | Error rate / p99 latency / saturation | 1× |
Minimum AskUserQuestion count: 5. (Phase 1 bundles three related questions
into one AskUserQuestion call with multiSelect per question, per native
protocol.)
Variables the pipeline produces
| Name | Set in | Meaning |
|---|---|---|
SERVICE |
argument | Service/repo name the user invokes the skill with |
SCALE |
Phase 1 | single-host / small-cluster / prod |
STACK |
Phase 1 | prom-grafana / otel-vendor / better-stack / custom |
LOG_TARGET |
Phase 1 | stdout-only / file / ship-loki / ship-datadog / ship-http |
LANGUAGES |
Phase 2 | Subset of {rust, go, python, node, swift} — SDKs to wire |
SCRAPE_CFG |
Phase 3 | prometheus.yml / otel-collector.yaml path |
SHIP_CMD |
Phase 3 | log-ship.sh invocation for the service |
DASHBOARDS |
Phase 4 | List of imported / generated dashboard slugs |
ALERTS |
Phase 5 | List of alert rule names |
Final report (emit after Phase 5)
=== OBSERVABILITY-SETUP REPORT ===
Service: <SERVICE>
Scale: <SCALE> Stack: <STACK> Logs: <LOG_TARGET>
Instrumented: <LANGUAGES>
Scrape cfg: <SCRAPE_CFG>
Ship cmd: <SHIP_CMD>
Dashboards: <DASHBOARDS>
Alerts: <ALERTS>
Next action: commit + deploy + watch first 30 min of traffic
Rules (apply throughout)
- Pure-click contract. Only values that must be typed are endpoint URLs, API keys (via env, never a prompt), and the service name (intake argument).
- NO HALLUCINATION (RULE 0.4). Never invent Grafana dashboard IDs. If the
user wants a dashboard, either generate the JSON from
_blocks/obs-metrics.mdnaming conventions or link to the official exporter README. Dashboard IDs fromgrafana.com/dashboards/MUST be verified via WebFetch in-session. - Reuse over rewrite. Phase 2 always cites
_blocks/obs-structured-logs.md,_blocks/obs-metrics.md,_blocks/obs-traces.md. Phase 3 invokes_primitives/metrics-scrape.shand_primitives/log-ship.sh— do not re-implement their logic inline. - Secrets via env (RULE 0.8). API keys for Datadog, Better Stack, Grafana
Cloud, etc. ALWAYS read from env (
LOG_SHIP_DD_API_KEY,GF_API_KEY). Never write a token into any generated file. - Constructor Pattern. Each phase file < 100 LOC. This index < 120 LOC.
- Surgical Changes. Only write to the target service repo's
observability.md,config/prometheus.yml,config/otel-collector.yaml,dashboards/*.json,alerts/*.yaml. Do NOT touch application source beyond the minimum init-call required by Phase 2.
References
- phase-1-intake.md · phase-2-instrument.md · phase-3-scrape-ship.md · phase-4-dashboards.md · phase-5-alerts.md
_blocks/obs-structured-logs.md— JSON-lines field taxonomy (Phase 2 + Phase 3)_blocks/obs-metrics.md— RED / USE signal families + naming (Phase 4 + Phase 5)_blocks/obs-traces.md— W3C traceparent + OTLP transport (Phase 2 + Phase 3)_primitives/metrics-scrape.sh— Prometheus/metricspretty-print + alert-check_primitives/log-ship.sh— stdin → stdout+forward (Loki / Datadog / custom HTTP)- Prometheus docs [VERIFIED: prometheus.io/docs/]
- OpenTelemetry docs [VERIFIED: opentelemetry.io/docs/]
- Grafana dashboards catalog [VERIFY: grafana.com/grafana/dashboards/]
- Better Stack docs [VERIFY: betterstack.com/docs/]