KeiSeiKit-1.0/_primitives
Parfii-bot cb59b77ed2 feat(kei-tts + kei-stt): TTS/STT abstractions with 4+3 backends
Two parallel atomars in the kei-buddy phase-1 plan. Mirror each other's
architecture: trait + feature-gated backend modules + env-driven dispatch
+ wiremock tests for HTTP backends + subprocess-error test for local.

## kei-tts (text-to-speech)
LOC: 959 across 15 files (largest src/lib.rs 121).
Trait `TtsBackend` + 4 backends behind feature flags:
  * elevenlabs — POST api.elevenlabs.io/v1/text-to-speech/{voice}/stream
  * openai     — POST api.openai.com/v1/audio/speech (tts-1, tts-1-hd)
  * google     — POST texttospeech.googleapis.com/v1/text:synthesize
                 (Wavenet voices, base64 audioContent)
  * piper      — local subprocess to piper-tts binary, raw PCM out
Default features: ["piper"]. all-backends feature gates the rest.
`from_env()` reads KEI_TTS_BACKEND (default piper). Returns Box<dyn TtsBackend>.
Tests: 9 passed (env routing + 3 wiremock backends + piper subprocess error).

## kei-stt (speech-to-text)
LOC: 935 across 13 files (largest whisper_local.rs 181).
Trait `SttBackend` + 3 backends:
  * whisper-local  — subprocess to `whisper` CLI / faster-whisper,
                     reads JSON output, parses segments
  * deepgram       — POST api.deepgram.com/v1/listen (Token auth header,
                     raw audio body, parses words → Segments)
  * openai-whisper — POST api.openai.com/v1/audio/transcriptions
                     (multipart file + model=whisper-1 +
                      response_format=verbose_json)
Default features: ["whisper-local"]. all-backends gates the rest.
`from_env()` reads KEI_STT_BACKEND (default whisper-local).
Tests: 10 passed + 1 doc-test (env routing + 5 wiremock + 2 JSON parsers
+ 1 subprocess error + 1 auth-header check).

## Common architecture decisions
  * `with_base_url(url)` constructor on each HTTP backend for wiremock
    testability — same pattern as kei-llm-router and kei-notify-telegram.
  * `tempfile` crate added to kei-stt for whisper-local audio scratch.
  * `base64 = { version = "0.22", optional = true }` in kei-tts for
    Google's base64-encoded audioContent.

## Verify-before-commit (RULE 0.13 §)
  * cargo check -p kei-tts (default + all-backends): PASS
  * cargo check -p kei-stt (default + all-backends): PASS
  * cargo test -p kei-tts --features all-backends --lib: 9/0
  * cargo test -p kei-stt --features all-backends --lib: 10/0
  * cargo check --workspace: PASS

STATUS-TRUTH from both agents: shipped=functional, stubs=0,
behaviour-verified=yes.

## Follow-up (deferred, non-blocking)
  * Real backend verification needs API keys for ElevenLabs / OpenAI /
    Google / Deepgram and piper-tts binary + .onnx model on PATH.
  * whisper-local language_detected always None — whisper CLI JSON
    schema differs across versions, parse heuristic to be added.
  * faster-whisper has different JSON schema from openai-whisper;
    current parser covers openai-whisper convention only.
2026-05-12 13:47:35 +08:00
..
_rust feat(kei-tts + kei-stt): TTS/STT abstractions with 4+3 backends 2026-05-12 13:47:35 +08:00
templates feat(live-graph): WebSocket activity stream — orchestrator-centric live view 2026-05-02 13:30:24 +08:00
design-scrape.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
figma-tokens.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
frontend-inspect.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
harden-base.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
kei-ci-lint.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
kei-docs-scaffold.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
kei-doctor.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
kei-sleep-queue.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
kei-sleep-setup.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
kei-sleep-sync.sh feat(sleep-sync): mirror time-metrics + ledger snapshots, surface in Phase B report 2026-05-02 04:02:28 +08:00
live-preview.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
log-ship.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
MANIFEST.toml KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
metrics-scrape.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
provision-hetzner.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
provision-vultr.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
README.md KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
screenshot-decode.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
tomd.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00

_primitives — first-class building blocks

_primitives/ holds standalone utilities that agents, hooks, and skills (including /compose-solution) depend on. Unlike _blocks/ (behavioral markdown) or _manifests/ (agent TOML), primitives are executable shell programs installed at $HOME/.claude/agents/_primitives/ by install.sh.

Current primitives

Primitive Purpose Invocation
tomd.sh Universal non-native-format → markdown converter (PDF, DOCX, XLSX, PPTX, CSV, images, code). ~/.claude/agents/_primitives/tomd.sh <file>

tomd.sh is a first-class primitive. Universal non-native-format → markdown converter with configurable cache directory (KEISEI_TOMD_CACHE) and KeiSeiKit-style error tags ([tomd]).

Hook integration

hooks/tomd-preread.sh is a PreToolUse(Read) hook that auto-redirects Claude to the converted markdown when a Read targets .docx / .doc / .xlsx / .pptx / .csv. Cached under $KEISEI_TOMD_CACHE (default /tmp/keisei-tomd-cache).

/compose-solution discovery

Phase 3 prior-art sweep greps _primitives/ alongside _blocks/, _manifests/, skills/, _bridges/, hooks/. If a user task involves file-format parsing, the meta-composer surfaces tomd automatically — reuse over rewrite (RULE "No Patching").