History

Parfii-bot cb59b77ed2 feat(kei-tts + kei-stt): TTS/STT abstractions with 4+3 backends Two parallel atomars in the kei-buddy phase-1 plan. Mirror each other's architecture: trait + feature-gated backend modules + env-driven dispatch + wiremock tests for HTTP backends + subprocess-error test for local. ## kei-tts (text-to-speech) LOC: 959 across 15 files (largest src/lib.rs 121). Trait `TtsBackend` + 4 backends behind feature flags: * elevenlabs — POST api.elevenlabs.io/v1/text-to-speech/{voice}/stream * openai — POST api.openai.com/v1/audio/speech (tts-1, tts-1-hd) * google — POST texttospeech.googleapis.com/v1/text:synthesize (Wavenet voices, base64 audioContent) * piper — local subprocess to piper-tts binary, raw PCM out Default features: ["piper"]. all-backends feature gates the rest. `from_env()` reads KEI_TTS_BACKEND (default piper). Returns Box<dyn TtsBackend>. Tests: 9 passed (env routing + 3 wiremock backends + piper subprocess error). ## kei-stt (speech-to-text) LOC: 935 across 13 files (largest whisper_local.rs 181). Trait `SttBackend` + 3 backends: * whisper-local — subprocess to `whisper` CLI / faster-whisper, reads JSON output, parses segments * deepgram — POST api.deepgram.com/v1/listen (Token auth header, raw audio body, parses words → Segments) * openai-whisper — POST api.openai.com/v1/audio/transcriptions (multipart file + model=whisper-1 + response_format=verbose_json) Default features: ["whisper-local"]. all-backends gates the rest. `from_env()` reads KEI_STT_BACKEND (default whisper-local). Tests: 10 passed + 1 doc-test (env routing + 5 wiremock + 2 JSON parsers + 1 subprocess error + 1 auth-header check). ## Common architecture decisions * `with_base_url(url)` constructor on each HTTP backend for wiremock testability — same pattern as kei-llm-router and kei-notify-telegram. * `tempfile` crate added to kei-stt for whisper-local audio scratch. * `base64 = { version = "0.22", optional = true }` in kei-tts for Google's base64-encoded audioContent. ## Verify-before-commit (RULE 0.13 §) * cargo check -p kei-tts (default + all-backends): PASS * cargo check -p kei-stt (default + all-backends): PASS * cargo test -p kei-tts --features all-backends --lib: 9/0 * cargo test -p kei-stt --features all-backends --lib: 10/0 * cargo check --workspace: PASS STATUS-TRUTH from both agents: shipped=functional, stubs=0, behaviour-verified=yes. ## Follow-up (deferred, non-blocking) * Real backend verification needs API keys for ElevenLabs / OpenAI / Google / Deepgram and piper-tts binary + .onnx model on PATH. * whisper-local language_detected always None — whisper CLI JSON schema differs across versions, parse heuristic to be added. * faster-whisper has different JSON schema from openai-whisper; current parser covers openai-whisper convention only.		2026-05-12 13:47:35 +08:00
..
src	feat(kei-tts + kei-stt): TTS/STT abstractions with 4+3 backends	2026-05-12 13:47:35 +08:00
Cargo.toml	feat(kei-tts + kei-stt): TTS/STT abstractions with 4+3 backends	2026-05-12 13:47:35 +08:00
README.md	feat(kei-tts + kei-stt): TTS/STT abstractions with 4+3 backends	2026-05-12 13:47:35 +08:00

README.md

kei-tts

Text-to-speech abstraction crate with 4 backends selected at runtime via KEI_TTS_BACKEND. Default backend is piper (local, free, zero latency).

Backend matrix

Backend	Feature flag	Cost	Latency	Quality	Language coverage
`piper`	`piper`	Free	~50–200 ms	Good	20+ language packs
`elevenlabs`	`elevenlabs`	~$0.30/1k ch	300–600 ms	Excellent	30+ languages
`openai`	`openai`	~$0.015/1k ch	200–500 ms	Very good	50+ languages
`google`	`google`	~$4/1M ch	200–400 ms	Very good	40+ languages

Environment variables

Variable	Backend	Required	Description
`KEI_TTS_BACKEND`	all	No	`piper` (default) / `elevenlabs` / `openai` / `google`
`ELEVENLABS_API_KEY`	elevenlabs	Yes	ElevenLabs API key
`OPENAI_API_KEY`	openai	Yes	OpenAI API key
`KEI_TTS_OPENAI_MODEL`	openai	No	`tts-1` (default) or `tts-1-hd`
`GOOGLE_TTS_API_KEY`	google	Yes	Google Cloud API key
`KEI_TTS_PIPER_MODEL`	piper	Yes	Path to `.onnx` piper model file
`KEI_TTS_PIPER_BINARY`	piper	No	Path to `piper-tts` (default: PATH)

Usage

[dependencies]
kei-tts = { path = "../kei-tts", features = ["piper"] }

#[tokio::main]
async fn main() -> Result<(), kei_tts::TtsError> {
    let backend = kei_tts::from_env()?;
    let req = kei_tts::TtsRequest::new("Hello, world!");
    let resp = backend.synth(&req).await?;
    std::fs::write("out.mp3", &resp.audio_bytes).ok();
    println!("synthesised {} bytes via {}", resp.audio_bytes.len(), backend.name());
    Ok(())
}

Compile-time features

# All backends:
kei-tts = { features = ["all-backends"] }
# Cloud only, no piper:
kei-tts = { features = ["elevenlabs", "openai", "google"], default-features = false }

README.md Unescape Escape

kei-tts

Backend matrix

Environment variables

Usage

Compile-time features

README.md