Two parallel atomars in the kei-buddy phase-1 plan. Mirror each other's
architecture: trait + feature-gated backend modules + env-driven dispatch
+ wiremock tests for HTTP backends + subprocess-error test for local.
## kei-tts (text-to-speech)
LOC: 959 across 15 files (largest src/lib.rs 121).
Trait `TtsBackend` + 4 backends behind feature flags:
* elevenlabs — POST api.elevenlabs.io/v1/text-to-speech/{voice}/stream
* openai — POST api.openai.com/v1/audio/speech (tts-1, tts-1-hd)
* google — POST texttospeech.googleapis.com/v1/text:synthesize
(Wavenet voices, base64 audioContent)
* piper — local subprocess to piper-tts binary, raw PCM out
Default features: ["piper"]. all-backends feature gates the rest.
`from_env()` reads KEI_TTS_BACKEND (default piper). Returns Box<dyn TtsBackend>.
Tests: 9 passed (env routing + 3 wiremock backends + piper subprocess error).
## kei-stt (speech-to-text)
LOC: 935 across 13 files (largest whisper_local.rs 181).
Trait `SttBackend` + 3 backends:
* whisper-local — subprocess to `whisper` CLI / faster-whisper,
reads JSON output, parses segments
* deepgram — POST api.deepgram.com/v1/listen (Token auth header,
raw audio body, parses words → Segments)
* openai-whisper — POST api.openai.com/v1/audio/transcriptions
(multipart file + model=whisper-1 +
response_format=verbose_json)
Default features: ["whisper-local"]. all-backends gates the rest.
`from_env()` reads KEI_STT_BACKEND (default whisper-local).
Tests: 10 passed + 1 doc-test (env routing + 5 wiremock + 2 JSON parsers
+ 1 subprocess error + 1 auth-header check).
## Common architecture decisions
* `with_base_url(url)` constructor on each HTTP backend for wiremock
testability — same pattern as kei-llm-router and kei-notify-telegram.
* `tempfile` crate added to kei-stt for whisper-local audio scratch.
* `base64 = { version = "0.22", optional = true }` in kei-tts for
Google's base64-encoded audioContent.
## Verify-before-commit (RULE 0.13 §)
* cargo check -p kei-tts (default + all-backends): PASS
* cargo check -p kei-stt (default + all-backends): PASS
* cargo test -p kei-tts --features all-backends --lib: 9/0
* cargo test -p kei-stt --features all-backends --lib: 10/0
* cargo check --workspace: PASS
STATUS-TRUTH from both agents: shipped=functional, stubs=0,
behaviour-verified=yes.
## Follow-up (deferred, non-blocking)
* Real backend verification needs API keys for ElevenLabs / OpenAI /
Google / Deepgram and piper-tts binary + .onnx model on PATH.
* whisper-local language_detected always None — whisper CLI JSON
schema differs across versions, parse heuristic to be added.
* faster-whisper has different JSON schema from openai-whisper;
current parser covers openai-whisper convention only.
115 lines
3.5 KiB
Rust
115 lines
3.5 KiB
Rust
// SPDX-License-Identifier: Apache-2.0
|
|
// Copyright 2026 <author org>
|
|
//! Deepgram STT backend — calls `api.deepgram.com/v1/listen`.
|
|
//!
|
|
//! Endpoint: `POST /v1/listen?language={lang}&punctuate=true`
|
|
//! Auth: `Authorization: Token {DEEPGRAM_API_KEY}` header.
|
|
//! Body: raw audio bytes with the request MIME type.
|
|
//!
|
|
//! Response shape:
|
|
//! ```json
|
|
//! {"results":{"channels":[{"alternatives":[{
|
|
//! "transcript":"...",
|
|
//! "words":[{"word":"...","start":0.1,"end":0.4}]
|
|
//! }]}]}}
|
|
//! ```
|
|
//!
|
|
//! Constructor surface:
|
|
//! * [`DeepgramBackend::from_env`] — reads `DEEPGRAM_API_KEY`.
|
|
//! * [`DeepgramBackend::with_base_url`] — explicit URL + key (tests).
|
|
|
|
#![cfg(feature = "deepgram")]
|
|
|
|
use crate::error::SttError;
|
|
use crate::request::SttRequest;
|
|
use crate::response::{Segment, SttResponse};
|
|
use crate::trait_def::SttBackend;
|
|
|
|
const DEFAULT_BASE_URL: &str = "https://api.deepgram.com";
|
|
|
|
pub struct DeepgramBackend {
|
|
api_key: String,
|
|
client: reqwest::Client,
|
|
base_url: String,
|
|
}
|
|
|
|
impl DeepgramBackend {
|
|
/// Build from explicit base URL and API key (used in wiremock tests).
|
|
pub fn with_base_url(
|
|
base_url: impl Into<String>,
|
|
api_key: impl Into<String>,
|
|
) -> Self {
|
|
Self {
|
|
api_key: api_key.into(),
|
|
client: reqwest::Client::new(),
|
|
base_url: base_url.into().trim_end_matches('/').to_string(),
|
|
}
|
|
}
|
|
|
|
/// Build from `DEEPGRAM_API_KEY` env var.
|
|
pub fn from_env() -> Result<Self, SttError> {
|
|
let key = std::env::var("DEEPGRAM_API_KEY")
|
|
.map_err(|_| SttError::MissingEnv("DEEPGRAM_API_KEY".into()))?;
|
|
Ok(Self::with_base_url(DEFAULT_BASE_URL, key))
|
|
}
|
|
}
|
|
|
|
#[async_trait::async_trait]
|
|
impl SttBackend for DeepgramBackend {
|
|
fn name(&self) -> &'static str { "deepgram" }
|
|
|
|
async fn transcribe(&self, req: &SttRequest) -> Result<SttResponse, SttError> {
|
|
let mut url = format!("{}/v1/listen?punctuate=true", self.base_url);
|
|
if let Some(lang) = &req.language {
|
|
url.push_str(&format!("&language={lang}"));
|
|
}
|
|
|
|
let resp = self.client
|
|
.post(&url)
|
|
.header("Authorization", format!("Token {}", self.api_key))
|
|
.header("Content-Type", &req.mime_type)
|
|
.body(req.audio_bytes.clone())
|
|
.send()
|
|
.await?;
|
|
|
|
if !resp.status().is_success() {
|
|
let status = resp.status().as_u16();
|
|
let text = resp.text().await.unwrap_or_default();
|
|
return Err(SttError::Http(format!("http {status}: {text}")));
|
|
}
|
|
|
|
let body: serde_json::Value = resp.json().await
|
|
.map_err(|e| SttError::InvalidResponse(e.to_string()))?;
|
|
|
|
parse_deepgram_response(&body)
|
|
}
|
|
}
|
|
|
|
fn parse_deepgram_response(body: &serde_json::Value) -> Result<SttResponse, SttError> {
|
|
let alt = body
|
|
.pointer("/results/channels/0/alternatives/0")
|
|
.ok_or_else(|| SttError::InvalidResponse("missing alternatives".into()))?;
|
|
|
|
let text = alt["transcript"]
|
|
.as_str()
|
|
.unwrap_or_default()
|
|
.to_string();
|
|
|
|
let segments = alt["words"]
|
|
.as_array()
|
|
.unwrap_or(&vec![])
|
|
.iter()
|
|
.filter_map(|w| {
|
|
let start_ms = (w["start"].as_f64()? * 1000.0) as u64;
|
|
let end_ms = (w["end"].as_f64()? * 1000.0) as u64;
|
|
let word = w["word"].as_str()?.to_string();
|
|
Some(Segment { start_ms, end_ms, text: word })
|
|
})
|
|
.collect();
|
|
|
|
Ok(SttResponse { text, segments, language_detected: None })
|
|
}
|
|
|
|
#[cfg(test)]
|
|
#[path = "deepgram_test.rs"]
|
|
mod tests;
|