Two parallel atomars in the kei-buddy phase-1 plan. Mirror each other's
architecture: trait + feature-gated backend modules + env-driven dispatch
+ wiremock tests for HTTP backends + subprocess-error test for local.
## kei-tts (text-to-speech)
LOC: 959 across 15 files (largest src/lib.rs 121).
Trait `TtsBackend` + 4 backends behind feature flags:
* elevenlabs — POST api.elevenlabs.io/v1/text-to-speech/{voice}/stream
* openai — POST api.openai.com/v1/audio/speech (tts-1, tts-1-hd)
* google — POST texttospeech.googleapis.com/v1/text:synthesize
(Wavenet voices, base64 audioContent)
* piper — local subprocess to piper-tts binary, raw PCM out
Default features: ["piper"]. all-backends feature gates the rest.
`from_env()` reads KEI_TTS_BACKEND (default piper). Returns Box<dyn TtsBackend>.
Tests: 9 passed (env routing + 3 wiremock backends + piper subprocess error).
## kei-stt (speech-to-text)
LOC: 935 across 13 files (largest whisper_local.rs 181).
Trait `SttBackend` + 3 backends:
* whisper-local — subprocess to `whisper` CLI / faster-whisper,
reads JSON output, parses segments
* deepgram — POST api.deepgram.com/v1/listen (Token auth header,
raw audio body, parses words → Segments)
* openai-whisper — POST api.openai.com/v1/audio/transcriptions
(multipart file + model=whisper-1 +
response_format=verbose_json)
Default features: ["whisper-local"]. all-backends gates the rest.
`from_env()` reads KEI_STT_BACKEND (default whisper-local).
Tests: 10 passed + 1 doc-test (env routing + 5 wiremock + 2 JSON parsers
+ 1 subprocess error + 1 auth-header check).
## Common architecture decisions
* `with_base_url(url)` constructor on each HTTP backend for wiremock
testability — same pattern as kei-llm-router and kei-notify-telegram.
* `tempfile` crate added to kei-stt for whisper-local audio scratch.
* `base64 = { version = "0.22", optional = true }` in kei-tts for
Google's base64-encoded audioContent.
## Verify-before-commit (RULE 0.13 §)
* cargo check -p kei-tts (default + all-backends): PASS
* cargo check -p kei-stt (default + all-backends): PASS
* cargo test -p kei-tts --features all-backends --lib: 9/0
* cargo test -p kei-stt --features all-backends --lib: 10/0
* cargo check --workspace: PASS
STATUS-TRUTH from both agents: shipped=functional, stubs=0,
behaviour-verified=yes.
## Follow-up (deferred, non-blocking)
* Real backend verification needs API keys for ElevenLabs / OpenAI /
Google / Deepgram and piper-tts binary + .onnx model on PATH.
* whisper-local language_detected always None — whisper CLI JSON
schema differs across versions, parse heuristic to be added.
* faster-whisper has different JSON schema from openai-whisper;
current parser covers openai-whisper convention only.
71 lines
2.3 KiB
Rust
71 lines
2.3 KiB
Rust
// SPDX-License-Identifier: Apache-2.0
|
|
// Copyright 2026 <author org>
|
|
//! Wiremock tests for `OpenAiWhisperBackend`.
|
|
//!
|
|
//! Verifies Bearer auth, multipart body, and verbose_json segment parsing.
|
|
|
|
#![cfg(all(test, feature = "openai-whisper"))]
|
|
|
|
use wiremock::matchers::{header, method, path};
|
|
use wiremock::{Mock, MockServer, ResponseTemplate};
|
|
|
|
use crate::openai_whisper::OpenAiWhisperBackend;
|
|
use crate::request::SttRequest;
|
|
use crate::trait_def::SttBackend;
|
|
|
|
fn verbose_json_body() -> serde_json::Value {
|
|
serde_json::json!({
|
|
"text": "hello openai",
|
|
"language": "english",
|
|
"segments": [
|
|
{"start": 0.0, "end": 0.5, "text": "hello"},
|
|
{"start": 0.5, "end": 1.2, "text": "openai"}
|
|
]
|
|
})
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn openai_whisper_parses_segments() {
|
|
let server = MockServer::start().await;
|
|
|
|
Mock::given(method("POST"))
|
|
.and(path("/v1/audio/transcriptions"))
|
|
.and(header("authorization", "Bearer test-key"))
|
|
.respond_with(
|
|
ResponseTemplate::new(200)
|
|
.set_body_json(verbose_json_body()),
|
|
)
|
|
.mount(&server)
|
|
.await;
|
|
|
|
let backend = OpenAiWhisperBackend::with_base_url(server.uri(), "test-key");
|
|
let req = SttRequest {
|
|
audio_bytes: b"fake_audio".to_vec(),
|
|
mime_type: "audio/wav".to_string(),
|
|
language: None,
|
|
};
|
|
let resp = backend.transcribe(&req).await.expect("transcribe should succeed");
|
|
assert_eq!(resp.text, "hello openai");
|
|
assert_eq!(resp.segments.len(), 2);
|
|
assert_eq!(resp.segments[0].start_ms, 0);
|
|
assert_eq!(resp.segments[0].end_ms, 500);
|
|
assert_eq!(resp.segments[1].start_ms, 500);
|
|
assert_eq!(resp.segments[1].end_ms, 1200);
|
|
assert_eq!(resp.language_detected.as_deref(), Some("english"));
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn openai_whisper_http_error() {
|
|
let server = MockServer::start().await;
|
|
|
|
Mock::given(method("POST"))
|
|
.and(path("/v1/audio/transcriptions"))
|
|
.respond_with(ResponseTemplate::new(429).set_body_string("Rate limited"))
|
|
.mount(&server)
|
|
.await;
|
|
|
|
let backend = OpenAiWhisperBackend::with_base_url(server.uri(), "test-key");
|
|
let req = SttRequest::new_wav(b"audio".to_vec());
|
|
let err = backend.transcribe(&req).await.expect_err("should fail on 429");
|
|
assert!(matches!(err, crate::SttError::Http(_)));
|
|
}
|