KeiSeiKit-1.0/_primitives/_rust/kei-buddy/src/machine_helpers.rs
Parfii-bot 26dc8c85f7 feat(kei-buddy): AskLanguage i18n + real proposeTopicSources + voice handling
Three follow-up atomics on top of the contacts/topics/sync wave.

## 1. AskLanguage state + ru/en localisation (default en)

New state `AskLanguage` inserted between `Intro` and `AskName`. Intro now
sends a bilingual greeting + language picker. AskLanguage parses
en/english/1/ru/русский/2/etc → persona_patch{"language":"<code>"} →
transitions to AskName with that language's prompt.

All later prompts (AskName / AskTone / AskInterests / AskHobbies /
TopicSpecifics / TopicNowLater / TopicResearch / AskSchedule / Ready)
read persona.language via Lang::from_persona and dispatch through
Strings::* helpers — two language tables, no fallthrough.

Back-compat migration: existing chats without `language` key (like the
user currently in topic_now_later) get an implicit "ru" patch on next
turn so their Russian onboarding continues without regression.

New files: strings.rs (164), machine_lang.rs (145).
Modified: state.rs (+AskLanguage variant), machine.rs (Intro→AskLanguage,
AskLanguage arm, migration guard), machine_helpers.rs, machine_tests.rs.

5 new tests (intro_to_ask_language, ask_language_en, ask_language_ru,
ask_language_invalid, migration_sets_ru_when_language_missing).

## 2. Real proposeTopicSources — removed TODO(phase2) stub

machine_lang.rs::step_topic_research now calls
extractor.extract(prompt, topic_title) with a {name, url, why} schema.
Parses JSON, formats numbered source list, transitions to TopicSources.

Failure paths (LLM error, empty array): graceful fallback prompt asking
user to suggest their own — still transitions to TopicSources so flow
doesn't deadlock.

3 new tests in machine_tests_topic_research.rs:
topic_research_yes_proposes_sources,
topic_research_yes_empty_sources_still_advances,
topic_research_no_skips_topic_sources.

## 3. Voice-message handling (Telegram voice/audio → STT → text pipeline)

kei-telegram-webhook: added Voice/Audio sub-structs on Message and
WebhookEvent::Voice variant. classify() detects message.voice OR
message.audio. 2 new tests in event.rs.

kei-buddy/src/voice.rs (178 LOC):
VoiceHandler { bot_token, stt: Arc<dyn SttBackend>, http }
transcribe_file(file_id, mime_type) does:
  1. GET https://api.telegram.org/bot{token}/getFile?file_id=...
  2. GET https://api.telegram.org/file/bot{token}/{file_path}
  3. SttRequest { audio_bytes, mime_type, language: None } → backend.transcribe
  4. Returns transcript text.
2 wiremock tests (download chain + 500 error mapping).

serve.rs adds voice: Option<Arc<VoiceHandler>> to BuddyContext;
on_event Voice arm: whitelist check → transcribe → handle_text (same
pipeline as if user typed). Voice unavailable: warn + ignore.

serve_runner.rs builds VoiceHandler from KEI_BUDDY_STT_BACKEND env.

kei-stt added as optional dep gated by serve feature. Default backend
whisper-local (no extra build deps).

TTS reply path deferred (next atomic).

## Verify

  * cargo check --workspace: PASS
  * cargo test -p kei-buddy --lib: 55 passed / 0 failed (was 41 → 50 → 53 → 55)
  * cargo test -p kei-telegram-webhook --lib: 7 passed (was 5, +2 voice)
  * cargo build -p kei-buddy --release: PASS (23.7s)

NOT deployed yet — three new things to roll out next:
  * новые миграции (нет — БД без изменений)
  * новые env: KEI_BUDDY_STT_BACKEND (optional)
  * установка faster-whisper / piper-tts на сервер для STT
    (без него Voice event просто warn-логируется и игнорируется)
2026-05-12 17:49:06 +08:00

169 lines
6.6 KiB
Rust
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// SPDX-License-Identifier: Apache-2.0
//! Pure helper functions for `machine::handle_step`.
//!
//! Constructor Pattern split: helpers extracted so `machine.rs` stays
//! within its 260-LOC exception budget.
//! Language-aware helpers live in `machine_lang.rs`.
use serde_json::{json, Value};
use crate::state::OnboardState;
use crate::transition::StepOutput;
// ─── string helpers ───────────────────────────────────────────────────────────
pub(crate) fn format_list(items: &[String]) -> String {
if items.is_empty() {
return "е указано_".to_owned();
}
items.iter().map(|i| format!("`{i}`")).collect::<Vec<_>>().join(", ")
}
pub(crate) fn str_list(v: &Value) -> Vec<String> {
v.as_array()
.map(|a| a.iter().filter_map(|x| x.as_str().map(str::to_owned)).collect())
.unwrap_or_default()
}
pub(crate) fn extract_string(v: &Value, key: &str) -> String {
v[key].as_str().unwrap_or("").to_owned()
}
// ─── topic-state helpers ──────────────────────────────────────────────────────
pub(crate) fn topic_queue(persona: &Value) -> Vec<Value> {
persona["__topic_state"]["queue"]
.as_array()
.cloned()
.unwrap_or_default()
}
pub(crate) fn topic_index(persona: &Value) -> usize {
persona["__topic_state"]["index"].as_u64().unwrap_or(0) as usize
}
pub(crate) fn build_topic_state(queue: &[Value], index: usize, extra: Value) -> Value {
let mut obj = extra.as_object().cloned().unwrap_or_default();
obj.insert("queue".to_owned(), json!(queue));
obj.insert("index".to_owned(), json!(index));
json!({ "__topic_state": obj })
}
// ─── source-selection parser ──────────────────────────────────────────────────
pub(crate) fn parse_source_selection(text: &str, total: usize) -> Vec<usize> {
let lower = text.trim().to_lowercase();
if ["all", "все", "yes", "да"].contains(&lower.as_str()) {
return (0..total).collect();
}
if ["none", "нет", "skip", "пропусти"].contains(&lower.as_str()) {
return vec![];
}
let mut seen = std::collections::HashSet::new();
let mut out = vec![];
for part in text.split(|c: char| c == ',' || c == ';' || c.is_whitespace()) {
if let Ok(n) = part.trim().parse::<usize>() {
let idx = n.saturating_sub(1);
if idx < total && seen.insert(idx) {
out.push(idx);
}
}
}
out
}
// ─── schedule helpers ─────────────────────────────────────────────────────────
pub(crate) fn clamp_hour(v: &Value) -> Option<u8> {
match v {
Value::Number(n) => n.as_u64().filter(|&h| h <= 23).map(|h| h as u8),
Value::String(s) => s.parse::<u64>().ok().filter(|&h| h <= 23).map(|h| h as u8),
_ => None,
}
}
pub(crate) fn describe_schedule(morning: Option<u8>, evening: Option<u8>, tz: &str) -> String {
if morning.is_none() && evening.is_none() {
return "_без расписания_".to_owned();
}
let mut parts = vec![];
if let Some(h) = morning {
parts.push(format!("утро {h}:00"));
}
if let Some(h) = evening {
parts.push(format!("вечер {h}:00"));
}
format!("{} ({tz})", parts.join(", "))
}
// ─── topic finisher ───────────────────────────────────────────────────────────
/// Save the completed topic record and advance to next topic or ask_schedule.
pub(crate) fn finish_topic(
persona: &Value,
name: &str,
_is_interest: bool,
specifics: &[String],
defer: bool,
research: bool,
proposed: &[Value],
picked: &[usize],
) -> StepOutput {
let status = if defer { "отложено" } else { "обсудим" };
let src_line = build_source_line(research, picked, proposed);
let summary = format!("✓ *{name}* — {status}; {src_line}.");
let queue = topic_queue(persona);
let index = topic_index(persona);
let mut done: Vec<Value> = persona["topics_done"].as_array().cloned().unwrap_or_default();
done.push(json!({
"name": name, "specifics": specifics, "defer": defer,
"research": research, "picked": picked
}));
if queue.is_empty() {
return ask_schedule_finish(&json!({ "topics_done": done }), &summary);
}
let next_topic = &queue[0];
let next_name = next_topic["name"].as_str().unwrap_or("?");
let ts = build_topic_state(&queue[1..], index + 1, json!({}));
let mut patch = ts;
patch["topics_done"] = json!(done);
patch["current_topic"] = next_topic.clone();
StepOutput {
next_state: OnboardState::TopicSpecifics,
response_text: format!(
"{summary}\n\nСледующая тема: *{next_name}*.\n\nЧто именно в ней тебе интересно?"
),
persona_patch: patch,
}
}
/// Internal schedule prompt used by `finish_topic` — always Russian (back-compat).
/// The per-turn language-aware variant is in `machine_lang::ask_schedule_lang`.
fn ask_schedule_finish(extra_patch: &Value, prefix: &str) -> StepOutput {
StepOutput {
next_state: OnboardState::AskSchedule,
response_text: format!(
"{prefix}\n\nТемы разобрали. ⏰ Когда удобно получать дайджесты? Напиши свободно — \
например, \"утром часов в 8, вечером в 10, я в Бали\" или \"вечером в 9\". \
Если не нужно — напиши \"нет\"."
),
persona_patch: extra_patch.clone(),
}
}
fn build_source_line(research: bool, picked: &[usize], proposed: &[Value]) -> String {
if research && !picked.is_empty() {
let handles: Vec<_> = picked.iter()
.filter_map(|&i| proposed.get(i))
.filter_map(|s| s["handle_or_url"].as_str())
.map(|h| format!("`{h}`"))
.collect();
format!("Источники: {}", handles.join(", "))
} else if research {
"Источники: _не выбрано_".to_owned()
} else {
"_без мониторинга_".to_owned()
}
}