refactor(v0.22): kei-store AsyncBackend trait + shared tokio runtime (Track B)
Extracts an AsyncBackend trait in kei-store so future GCS/Azure/Bunny
backends implement 5 async fns, not re-invent the sync-over-tokio
bridge. Closes architect P1 + P2 findings.
NEW src/async_backend.rs (189 LOC):
- trait AsyncBackend with get/put/list/list_recursive/delete/exists
- AsyncBackendStore<B: AsyncBackend> — generic MemoryStore impl
(sync-over-async via shared runtime)
- shared_runtime() -> &'static Runtime via OnceLock
(multi-thread, 2 workers, enable_io+enable_time)
- path helpers (validate_rel, short_hash, is_manifest_key) moved
here as single source of truth
NEW src/s3_cloud/backend.rs (120 LOC):
- S3AsyncBackend impl of AsyncBackend — 5 async fns using the
existing aws-sdk-s3::Client
MODIFIED src/s3_cloud/mod.rs (200 → 43 LOC):
pub type S3CloudStore = AsyncBackendStore<S3AsyncBackend>;
Thin re-export + inherent new(cfg) constructor.
Doc-header documents the extension seam: 'adding GCS = impl 5 async fns'.
MODIFIED src/s3_cloud/keys.rs (66 → 40 LOC): compat shim — re-exports
validate_rel / short_hash / is_manifest_key from async_backend.
Old call-sites + 4 unit tests unchanged.
Deps: async-trait = 0.1 added under s3 feature; tokio now has
rt-multi-thread feature too.
FIXES N=2 Store footgun: prior impl created a current_thread Runtime
per instance — 2 instances in one process = 2 runtimes, block_on
panic if caller is on another runtime. Shared multi-thread runtime
via OnceLock means N instances all share 2 workers.
REAL VERIFICATION (agent-pasted):
cargo check -p kei-store: clean
cargo check -p kei-store --features s3: clean
cargo test -p kei-store --release: 10+9+0 = 19 passed
cargo test -p kei-store --features s3 --release: 38+9+6 = 53 passed
(+7 vs baseline 46)
Tests added (7):
async_backend::tests::shared_runtime_is_singleton
async_backend::tests::validate_rel_rejects_absolute
async_backend::tests::validate_rel_rejects_parent
async_backend::tests::short_hash_deterministic
async_backend::tests::is_manifest_key_matches_format
s3_cloud::tests::async_backend_shared_runtime_handles_two_store_instances
s3_cloud::tests::async_backend_runtime_is_multi_thread
Public API preserved: S3CloudStore::new / .branch / .current_branch /
.key / .backend_name. Factory + integration tests untouched.
Pre-existing: list_inner 38 LOC (moved verbatim from mod.rs, not
refactored per Core Rule 3).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
f33408f0d6
commit
bc7e099697
9 changed files with 418 additions and 222 deletions
|
|
@ -51,6 +51,13 @@ _primitives/_rust/target/release/kei-changelog \
|
|||
- added: `kei-mcp-server` single-binary compile for 5 platforms (linux/darwin/windows × x64/arm64 where available) via `bun build --compile` — v0.18 Phase 1 of the exobrain distribution architecture. Ships as bare binaries + `.sha256` sums on every GitHub release; `install.sh` detects a dropped binary at `_primitives/_rust/target/release/kei-mcp-server-<os>-<arch>` and skips bun/npm build. Opt-out via `KEI_SKIP_MCP_BUILD=1`. See `_ts_packages/packages/mcp-server/BUILD.md`.
|
||||
|
||||
### Changed
|
||||
- **primitives (v0.22 Track B — `kei-store` AsyncBackend trait + shared tokio runtime):**
|
||||
- New `async_backend` module (gated behind `s3` feature) — introduces an `AsyncBackend` sub-trait (4 async methods: `get`/`put`/`list`/`list_recursive` + `label`) and a generic `AsyncBackendStore<B: AsyncBackend>` wrapper that implements `MemoryStore` on top of any backend. Adding a new cloud backend (GCS, Azure Blob, Bunny Storage, …) is now 6 methods, not a re-invention of the sync-over-async bridge.
|
||||
- Shared process-global multi-thread tokio runtime via `OnceLock<Runtime>` — 2 worker threads, `enable_io + enable_time`. Replaces the previous per-instance `current_thread` runtime inside `S3CloudStore`, which caused a `block_on` panic when two `S3CloudStore` instances in one process interacted across threads (N=2-Store footgun).
|
||||
- `S3CloudStore` is now `pub type S3CloudStore = AsyncBackendStore<S3AsyncBackend>`. Public API (`S3CloudStore::new(cfg)`, `.branch()`, `.current_branch()`, `.key()`, `.backend_name()`) preserved. New `S3AsyncBackend` struct in `s3_cloud/backend.rs` holds the `aws-sdk-s3::Client` and the bucket name; the sync wrapper handles branch-prefix + path-validation + commit-manifest.
|
||||
- `validate_rel`, `short_hash`, `is_manifest_key` helpers moved from `s3_cloud/keys.rs` into `async_backend` (single source of truth for every future cloud backend). `s3_cloud/keys.rs` kept as a thin re-export shim so external callers and its unit tests keep working unchanged.
|
||||
- New deps under `s3` feature: `async-trait 0.1` + `tokio` feature `rt-multi-thread`. No change to the default-feature dep graph.
|
||||
- +7 tests (5 async_backend units + `two_store_instances` + `runtime_is_multi_thread`). Existing 46 tests (31 unit + 9 integration + 6 smoke) unchanged and green.
|
||||
- Placeholder: plugin / block format refresh targeted for v0.16.0.
|
||||
|
||||
### Security
|
||||
|
|
|
|||
12
_primitives/_rust/Cargo.lock
generated
12
_primitives/_rust/Cargo.lock
generated
|
|
@ -100,6 +100,17 @@ version = "1.0.102"
|
|||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c"
|
||||
|
||||
[[package]]
|
||||
name = "async-trait"
|
||||
version = "0.1.89"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "9035ad2d096bed7955a320ee7e2230574d28fd3c3a0f186cbea1ff3c7eed5dbb"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "atoi"
|
||||
version = "2.0.0"
|
||||
|
|
@ -1922,6 +1933,7 @@ name = "kei-store"
|
|||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"async-trait",
|
||||
"aws-config",
|
||||
"aws-credential-types",
|
||||
"aws-sdk-s3",
|
||||
|
|
|
|||
|
|
@ -17,7 +17,7 @@ path = "src/lib.rs"
|
|||
# (gated by KEI_STORE_ALLOW_S3_STUB=1). Users who actually need real S3 / R2 /
|
||||
# MinIO push opt into the heavier AWS SDK stack by enabling this feature.
|
||||
default = []
|
||||
s3 = ["dep:aws-config", "dep:aws-sdk-s3", "dep:aws-credential-types", "dep:tokio"]
|
||||
s3 = ["dep:aws-config", "dep:aws-sdk-s3", "dep:aws-credential-types", "dep:tokio", "dep:async-trait"]
|
||||
|
||||
[dependencies]
|
||||
clap = { workspace = true }
|
||||
|
|
@ -31,7 +31,8 @@ git2 = { version = "0.19", default-features = false }
|
|||
aws-config = { version = "1", default-features = false, features = ["behavior-version-latest", "rustls", "rt-tokio"], optional = true }
|
||||
aws-sdk-s3 = { version = "1", default-features = false, features = ["behavior-version-latest", "rustls", "rt-tokio"], optional = true }
|
||||
aws-credential-types = { version = "1", optional = true }
|
||||
tokio = { version = "1", features = ["rt", "macros"], optional = true }
|
||||
tokio = { version = "1", features = ["rt", "rt-multi-thread", "macros"], optional = true }
|
||||
async-trait = { version = "0.1", optional = true }
|
||||
|
||||
[dev-dependencies]
|
||||
tempfile = "3"
|
||||
|
|
|
|||
189
_primitives/_rust/kei-store/src/async_backend.rs
Normal file
189
_primitives/_rust/kei-store/src/async_backend.rs
Normal file
|
|
@ -0,0 +1,189 @@
|
|||
//! AsyncBackend — cloud-storage sub-trait + sync-over-async adapter.
|
||||
//!
|
||||
//! v0.22 Track B. Adding a new cloud backend (GCS, Azure, Bunny, …) =
|
||||
//! implement 4 async methods on `AsyncBackend` + `label()`. Runtime glue +
|
||||
//! branch-prefix + path-validation + commit-manifest are all free.
|
||||
//!
|
||||
//! Multi-thread shared runtime (2 workers, IO + time) avoids the N=2-Store
|
||||
//! footgun of the previous per-instance `current_thread` runtimes — two
|
||||
//! `AsyncBackendStore` instances in one process no longer risk `block_on`
|
||||
//! panics when one instance's call runs on the other's runtime thread.
|
||||
|
||||
use crate::store_trait::MemoryStore;
|
||||
use anyhow::{anyhow, bail, Context, Result};
|
||||
use async_trait::async_trait;
|
||||
use std::sync::{Mutex, OnceLock};
|
||||
use tokio::runtime::{Builder, Runtime};
|
||||
|
||||
pub const DEFAULT_BRANCH: &str = "main";
|
||||
|
||||
static SHARED_RT: OnceLock<Runtime> = OnceLock::new();
|
||||
|
||||
/// Process-global multi-thread tokio runtime.
|
||||
pub(crate) fn shared_runtime() -> &'static Runtime {
|
||||
SHARED_RT.get_or_init(|| {
|
||||
Builder::new_multi_thread()
|
||||
.worker_threads(2)
|
||||
.enable_io()
|
||||
.enable_time()
|
||||
.thread_name("kei-store-rt")
|
||||
.build()
|
||||
.expect("kei-store: failed to build shared tokio runtime")
|
||||
})
|
||||
}
|
||||
|
||||
/// Reject absolute paths and `..` components. Keeps branch prefix
|
||||
/// unescapable (CVE-class guard, same contract as `filesystem::safe_join`).
|
||||
pub fn validate_rel(rel: &str) -> Result<()> {
|
||||
if rel.starts_with('/') {
|
||||
bail!("path traversal rejected: absolute path {:?}", rel);
|
||||
}
|
||||
for part in rel.split('/') {
|
||||
if part == ".." {
|
||||
bail!("path traversal rejected: parent-dir component in {:?}", rel);
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Tiny DJB2 — deterministic, avoids pulling sha2 just for manifest names.
|
||||
pub fn short_hash(s: &str) -> String {
|
||||
let mut h: u64 = 5381;
|
||||
for b in s.bytes() {
|
||||
h = h.wrapping_mul(33).wrapping_add(b as u64);
|
||||
}
|
||||
format!("{:x}", h)
|
||||
}
|
||||
|
||||
/// `manifest-<hex>.json` — format produced by `commit()` below.
|
||||
pub fn is_manifest_key(k: &str) -> bool {
|
||||
k.starts_with("manifest-") && k.ends_with(".json")
|
||||
}
|
||||
|
||||
/// Cloud-storage backend trait. Impls deal with keys only.
|
||||
#[async_trait]
|
||||
pub trait AsyncBackend: Send + Sync {
|
||||
async fn get(&self, key: &str) -> Result<Vec<u8>>;
|
||||
async fn put(&self, key: &str, bytes: Vec<u8>) -> Result<()>;
|
||||
/// Single-level list — keys directly under `prefix`, no recursion.
|
||||
async fn list(&self, prefix: &str) -> Result<Vec<String>>;
|
||||
/// Full recursive list under `prefix`.
|
||||
async fn list_recursive(&self, prefix: &str) -> Result<Vec<String>>;
|
||||
/// Backend-specific label used by `MemoryStore::backend_name`.
|
||||
fn label(&self) -> &'static str;
|
||||
}
|
||||
|
||||
/// Sync wrapper: `MemoryStore` on top of any `AsyncBackend`.
|
||||
pub struct AsyncBackendStore<B: AsyncBackend> {
|
||||
backend: B,
|
||||
branch: Mutex<String>,
|
||||
}
|
||||
|
||||
impl<B: AsyncBackend> AsyncBackendStore<B> {
|
||||
/// Wrap an already-constructed backend. Renamed from `new` to avoid a
|
||||
/// multiple-`new` collision with specialised inherent impls on the
|
||||
/// `pub type XyzCloudStore = AsyncBackendStore<XyzAsyncBackend>` sugar.
|
||||
pub fn wrap(backend: B) -> Self {
|
||||
Self {
|
||||
backend,
|
||||
branch: Mutex::new(DEFAULT_BRANCH.to_string()),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn current_branch(&self) -> String {
|
||||
self.branch
|
||||
.lock()
|
||||
.map(|g| g.clone())
|
||||
.unwrap_or_else(|p| p.into_inner().clone())
|
||||
}
|
||||
|
||||
pub fn key(&self, rel: &str) -> Result<String> {
|
||||
validate_rel(rel)?;
|
||||
Ok(format!("{}/{}", self.current_branch(), rel))
|
||||
}
|
||||
|
||||
pub fn backend(&self) -> &B {
|
||||
&self.backend
|
||||
}
|
||||
}
|
||||
|
||||
impl<B: AsyncBackend> MemoryStore for AsyncBackendStore<B> {
|
||||
fn read(&self, path: &str) -> Result<Vec<u8>> {
|
||||
let key = self.key(path)?;
|
||||
shared_runtime().block_on(self.backend.get(&key))
|
||||
}
|
||||
|
||||
fn write(&self, path: &str, bytes: &[u8]) -> Result<()> {
|
||||
let key = self.key(path)?;
|
||||
shared_runtime().block_on(self.backend.put(&key, bytes.to_vec()))
|
||||
}
|
||||
|
||||
fn list(&self, dir: &str) -> Result<Vec<String>> {
|
||||
let raw = self.key(dir)?;
|
||||
let prefix = if raw.ends_with('/') { raw } else { format!("{raw}/") };
|
||||
shared_runtime().block_on(self.backend.list(&prefix))
|
||||
}
|
||||
|
||||
fn branch(&self, name: &str) -> Result<()> {
|
||||
validate_rel(name)?;
|
||||
let mut g = self.branch.lock().map_err(|_| anyhow!("branch lock poisoned"))?;
|
||||
*g = name.to_string();
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn commit(&self, message: &str) -> Result<String> {
|
||||
let branch_prefix = format!("{}/", self.current_branch());
|
||||
let all = shared_runtime()
|
||||
.block_on(self.backend.list_recursive(&branch_prefix))
|
||||
.with_context(|| format!("list_recursive for commit on {branch_prefix}"))?;
|
||||
let mut entries: Vec<String> = all.into_iter().filter(|k| !is_manifest_key(k)).collect();
|
||||
entries.sort();
|
||||
let manifest = serde_json::json!({
|
||||
"message": message,
|
||||
"branch": self.current_branch(),
|
||||
"entries": entries,
|
||||
})
|
||||
.to_string();
|
||||
let hash = short_hash(&manifest);
|
||||
self.write(&format!("manifest-{hash}.json"), manifest.as_bytes())?;
|
||||
Ok(hash)
|
||||
}
|
||||
|
||||
fn push(&self, _branch: &str) -> Result<()> { Ok(()) }
|
||||
fn pull(&self, _branch: &str) -> Result<()> { Ok(()) }
|
||||
fn backend_name(&self) -> &'static str { self.backend.label() }
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn shared_runtime_is_singleton() {
|
||||
let a = shared_runtime() as *const Runtime;
|
||||
let b = shared_runtime() as *const Runtime;
|
||||
assert_eq!(a, b);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validate_rel_rejects_absolute() {
|
||||
assert!(format!("{:#}", validate_rel("/etc/passwd").unwrap_err()).contains("absolute"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validate_rel_rejects_parent() {
|
||||
assert!(format!("{:#}", validate_rel("a/../b").unwrap_err()).contains("parent-dir"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn short_hash_deterministic() {
|
||||
assert_eq!(short_hash("abc"), short_hash("abc"));
|
||||
assert_ne!(short_hash("abc"), short_hash("abd"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn is_manifest_key_matches_format() {
|
||||
assert!(is_manifest_key("manifest-deadbeef.json"));
|
||||
assert!(!is_manifest_key("traces/a.jsonl"));
|
||||
}
|
||||
}
|
||||
|
|
@ -20,6 +20,12 @@ pub mod forgejo;
|
|||
pub mod gitea;
|
||||
pub mod github;
|
||||
pub mod s3;
|
||||
/// Cloud-backend sub-trait + shared tokio runtime + sync-over-async adapter.
|
||||
/// Extension seam for future GCS / Azure Blob / Bunny backends (v0.22+).
|
||||
/// Gated behind `s3` for now — promoted to default once a second cloud
|
||||
/// backend exists.
|
||||
#[cfg(feature = "s3")]
|
||||
pub mod async_backend;
|
||||
#[cfg(feature = "s3")]
|
||||
pub mod s3_cloud;
|
||||
pub mod store_trait;
|
||||
|
|
|
|||
120
_primitives/_rust/kei-store/src/s3_cloud/backend.rs
Normal file
120
_primitives/_rust/kei-store/src/s3_cloud/backend.rs
Normal file
|
|
@ -0,0 +1,120 @@
|
|||
//! S3AsyncBackend — `AsyncBackend` impl over `aws-sdk-s3::Client`.
|
||||
//!
|
||||
//! v0.22 Track B. Holds only the S3-specific pieces: the `aws-sdk-s3`
|
||||
//! client, the bucket name, and the ListObjectsV2 paginator. Branch-prefix
|
||||
//! + path-safety + commit-manifest semantics live in
|
||||
//! `crate::async_backend::AsyncBackendStore<S3AsyncBackend>`.
|
||||
|
||||
use super::client;
|
||||
use crate::async_backend::AsyncBackend;
|
||||
use crate::config::S3Cfg;
|
||||
use anyhow::{anyhow, Context, Result};
|
||||
use async_trait::async_trait;
|
||||
use aws_sdk_s3::primitives::ByteStream;
|
||||
use aws_sdk_s3::Client;
|
||||
|
||||
pub struct S3AsyncBackend {
|
||||
client: Client,
|
||||
bucket: String,
|
||||
}
|
||||
|
||||
impl S3AsyncBackend {
|
||||
/// Build the backend. Requires `cfg.bucket` to be set.
|
||||
///
|
||||
/// This is async because `aws_config::load()` is async; the thin
|
||||
/// `S3CloudStore::new` wrapper in `mod.rs` drives it via the shared
|
||||
/// runtime so callers keep the sync signature they already have.
|
||||
pub async fn new(cfg: S3Cfg) -> Result<Self> {
|
||||
let bucket = cfg
|
||||
.bucket
|
||||
.clone()
|
||||
.ok_or_else(|| anyhow!("s3 backend requires s3.bucket in config"))?;
|
||||
let client = client::build_client(&cfg).await?;
|
||||
Ok(Self { client, bucket })
|
||||
}
|
||||
|
||||
/// Shared ListObjectsV2 paginator. `delim_slash=true` → delimiter="/"
|
||||
/// (single-level). `false` → recursive over every key under prefix.
|
||||
async fn list_inner(&self, prefix: &str, delim_slash: bool) -> Result<Vec<String>> {
|
||||
let mut out = Vec::new();
|
||||
let mut token: Option<String> = None;
|
||||
let tag = if delim_slash { "s3 list" } else { "s3 list-recursive" };
|
||||
loop {
|
||||
let mut req = self
|
||||
.client
|
||||
.list_objects_v2()
|
||||
.bucket(&self.bucket)
|
||||
.prefix(prefix);
|
||||
if delim_slash {
|
||||
req = req.delimiter("/");
|
||||
}
|
||||
if let Some(t) = token.as_ref() {
|
||||
req = req.continuation_token(t);
|
||||
}
|
||||
let resp = req
|
||||
.send()
|
||||
.await
|
||||
.with_context(|| format!("{tag} {prefix}"))?;
|
||||
for obj in resp.contents() {
|
||||
if let Some(k) = obj.key() {
|
||||
if let Some(name) = k.strip_prefix(prefix) {
|
||||
if !name.is_empty() {
|
||||
out.push(name.to_string());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if resp.is_truncated().unwrap_or(false) {
|
||||
token = resp.next_continuation_token().map(|s| s.to_string());
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
out.sort();
|
||||
Ok(out)
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl AsyncBackend for S3AsyncBackend {
|
||||
async fn get(&self, key: &str) -> Result<Vec<u8>> {
|
||||
let resp = self
|
||||
.client
|
||||
.get_object()
|
||||
.bucket(&self.bucket)
|
||||
.key(key)
|
||||
.send()
|
||||
.await
|
||||
.with_context(|| format!("s3 get_object {key}"))?;
|
||||
let body = resp
|
||||
.body
|
||||
.collect()
|
||||
.await
|
||||
.with_context(|| format!("s3 read body {key}"))?;
|
||||
Ok(body.into_bytes().to_vec())
|
||||
}
|
||||
|
||||
async fn put(&self, key: &str, bytes: Vec<u8>) -> Result<()> {
|
||||
self.client
|
||||
.put_object()
|
||||
.bucket(&self.bucket)
|
||||
.key(key)
|
||||
.body(ByteStream::from(bytes))
|
||||
.send()
|
||||
.await
|
||||
.with_context(|| format!("s3 put_object {key}"))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn list(&self, prefix: &str) -> Result<Vec<String>> {
|
||||
self.list_inner(prefix, true).await
|
||||
}
|
||||
|
||||
async fn list_recursive(&self, prefix: &str) -> Result<Vec<String>> {
|
||||
self.list_inner(prefix, false).await
|
||||
}
|
||||
|
||||
fn label(&self) -> &'static str {
|
||||
"s3-cloud"
|
||||
}
|
||||
}
|
||||
|
|
@ -1,38 +1,12 @@
|
|||
//! Key-path helpers for the S3 cloud backend.
|
||||
//!
|
||||
//! `validate_rel` is the CVE-class guard: rejects absolute paths and
|
||||
//! `..` components before they are spliced into an S3 key. Same pattern as
|
||||
//! `filesystem::safe_join` — keeps the per-branch prefix unescapable.
|
||||
//! v0.22 Track B: the actual logic moved one level up into
|
||||
//! `crate::async_backend` (shared by every future cloud backend — GCS,
|
||||
//! Azure Blob, Bunny, etc.). This module now re-exports the helpers for
|
||||
//! backward source-compat and keeps the unit tests green.
|
||||
|
||||
use anyhow::{bail, Result};
|
||||
|
||||
/// Reject absolute paths and any `..` component in the caller's rel path.
|
||||
pub fn validate_rel(rel: &str) -> Result<()> {
|
||||
if rel.starts_with('/') {
|
||||
bail!("path traversal rejected: absolute path {:?}", rel);
|
||||
}
|
||||
for part in rel.split('/') {
|
||||
if part == ".." {
|
||||
bail!("path traversal rejected: parent-dir component in {:?}", rel);
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Tiny DJB2 — deterministic, avoids pulling sha2 just for manifest naming.
|
||||
pub fn short_hash(s: &str) -> String {
|
||||
let mut h: u64 = 5381;
|
||||
for b in s.bytes() {
|
||||
h = h.wrapping_mul(33).wrapping_add(b as u64);
|
||||
}
|
||||
format!("{:x}", h)
|
||||
}
|
||||
|
||||
/// Does this key look like a prior-commit manifest blob?
|
||||
/// Format: `manifest-<hex>.json` (see `commit()` in mod.rs).
|
||||
pub fn is_manifest_key(k: &str) -> bool {
|
||||
k.starts_with("manifest-") && k.ends_with(".json")
|
||||
}
|
||||
#[allow(unused_imports)]
|
||||
pub use crate::async_backend::{is_manifest_key, short_hash, validate_rel};
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
|
|
|
|||
|
|
@ -1,198 +1,41 @@
|
|||
//! S3CloudStore — real object-storage backend via `aws-sdk-s3`.
|
||||
//! Behind `#[cfg(feature = "s3")]`; ~5 MB release binary growth
|
||||
//! [estimate, E5 — not yet benchmarked].
|
||||
//! Credentials via AWS default chain OR explicit `access_key_env` +
|
||||
//! `secret_key_env` (v0.21.1 HIGH-2). Endpoint override for
|
||||
//! R2 / MinIO / Wasabi via `KEI_STORE_S3_ENDPOINT` env var or
|
||||
//! `s3.endpoint` in TOML. "Branch" = key prefix (`<branch>/<path>`);
|
||||
//! default `main`. Sync `MemoryStore` trait bridged over async SDK
|
||||
//! via a single current-thread tokio runtime (`block_on` per call).
|
||||
//!
|
||||
//! v0.22 Track B refactor: this module now contains ONLY the S3-specific
|
||||
//! construction and re-exports. The sync-over-async runtime bridge,
|
||||
//! branch-prefix handling, path validation, and commit-manifest semantics
|
||||
//! all live in `crate::async_backend::AsyncBackendStore`, which is a
|
||||
//! generic wrapper over any `AsyncBackend` impl.
|
||||
//!
|
||||
//! Extension seam: to add a new cloud backend (GCS, Azure Blob, Bunny),
|
||||
//!
|
||||
//! 1. Create `src/gcs_cloud/backend.rs` with a struct that impls
|
||||
//! `crate::async_backend::AsyncBackend` (4 async methods + `label`).
|
||||
//! 2. Add `pub type GcsCloudStore = AsyncBackendStore<GcsAsyncBackend>;`.
|
||||
//! 3. Wire a constructor that calls `AsyncBackendStore::wrap(backend)`.
|
||||
//! 4. `factory::build_store` dispatches on `cfg.active.backend`.
|
||||
//!
|
||||
//! The shared tokio runtime + MemoryStore impl are free.
|
||||
|
||||
mod backend;
|
||||
mod client;
|
||||
mod keys;
|
||||
|
||||
pub use backend::S3AsyncBackend;
|
||||
|
||||
use crate::async_backend::{shared_runtime, AsyncBackendStore};
|
||||
use crate::config::S3Cfg;
|
||||
use crate::store_trait::MemoryStore;
|
||||
use anyhow::{anyhow, Context, Result};
|
||||
use aws_sdk_s3::primitives::ByteStream;
|
||||
use aws_sdk_s3::Client;
|
||||
use std::sync::Mutex;
|
||||
use tokio::runtime::{Builder, Runtime};
|
||||
use anyhow::Result;
|
||||
|
||||
const DEFAULT_BRANCH: &str = "main";
|
||||
|
||||
pub struct S3CloudStore {
|
||||
client: Client,
|
||||
bucket: String,
|
||||
rt: Runtime,
|
||||
branch: Mutex<String>,
|
||||
}
|
||||
/// Public API: unchanged from v0.21 — `S3CloudStore::new(cfg)` still works.
|
||||
/// Internally it is `AsyncBackendStore<S3AsyncBackend>`, which solves the
|
||||
/// N=2-Store runtime footgun of the previous per-instance design.
|
||||
pub type S3CloudStore = AsyncBackendStore<S3AsyncBackend>;
|
||||
|
||||
impl S3CloudStore {
|
||||
/// Build a cloud-S3 backend. `bucket` MUST be configured.
|
||||
pub fn new(cfg: S3Cfg) -> Result<Self> {
|
||||
let bucket = cfg
|
||||
.bucket
|
||||
.clone()
|
||||
.ok_or_else(|| anyhow!("s3 backend requires s3.bucket in config"))?;
|
||||
let rt = Builder::new_current_thread()
|
||||
.enable_all()
|
||||
.build()
|
||||
.context("init tokio runtime")?;
|
||||
let client = rt.block_on(client::build_client(&cfg))?;
|
||||
Ok(Self {
|
||||
client,
|
||||
bucket,
|
||||
rt,
|
||||
branch: Mutex::new(DEFAULT_BRANCH.to_string()),
|
||||
})
|
||||
}
|
||||
|
||||
fn current_branch(&self) -> String {
|
||||
self.branch
|
||||
.lock()
|
||||
.map(|g| g.clone())
|
||||
.unwrap_or_else(|poison| poison.into_inner().clone())
|
||||
}
|
||||
|
||||
fn key(&self, rel: &str) -> Result<String> {
|
||||
keys::validate_rel(rel)?;
|
||||
Ok(format!("{}/{}", self.current_branch(), rel))
|
||||
}
|
||||
|
||||
async fn get(&self, key: &str) -> Result<Vec<u8>> {
|
||||
let resp = self.client.get_object().bucket(&self.bucket).key(key)
|
||||
.send().await.with_context(|| format!("s3 get_object {key}"))?;
|
||||
let body = resp.body.collect().await
|
||||
.with_context(|| format!("s3 read body {key}"))?;
|
||||
Ok(body.into_bytes().to_vec())
|
||||
}
|
||||
|
||||
async fn put(&self, key: &str, bytes: Vec<u8>) -> Result<()> {
|
||||
self.client.put_object().bucket(&self.bucket).key(key)
|
||||
.body(ByteStream::from(bytes))
|
||||
.send().await.with_context(|| format!("s3 put_object {key}"))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Shared ListObjectsV2 paginator. `delim_slash=true` → delimiter="/"
|
||||
/// (single-level). `false` → recursive over every key under prefix.
|
||||
async fn list_inner(&self, prefix: &str, delim_slash: bool) -> Result<Vec<String>> {
|
||||
let mut out = Vec::new();
|
||||
let mut token: Option<String> = None;
|
||||
let tag = if delim_slash { "s3 list" } else { "s3 list-recursive" };
|
||||
loop {
|
||||
let mut req = self
|
||||
.client
|
||||
.list_objects_v2()
|
||||
.bucket(&self.bucket)
|
||||
.prefix(prefix);
|
||||
if delim_slash {
|
||||
req = req.delimiter("/");
|
||||
}
|
||||
if let Some(t) = token.as_ref() {
|
||||
req = req.continuation_token(t);
|
||||
}
|
||||
let resp = req
|
||||
.send()
|
||||
.await
|
||||
.with_context(|| format!("{tag} {prefix}"))?;
|
||||
for obj in resp.contents() {
|
||||
if let Some(k) = obj.key() {
|
||||
if let Some(name) = k.strip_prefix(prefix) {
|
||||
if !name.is_empty() {
|
||||
out.push(name.to_string());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if resp.is_truncated().unwrap_or(false) {
|
||||
token = resp.next_continuation_token().map(|s| s.to_string());
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
out.sort();
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
async fn list_prefix(&self, prefix: &str) -> Result<Vec<String>> {
|
||||
self.list_inner(prefix, true).await
|
||||
}
|
||||
|
||||
/// Recursive list — NO `delimiter` — every key under the prefix is
|
||||
/// returned, including nested paths that `list_prefix` would collapse
|
||||
/// into `CommonPrefixes`. Used by `commit()` for manifest hashing;
|
||||
/// `list("")` (delimiter="/") hid nested writes from prior commits
|
||||
/// (critic HIGH-1, v0.21.1).
|
||||
pub async fn list_recursive(&self, prefix: &str) -> Result<Vec<String>> {
|
||||
self.list_inner(prefix, false).await
|
||||
}
|
||||
}
|
||||
|
||||
impl MemoryStore for S3CloudStore {
|
||||
fn read(&self, path: &str) -> Result<Vec<u8>> {
|
||||
let key = self.key(path)?;
|
||||
self.rt.block_on(self.get(&key))
|
||||
}
|
||||
|
||||
fn write(&self, path: &str, bytes: &[u8]) -> Result<()> {
|
||||
let key = self.key(path)?;
|
||||
self.rt.block_on(self.put(&key, bytes.to_vec()))
|
||||
}
|
||||
|
||||
fn list(&self, dir: &str) -> Result<Vec<String>> {
|
||||
let raw = self.key(dir)?;
|
||||
let prefix = if raw.ends_with('/') {
|
||||
raw
|
||||
} else {
|
||||
format!("{raw}/")
|
||||
};
|
||||
self.rt.block_on(self.list_prefix(&prefix))
|
||||
}
|
||||
|
||||
fn branch(&self, name: &str) -> Result<()> {
|
||||
keys::validate_rel(name)?;
|
||||
let mut g = self
|
||||
.branch
|
||||
.lock()
|
||||
.map_err(|_| anyhow!("branch lock poisoned"))?;
|
||||
*g = name.to_string();
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn commit(&self, message: &str) -> Result<String> {
|
||||
// v0.21.1: recursive list + skip prior manifests for hash stability.
|
||||
// See `list_recursive` doc-comment for the audit context.
|
||||
let branch_prefix = format!("{}/", self.current_branch());
|
||||
let all = self.rt.block_on(self.list_recursive(&branch_prefix))?;
|
||||
let mut entries: Vec<String> = all
|
||||
.into_iter()
|
||||
.filter(|k| !keys::is_manifest_key(k))
|
||||
.collect();
|
||||
entries.sort();
|
||||
let manifest = serde_json::json!({
|
||||
"message": message,
|
||||
"branch": self.current_branch(),
|
||||
"entries": entries,
|
||||
})
|
||||
.to_string();
|
||||
let hash = keys::short_hash(&manifest);
|
||||
self.write(&format!("manifest-{hash}.json"), manifest.as_bytes())?;
|
||||
Ok(hash)
|
||||
}
|
||||
|
||||
fn push(&self, _branch: &str) -> Result<()> {
|
||||
// Every write() already persists to S3 — no staging to flush.
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn pull(&self, _branch: &str) -> Result<()> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn backend_name(&self) -> &'static str {
|
||||
"s3-cloud"
|
||||
let backend = shared_runtime().block_on(S3AsyncBackend::new(cfg))?;
|
||||
Ok(AsyncBackendStore::wrap(backend))
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@
|
|||
|
||||
use super::client::{effective_endpoint, resolve_explicit_creds, validate_endpoint};
|
||||
use super::*;
|
||||
use crate::store_trait::MemoryStore;
|
||||
use crate::test_env::env_lock;
|
||||
|
||||
fn cfg(endpoint: &str) -> S3Cfg {
|
||||
|
|
@ -253,10 +254,53 @@ fn rejects_empty_resolved_creds() {
|
|||
// commit() recursive-list fix (HIGH-1) — compile-smoke only.
|
||||
// ----------------------------------------------------------------------
|
||||
|
||||
/// Compile-time assertion that `list_recursive` exists on S3CloudStore
|
||||
/// with an async signature returning `Result<Vec<String>>`. If someone
|
||||
/// removes / renames it, this test fails to compile (tight contract).
|
||||
/// Compile-time assertion that `list_recursive` exists on the underlying
|
||||
/// AsyncBackend with an async signature returning `Result<Vec<String>>`.
|
||||
/// v0.22 Track B: the method moved off `S3CloudStore` onto `S3AsyncBackend`
|
||||
/// (via the `AsyncBackend` trait); we reach it through `.backend()`.
|
||||
#[allow(dead_code)]
|
||||
async fn _compile_smoke_list_recursive_exists(store: &S3CloudStore, prefix: &str) {
|
||||
let _out: Vec<String> = store.list_recursive(prefix).await.unwrap();
|
||||
use crate::async_backend::AsyncBackend;
|
||||
let _out: Vec<String> = store.backend().list_recursive(prefix).await.unwrap();
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------
|
||||
// v0.22 Track B — shared runtime across multiple Store instances.
|
||||
// ----------------------------------------------------------------------
|
||||
|
||||
/// Previously, each `S3CloudStore` built its own `current_thread` tokio
|
||||
/// runtime. If a single process held two instances (e.g. migrate A→B), a
|
||||
/// `block_on` call from one runtime's thread that tried to use the other
|
||||
/// instance's runtime would panic. With the shared multi-thread runtime,
|
||||
/// both instances should coexist fine.
|
||||
#[test]
|
||||
fn async_backend_shared_runtime_handles_two_store_instances() {
|
||||
let _g = with_local_env();
|
||||
let a = S3CloudStore::new(cfg("http://127.0.0.1:9999")).expect("first store");
|
||||
let b = S3CloudStore::new(cfg("http://127.0.0.1:9999")).expect("second store");
|
||||
clear_local_env();
|
||||
assert_eq!(a.backend_name(), "s3-cloud");
|
||||
assert_eq!(b.backend_name(), "s3-cloud");
|
||||
// Each instance keeps its own branch; the runtime is shared.
|
||||
a.branch("branch-a").unwrap();
|
||||
b.branch("branch-b").unwrap();
|
||||
assert_eq!(a.current_branch(), "branch-a");
|
||||
assert_eq!(b.current_branch(), "branch-b");
|
||||
}
|
||||
|
||||
/// The shared runtime is multi-thread (needed for the N=2-Store fix).
|
||||
/// Verify via `tokio::runtime::Handle::current` from inside a spawned
|
||||
/// task — `current_thread` runtimes have `num_workers == 1`, multi-thread
|
||||
/// runtimes report >1.
|
||||
#[test]
|
||||
fn async_backend_runtime_is_multi_thread() {
|
||||
use crate::async_backend::shared_runtime;
|
||||
let workers = shared_runtime().block_on(async {
|
||||
let h = tokio::runtime::Handle::current();
|
||||
h.metrics().num_workers()
|
||||
});
|
||||
assert!(
|
||||
workers >= 2,
|
||||
"shared runtime must have >=2 workers, got {workers}"
|
||||
);
|
||||
}
|
||||
|
|
|
|||
Loading…
Reference in a new issue