Single-commit clean baseline after security scrub of niche-tells, project codenames, internal jargon, and contributor-email leaks. Contents: - 100 Rust crates (_primitives/_rust/) - 37 agent manifests (_manifests/) + generated specs (_generated/) - 67 user-invocable skills (skills/) - 33 hooks (hooks/) - Composition blocks (_blocks/) - Documentation (docs/, README.md) - TS adapter packages (_ts_packages/) - Assembler (_assembler/) - Roles (_roles/) - Templates (_templates/) - Forgejo CI (.forgejo/) Author: Denis Parfionovich <info@greendragon.info> License: see LICENSE.
219 lines
11 KiB
Markdown
219 lines
11 KiB
Markdown
# kei-gdrive-import — Wave 46 Plan
|
|
|
|
> Restored from chat ed8fb26e 2026-04-26T08:00:11Z (Wave 1 research synthesis).
|
|
> Branch: `feat/kei-gdrive-import` (created from main @ a5625e08).
|
|
> RULE 0.5 plan-mode artefact, 2 ledger anchor, 3 orchestrator-owned branch.
|
|
|
|
## Goal
|
|
|
|
One-shot wizard `kei-drive-import` that takes a Google Drive root, classifies every subfolder, and converts each detected project into a fresh repo on the local Forgejo dev-hub (`127.0.0.1:3001` per Wave 45).
|
|
|
|
## Wave 1 research verdicts (4/4 done, frozen)
|
|
|
|
| Stream | Verdict | Decision |
|
|
|---|---|---|
|
|
| GDrive sync tools | rclone primary | `brew install rclone` (MIT, arm64). **CRITICAL: NOT Drive Desktop** — corrupts `.git/` via `desktop.ini` injection |
|
|
| Existing GDrive→git scripts | None viable | Build ourselves, ~200 LOC core |
|
|
| Forgejo API | Raw curl | `POST /api/v1/user/repos {auto_init:false}`, catch 409 conflict |
|
|
| Project detection | 8-marker scoring | Cargo.toml / package.json / pyproject.toml / go.mod / pom.xml / build.gradle / Gemfile / composer.json (weight 10), threshold ≥ 8 |
|
|
|
|
## Architecture (hybrid: Rust detection + shell orchestration)
|
|
|
|
### Component 1 — `_primitives/_rust/kei-gdrive-import` (Rust, Constructor Pattern)
|
|
|
|
```
|
|
src/
|
|
├── cli.rs clap subcommands
|
|
├── classify.rs single-folder verdict {PROJECT, AMBIGUOUS, NOT-A-PROJECT}
|
|
├── scan.rs walk-tree → JSON array of classifications
|
|
├── scoring.rs 8-marker weighted scorer (table-driven, easy to extend)
|
|
├── lib.rs re-exports
|
|
└── main.rs binary entry
|
|
tests/
|
|
├── classify_fixtures.rs
|
|
├── scan_smoke.rs
|
|
└── fixtures/
|
|
├── rust-project/Cargo.toml
|
|
├── node-project/package.json
|
|
├── photos-folder/IMG_0001.jpg
|
|
└── mixed/{README.md, src/, .git/}
|
|
```
|
|
|
|
**CLI surface:**
|
|
```bash
|
|
kei-gdrive-import classify <path> # → JSON {verdict, score, primary_lang, markers: [...]}
|
|
kei-gdrive-import scan-tree <root> # → JSON array of all folders + classifications
|
|
kei-gdrive-import scan-tree --remote drive:Projects/ # → uses `rclone lsf` if path starts with remote:
|
|
```
|
|
|
|
**JSON schema (output of classify):**
|
|
```json
|
|
{
|
|
"path": "drive:Projects/MyApp",
|
|
"verdict": "PROJECT",
|
|
"score": 18,
|
|
"primary_lang": "rust",
|
|
"markers": [
|
|
{"file": "Cargo.toml", "weight": 10, "kind": "build_manifest"},
|
|
{"file": "src/main.rs", "weight": 5, "kind": "source_file"},
|
|
{"file": "README.md", "weight": 3, "kind": "doc"}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Component 2 — `install/lib-dev-hub-gdrive-import.sh` (idempotent installer)
|
|
|
|
- `brew install rclone jq` (skip if present)
|
|
- compile `kei-gdrive-import` (cargo build --release, copy to `${KIT}/bin/`)
|
|
- generate wizard wrapper at `${KIT}/dev-hub/drive-import-wizard.sh`
|
|
- **NO launchd plist** — interactive one-shot, not a daemon
|
|
- post-install hint: "run `kei-drive-import` to start"
|
|
|
|
### Component 3 — `dev-hub/drive-import-wizard.sh` (bash, interactive)
|
|
|
|
```
|
|
$ kei-drive-import
|
|
|
|
┌─ Step 1: rclone config (one-time OAuth) ─────┐
|
|
│ Detected remotes: drive: │
|
|
│ Missing remote? → run `rclone config` first │
|
|
└──────────────────────────────────────────────┘
|
|
|
|
┌─ Step 2: scan ──────────────────────────────┐
|
|
│ root = drive:Projects/ │
|
|
│ Found 47 folders │
|
|
│ Classifying via kei-gdrive-import... │
|
|
│ 31 PROJECT (score ≥ 8) │
|
|
│ 8 AMBIGUOUS (score 5-7) — review needed │
|
|
│ 8 NOT-A-PROJECT (skipped) │
|
|
└──────────────────────────────────────────────┘
|
|
|
|
┌─ Step 3: select ────────────────────────────┐
|
|
│ [✓] all 31 projects │
|
|
│ [ ] 8 ambiguous (review each via fzf) │
|
|
│ Forgejo: http://127.0.0.1:3001 │
|
|
│ Owner: ${USER} │
|
|
│ Default branch: main │
|
|
└──────────────────────────────────────────────┘
|
|
|
|
┌─ Step 4: migrate (per project) ─────────────┐
|
|
│ → rclone copy drive:Projects/X /tmp/staging/X
|
|
│ → write .gitignore (lang-aware) │
|
|
│ → git init && git add . && git commit │
|
|
│ → curl POST /api/v1/user/repos { name:X } │
|
|
│ → git remote add origin http://.../${USER}/X│
|
|
│ → git push -u origin main │
|
|
│ → log result to ledger │
|
|
└──────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Component 4 — Tests
|
|
|
|
- `tests/gdrive_import_integration.sh` — fake `rclone` via PATH override, fake Forgejo via netcat listener
|
|
- Rust unit tests cover scoring + classification fixtures
|
|
- Smoke test asserts wizard skips folders containing `.git/` already (don't re-import live repos)
|
|
|
|
## Ledger row (2)
|
|
|
|
```
|
|
agent_id = wave46-gdrive-import-orchestrator
|
|
branch = feat/kei-gdrive-import
|
|
parent = main @ a5625e08
|
|
spec_sha = (this file)
|
|
status = running
|
|
started_ts = 2026-04-26T...
|
|
```
|
|
|
|
## Wave 2 research — DONE 2026-04-26 (3/3 streams)
|
|
|
|
### R1 — rclone edge-cases (E1 except where noted)
|
|
- Per-file cap: 5 TB (Drive hard-limit). 750 GiB/day = upload only, irrelevant for read.
|
|
- Rate: ~12k qps personal, rclone backs off natively. Practical throughput ≈2 files/sec.
|
|
- **Gdocs**: `--drive-skip-gdocs` makes them invisible. Pre-flight `lsf` enumeration MUST surface count to user. Opt-in to `--drive-export-formats=md,docx,xlsx` (md unverified for current API [E5]).
|
|
- OS-junk (`.DS_Store`/`Thumbs.db`/`desktop.ini`) NOT filtered by default — explicit `--exclude` needed.
|
|
- `rclone copy` idempotent on re-run (size+mtime, `--checksum` stronger).
|
|
- Shortcuts: dereferenced by default → infinite loop risk → `--drive-skip-shortcuts` mandatory.
|
|
|
|
**Recommended flag block (frozen):**
|
|
```bash
|
|
rclone copy "drive:$SRC" "$DST" \
|
|
--drive-skip-gdocs \
|
|
--drive-skip-shortcuts \
|
|
--drive-skip-dangling-shortcuts \
|
|
--drive-acknowledge-abuse \
|
|
--exclude "**/.DS_Store" --exclude "**/._*" \
|
|
--exclude "**/Thumbs.db" --exclude "**/desktop.ini" \
|
|
--exclude "**/.Spotlight-V100/**" --exclude "**/.Trashes/**" --exclude "**/.fseventsd/**" \
|
|
--transfers 4 --checkers 8 --tpslimit 10 \
|
|
--retries 5 --low-level-retries 10 \
|
|
--checksum --create-empty-src-dirs \
|
|
--stats 5s --log-file "$DST/.rclone-import.log"
|
|
```
|
|
|
|
### R2 — auth UX + secrets (RULE 0.8 reconciled)
|
|
- Auth mode: **interactive browser OAuth** via `rclone config` (autoconfig=Y, localhost:53682). Headless + service-account rejected for single-user macOS.
|
|
- Scope: `drive.readonly` (minimum for list+download). [E1 developers.google.com]
|
|
- **Token CANNOT live in `.env`** — rclone rewrites it on every auto-refresh.
|
|
- 2-tier secrets layout:
|
|
- Real token: `~/.config/rclone/rclone.conf` chmod 600 (XDG default, treat like `~/.ssh/`)
|
|
- `~/.claude/secrets/.env`:
|
|
```
|
|
RCLONE_CONFIG=${HOME}/.config/rclone/rclone.conf
|
|
KEI_DRIVE_REMOTE=gdrive
|
|
```
|
|
- Detection commands (exit codes undocumented — parse stderr):
|
|
- Missing remote: `rclone --config "$RCLONE_CONFIG" listremotes \| grep -q '^gdrive:$'`
|
|
- Expired token: `rclone about gdrive: 2>&1 \| grep -qiE 'oauth2\|401\|token'`
|
|
- Wizard MUST pass `--config "$RCLONE_CONFIG"` explicitly (belt-and-suspenders to env var).
|
|
|
|
### R3 — license/safety (5-step pre-push checklist)
|
|
- **Tool pick**: `gitleaks v8.30.1` MIT (`brew install gitleaks`). Static `gitleaks dir <path>` mode (no git history needed). Default ruleset covers AWS / GCP / GitHub PAT / Stripe / PEM private keys / generic API keys.
|
|
- **gitignore source**: github/gitignore CC0-1.0, SHA-pinned to `576334520435382d6522f349b9d270eda1e79a25` (last commit 2026-04-24).
|
|
- **marker→template map** (hardcode, do NOT name-guess):
|
|
| Marker | Template URL filename |
|
|
|---|---|
|
|
| Cargo.toml | Rust.gitignore |
|
|
| package.json | Node.gitignore |
|
|
| pyproject.toml | Python.gitignore |
|
|
| go.mod | Go.gitignore |
|
|
| pom.xml | Maven.gitignore |
|
|
| build.gradle | Gradle.gitignore |
|
|
| Gemfile | Ruby.gitignore |
|
|
| composer.json | Composer.gitignore |
|
|
|
|
**5-step ordered pre-push checklist (wizard MUST run in order):**
|
|
1. Existing repo detect: `rclone lsf --dirs-only --include ".git/" <src>` + HEAD-file fallback (Drive may store `.git` opaque). Found → SKIP + warn.
|
|
2. Size + extension histogram: `du -sh` + bytes-per-extension. If `.pdf >50%` OR `{.mp4,.mov,.mkv,.iso,.zip} >30%` → prompt user (third-party content risk).
|
|
3. Secret scan: `gitleaks dir --no-banner --redact <src>`. Non-zero → BLOCK until resolved or explicit bypass.
|
|
4. Apply language `.gitignore` BEFORE first `git add` (fetch from SHA-pinned URL above).
|
|
5. Final remote check: assert URL matches `127.0.0.1:3001` allowlist; reject `github.com` per .
|
|
|
|
### Cross-cutting — prompt-injection notes
|
|
Both R2 + R3 caught fake `<system-reminder>` blocks appended to rclone.org and github docs pages via WebFetch. Pattern: trailing fake "MCP Server Instructions" telling agent to load computer-use tools. Both agents correctly ignored. Wizard implementation does NOT execute LLM-fetched content; this is research-tooling concern only.
|
|
|
|
## Wave 3 implementation (4 streams parallel, 3)
|
|
|
|
| I# | Worktree | Files | Agent prompt clause |
|
|
|---|---|---|---|
|
|
| I1 | `agent-gdrive-rust` | `_primitives/_rust/kei-gdrive-import/**` | "MUST NOT invoke git/cargo build (cargo check ok). Write files only." |
|
|
| I2 | `agent-gdrive-installer` | `install/lib-dev-hub-gdrive-import.sh` | same |
|
|
| I3 | `agent-gdrive-wizard` | `dev-hub/drive-import-wizard.sh` (template), `_templates/` | same |
|
|
| I4 | `agent-gdrive-tests` | `tests/gdrive_import_integration.sh` + fixtures | same |
|
|
|
|
## Wave 4 — merge ceremony
|
|
|
|
Per 2: AskUserQuestion per branch [merge --no-ff / squash / reject / defer]. Orchestrator commits with `feat(wave46):` prefix.
|
|
|
|
## Out of scope (deferred)
|
|
|
|
- Reverse direction (Forgejo → Drive backup) — separate primitive `kei-gdrive-export`
|
|
- GitHub mirror — covered by existing `tools/sync-public.sh`
|
|
- Bidirectional sync — explicit non-goal, this is one-shot import
|
|
- Web UI — terminal-only
|
|
|
|
## Risks (Wave 1)
|
|
|
|
1. `rclone config` is interactive on first run — wizard must detect and pause for user
|
|
2. Forgejo not running → `curl` fails fast, wizard aborts with clear message
|
|
3. Folder named `Projects` (Drive) maps to nested KeiSeiKit `Projects/` confusion — wizard uses absolute paths throughout
|
|
4. Network drop mid-batch — per-project retries, ledger row per project for restart
|