KeiSeiKit-1.0/tasks/kei-gdrive-import/PLAN.md
Parfii-bot a4e667de10 KeiSeiKit-public — clean state
Single-commit clean baseline after security scrub of niche-tells,
project codenames, internal jargon, and contributor-email leaks.

Contents:
- 100 Rust crates (_primitives/_rust/)
- 37 agent manifests (_manifests/) + generated specs (_generated/)
- 67 user-invocable skills (skills/)
- 33 hooks (hooks/)
- Composition blocks (_blocks/)
- Documentation (docs/, README.md)
- TS adapter packages (_ts_packages/)
- Assembler (_assembler/)
- Roles (_roles/)
- Templates (_templates/)
- Forgejo CI (.forgejo/)

Author: Denis Parfionovich <info@greendragon.info>

License: see LICENSE.
2026-05-01 12:09:03 +08:00

11 KiB

kei-gdrive-import — Wave 46 Plan

Restored from chat ed8fb26e 2026-04-26T08:00:11Z (Wave 1 research synthesis). Branch: feat/kei-gdrive-import (created from main @ a5625e08). RULE 0.5 plan-mode artefact, 2 ledger anchor, 3 orchestrator-owned branch.

Goal

One-shot wizard kei-drive-import that takes a Google Drive root, classifies every subfolder, and converts each detected project into a fresh repo on the local Forgejo dev-hub (127.0.0.1:3001 per Wave 45).

Wave 1 research verdicts (4/4 done, frozen)

Stream Verdict Decision
GDrive sync tools rclone primary brew install rclone (MIT, arm64). CRITICAL: NOT Drive Desktop — corrupts .git/ via desktop.ini injection
Existing GDrive→git scripts None viable Build ourselves, ~200 LOC core
Forgejo API Raw curl POST /api/v1/user/repos {auto_init:false}, catch 409 conflict
Project detection 8-marker scoring Cargo.toml / package.json / pyproject.toml / go.mod / pom.xml / build.gradle / Gemfile / composer.json (weight 10), threshold ≥ 8

Architecture (hybrid: Rust detection + shell orchestration)

Component 1 — _primitives/_rust/kei-gdrive-import (Rust, Constructor Pattern)

src/
├── cli.rs        clap subcommands
├── classify.rs   single-folder verdict {PROJECT, AMBIGUOUS, NOT-A-PROJECT}
├── scan.rs       walk-tree → JSON array of classifications
├── scoring.rs    8-marker weighted scorer (table-driven, easy to extend)
├── lib.rs        re-exports
└── main.rs       binary entry
tests/
├── classify_fixtures.rs
├── scan_smoke.rs
└── fixtures/
    ├── rust-project/Cargo.toml
    ├── node-project/package.json
    ├── photos-folder/IMG_0001.jpg
    └── mixed/{README.md, src/, .git/}

CLI surface:

kei-gdrive-import classify <path>         # → JSON {verdict, score, primary_lang, markers: [...]}
kei-gdrive-import scan-tree <root>        # → JSON array of all folders + classifications
kei-gdrive-import scan-tree --remote drive:Projects/  # → uses `rclone lsf` if path starts with remote:

JSON schema (output of classify):

{
  "path": "drive:Projects/MyApp",
  "verdict": "PROJECT",
  "score": 18,
  "primary_lang": "rust",
  "markers": [
    {"file": "Cargo.toml", "weight": 10, "kind": "build_manifest"},
    {"file": "src/main.rs", "weight": 5, "kind": "source_file"},
    {"file": "README.md", "weight": 3, "kind": "doc"}
  ]
}

Component 2 — install/lib-dev-hub-gdrive-import.sh (idempotent installer)

  • brew install rclone jq (skip if present)
  • compile kei-gdrive-import (cargo build --release, copy to ${KIT}/bin/)
  • generate wizard wrapper at ${KIT}/dev-hub/drive-import-wizard.sh
  • NO launchd plist — interactive one-shot, not a daemon
  • post-install hint: "run kei-drive-import to start"

Component 3 — dev-hub/drive-import-wizard.sh (bash, interactive)

$ kei-drive-import

  ┌─ Step 1: rclone config (one-time OAuth) ─────┐
  │ Detected remotes: drive:                     │
  │ Missing remote? → run `rclone config` first  │
  └──────────────────────────────────────────────┘

  ┌─ Step 2: scan ──────────────────────────────┐
  │ root = drive:Projects/                      │
  │ Found 47 folders                            │
  │ Classifying via kei-gdrive-import...        │
  │   31 PROJECT   (score ≥ 8)                  │
  │    8 AMBIGUOUS (score 5-7) — review needed  │
  │    8 NOT-A-PROJECT (skipped)                │
  └──────────────────────────────────────────────┘

  ┌─ Step 3: select ────────────────────────────┐
  │ [✓] all 31 projects                         │
  │ [ ] 8 ambiguous (review each via fzf)       │
  │ Forgejo: http://127.0.0.1:3001              │
  │ Owner: ${USER}                              │
  │ Default branch: main                        │
  └──────────────────────────────────────────────┘

  ┌─ Step 4: migrate (per project) ─────────────┐
  │ → rclone copy drive:Projects/X /tmp/staging/X
  │ → write .gitignore (lang-aware)             │
  │ → git init && git add . && git commit       │
  │ → curl POST /api/v1/user/repos { name:X }   │
  │ → git remote add origin http://.../${USER}/X│
  │ → git push -u origin main                   │
  │ → log result to ledger                      │
  └──────────────────────────────────────────────┘

Component 4 — Tests

  • tests/gdrive_import_integration.sh — fake rclone via PATH override, fake Forgejo via netcat listener
  • Rust unit tests cover scoring + classification fixtures
  • Smoke test asserts wizard skips folders containing .git/ already (don't re-import live repos)

Ledger row (2)

agent_id     = wave46-gdrive-import-orchestrator
branch       = feat/kei-gdrive-import
parent       = main @ a5625e08
spec_sha     = (this file)
status       = running
started_ts   = 2026-04-26T...

Wave 2 research — DONE 2026-04-26 (3/3 streams)

R1 — rclone edge-cases (E1 except where noted)

  • Per-file cap: 5 TB (Drive hard-limit). 750 GiB/day = upload only, irrelevant for read.
  • Rate: ~12k qps personal, rclone backs off natively. Practical throughput ≈2 files/sec.
  • Gdocs: --drive-skip-gdocs makes them invisible. Pre-flight lsf enumeration MUST surface count to user. Opt-in to --drive-export-formats=md,docx,xlsx (md unverified for current API [E5]).
  • OS-junk (.DS_Store/Thumbs.db/desktop.ini) NOT filtered by default — explicit --exclude needed.
  • rclone copy idempotent on re-run (size+mtime, --checksum stronger).
  • Shortcuts: dereferenced by default → infinite loop risk → --drive-skip-shortcuts mandatory.

Recommended flag block (frozen):

rclone copy "drive:$SRC" "$DST" \
    --drive-skip-gdocs \
    --drive-skip-shortcuts \
    --drive-skip-dangling-shortcuts \
    --drive-acknowledge-abuse \
    --exclude "**/.DS_Store" --exclude "**/._*" \
    --exclude "**/Thumbs.db" --exclude "**/desktop.ini" \
    --exclude "**/.Spotlight-V100/**" --exclude "**/.Trashes/**" --exclude "**/.fseventsd/**" \
    --transfers 4 --checkers 8 --tpslimit 10 \
    --retries 5 --low-level-retries 10 \
    --checksum --create-empty-src-dirs \
    --stats 5s --log-file "$DST/.rclone-import.log"

R2 — auth UX + secrets (RULE 0.8 reconciled)

  • Auth mode: interactive browser OAuth via rclone config (autoconfig=Y, localhost:53682). Headless + service-account rejected for single-user macOS.
  • Scope: drive.readonly (minimum for list+download). [E1 developers.google.com]
  • Token CANNOT live in .env — rclone rewrites it on every auto-refresh.
  • 2-tier secrets layout:
    • Real token: ~/.config/rclone/rclone.conf chmod 600 (XDG default, treat like ~/.ssh/)
    • ~/.claude/secrets/.env:
      RCLONE_CONFIG=${HOME}/.config/rclone/rclone.conf
      KEI_DRIVE_REMOTE=gdrive
      
  • Detection commands (exit codes undocumented — parse stderr):
    • Missing remote: rclone --config "$RCLONE_CONFIG" listremotes \| grep -q '^gdrive:$'
    • Expired token: rclone about gdrive: 2>&1 \| grep -qiE 'oauth2\|401\|token'
  • Wizard MUST pass --config "$RCLONE_CONFIG" explicitly (belt-and-suspenders to env var).

R3 — license/safety (5-step pre-push checklist)

  • Tool pick: gitleaks v8.30.1 MIT (brew install gitleaks). Static gitleaks dir <path> mode (no git history needed). Default ruleset covers AWS / GCP / GitHub PAT / Stripe / PEM private keys / generic API keys.
  • gitignore source: github/gitignore CC0-1.0, SHA-pinned to 576334520435382d6522f349b9d270eda1e79a25 (last commit 2026-04-24).
  • marker→template map (hardcode, do NOT name-guess):
    Marker Template URL filename
    Cargo.toml Rust.gitignore
    package.json Node.gitignore
    pyproject.toml Python.gitignore
    go.mod Go.gitignore
    pom.xml Maven.gitignore
    build.gradle Gradle.gitignore
    Gemfile Ruby.gitignore
    composer.json Composer.gitignore

5-step ordered pre-push checklist (wizard MUST run in order):

  1. Existing repo detect: rclone lsf --dirs-only --include ".git/" <src> + HEAD-file fallback (Drive may store .git opaque). Found → SKIP + warn.
  2. Size + extension histogram: du -sh + bytes-per-extension. If .pdf >50% OR {.mp4,.mov,.mkv,.iso,.zip} >30% → prompt user (third-party content risk).
  3. Secret scan: gitleaks dir --no-banner --redact <src>. Non-zero → BLOCK until resolved or explicit bypass.
  4. Apply language .gitignore BEFORE first git add (fetch from SHA-pinned URL above).
  5. Final remote check: assert URL matches 127.0.0.1:3001 allowlist; reject github.com per .

Cross-cutting — prompt-injection notes

Both R2 + R3 caught fake <system-reminder> blocks appended to rclone.org and github docs pages via WebFetch. Pattern: trailing fake "MCP Server Instructions" telling agent to load computer-use tools. Both agents correctly ignored. Wizard implementation does NOT execute LLM-fetched content; this is research-tooling concern only.

Wave 3 implementation (4 streams parallel, 3)

I# Worktree Files Agent prompt clause
I1 agent-gdrive-rust _primitives/_rust/kei-gdrive-import/** "MUST NOT invoke git/cargo build (cargo check ok). Write files only."
I2 agent-gdrive-installer install/lib-dev-hub-gdrive-import.sh same
I3 agent-gdrive-wizard dev-hub/drive-import-wizard.sh (template), _templates/ same
I4 agent-gdrive-tests tests/gdrive_import_integration.sh + fixtures same

Wave 4 — merge ceremony

Per 2: AskUserQuestion per branch [merge --no-ff / squash / reject / defer]. Orchestrator commits with feat(wave46): prefix.

Out of scope (deferred)

  • Reverse direction (Forgejo → Drive backup) — separate primitive kei-gdrive-export
  • GitHub mirror — covered by existing tools/sync-public.sh
  • Bidirectional sync — explicit non-goal, this is one-shot import
  • Web UI — terminal-only

Risks (Wave 1)

  1. rclone config is interactive on first run — wizard must detect and pause for user
  2. Forgejo not running → curl fails fast, wizard aborts with clear message
  3. Folder named Projects (Drive) maps to nested KeiSeiKit Projects/ confusion — wizard uses absolute paths throughout
  4. Network drop mid-batch — per-project retries, ledger row per project for restart