Generic Constructor-Pattern agent kit for Claude Code. Zero personal data, fully English, MIT-licensed. Contents: - 34 reusable blocks (baseline, rules, stack/deploy/domain/api/scraper) - 14 cross-project agent manifests (code/ml/infra/researcher/critic/...) - 6 portable skills (/new-agent, /research, /test-gen, /debug-deep, /pr-review, /refactor) - Rust assembler (single binary, ~500 KB) - 3 hooks (auto-reassemble, pre-commit validate, no-hand-edit) - install.sh (idempotent, cargo-builds on first run) - MIT LICENSE All 6 sanity greps pass: 0 Russian text, 0 specific project names, 0 incident numbers, 0 user paths, 0 hardcoded IPs, 0 API keys. cargo check + assemble --validate: both pass on 14 manifests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.2 KiB
2.2 KiB
DOMAIN — Scraper unified output invariant
All scrapers emit UnifiedProfile / UnifiedContent via normalize(). Provider-specific fields belong in rawData, nothing else.
Schema (minimum fields):
UnifiedProfile {
platform: 'youtube' | 'linkedin' | 'instagram' | 'facebook' | 'xing' | 'telegram' | 'github' | 'twitter',
external_id: string, // platform-native stable ID (PRIMARY dedup key)
name, username, avatar_url, bio, url,
followers_count, following_count, posts_count,
email, phone, website, location,
company, job_title, industry, // LinkedIn / XING
consent: { lawful_basis, source, timestamp }, // GDPR — mandatory
raw_data: Record<string, unknown>, // untouched provider response
}
BaseScraper pattern (all new scrapers inherit):
- 1 scraper = 1 file = 1 platform (Constructor Pattern).
fetch()→ raw provider response;normalize()→UnifiedProfile | UnifiedContent.- Normalizers live in
src/normalizers/<platform>.(ts|py|rs)— one cube per platform. - Never let provider-specific fields leak into DB queries, business logic, or UI. Business code reads ONLY
UnifiedProfilekeys.
Deduplication:
- Primary key:
(platform, external_id)— platform-native stable ID. - Secondary merge: normalized name + location + company — only when
external_idmissing. - Never dedup by email only — email collisions (shared inboxes, typos, generic
info@) merge distinct people into one profile.
Consent flag (GDPR):
- Every profile record a lawful-basis value (
legitimate_interest/consent/public_data). - Source (which scraper + when) logged per record.
- Right-to-erasure endpoint deletes by
(platform, external_id)across all tables.
Forbidden: writing a scraper that skips normalize(); passing raw provider dicts into business logic / DB queries / UI components (breaks Single Source of Truth); deduplication by email alone; persisting a profile without consent field populated; putting platform-specific schema into src/models/ top-level types (belongs in raw_data or provider-scoped module); mixing two platforms in one scraper file (Constructor Pattern — split per platform).