Generic Constructor-Pattern agent kit for Claude Code. Zero personal data, fully English, MIT-licensed. Contents: - 34 reusable blocks (baseline, rules, stack/deploy/domain/api/scraper) - 14 cross-project agent manifests (code/ml/infra/researcher/critic/...) - 6 portable skills (/new-agent, /research, /test-gen, /debug-deep, /pr-review, /refactor) - Rust assembler (single binary, ~500 KB) - 3 hooks (auto-reassemble, pre-commit validate, no-hand-edit) - install.sh (idempotent, cargo-builds on first run) - MIT LICENSE All 6 sanity greps pass: 0 Russian text, 0 specific project names, 0 incident numbers, 0 user paths, 0 hardcoded IPs, 0 API keys. cargo check + assemble --validate: both pass on 14 manifests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 lines
1.7 KiB
Markdown
21 lines
1.7 KiB
Markdown
# DOMAIN — Scrapers Tier 1 (free APIs + open-source)
|
|
|
|
**Default to Tier 1. Paid tier only after Tier 1 is proven insufficient** (e.g. GitHub GraphQL FREE covers most dev-profile needs before anything paid).
|
|
|
|
**Tier 1 providers (FREE, with quota ceilings):**
|
|
- **YouTube Data API v3** — 10K units/day, search=100 units (≈100 searches/day), video details=1 unit. Cache aggressively, reuse IDs.
|
|
- **Telegram Telethon** (Python, MTProto) — user-account session, `get_participants` capped 200/call, FLOOD_WAIT adaptive. Pyrogram = alt.
|
|
- **GitHub GraphQL API v4** — 5K requests/hour authenticated; unauthenticated = 60/hr only.
|
|
- **Twitter twscrape** — unofficial, account-pool based, shadowban risk per account. Rotate accounts; never use main.
|
|
|
|
**GDPR — consent-first pipeline:**
|
|
- Discover → normalize → dedup → enrich → save, with explicit consent flag per profile.
|
|
- Scraped profile = personal data under GDPR; `lawful basis` recorded per source.
|
|
- Right-to-erasure: delete by (platform, external_id) must work.
|
|
|
|
**Rate & quota hygiene:**
|
|
- Persist quota counters per provider per day to `memory/{project}.md` or DB.
|
|
- Exponential backoff on 429/rate-limit; never hammer.
|
|
- Telethon/twscrape sessions stored in `secrets/` (see `domain-has-secrets`).
|
|
|
|
**Forbidden:** scraping Telegram with a user account without the user's explicit consent (account ban + ToS); hammering YouTube API quota without caching (10K units burns in minutes); unauthenticated GitHub calls (60/hr = instant lockout on any real job); committing Telethon `.session` files; using your personal Twitter account as the twscrape pool seed; scraping profiles without recording consent/lawful-basis flag.
|