KeiSeiKit-1.0/_blocks/scraper-paid-tier.md
Parfii-bot 0be354a920 KeiSeiKit-public — clean state
Single-commit clean baseline after security scrub of niche-tells,
project codenames, internal jargon, and contributor-email leaks.

Contents:
- 100 Rust crates (_primitives/_rust/)
- 37 agent manifests (_manifests/) + generated specs (_generated/)
- 67 user-invocable skills (skills/)
- 33 hooks (hooks/)
- Composition blocks (_blocks/)
- Documentation (docs/, README.md)
- TS adapter packages (_ts_packages/)
- Assembler (_assembler/)
- Roles (_roles/)
- Templates (_templates/)
- Forgejo CI (.forgejo/)

Author: Denis Parfionovich <info@greendragon.info>

License: see LICENSE.
2026-05-01 12:09:03 +08:00

31 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# DOMAIN — Scrapers Tier 3 (Apify / Bright Data paid)
**MANDATORY handoff to `kei-cost-guardian` before ANY paid scraping run.** Tier 3 = fallback, not default. Prove Tier 1 insufficient first.
**Known rates (verify on provider pricing page before launch — rates change):**
- **Apify YouTube** `apidojo/youtube-scraper` — $0.50/1K (free API v3 preferred when quota allows)
- **Apify LinkedIn** `harvestapi/linkedin-profile-scraper` — $4/1K (no email) / $10/1K (with email) — **HIGH LEGAL RISK** (BGH Germany Nov 2024: scraping = 100 EUR GDPR compensation per user)
- **Apify Instagram** `apify/instagram-scraper` — $2.30-2.60/1K · `apidojo/instagram-scraper` — $0.50/1K (cheaper, residential proxies mandatory)
- **Apify Facebook** `apify/facebook-posts-scraper` — $5-8/1K · Bright Data ~$1/1K (10x cheaper at scale)
- **Apify TikTok** — `[VERIFY: https://apify.com/store?search=tiktok]` (report lacks current rate)
- **Apify Telegram** — $1-3/1K, **DON'T USE** — Telethon (Tier 1, FREE) gives 100% functionality
- **Bright Data residential proxies** — ~$7-8/GB (Apify residential add-on same tier)
**Pre-run checklist (hand off to `kei-cost-guardian`):**
1. Dashboard balance — state current Apify credits / Bright Data balance.
2. Pricing page fetched LIVE (WebFetch) — quote rate + timestamp.
3. Running actors — Apify dashboard: show what's already billing.
4. Cost estimate — `N_items × $rate/1K + proxy_GB × $8`. Echo dollars to user BEFORE launch.
5. **1-2 item smoke run first** — verify shape + per-item cost; only then scale.
6. Monitor first 2 min stdout — kill on anomaly, don't let a broken actor burn the run.
7. Log actuals to `memory/{project}.md` after.
**GDPR residential-proxy ban (EU targets):**
- Residential proxies for GDPR-protected data (EU individual profiles) → **DPO sign-off required**.
- Default to datacenter proxies unless the actor mandates residential (Instagram, Facebook).
- XING = DACH = strictest GDPR jurisdiction — prefer XingZap (5 EUR/query, GDPR-compliant) over raw Apify actors.
**Cost tiers (inherit from `domain-paid-apis`):**
- < $5 AUTO · $5-$20 WARN · > $20 STOP + explicit user "yes, launch $N.NN" echo.
**Forbidden:** launching LinkedIn paid scrape without legal-review sign-off in `DECISIONS.md`; cookie-based LinkedIn actors with user's main account (`curious_coder/*` bans accounts); residential proxies on EU individual profiles without DPO approval; batch >100 items without `kei-cost-guardian` estimate; skipping 1-2 item smoke run (failed actor config × N items = N billings); running paid scraper when Tier 1 (YouTube API v3, Telethon, GitHub GraphQL) covers the data; hardcoding Apify tokens in source (use `secrets/*.env`).