KeiSeiKit-1.0/_blocks/api-apify.md
Parfii-bot 0398c9ca05 refactor(blocks): update kit-agent handoff refs to kei- prefix in 5 blocks
Caught in Phase-2 double-audit pass AFTER commits 1-5 were already
pushed: top-level _blocks/*.md contains prose handoff references to
"cost-guardian" that get composed into generated agent .md files.
These were missed by the skills/manifests sweep because blocks weren't
in the original task spec list (only fixture _blocks/ were mentioned,
and those are separate).

Impact if left unfixed: any project-specialist created via /new-agent
with Q3=Yes (paid APIs) or Q7!=None (scrapers) would compose these
blocks and emit a generated .md referencing the stale `cost-guardian`
handoff target — a dangling reference after the kei-* rename.

Files touched (10 references, all to `cost-guardian`):
- _blocks/api-apify.md          (1)
- _blocks/api-elevenlabs.md     (2)
- _blocks/api-fal-ai.md         (2)
- _blocks/domain-paid-apis.md   (2)
- _blocks/scraper-paid-tier.md  (3)

Verify: cargo test -> 17/17 still green (fixture _blocks/ isolated
from top-level _blocks/, so no snapshot drift).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:12:49 +08:00

41 lines
3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# API — Apify (web scraping platform)
Live pricing: WebFetch https://apify.com/pricing before any run >$5. Treat the table below as a starting sketch and always re-verify on the live pricing page.
**Platform plans (sample — re-verify on live pricing page):**
| Plan | $/mo | Credits | CU cost | Max RAM | Retention |
|------|-----:|--------:|--------:|--------:|----------:|
| Free | $0 | $5 | $0.30 | 4-8 GB | 7d |
| Starter | $49 | $49 | $0.30 | 32 GB | 14d |
| Scale | $199 | $199 | $0.25 | 128 GB | 21d |
| Business | $999 | $999 | $0.20 | 256+ GB | 31d |
**CU (Compute Unit) formula:** `CU = Memory(GB) × Duration(hours)`. Browser scraper ≈ 300 pages/CU; HTTP scraper ≈ 3000 pages/CU. Most actors 0.1-5 CU/run.
**Per-actor rates (sample — re-check pricing page before any batch):**
| Platform | Best actor | $/1K | Risk | Free alternative |
|----------|-----------|-----:|------|-----------------|
| YouTube | `apidojo/youtube-scraper` | $0.50 | LOW | **YouTube Data API v3 (FREE, 10K units/day)** |
| LinkedIn | `harvestapi/linkedin-profile-scraper` | $4 (no email) / $10 (email) | **HIGH** | linkedin_scraper (Python) |
| Instagram | `apify/instagram-scraper` (official) | $2.30-2.60 | VERY HIGH | Instaloader |
| Instagram | `apidojo/instagram-scraper` (3rd party) | $0.50 | VERY HIGH | — |
| Facebook | `apify/facebook-posts-scraper` | $5-8 | VERY HIGH | facebook-scraper |
| Telegram | via Apify | $1-3 | LOW | **Telethon/Pyrogram (FREE, MTProto)** |
Prefer free path when available — Telethon (Telegram) and YouTube Data API v3 are 100% FREE and fully featured.
**Proxies:**
- Datacenter — included in plan; $0.6-1.0/IP overage. Blocked by IG/FB on first hit.
- Residential — **$7-8/GB**. Required for Instagram/Facebook. **GDPR risk** for EU targets (BGH Germany Nov 2024: €100/user scraping compensation).
- SERP — $2.50/1K.
**Webhooks:** POST on `ACTOR.RUN.SUCCEEDED` / `.FAILED` → your endpoint receives `runId`, `datasetId`. Use for pipelines; poll only for manual one-offs.
**Input schema validation:** every actor has a JSON schema (`input_schema.json`). Validate inputs client-side before POST — failed inputs still eat CU in the startup phase.
**Legal landscape:** hiQ v. LinkedIn (2022) CFAA ≠ public data; Meta v. Bright Data (2024) Meta lost; **BGH Germany Nov 2024: GDPR Art. 82 → €100 per scraped user**. All 6 major platforms' ToS prohibit scraping (contractual, not criminal).
**LinkedIn HIGH RISK:** `harvestapi` no-cookie actors are safer ($4-10/1K). Cookie-based (`curious_coder`) = ban + ToS exposure. Max 500 profiles/day deep. **Always legal review before EU LinkedIn runs.**
**Forbidden:** LinkedIn batch without legal sign-off (GDPR + ToS); residential proxies against EU targets without documented consent basis; batch runs without per-item cost estimate to `kei-cost-guardian`; using main personal account for any cookie-based actor (curious_coder line); launching an actor before validating input against its `input_schema.json`; paying Apify for Telegram when Telethon is free.