Parfii-bot 0398c9ca05 refactor(blocks): update kit-agent handoff refs to kei- prefix in 5 blocks

Caught in Phase-2 double-audit pass AFTER commits 1-5 were already
pushed: top-level _blocks/*.md contains prose handoff references to
"cost-guardian" that get composed into generated agent .md files.
These were missed by the skills/manifests sweep because blocks weren't
in the original task spec list (only fixture _blocks/ were mentioned,
and those are separate).

Impact if left unfixed: any project-specialist created via /new-agent
with Q3=Yes (paid APIs) or Q7!=None (scrapers) would compose these
blocks and emit a generated .md referencing the stale `cost-guardian`
handoff target — a dangling reference after the kei-* rename.

Files touched (10 references, all to `cost-guardian`):
- _blocks/api-apify.md          (1)
- _blocks/api-elevenlabs.md     (2)
- _blocks/api-fal-ai.md         (2)
- _blocks/domain-paid-apis.md   (2)
- _blocks/scraper-paid-tier.md  (3)

Verify: cargo test -> 17/17 still green (fixture _blocks/ isolated
from top-level _blocks/, so no snapshot drift).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-21 14:12:49 +08:00

3 KiB

Raw Blame History

API — Apify (web scraping platform)

Live pricing: WebFetch https://apify.com/pricing before any run >$5. Treat the table below as a starting sketch and always re-verify on the live pricing page.

Platform plans (sample — re-verify on live pricing page):

Plan	$/mo	Credits	CU cost	Max RAM	Retention
Free	$0	$5	$0.30	4-8 GB	7d
Starter	$49	$49	$0.30	32 GB	14d
Scale	$199	$199	$0.25	128 GB	21d
Business	$999	$999	$0.20	256+ GB	31d

CU (Compute Unit) formula: CU = Memory(GB) × Duration(hours). Browser scraper ≈ 300 pages/CU; HTTP scraper ≈ 3000 pages/CU. Most actors 0.1-5 CU/run.

Per-actor rates (sample — re-check pricing page before any batch):

Platform	Best actor	$/1K	Risk	Free alternative
YouTube	`apidojo/youtube-scraper`	$0.50	LOW	YouTube Data API v3 (FREE, 10K units/day)
LinkedIn	`harvestapi/linkedin-profile-scraper`	$4 (no email) / $10 (email)	HIGH	linkedin_scraper (Python)
Instagram	`apify/instagram-scraper` (official)	$2.30-2.60	VERY HIGH	Instaloader
Instagram	`apidojo/instagram-scraper` (3rd party)	$0.50	VERY HIGH	—
Facebook	`apify/facebook-posts-scraper`	$5-8	VERY HIGH	facebook-scraper
Telegram	via Apify	$1-3	LOW	Telethon/Pyrogram (FREE, MTProto)

Prefer free path when available — Telethon (Telegram) and YouTube Data API v3 are 100% FREE and fully featured.

Proxies:

Datacenter — included in plan; $0.6-1.0/IP overage. Blocked by IG/FB on first hit.
Residential — $7-8/GB. Required for Instagram/Facebook. GDPR risk for EU targets (BGH Germany Nov 2024: €100/user scraping compensation).
SERP — $2.50/1K.

Webhooks: POST on ACTOR.RUN.SUCCEEDED / .FAILED → your endpoint receives runId, datasetId. Use for pipelines; poll only for manual one-offs.

Input schema validation: every actor has a JSON schema (input_schema.json). Validate inputs client-side before POST — failed inputs still eat CU in the startup phase.

Legal landscape: hiQ v. LinkedIn (2022) CFAA ≠ public data; Meta v. Bright Data (2024) Meta lost; BGH Germany Nov 2024: GDPR Art. 82 → €100 per scraped user. All 6 major platforms' ToS prohibit scraping (contractual, not criminal).

LinkedIn HIGH RISK: harvestapi no-cookie actors are safer ($4-10/1K). Cookie-based (curious_coder) = ban + ToS exposure. Max 500 profiles/day deep. Always legal review before EU LinkedIn runs.

Forbidden: LinkedIn batch without legal sign-off (GDPR + ToS); residential proxies against EU targets without documented consent basis; batch runs without per-item cost estimate to kei-cost-guardian; using main personal account for any cookie-based actor (curious_coder line); launching an actor before validating input against its input_schema.json; paying Apify for Telegram when Telethon is free.

3 KiB Raw Blame History Unescape Escape

API — Apify (web scraping platform)

3 KiB

Raw Blame History