feat(blocks): 4 API design blocks — rest/openapi/graphql/versioning-pagination

This commit is contained in:
Parfii-bot 2026-04-21 20:54:53 +08:00
parent 48d4dd0733
commit e3c20b2b01
4 changed files with 153 additions and 0 deletions

33
_blocks/api-graphql.md Normal file
View file

@ -0,0 +1,33 @@
# API — GraphQL (schema-first, DataLoader, subscriptions, persisted queries)
Single-endpoint, client-driven query language. Pairs with `auth-sessions.md` / `auth-authorization.md` (identity + field-level authz) and `api-versioning-pagination-ratelimit.md` (Relay cursors + cost-based rate limits).
## When to include
- Client needs shape each response themselves (mobile bandwidth, SPA over-fetch, UI-driven demand).
- Graph-shaped domain (social, sharing, org charts, document tree) where REST nesting explodes.
- Multiple teams own different resolvers behind one gateway (federation / subgraphs).
## What it declares
- **Schema-first, not code-first:** `schema.graphql` is the SSoT, committed to the repo. Resolvers are generated types (TS `graphql-codegen`, Rust `async-graphql` derive, Go `gqlgen`) that must implement the schema. Schema-first beats code-first for reviewability, federation, and client codegen.
- **SDL only, no custom DSL:** use standard GraphQL SDL — `type`, `input`, `interface`, `union`, `enum`, `scalar`, directives. Custom scalars (`DateTime`, `UUID`, `JSON`) declared once; keep the list short.
- **Resolver structure (Apollo / urql / Relay agnostic):** one resolver per field; resolvers return values OR a loader handle, never hit the DB directly in a loop — that's the N+1 trap.
- **DataLoader for every 1-to-many or many-to-many field:** Facebook's `dataloader` pattern (batch + per-request cache). Without it, a query `users { posts { comments { author { name } } } }` issues O(N³) queries; with it, exactly 4. Implementations: `dataloader` (JS, reference), `async-graphql` built-in (Rust), `graphql-dataloader` (Go), `aiodataloader` (Python).
- **Pagination: Relay cursor spec**`type FooConnection { edges: [FooEdge!]! pageInfo: PageInfo! totalCount: Int } type FooEdge { node: Foo! cursor: String! } type PageInfo { hasNextPage: Boolean! hasPreviousPage: Boolean! startCursor: String endCursor: String }`. See `api-versioning-pagination-ratelimit.md`.
- **Errors:** don't throw — return the GraphQL error envelope. Expected errors (not-found, unauthorized, validation) go in `errors[]` with `extensions.code` taxonomy (`NOT_FOUND`, `FORBIDDEN`, `BAD_USER_INPUT`, `RATE_LIMITED`). Unexpected errors → generic `INTERNAL_SERVER_ERROR`, server-side logged with correlation id.
- **Subscriptions — pick transport explicitly:** **graphql-ws** (RFC-like WebSocket sub-protocol, Apollo-server + urql default; replaces the deprecated `subscriptions-transport-ws`) OR **graphql-sse** (HTTP Server-Sent Events, no WS infra). WebSocket needs auth on `connectionInit` (token in payload), reconnect strategy, and a resumable cursor — SSE is simpler where you don't need client→server push.
- **Persisted queries (APQ / PQ):** hash the query at build time, send only the hash at runtime. Stops query-bombing attacks, cuts bandwidth, and enables CDN caching of `GET /graphql?hash=...`. Apollo Automatic Persisted Queries, Relay persisted queries, Hasura allow-list all implement this. PRODUCTION-ONLY allow-list the hashes — reject unknown queries.
- **Depth + cost limiting:** every query runs through a cost analyser (e.g. `graphql-cost-analysis`, `graphql-armor`) and rejects when depth > N (typically 10) or cost > budget. Without this, a 20-line query can DoS the DB.
- **Introspection:** ON in dev and staging (the whole tooling assumes it). OFF on the public-facing prod endpoint unless you operate a public API — combine with persisted-query allow-list.
- **Field-level authz:** directive-based (`@auth(role: ADMIN)`) OR middleware in the resolver. Either way — check permission INSIDE the resolver, NOT only at the HTTP layer; a single GraphQL POST hits dozens of resolvers.
- **Libraries:** **TS server**: GraphQL Yoga, Apollo Server 4, Mercurius (Fastify). **TS client**: Apollo Client, urql, Relay. **Rust**: async-graphql (schema-first via derive). **Go**: gqlgen. **Python**: Strawberry, Ariadne. **Federation**: Apollo Federation 2 (`@key`, `@extends`, `@external`), Cosmo, Hive — only if you truly have multiple subgraphs.
## References
- GraphQL spec (https://spec.graphql.org/October2021/) [E1 — normative, October 2021 revision current].
- GraphQL over HTTP + GraphQL over WebSocket (graphql-ws) + graphql-sse [E1 — working group specs].
- Relay Cursor Connections (https://relay.dev/graphql/connections.htm) [E1].
- DataLoader — Facebook OSS (https://github.com/graphql/dataloader) [E2].
- Apollo Federation v2 docs, Hasura docs, gqlgen docs, async-graphql docs [E2 — production-deployed].
- Evidence grade [E2] — GitHub v4 API, Shopify Admin, Facebook, Netflix all production GraphQL.

View file

@ -0,0 +1,39 @@
# API — OpenAPI-First (3.1 as single source of truth)
Machine-readable contract that drives server stubs, client SDKs, docs, mocks, and contract tests from ONE file. Pairs with `api-rest-conventions.md` (the HTTP rules the spec encodes) and `api-versioning-pagination-ratelimit.md` (versioning + pagination schemas).
## When to include
- Any REST API with ≥2 consumers (web + mobile, public + partner, multiple internal services).
- API that must publish SDKs in >1 language — spec-driven codegen beats hand-written clients per language.
- Regulated API (finance / health) where the contract must be reviewable and diff-able as a single artefact.
## What it declares
- **OpenAPI 3.1.0** — the 2021+ version that is a strict superset of JSON Schema 2020-12. Use 3.1 unless a specific tool pins you to 3.0.x; 2.0 (Swagger) is legacy and missing `oneOf/anyOf/nullable` nuances.
- **Single file, single source of truth:** `openapi.yaml` (or `.json`) committed at repo root or under `api/`. ALL of the following are GENERATED, never hand-written:
- Server routing stubs / request validators (codegen for your stack).
- Typed client SDKs (TS, Swift, Kotlin, Python, Rust, Go).
- Human docs site (Swagger UI / Redoc / Scalar / Stoplight Elements).
- Mock server (Prism, mswjs, Stoplight) for consumer tests before the backend exists.
- Contract tests (Schemathesis, Dredd, Pact broker feed).
- **Structure:** `info`, `servers` (per environment — prod, staging, sandbox), `paths` (one entry per resource/action pair), `components.schemas` (reusable types), `components.securitySchemes` (bearer / OAuth2 / API-key), `components.parameters` (shared query params like `page`, `cursor`, `limit`), `components.responses` (problem+json 400 / 401 / 403 / 404 / 409 / 422 / 429 / 500 reused by `$ref`), `tags` (grouping for docs).
- **Schemas ARE types:** every `$ref` resolves to `components/schemas/*`; no anonymous objects inline inside responses. This makes the codegen output readable and re-usable.
- **Error model is shared:** define `Problem` schema once (RFC 9457 shape) and `$ref` it from every 4xx/5xx response. Keeps the error contract identical across 120 endpoints.
- **Examples are typed:** every operation has ≥1 request example + ≥1 response example. Examples flow into Redoc docs, mock server responses, and SDK fixtures. Invalid examples break CI — treat them as test data.
- **Tooling pick — ONE per job:**
- Lint: **Spectral** (`.spectral.yaml` with a ruleset — Google/Microsoft API guidelines ship starter rulesets).
- Diff / breaking-change gate: **oasdiff** or **openapi-diff** in CI — PR fails on a breaking change unless `breaking: approved` label.
- Codegen: **openapi-generator** (multi-language, mature; prefer `*-axios`, `*-nullable` templates for TS); **orval** for TS + React Query / SWR first-class; **oapi-codegen** for Go; **progenitor** for Rust.
- Docs: **Redoc** (read-only, pretty), **Swagger UI** (interactive), **Scalar** (modern, fast), **Stoplight Elements** (embeddable React component). Pick one — documented decision in repo.
- **Governance:** `openapi.yaml` change = PR review like code. No drift between spec and server: CI runs the generated server stubs AND contract tests against the running app.
- **[UNVERIFIED] claims — forbidden:** never quote an OpenAPI feature without checking the 3.1 spec. `discriminator`, `oneOf`, `nullable` (removed — use `type: [string, "null"]`) are easy to get wrong; cite spec link on debate.
## References
- OpenAPI 3.1.0 spec (https://spec.openapis.org/oas/v3.1.0) [E1 — normative].
- JSON Schema 2020-12 (https://json-schema.org/specification.html) [E1].
- RFC 9457 Problem Details + `api-rest-conventions.md` for the HTTP semantics the spec encodes.
- Swagger UI / Redoc / Scalar / Stoplight Elements — all actively maintained as of 2026 [E2].
- openapi-generator (https://openapi-generator.tech/), orval (https://orval.dev/), oapi-codegen (https://github.com/oapi-codegen/oapi-codegen) [E2 — production-deployed].
- Evidence grade [E2] — pattern is the Stripe / GitHub / Twilio / Shopify default.

View file

@ -0,0 +1,28 @@
# API — REST Conventions (verbs, status codes, resources, idempotency, ETag)
HTTP-level contract for resource-oriented APIs. Pairs with `api-openapi-first.md` (spec as SSoT), `api-versioning-pagination-ratelimit.md` (list + version policy), and `auth-oauth2-oidc.md` / `auth-sessions.md` (principal + scopes).
## When to include
- Public or partner JSON-over-HTTP API where clients are heterogeneous (mobile, SPA, third-party integrations, curl).
- Internal service boundary that you want reviewable by humans without generated tooling.
- Any API that must degrade gracefully through an HTTP cache / proxy / API gateway.
## What it declares
- **Resource naming:** plural nouns, lowercase, kebab-case (`/invoices`, `/invoice-items/{id}`), no verbs in path. Nested resources ≤2 levels deep (`/invoices/{id}/items`); beyond that flatten with query filters. One canonical URL per resource — never two paths for the same entity.
- **Verbs (RFC 9110):** `GET` safe + idempotent, `HEAD` metadata only, `PUT` full replace + idempotent, `PATCH` partial (JSON Merge Patch RFC 7396 OR JSON Patch RFC 6902, pick one per API), `POST` create / non-idempotent action, `DELETE` idempotent. Non-CRUD actions → `POST /resource/{id}:action` (Google AIP-136) or a child resource — never `GET /do-thing`.
- **Status codes — pick from this set, no creativity:** `200 OK`, `201 Created` (+ `Location` header), `202 Accepted` (async), `204 No Content`, `301/308` (moved), `400 Bad Request` (validation), `401 Unauthorized` (no/invalid credential), `403 Forbidden` (authenticated but not allowed), `404 Not Found`, `409 Conflict` (optimistic-lock / duplicate), `410 Gone`, `412 Precondition Failed` (If-Match mismatch), `415 Unsupported Media Type`, `422 Unprocessable Entity` (semantic validation), `429 Too Many Requests`, `500 Internal Server Error`, `502/503/504` (upstream). `418` is a joke, not a status.
- **Error body: RFC 9457 Problem Details**`{ "type": "https://api.example.com/errors/invoice-not-found", "title": "...", "status": 404, "detail": "...", "instance": "/invoices/42", "errors": [{"field":"amount","code":"negative"}] }`. Content-Type `application/problem+json`. Stable `type` URI = machine key; `title` = human; `detail` = this instance.
- **Idempotency-Key header (Stripe / IETF draft-ietf-httpapi-idempotency-key-header):** required on `POST` that creates/charges. Server stores `(key, route, response)` for ≥24 h and replays on retry. Different body with same key → `422`. Missing key on mutating `POST``400` for strict APIs, accept + warn for lenient.
- **Conditional requests (RFC 9110 §13):** `ETag` on every resource representation (strong `"abc123"` unless you truly serve byte-equivalent variants). Clients send `If-Match: "abc123"` on `PUT` / `PATCH` / `DELETE` — server replies `412` on mismatch. `If-None-Match` + `304 Not Modified` on `GET` for cache revalidation. `Last-Modified` as a weaker fallback only.
- **Content negotiation:** `Accept`, `Accept-Language`, `Accept-Encoding` honoured. Default `application/json; charset=utf-8`. Version media types (`application/vnd.example.v2+json`) ONLY if you commit to header-based versioning — see `api-versioning-pagination-ratelimit.md`.
- **HATEOAS / hypermedia:** OPTIONAL. Include a `_links` / `links` object per resource when the API is explicitly browsable (HAL, JSON:API, Siren) — it's not required for typed SDKs. Document the choice in `openapi.yaml` and stay consistent.
- **Safe-by-default surface:** `GET` never mutates. `DELETE` is idempotent — repeated calls return `204` even if the row is already gone. `PUT` requires the FULL representation; partial field on `PUT` = `400`.
## References
- RFC 9110 (HTTP Semantics), RFC 9111 (HTTP Caching), RFC 9457 (Problem Details, 2023), RFC 7396 / 6902 (Merge Patch / JSON Patch), RFC 5988 + 8288 (Web Linking) [E1 — IETF standards-track].
- Google AIP (https://google.aip.dev/) and Microsoft REST API Guidelines (https://github.com/microsoft/api-guidelines) — production-grade conventions [E2].
- `api-openapi-first.md` — encode this block as the machine-readable SSoT; `api-versioning-pagination-ratelimit.md` — list, cursor, and version policy.
- Evidence grade [E2] — every rule here is deployed across Stripe, GitHub, Google, Microsoft production APIs.

View file

@ -0,0 +1,53 @@
# API — Versioning, Pagination, Rate Limiting
Three cross-cutting concerns that every production API hits within the first month. Pairs with `api-rest-conventions.md` (HTTP semantics), `api-openapi-first.md` (where the policy is encoded), and `api-graphql.md` (Relay cursors + cost-based limits).
## When to include
- Any API expected to outlive one client release — versioning decided BEFORE launch, not during the first breaking change.
- Any endpoint returning a collection — pagination decided BEFORE the dataset grows past 10k rows.
- Any API on the public internet or behind a partner quota — rate limits decided BEFORE the first abusive client.
## What it declares
### Versioning — pick one strategy, document it
| Strategy | Example | Pros | Cons | Use when |
|---|---|---|---|---|
| **URL path** | `/v1/invoices``/v2/invoices` | Most visible, curl-friendly, easy CDN routing | Pollutes every path; "v2" is vague | Public API, coarse versions, infrequent bumps. GitHub v3/v4, Stripe-compatible mirrors. |
| **Header (media type)** | `Accept: application/vnd.example.v2+json` | Clean URLs, content negotiation native | Invisible in logs/curl; needs client support | Internal APIs with typed SDKs, GitHub v4 hybrid. |
| **Date-based** | `Stripe-Version: 2025-11-01` | Fine-grained, every breaking change pinnable | Complex rollout matrix; server must keep N-1 versions live | Pay-for-stability APIs (Stripe, Shopify); regulated domains. |
| **GraphQL evolution** | Never break the schema; mark fields `@deprecated(reason: "use X")` and remove after telemetry shows 0 usage | No versions to maintain | Schema grows forever; deprecation discipline required | Any GraphQL API — see `api-graphql.md`. |
| **No versioning (additive-only)** | Promise: additions never break clients; removals need a new endpoint | Simplest | Only works with disciplined teams + strong typing | Small internal APIs with ≤3 consumers. |
Rules that apply to ALL strategies: (a) deprecate with `Deprecation` + `Sunset` headers (RFC 8594, RFC 9745) + 6-month minimum runway, (b) publish a changelog, (c) run the old + new in parallel until telemetry shows the old is unused.
### Pagination — three patterns, one rule
- **Offset / page (LIMIT N OFFSET M):** `?page=3&limit=50`. OK for admin UIs over small tables. BROKEN for real data — rows drift during paging, `OFFSET 10000` scans 10k rows on every call. Returns `X-Total-Count` or a `meta.total` field; clients assume random access.
- **Cursor (opaque token, keyset/seek):** `?cursor=eyJpZCI6MTIzfQ&limit=50`. Cursor = base64 of `(id, created_at, …)` — opaque to client, ordered by the server's index. Handles drift, O(log n) lookups. Response envelope: `{ data: [...], meta: { next_cursor, prev_cursor, has_more } }`. REQUIRED for any list that can exceed 1k rows or where concurrent writes happen.
- **Relay (GraphQL spec):** `first: 50, after: "cursor"` + `Connection { edges, pageInfo { endCursor, hasNextPage } }`. Standardised cursor pattern for GraphQL — see `api-graphql.md`.
Rule: **default cursor, offer offset only when the UI genuinely needs page numbers**. Never return >1000 items per page; clamp `limit` server-side.
### Rate limiting — headers + strategy
- **Token bucket or sliding-window**, per authenticated principal (user / API key / IP). Redis-backed, atomic via Lua. Policy tiers: anon < authenticated < partner < internal.
- **Response headers — IETF `RateLimit` (draft-ietf-httpapi-ratelimit-headers, shipped in Cloudflare / GitHub as of 2024):**
- `RateLimit-Limit: 1000` — quota in the current window.
- `RateLimit-Remaining: 947` — left in the current window.
- `RateLimit-Reset: 47` — seconds until reset.
- Also accept legacy `X-RateLimit-*` for GitHub/Stripe parity during migration.
- **On block: `429 Too Many Requests` + `Retry-After: <seconds>`** (RFC 9110 §10.2.3) + Problem+json body describing the limit that was hit. Always include `Retry-After`; idempotent clients retry cleanly.
- **Cost-based for GraphQL:** each field has a cost (e.g. `user: 1, user.posts: 5 per item, search: 50`); query total checked against per-principal budget. See `api-graphql.md`.
- **Fail-open on metering outage** is a bug, not a feature — fail-closed with a clear error code (`RATE_LIMITER_UNAVAILABLE`) so clients can alert. Silent "no limit" costs more than a short outage.
- **Defence-in-depth:** per-IP (anti-bot), per-principal (anti-abuse), per-endpoint (protect expensive routes), global (protect the cluster). Document all four layers in the repo — hidden layers surprise on-call.
## References
- RFC 8594 (Sunset header), RFC 9745 (Deprecation header, 2024), RFC 9110 §10.2.3 (`Retry-After`) [E1 — IETF].
- draft-ietf-httpapi-ratelimit-headers (https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers/) [E1 — active working group draft, deployed by Cloudflare + GitHub].
- Relay Cursor Connections (https://relay.dev/graphql/connections.htm) [E1].
- Stripe API versioning post (https://stripe.com/blog/api-versioning) [E2 — production-documented 2017 onward].
- GitHub v3 → v4 migration notes, Shopify API versioning [E2].
- Evidence grade [E2] — all three policies are production-deployed at Stripe, GitHub, Shopify, Cloudflare.