API Documentation

ContextZip converts any URL into clean, LLM-ready Markdown. One API call replaces complex HTML parsing pipelines — saving up to 90% on token costs.

ℹ️

Base URL: All API endpoints are available at https://contextzip.com. The interactive OpenAPI spec is at /v1/openapi.json.

Quick Start

Get your first clean Markdown response in under 60 seconds.

1. Get an API key

2. Make your first request

curl -X POST https://contextzip.com/v1/extract \
  -H "X-API-Key: czk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/article","mode":"clean"}'

3. Parse the response

{
  "status": "success",
  "data": {
    "title": "Example Article",
    "markdown": "# Example Article\n\nClean content here...",
    "byline": "John Smith",
    "tokens_saved": 84210,
    "word_count": 1240
  },
  "cached": false,
  "cost": 0.005,
  "mode": "clean"
}

💡

Pro tip: Identical URL requests within 24 hours return cached results — completely free. Cache hits respond in under 100ms.

Authentication

All API requests require authentication via an API key passed in the request header.

Header	Value	Notes
X-API-Key	string	Your API key. Starts with `czk_`. Obtain from Admin Panel.

⚠️

Keep your API key secret. Never expose it in client-side code or public repositories. Rotate it immediately if compromised.

POST /v1/extract

Synchronously extracts and returns clean Markdown from the provided URL. For very slow pages (>15s render time), consider using async mode.

POST https://contextzip.com/v1/extract

Renders the page using a headless browser, extracts main content, converts to Markdown, and returns the result. Results are cached for 24h.

Request Body

Parameter	Type	Required	Description
url	string	Required	The URL to extract content from. Must be a valid HTTP/HTTPS URL.
mode	string	Optional	Extraction mode: `clean` (default), `raw`, or `summary`. See Extraction Modes.
async	boolean	Optional	If `true`, returns a job ID immediately instead of waiting for completion. Default: `false`.
webhook_url	string	Optional	URL to receive a POST callback when async job completes.

Response

Field	Type	Description
status	string	`"success"` or `"error"`
data.title	string	Page title extracted from the document
data.markdown	string	The clean Markdown content
data.byline	string	Author name if detectable
data.tokens_saved	number	Estimated tokens saved vs raw HTML
data.word_count	number	Word count of the extracted content
cached	boolean	Whether this response was served from cache
cost	number	Cost in USD. Zero for cache hits.
mode	string	Extraction mode used

GET /v1/jobs/:jobId

Poll the status and result of an async extraction job.

GET https://contextzip.com/v1/jobs/{jobId}

Returns the current status of an async job. Once status is completed, the result field contains the extracted Markdown.

Job Status Values

Status	Description
`pending`	Job is in the queue, not yet started
`processing`	Browser is rendering the page
`completed`	Extraction finished. Result available in `result` field.
`failed`	Extraction failed. Check `error` field for details.

GET /health

Returns the health status of all system components.

curl https://contextzip.com/health

{
  "status": "healthy",
  "uptime": 86400,
  "services": {
    "database": "ok",
    "redis": "ok",
    "queue": "ok"
  }
}

Extraction Modes

clean $0.005/req · Recommended

Uses Mozilla Readability to identify and extract only the main article content. Strips ads, navigation, sidebars, footers, cookie banners, and other boilerplate. Best signal-to-noise ratio for LLM ingestion.

raw $0.003/req

Converts the entire rendered page to Markdown. Preserves all content including navigation and sidebar. Use when you need maximum coverage or the page structure is non-standard.

summary $0.001/req

Returns the first 500 tokens of clean content. Ideal for quick page previews, duplicate detection, or when you only need a snippet of the content.

Async Jobs

For pages that take longer than 30 seconds to render, use async mode. Submit the job and poll for completion, or receive a webhook callback.

# 1. Submit async job
curl -X POST https://contextzip.com/v1/extract \
  -H "X-API-Key: czk_your_key" \
  -d '{"url":"https://example.com","async":true,"webhook_url":"https://your.app/webhook"}'

# Response: {"jobId": "job_abc123", "status": "pending"}

# 2. Poll for result
curl https://contextzip.com/v1/jobs/job_abc123 \
  -H "X-API-Key: czk_your_key"

Caching

ContextZip caches all successful extractions for 24 hours per URL+mode combination. Cache hits are:

Free — $0 cost, regardless of mode
Fast — Sub-100ms response time
Shared — Any user's extraction warms the cache for everyone

Check the cached: true field in the response to know if you're receiving a cached result.

Rate Limits

Limit	Value	Notes
Default rate limit	60 req/min	Per API key
Sync timeout	30 seconds	Use async mode for slow pages
Max URL length	2048 chars	—
Response max size	10 MB	Raw Markdown output

Rate limit headers are included in every response:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1709123456

Error Codes

Status	Code	Description
400	`INVALID_URL`	The provided URL is malformed or missing
401	`UNAUTHORIZED`	Missing or invalid API key
402	`PAYMENT_REQUIRED`	Skyfire payment token missing or invalid
429	`RATE_LIMITED`	You've exceeded your rate limit. Wait before retrying.
500	`EXTRACTION_FAILED`	Browser failed to render the page
503	`SERVICE_UNAVAILABLE`	Service temporarily unavailable

All errors follow this shape:

{
  "status": "error",
  "error": {
    "code": "INVALID_URL",
    "message": "The provided URL is not valid"
  }
}

Pricing

Mode	Price	Best for
`summary`	$0.001	Quick previews, snippets, classification
`raw`	$0.003	Full-page extraction, non-article pages
`clean`	$0.005	LLM ingestion, RAG, research agents
`cached`	$0.000	Any repeated URL within 24h

💳

A2A Payments: ContextZip supports autonomous agent-to-agent payments via Skyfire. AI agents can pay for extractions directly — no human billing setup required. Pass the Skyfire JWT token in the skyfire-pay-id header.