API Documentation
ContextZip converts any URL into clean, LLM-ready Markdown. One API call replaces complex HTML parsing pipelines โ saving up to 90% on token costs.
https://contextzip.com. The interactive OpenAPI spec is at /v1/openapi.json.Quick Start
Get your first clean Markdown response in under 60 seconds.
1. Get an API key
Log in to the Admin Panel and generate an API key under API Keys.
2. Make your first request
curl -X POST https://contextzip.com/v1/extract \ -H "X-API-Key: czk_your_key_here" \ -H "Content-Type: application/json" \ -d '{"url":"https://example.com/article","mode":"clean"}'
3. Parse the response
{
"status": "success",
"data": {
"title": "Example Article",
"markdown": "# Example Article\n\nClean content here...",
"byline": "John Smith",
"tokens_saved": 84210,
"word_count": 1240
},
"cached": false,
"cost": 0.005,
"mode": "clean"
}Authentication
All API requests require authentication via an API key passed in the request header.
| Header | Value | Notes |
|---|---|---|
| X-API-Key | string | Your API key. Starts with czk_. Obtain from Admin Panel. |
POST /v1/extract
Synchronously extracts and returns clean Markdown from the provided URL. For very slow pages (>15s render time), consider using async mode.
https://contextzip.com/v1/extract
Renders the page using a headless browser, extracts main content, converts to Markdown, and returns the result. Results are cached for 24h.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Required | The URL to extract content from. Must be a valid HTTP/HTTPS URL. |
| mode | string | Optional | Extraction mode: clean (default), raw, or summary. See Extraction Modes. |
| async | boolean | Optional | If true, returns a job ID immediately instead of waiting for completion. Default: false. |
| webhook_url | string | Optional | URL to receive a POST callback when async job completes. |
Response
| Field | Type | Description |
|---|---|---|
| status | string | "success" or "error" |
| data.title | string | Page title extracted from the document |
| data.markdown | string | The clean Markdown content |
| data.byline | string | Author name if detectable |
| data.tokens_saved | number | Estimated tokens saved vs raw HTML |
| data.word_count | number | Word count of the extracted content |
| cached | boolean | Whether this response was served from cache |
| cost | number | Cost in USD. Zero for cache hits. |
| mode | string | Extraction mode used |
GET /v1/jobs/:jobId
Poll the status and result of an async extraction job.
https://contextzip.com/v1/jobs/{jobId}
Returns the current status of an async job. Once status is completed, the result field contains the extracted Markdown.
Job Status Values
| Status | Description |
|---|---|
pending | Job is in the queue, not yet started |
processing | Browser is rendering the page |
completed | Extraction finished. Result available in result field. |
failed | Extraction failed. Check error field for details. |
GET /health
Returns the health status of all system components.
curl https://contextzip.com/health { "status": "healthy", "uptime": 86400, "services": { "database": "ok", "redis": "ok", "queue": "ok" } }
Extraction Modes
clean $0.005/req ยท Recommended
Uses Mozilla Readability to identify and extract only the main article content. Strips ads, navigation, sidebars, footers, cookie banners, and other boilerplate. Best signal-to-noise ratio for LLM ingestion.
raw $0.003/req
Converts the entire rendered page to Markdown. Preserves all content including navigation and sidebar. Use when you need maximum coverage or the page structure is non-standard.
summary $0.001/req
Returns the first 500 tokens of clean content. Ideal for quick page previews, duplicate detection, or when you only need a snippet of the content.
Async Jobs
For pages that take longer than 30 seconds to render, use async mode. Submit the job and poll for completion, or receive a webhook callback.
# 1. Submit async job curl -X POST https://contextzip.com/v1/extract \ -H "X-API-Key: czk_your_key" \ -d '{"url":"https://example.com","async":true,"webhook_url":"https://your.app/webhook"}' # Response: {"jobId": "job_abc123", "status": "pending"} # 2. Poll for result curl https://contextzip.com/v1/jobs/job_abc123 \ -H "X-API-Key: czk_your_key"
Caching
ContextZip caches all successful extractions for 24 hours per URL+mode combination. Cache hits are:
- Free โ $0 cost, regardless of mode
- Fast โ Sub-100ms response time
- Shared โ Any user's extraction warms the cache for everyone
Check the cached: true field in the response to know if you're receiving a cached result.
Rate Limits
| Limit | Value | Notes |
|---|---|---|
| Default rate limit | 60 req/min | Per API key |
| Sync timeout | 30 seconds | Use async mode for slow pages |
| Max URL length | 2048 chars | โ |
| Response max size | 10 MB | Raw Markdown output |
Rate limit headers are included in every response:
X-RateLimit-Limit: 60 X-RateLimit-Remaining: 58 X-RateLimit-Reset: 1709123456
Error Codes
| Status | Code | Description |
|---|---|---|
| 400 | INVALID_URL | The provided URL is malformed or missing |
| 401 | UNAUTHORIZED | Missing or invalid API key |
| 402 | PAYMENT_REQUIRED | Skyfire payment token missing or invalid |
| 429 | RATE_LIMITED | You've exceeded your rate limit. Wait before retrying. |
| 500 | EXTRACTION_FAILED | Browser failed to render the page |
| 503 | SERVICE_UNAVAILABLE | Service temporarily unavailable |
All errors follow this shape:
{
"status": "error",
"error": {
"code": "INVALID_URL",
"message": "The provided URL is not valid"
}
}Pricing
| Mode | Price | Best for |
|---|---|---|
summary | $0.001 | Quick previews, snippets, classification |
raw | $0.003 | Full-page extraction, non-article pages |
clean | $0.005 | LLM ingestion, RAG, research agents |
cached | $0.000 | Any repeated URL within 24h |
skyfire-pay-id header.