API Documentation

ContextZip converts any URL into clean, LLM-ready Markdown. One API call replaces complex HTML parsing pipelines โ€” saving up to 90% on token costs.

โ„น๏ธ
Base URL: All API endpoints are available at https://contextzip.com. The interactive OpenAPI spec is at /v1/openapi.json.

Quick Start

Get your first clean Markdown response in under 60 seconds.

1. Get an API key

Log in to the Admin Panel and generate an API key under API Keys.

2. Make your first request

curl -X POST https://contextzip.com/v1/extract \
  -H "X-API-Key: czk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/article","mode":"clean"}'

3. Parse the response

{
  "status": "success",
  "data": {
    "title": "Example Article",
    "markdown": "# Example Article\n\nClean content here...",
    "byline": "John Smith",
    "tokens_saved": 84210,
    "word_count": 1240
  },
  "cached": false,
  "cost": 0.005,
  "mode": "clean"
}
๐Ÿ’ก
Pro tip: Identical URL requests within 24 hours return cached results โ€” completely free. Cache hits respond in under 100ms.

Authentication

All API requests require authentication via an API key passed in the request header.

HeaderValueNotes
X-API-KeystringYour API key. Starts with czk_. Obtain from Admin Panel.
โš ๏ธ
Keep your API key secret. Never expose it in client-side code or public repositories. Rotate it immediately if compromised.

POST /v1/extract

Synchronously extracts and returns clean Markdown from the provided URL. For very slow pages (>15s render time), consider using async mode.

POST https://contextzip.com/v1/extract

Renders the page using a headless browser, extracts main content, converts to Markdown, and returns the result. Results are cached for 24h.

Request Body

ParameterTypeRequiredDescription
url string Required The URL to extract content from. Must be a valid HTTP/HTTPS URL.
mode string Optional Extraction mode: clean (default), raw, or summary. See Extraction Modes.
async boolean Optional If true, returns a job ID immediately instead of waiting for completion. Default: false.
webhook_url string Optional URL to receive a POST callback when async job completes.

Response

FieldTypeDescription
statusstring"success" or "error"
data.titlestringPage title extracted from the document
data.markdownstringThe clean Markdown content
data.bylinestringAuthor name if detectable
data.tokens_savednumberEstimated tokens saved vs raw HTML
data.word_countnumberWord count of the extracted content
cachedbooleanWhether this response was served from cache
costnumberCost in USD. Zero for cache hits.
modestringExtraction mode used

GET /v1/jobs/:jobId

Poll the status and result of an async extraction job.

GET https://contextzip.com/v1/jobs/{jobId}

Returns the current status of an async job. Once status is completed, the result field contains the extracted Markdown.

Job Status Values

StatusDescription
pendingJob is in the queue, not yet started
processingBrowser is rendering the page
completedExtraction finished. Result available in result field.
failedExtraction failed. Check error field for details.

GET /health

Returns the health status of all system components.

curl https://contextzip.com/health

{
  "status": "healthy",
  "uptime": 86400,
  "services": {
    "database": "ok",
    "redis": "ok",
    "queue": "ok"
  }
}

Extraction Modes

clean $0.005/req ยท Recommended

Uses Mozilla Readability to identify and extract only the main article content. Strips ads, navigation, sidebars, footers, cookie banners, and other boilerplate. Best signal-to-noise ratio for LLM ingestion.

raw $0.003/req

Converts the entire rendered page to Markdown. Preserves all content including navigation and sidebar. Use when you need maximum coverage or the page structure is non-standard.

summary $0.001/req

Returns the first 500 tokens of clean content. Ideal for quick page previews, duplicate detection, or when you only need a snippet of the content.

Async Jobs

For pages that take longer than 30 seconds to render, use async mode. Submit the job and poll for completion, or receive a webhook callback.

# 1. Submit async job
curl -X POST https://contextzip.com/v1/extract \
  -H "X-API-Key: czk_your_key" \
  -d '{"url":"https://example.com","async":true,"webhook_url":"https://your.app/webhook"}'

# Response: {"jobId": "job_abc123", "status": "pending"}

# 2. Poll for result
curl https://contextzip.com/v1/jobs/job_abc123 \
  -H "X-API-Key: czk_your_key"

Caching

ContextZip caches all successful extractions for 24 hours per URL+mode combination. Cache hits are:

  • Free โ€” $0 cost, regardless of mode
  • Fast โ€” Sub-100ms response time
  • Shared โ€” Any user's extraction warms the cache for everyone

Check the cached: true field in the response to know if you're receiving a cached result.

Rate Limits

LimitValueNotes
Default rate limit60 req/minPer API key
Sync timeout30 secondsUse async mode for slow pages
Max URL length2048 charsโ€”
Response max size10 MBRaw Markdown output

Rate limit headers are included in every response:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1709123456

Error Codes

StatusCodeDescription
400INVALID_URLThe provided URL is malformed or missing
401UNAUTHORIZEDMissing or invalid API key
402PAYMENT_REQUIREDSkyfire payment token missing or invalid
429RATE_LIMITEDYou've exceeded your rate limit. Wait before retrying.
500EXTRACTION_FAILEDBrowser failed to render the page
503SERVICE_UNAVAILABLEService temporarily unavailable

All errors follow this shape:

{
  "status": "error",
  "error": {
    "code": "INVALID_URL",
    "message": "The provided URL is not valid"
  }
}

Pricing

ModePriceBest for
summary$0.001Quick previews, snippets, classification
raw$0.003Full-page extraction, non-article pages
clean$0.005LLM ingestion, RAG, research agents
cached$0.000Any repeated URL within 24h
๐Ÿ’ณ
A2A Payments: ContextZip supports autonomous agent-to-agent payments via Skyfire. AI agents can pay for extractions directly โ€” no human billing setup required. Pass the Skyfire JWT token in the skyfire-pay-id header.