What teams build with
ContextZip

From autonomous research agents to production RAG pipelines โ€” see how AI teams use ContextZip to cut token costs and ship faster.

๐Ÿ”ฌ
Research Agents

Let your agent read
the entire web

Research agents using Claude or GPT-4 often hit context limits reading raw HTML. ContextZip gives them clean, token-efficient Markdown โ€” so they can read 10x more sources per run without hitting limits or blowing budgets.

  • 90% fewer tokens per web page on average
  • Clean content means higher-quality LLM outputs
  • Cache means popular pages are always free
  • Works with LangChain, AutoGen, CrewAI, and raw API calls
See integration guide โ†’
Python ยท LangChain Tool
from langchain.tools import Tool
import requests

def read_web_page(url: str) -> str:
    """Fetch clean Markdown from any URL."""
    r = requests.post(
        "https://contextzip.com/v1/extract",
        headers={"X-API-Key": CONTEXTZIP_KEY},
        json={"url": url, "mode": "clean"},
        timeout=30
    )
    data = r.json()
    return data["data"]["markdown"]

# Register as an agent tool
web_reader = Tool(
    name="read_url",
    func=read_web_page,
    description="Read and extract content from any web URL. "
                 "Returns clean Markdown. Use for articles, docs, reports."
)

# Add to your agent
agent = initialize_agent(
    tools=[web_reader, ...],
    llm=ChatOpenAI(model="gpt-4o"),
    agent=AgentType.OPENAI_FUNCTIONS,
)
๐Ÿฆพ
OpenClaw ยท Quick Start

Add web reading to
OpenClaw in 5 minutes

OpenClaw agents can read any URL out of the box with ContextZip. Define the tool once in your YAML config, set the API key, and your agent can browse the web for research, fact-checking, or monitoring.

  • No code changes โ€” pure YAML tool definition
  • Agent pays per-use via Skyfire A2A payments
  • Works with any OpenClaw agent configuration
  • Responses automatically cached โ€” fast repeated reads
View on GitHub โ†’
YAML ยท OpenClaw tool.yml
# tools/read_url.yml โ€” add to your OpenClaw agent

name: read_url
description: |
  Fetch and extract clean Markdown content from any web URL.
  Use when you need to read articles, documentation, news,
  or any web page to gather information.

endpoint: https://contextzip.com/v1/extract
method: POST

headers:
  X-API-Key: ${CONTEXTZIP_API_KEY}
  Content-Type: application/json

body_template:
  url: ${url}
  mode: "clean"

response_mapping:
  content: data.markdown
  title: data.title
  cached: cached

parameters:
  - name: url
    type: string
    description: "The full URL of the web page to read"
    required: true
    validation:
      pattern: "^https?://"

---
# .env โ€” set your key
CONTEXTZIP_API_KEY=czk_your_key_here
๐Ÿ“š
RAG Pipelines

Clean Markdown,
straight into your vector DB

Building a RAG system that ingests web content? Skip the HTML preprocessing step entirely. ContextZip returns structured Markdown that's ready to chunk and embed โ€” no BeautifulSoup, no custom parsers.

  • Structured headings map naturally to chunk boundaries
  • No preprocessing pipeline to maintain
  • Token count included for chunking strategy
  • Works with Pinecone, Weaviate, Qdrant, pgvector
See API reference โ†’
Python ยท Pinecone RAG
import requests
from openai import OpenAI
from pinecone import Pinecone

client = OpenAI()
pc = Pinecone(api_key=PINECONE_KEY)
index = pc.Index("my-knowledge-base")

def ingest_url(url: str, doc_id: str):
    # 1. Extract clean Markdown
    r = requests.post(
        "https://contextzip.com/v1/extract",
        headers={"X-API-Key": CONTEXTZIP_KEY},
        json={"url": url, "mode": "clean"}
    )
    content = r.json()["data"]["markdown"]

    # 2. Chunk by headings (Markdown makes this trivial)
    chunks = [c.strip() for c in content.split("\n## ") if c]

    # 3. Embed and upsert
    vectors = []
    for i, chunk in enumerate(chunks):
        emb = client.embeddings.create(
            input=chunk, model="text-embedding-3-small"
        ).data[0].embedding
        vectors.append({
            "id": f"{doc_id}_{i}",
            "values": emb,
            "metadata": {"url": url, "chunk": chunk[:500]}
        })

    index.upsert(vectors=vectors)
    print(f"Ingested {len(vectors)} chunks from {url}")
๐Ÿ“ˆ
Price Monitoring

Track prices and product
data at scale

Price monitoring bots traditionally scrape raw HTML and parse fragile CSS selectors. ContextZip delivers structured Markdown โ€” feed it to an LLM to extract prices, specs, or any structured data without maintaining custom parsers.

  • No selector maintenance โ€” LLM handles extraction
  • Handles JavaScript-heavy SPAs (React, Vue, Angular)
  • 24h cache prevents re-scraping unchanged pages
  • Rate limiting keeps you under site restrictions
Python ยท Price Monitor
import requests
from openai import OpenAI

client = OpenAI()

def extract_price(product_url: str) -> dict:
    # Get clean page content
    page = requests.post(
        "https://contextzip.com/v1/extract",
        headers={"X-API-Key": CONTEXTZIP_KEY},
        json={"url": product_url, "mode": "clean"}
    ).json()["data"]["markdown"]

    # Let GPT extract structured data
    result = client.chat.completions.create(
        model="gpt-4o-mini",
        response_format={"type": "json_object"},
        messages=[{
            "role": "user",
            "content": f"Extract: name, price, currency, in_stock\n\n{page}"
        }]
    )
    return result.choices[0].message.content

# Monitor 1000 product pages for ~$3 total
prices = [extract_price(url) for url in product_urls]
๐Ÿ“ฐ
News & Intelligence

Build AI-powered
news aggregators

Aggregate and summarize news from dozens of sources without wrestling with different site structures. ContextZip normalizes every page to clean Markdown โ€” your summarization pipeline stays simple.

  • Works on any news site regardless of structure
  • Byline and date extraction included
  • Async mode for batch processing dozens of articles
  • Summary mode for quick headline + snippet extraction
JavaScript ยท News Digest
const sources = [
  "https://techcrunch.com/latest",
  "https://hnrss.org/frontpage",
  "https://news.ycombinator.com",
];

async function buildDigest(urls) {
  // Parallel extraction โ€” all cached after first run
  const pages = await Promise.all(
    urls.map(url =>
      fetch("https://contextzip.com/v1/extract", {
        method: "POST",
        headers: {
          "X-API-Key": process.env.CONTEXTZIP_KEY,
          "Content-Type": "application/json",
        },
        body: JSON.stringify({ url, mode: "summary" }), // $0.001/req
      }).then(r => r.json())
    )
  );

  const combined = pages
    .map(p => `### ${p.data.title}\n${p.data.markdown}`)
    .join("\n\n---\n\n");

  return summarizeWithLLM(combined); // Your summarization step
}
๐ŸŽฏ
Competitive Intelligence

Monitor competitors
automatically

Track competitor pricing pages, product launches, and blog posts. Set up a scheduled job that runs nightly โ€” ContextZip caches unchanged pages, so you only pay when content actually changes.

  • Cache invalidation forces fresh crawl when needed
  • Diff-detect changes using LLM comparison
  • Alert on pricing changes, new product launches
  • Run nightly for fraction of traditional scraper costs
Python ยท Competitive Monitor
import hashlib, requests from openai import OpenAI client = OpenAI() DB = {} # your actual DB here def check_for_changes(competitor_url: str): r = requests.post( "https://contextzip.com/v1/extract", headers={"X-API-Key": CONTEXTZIP_KEY}, json={"url": competitor_url, "mode": "clean"} ) data = r.json() # Skip if cached (no content change) if data.get("cached"): return None content = data["data"]["markdown"] content_hash = hashlib.sha256(content.encode()).hexdigest()[:12] previous = DB.get(competitor_url) DB[competitor_url] = {"hash": content_hash, "content": content} if previous and previous["hash"] != content_hash: return summarize_diff(previous["content"], content) def summarize_diff(old, new): resp = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f"What changed between these two versions?\n\nOLD:\n{old[:2000]}\n\nNEW:\n{new[:2000]}"}] ) return resp.choices[0].message.content

Ready to build?

Get an API key and make your first request in under 60 seconds.