POST /v1/extract

Scrapes the page, then passes the content to Claude with your schema and optional prompt. Returns structured JSON — no regex, no CSS selectors.

Cost: 5 credits per request

Request

curl -X POST https://www.webglean.com/v1/extract \
  -H "Authorization: Bearer wg_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "schema": {
      "stories": [{ "title": "string", "points": "number", "url": "string" }]
    },
    "prompt": "Extract the top stories from the page"
  }'

Body parameters

ParameterTypeDefaultDescription
urlstringrequiredThe URL to scrape and extract from
schemaobjectrequiredJSON schema describing the structure you want
promptstringoptionalNatural language instruction for the AI

Schema format

The schema is a plain JSON object describing the shape of data you want back. Use primitive types as strings:

{
  "title": "string",
  "price": "number",
  "inStock": "boolean",
  "features": ["string"]
}

Response

{
  "success": true,
  "data": {
    "stories": [
      { "title": "Show HN: I built a web scraper", "points": 342, "url": "https://..." },
      { "title": "Ask HN: Best tools for AI pipelines?", "points": 218, "url": "https://..." }
    ]
  }
}

Use cases

  • Extract product prices, inventory, and descriptions from e-commerce pages
  • Pull contact info, bios, or job listings from company pages
  • Summarize article metadata (author, date, tags, word count)
  • Parse event details (name, date, location, ticket price)

Errors

CodeReason
401Invalid API key
402Insufficient credits
429Rate limit exceeded
400Missing or invalid url, neither schema nor prompt provided, or the target domain doesn't exist
504Extraction timed out — try again
502The target site refused the connection, blocked automated access, or had an SSL error
500Extraction failed for another reason