WebGlean

POST /v1/extract

Scrapes the page, then passes the content to Claude with your schema and optional prompt. Returns structured JSON — no regex, no CSS selectors.

Cost: 5 credits per request

Request

curl -X POST https://www.webglean.com/v1/extract \
  -H "Authorization: Bearer wg_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "schema": {
      "stories": [{ "title": "string", "points": "number", "url": "string" }]
    },
    "prompt": "Extract the top stories from the page"
  }'

Body parameters

Parameter	Type	Default	Description
`url`	string	required	The URL to scrape and extract from
`schema`	object	required	JSON schema describing the structure you want
`prompt`	string	optional	Natural language instruction for the AI

Schema format

The schema is a plain JSON object describing the shape of data you want back. Use primitive types as strings:

{
  "title": "string",
  "price": "number",
  "inStock": "boolean",
  "features": ["string"]
}

Response

{
  "success": true,
  "data": {
    "stories": [
      { "title": "Show HN: I built a web scraper", "points": 342, "url": "https://..." },
      { "title": "Ask HN: Best tools for AI pipelines?", "points": 218, "url": "https://..." }
    ]
  }
}

Use cases

Extract product prices, inventory, and descriptions from e-commerce pages
Pull contact info, bios, or job listings from company pages
Summarize article metadata (author, date, tags, word count)
Parse event details (name, date, location, ticket price)

Errors

Code	Reason
`401`	Invalid API key
`402`	Insufficient credits
`429`	Rate limit exceeded
`400`	Missing or invalid `url`, neither `schema` nor `prompt` provided, or the target domain doesn't exist
`504`	Extraction timed out — try again
`502`	The target site refused the connection, blocked automated access, or had an SSL error
`500`	Extraction failed for another reason