POST /v1/extract
Scrapes the page, then passes the content to Claude with your schema and optional prompt. Returns structured JSON — no regex, no CSS selectors.
Cost: 5 credits per request
Request
curl -X POST https://www.webglean.com/v1/extract \
-H "Authorization: Bearer wg_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"schema": {
"stories": [{ "title": "string", "points": "number", "url": "string" }]
},
"prompt": "Extract the top stories from the page"
}'
Body parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | required | The URL to scrape and extract from |
schema | object | required | JSON schema describing the structure you want |
prompt | string | optional | Natural language instruction for the AI |
Schema format
The schema is a plain JSON object describing the shape of data you want back. Use primitive types as strings:
{
"title": "string",
"price": "number",
"inStock": "boolean",
"features": ["string"]
}
Response
{
"success": true,
"data": {
"stories": [
{ "title": "Show HN: I built a web scraper", "points": 342, "url": "https://..." },
{ "title": "Ask HN: Best tools for AI pipelines?", "points": 218, "url": "https://..." }
]
}
}
Use cases
- Extract product prices, inventory, and descriptions from e-commerce pages
- Pull contact info, bios, or job listings from company pages
- Summarize article metadata (author, date, tags, word count)
- Parse event details (name, date, location, ticket price)
Errors
| Code | Reason |
|---|---|
401 | Invalid API key |
402 | Insufficient credits |
429 | Rate limit exceeded |
400 | Missing or invalid url, neither schema nor prompt provided, or the target domain doesn't exist |
504 | Extraction timed out — try again |
502 | The target site refused the connection, blocked automated access, or had an SSL error |
500 | Extraction failed for another reason |
WebGlean