POST /v1/batch/scrape · GET /v1/batch/scrape/:id
Submits many URLs at once. Unlike a single scrape, a batch runs asynchronously — the POST returns immediately with a batch ID, and you poll for results.
Cost: 1 credit per URL, deducted upfront
Start a batch
curl -X POST https://www.webglean.com/v1/batch/scrape \
-H "Authorization: Bearer wg_your_key" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com", "https://example.org"],
"format": "markdown"
}'
You can also pass items instead of urls if you want to attach your own ID to each result (useful for matching results back to rows in your own database):
{
"items": [
{ "id": "product-1", "url": "https://example.com/products/1" },
{ "id": "product-2", "url": "https://example.com/products/2" }
]
}
If both are provided, items takes precedence.
Body parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | string[] | — | URLs to scrape. Required unless items is provided. |
items | {id?, url}[] | — | URLs with an optional caller-supplied id, returned alongside each result. Takes precedence over urls. |
format | string | "markdown" | Output format for every item: markdown, html, text, json |
onlyMainContent | boolean | true | Strip nav, ads, footers, sidebars |
Max batch size is 500 URLs per request.
Response
{
"success": true,
"id": "8f14e45f-ceea-467e-bd7c-9f42c8f5a1b2",
"total": 2
}
Returned with status 202 — the batch is still processing when you get this response.
Poll for results
curl https://www.webglean.com/v1/batch/scrape/8f14e45f-ceea-467e-bd7c-9f42c8f5a1b2 \
-H "Authorization: Bearer wg_your_key"
Response
{
"success": true,
"status": "done",
"total": 2,
"completed": 1,
"failed": 1,
"creditsUsed": 2,
"results": [
{
"id": "product-1",
"url": "https://example.com/products/1",
"status": "done",
"data": {
"markdown": "# Product 1\n\n...",
"html": "<h1>Product 1</h1>...",
"text": "Product 1\n\n...",
"metadata": { "title": "Product 1", "statusCode": 200 }
}
},
{
"id": "product-2",
"url": "https://example.com/products/2",
"status": "failed",
"error": "Page failed to load"
}
]
}
id is only present on a result if you supplied one via items. data is only present when status is done; error is only present when status is failed.
Status values
| Status | Meaning |
|---|---|
pending | Batch is queued |
processing | Some items still running |
done | All items finished (check each result's status for per-item success/failure) |
Errors
| Code | Reason |
|---|---|
401 | Invalid API key |
400 | Invalid JSON, missing both urls and items, batch exceeds 500 items, or an item URL is invalid |
402 | Insufficient credits for the full batch |
429 | Rate limit exceeded |
WebGlean