POST /v1/batch/scrape · GET /v1/batch/scrape/:id

Submits many URLs at once. Unlike a single scrape, a batch runs asynchronously — the POST returns immediately with a batch ID, and you poll for results.

Cost: 1 credit per URL, deducted upfront

Start a batch

curl -X POST https://www.webglean.com/v1/batch/scrape \
  -H "Authorization: Bearer wg_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com", "https://example.org"],
    "format": "markdown"
  }'

You can also pass items instead of urls if you want to attach your own ID to each result (useful for matching results back to rows in your own database):

{
  "items": [
    { "id": "product-1", "url": "https://example.com/products/1" },
    { "id": "product-2", "url": "https://example.com/products/2" }
  ]
}

If both are provided, items takes precedence.

Body parameters

ParameterTypeDefaultDescription
urlsstring[]URLs to scrape. Required unless items is provided.
items{id?, url}[]URLs with an optional caller-supplied id, returned alongside each result. Takes precedence over urls.
formatstring"markdown"Output format for every item: markdown, html, text, json
onlyMainContentbooleantrueStrip nav, ads, footers, sidebars

Max batch size is 500 URLs per request.

Response

{
  "success": true,
  "id": "8f14e45f-ceea-467e-bd7c-9f42c8f5a1b2",
  "total": 2
}

Returned with status 202 — the batch is still processing when you get this response.

Poll for results

curl https://www.webglean.com/v1/batch/scrape/8f14e45f-ceea-467e-bd7c-9f42c8f5a1b2 \
  -H "Authorization: Bearer wg_your_key"

Response

{
  "success": true,
  "status": "done",
  "total": 2,
  "completed": 1,
  "failed": 1,
  "creditsUsed": 2,
  "results": [
    {
      "id": "product-1",
      "url": "https://example.com/products/1",
      "status": "done",
      "data": {
        "markdown": "# Product 1\n\n...",
        "html": "<h1>Product 1</h1>...",
        "text": "Product 1\n\n...",
        "metadata": { "title": "Product 1", "statusCode": 200 }
      }
    },
    {
      "id": "product-2",
      "url": "https://example.com/products/2",
      "status": "failed",
      "error": "Page failed to load"
    }
  ]
}

id is only present on a result if you supplied one via items. data is only present when status is done; error is only present when status is failed.

Status values

StatusMeaning
pendingBatch is queued
processingSome items still running
doneAll items finished (check each result's status for per-item success/failure)

Errors

CodeReason
401Invalid API key
400Invalid JSON, missing both urls and items, batch exceeds 500 items, or an item URL is invalid
402Insufficient credits for the full batch
429Rate limit exceeded