WebGlean

POST /v1/crawl · GET /v1/crawl/:id

Starts a breadth-first crawl from the given URL, following internal links up to the specified depth and page count. Returns a job ID immediately — poll to get results.

Cost: 1 credit per page crawled

Start a crawl

curl -X POST https://www.webglean.com/v1/crawl \
  -H "Authorization: Bearer wg_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "maxDepth": 2,
    "maxPages": 20
  }'

Body parameters

Parameter	Type	Default	Description
`url`	string	required	The root URL to start crawling from
`maxDepth`	number	`2`	Max BFS depth (1 = root page only, 2 = root + linked pages)
`maxPages`	number	`10`	Maximum number of pages to crawl

Response

{
  "success": true,
  "id": "550e8400-e29b-41d4-a716-446655440000"
}

Poll for results

curl https://www.webglean.com/v1/crawl/550e8400-e29b-41d4-a716-446655440000 \
  -H "Authorization: Bearer wg_your_key"

Response

{
  "success": true,
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://docs.example.com",
    "status": "done",
    "pagesCrawled": 12,
    "maxPages": 20,
    "creditsUsed": 12,
    "createdAt": "2025-06-01T12:00:00Z",
    "completedAt": "2025-06-01T12:00:42Z",
    "pages": [
      {
        "url": "https://docs.example.com",
        "markdown": "# Welcome\n\n...",
        "html": "<h1>Welcome</h1>...",
        "text": "Welcome\n\n...",
        "metadata": { "title": "Welcome", "statusCode": 200 }
      },
      {
        "url": "https://docs.example.com/quickstart",
        "markdown": "# Quickstart\n\n...",
        "html": "<h1>Quickstart</h1>...",
        "text": "Quickstart\n\n...",
        "metadata": { "title": "Quickstart", "statusCode": 200 }
      }
    ]
  }
}

pages only includes pages that finished successfully — a page that failed mid-crawl is simply omitted rather than included with an error.

Status values

Status	Meaning
`pending`	Job is queued
`processing`	Crawl in progress
`done`	Crawl finished — see `pagesCrawled` vs `maxPages` for how far it got
`failed`	Crawl encountered a fatal error

Notes

Credits are charged per page as it's crawled, not upfront for the whole job — creditsUsed grows as the crawl progresses.
If you run out of credits mid-crawl, the crawl stops early rather than failing. The job is marked done with whatever pages it completed before credits ran out.