POST /v1/crawl · GET /v1/crawl/:id

Starts a breadth-first crawl from the given URL, following internal links up to the specified depth and page count. Returns a job ID immediately — poll to get results.

Cost: 1 credit per page crawled

Start a crawl

curl -X POST https://www.webglean.com/v1/crawl \
  -H "Authorization: Bearer wg_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "maxDepth": 2,
    "maxPages": 20
  }'

Body parameters

ParameterTypeDefaultDescription
urlstringrequiredThe root URL to start crawling from
maxDepthnumber2Max BFS depth (1 = root page only, 2 = root + linked pages)
maxPagesnumber10Maximum number of pages to crawl

Response

{
  "success": true,
  "id": "550e8400-e29b-41d4-a716-446655440000"
}

Poll for results

curl https://www.webglean.com/v1/crawl/550e8400-e29b-41d4-a716-446655440000 \
  -H "Authorization: Bearer wg_your_key"

Response

{
  "success": true,
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://docs.example.com",
    "status": "done",
    "pagesCrawled": 12,
    "maxPages": 20,
    "creditsUsed": 12,
    "createdAt": "2025-06-01T12:00:00Z",
    "completedAt": "2025-06-01T12:00:42Z",
    "pages": [
      {
        "url": "https://docs.example.com",
        "markdown": "# Welcome\n\n...",
        "html": "<h1>Welcome</h1>...",
        "text": "Welcome\n\n...",
        "metadata": { "title": "Welcome", "statusCode": 200 }
      },
      {
        "url": "https://docs.example.com/quickstart",
        "markdown": "# Quickstart\n\n...",
        "html": "<h1>Quickstart</h1>...",
        "text": "Quickstart\n\n...",
        "metadata": { "title": "Quickstart", "statusCode": 200 }
      }
    ]
  }
}

pages only includes pages that finished successfully — a page that failed mid-crawl is simply omitted rather than included with an error.

Status values

StatusMeaning
pendingJob is queued
processingCrawl in progress
doneCrawl finished — see pagesCrawled vs maxPages for how far it got
failedCrawl encountered a fatal error

Notes

  • Credits are charged per page as it's crawled, not upfront for the whole job — creditsUsed grows as the crawl progresses.
  • If you run out of credits mid-crawl, the crawl stops early rather than failing. The job is marked done with whatever pages it completed before credits ran out.