POST /v1/crawl · GET /v1/crawl/:id
Starts a breadth-first crawl from the given URL, following internal links up to the specified depth and page count. Returns a job ID immediately — poll to get results.
Cost: 1 credit per page crawled
Start a crawl
curl -X POST https://www.webglean.com/v1/crawl \
-H "Authorization: Bearer wg_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"maxDepth": 2,
"maxPages": 20
}'
Body parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | required | The root URL to start crawling from |
maxDepth | number | 2 | Max BFS depth (1 = root page only, 2 = root + linked pages) |
maxPages | number | 10 | Maximum number of pages to crawl |
Response
{
"success": true,
"id": "550e8400-e29b-41d4-a716-446655440000"
}
Poll for results
curl https://www.webglean.com/v1/crawl/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer wg_your_key"
Response
{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"url": "https://docs.example.com",
"status": "done",
"pagesCrawled": 12,
"maxPages": 20,
"creditsUsed": 12,
"createdAt": "2025-06-01T12:00:00Z",
"completedAt": "2025-06-01T12:00:42Z",
"pages": [
{
"url": "https://docs.example.com",
"markdown": "# Welcome\n\n...",
"html": "<h1>Welcome</h1>...",
"text": "Welcome\n\n...",
"metadata": { "title": "Welcome", "statusCode": 200 }
},
{
"url": "https://docs.example.com/quickstart",
"markdown": "# Quickstart\n\n...",
"html": "<h1>Quickstart</h1>...",
"text": "Quickstart\n\n...",
"metadata": { "title": "Quickstart", "statusCode": 200 }
}
]
}
}
pages only includes pages that finished successfully — a page that failed mid-crawl is simply omitted rather than included with an error.
Status values
| Status | Meaning |
|---|---|
pending | Job is queued |
processing | Crawl in progress |
done | Crawl finished — see pagesCrawled vs maxPages for how far it got |
failed | Crawl encountered a fatal error |
Notes
- Credits are charged per page as it's crawled, not upfront for the whole job —
creditsUsedgrows as the crawl progresses. - If you run out of credits mid-crawl, the crawl stops early rather than failing. The job is marked
donewith whatever pages it completed before credits ran out.
WebGlean