POST /v1/scrape
Renders the page with a full Playwright browser (JavaScript enabled), cleans content with Mozilla Readability, and returns it in your chosen format.
Cost: 1 credit per request
Request
curl -X POST https://www.webglean.com/v1/scrape \
-H "Authorization: Bearer wg_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"format": "markdown"
}'
Body parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | required | The URL to scrape |
format | string | "markdown" | Output format: markdown, html, text, json |
onlyMainContent | boolean | true | Strip nav, ads, footers, sidebars |
Formats
markdown— Clean Markdown, ideal for LLMs and RAG pipelineshtml— Cleaned HTML with scripts/styles removedtext— Plain text, no markupjson— Structured metadata: title, description, links, word count
Response
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"html": "<h1>Example Domain</h1><p>This domain is for use...</p>",
"text": "Example Domain\n\nThis domain is for use...",
"metadata": {
"title": "Example Domain",
"description": null,
"url": "https://example.com",
"statusCode": 200
}
}
}
Errors
| Code | Reason |
|---|---|
401 | Invalid API key |
402 | Insufficient credits |
429 | Rate limit exceeded |
400 | Missing/invalid url, or the target domain doesn't exist |
408 | Scrape timed out — try again |
502 | The target site refused the connection, blocked automated access, or had an SSL error |
500 | Scrape failed for another reason |
WebGlean