POST /v1/scrape

Renders the page with a full Playwright browser (JavaScript enabled), cleans content with Mozilla Readability, and returns it in your chosen format.

Cost: 1 credit per request

Request

curl -X POST https://www.webglean.com/v1/scrape \
  -H "Authorization: Bearer wg_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "format": "markdown"
  }'

Body parameters

ParameterTypeDefaultDescription
urlstringrequiredThe URL to scrape
formatstring"markdown"Output format: markdown, html, text, json
onlyMainContentbooleantrueStrip nav, ads, footers, sidebars

Formats

  • markdown — Clean Markdown, ideal for LLMs and RAG pipelines
  • html — Cleaned HTML with scripts/styles removed
  • text — Plain text, no markup
  • json — Structured metadata: title, description, links, word count

Response

{
  "success": true,
  "data": {
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "html": "<h1>Example Domain</h1><p>This domain is for use...</p>",
    "text": "Example Domain\n\nThis domain is for use...",
    "metadata": {
      "title": "Example Domain",
      "description": null,
      "url": "https://example.com",
      "statusCode": 200
    }
  }
}

Errors

CodeReason
401Invalid API key
402Insufficient credits
429Rate limit exceeded
400Missing/invalid url, or the target domain doesn't exist
408Scrape timed out — try again
502The target site refused the connection, blocked automated access, or had an SSL error
500Scrape failed for another reason