WebGlean

Official client libraries for Node.js and Python. Both wrap every endpoint (scrape, crawl, extract, map, search, monitor, batch scrape), throw a typed error on any non-2xx response, and include polling helpers for the two async endpoints (crawl, batch/scrape).

Installation

npm install webglean       # Node.js — requires Node 18+
pip install webglean       # Python — requires Python 3.9+

Package pages: webglean on npm · webglean on PyPI

Node.js

import { WebGlean } from "webglean";

// Falls back to the WEBGLEAN_API_KEY env var if apiKey isn't passed.
const client = new WebGlean({ apiKey: process.env.WEBGLEAN_API_KEY });

const { markdown } = await client.scrape({ url: "https://example.com" });
console.log(markdown);

Methods

Method	Endpoint
`scrape(params)`	`POST /v1/scrape`
`crawl(params)` / `getCrawl(id)` / `crawlAndWait(id, opts?)`	`POST /v1/crawl`, `GET /v1/crawl/:id`
`extract<T>(params)`	`POST /v1/extract`
`map(params)`	`POST /v1/map`
`search(params)`	`POST /v1/search`
`createMonitor(params)` / `listMonitors()` / `getMonitor(id)` / `deleteMonitor(id)`	`POST` / `GET` / `GET` / `DELETE /v1/monitor`
`batchScrape(params)` / `getBatch(id)` / `batchScrapeAndWait(id, opts?)`	`POST /v1/batch/scrape`, `GET /v1/batch/scrape/:id`

Crawling and waiting for the result

const { id } = await client.crawl({ url: "https://example.com", maxDepth: 2, maxPages: 20 });

// Polls GET /v1/crawl/:id every 2s (configurable) until status is "done" or "failed".
const result = await client.crawlAndWait(id, { pollIntervalMs: 2000, timeoutMs: 10 * 60_000 });

for (const page of result.pages) {
  console.log(page.url, page.markdown.slice(0, 80));
}

batchScrapeAndWait(id, opts?) works the same way for POST /v1/batch/scrape.

Errors

Any non-2xx response throws a WebGleanError with .status and .message taken from the API's error body:

import { WebGlean, WebGleanError } from "webglean";

try {
  await client.scrape({ url: "https://example.com" });
} catch (err) {
  if (err instanceof WebGleanError) {
    console.error(err.status, err.message); // e.g. 402 "Insufficient credits"
  }
}

Python

from webglean import WebGlean

# Falls back to the WEBGLEAN_API_KEY env var if api_key isn't passed.
client = WebGlean(api_key="wg_your_key")

result = client.scrape("https://example.com")
print(result["markdown"])

Methods

Method	Endpoint
`scrape(url, format=, only_main_content=)`	`POST /v1/scrape`
`crawl(url, ...)` / `get_crawl(id)` / `crawl_and_wait(id, ...)`	`POST /v1/crawl`, `GET /v1/crawl/:id`
`extract(url, schema=, prompt=)`	`POST /v1/extract`
`map(url, max_urls=, search=)`	`POST /v1/map`
`search(query, num_results=, country=, lang=)`	`POST /v1/search`
`create_monitor(url, ...)` / `list_monitors()` / `get_monitor(id)` / `delete_monitor(id)`	`POST` / `GET` / `GET` / `DELETE /v1/monitor`
`batch_scrape(urls=, items=, ...)` / `get_batch(id)` / `batch_scrape_and_wait(id, ...)`	`POST /v1/batch/scrape`, `GET /v1/batch/scrape/:id`

Crawling and waiting for the result

crawl_id = client.crawl("https://example.com", max_depth=2, max_pages=20)

# Polls GET /v1/crawl/:id every 2s (configurable) until status is "done" or "failed".
result = client.crawl_and_wait(crawl_id, poll_interval=2.0, timeout=600.0)

for page in result["pages"]:
    print(page["url"], page["markdown"][:80])

batch_scrape_and_wait(batch_id, ...) works the same way for POST /v1/batch/scrape.

The client can also be used as a context manager to close its underlying HTTP connection pool: with WebGlean(api_key="...") as client: ....

Errors

Any non-2xx response raises a WebGleanError with .status set from the API's error body:

from webglean import WebGlean, WebGleanError

try:
    client.scrape("https://example.com")
except WebGleanError as err:
    print(err.status, str(err))  # e.g. 402 "Insufficient credits"

Notes shared by both SDKs

crawlAndWait / crawl_and_wait and batchScrapeAndWait / batch_scrape_and_wait raise/throw a WebGleanError with status 0 (no HTTP response — a client-side timeout) if timeoutMs / timeout is exceeded.
map()'s response is not wrapped in data like every other endpoint — both SDKs handle this for you and just return { links, total }.
search() can return a per-result item with markdown: null and error set if that specific page failed to scrape, even though the overall search call succeeded — check error on each item rather than assuming every result has markdown.
The /v1/batch/scrape endpoint does not auto-prepend https:// to bare-domain URLs the way every other endpoint does — pass full URLs (https://example.com, not example.com) when using batchScrape / batch_scrape directly. The CLI and MCP server both normalize this for you automatically.