Official client libraries for Node.js and Python. Both wrap every endpoint (scrape, crawl, extract, map, search, monitor, batch scrape), throw a typed error on any non-2xx response, and include polling helpers for the two async endpoints (crawl, batch/scrape).

Installation

npm install webglean       # Node.js — requires Node 18+
pip install webglean       # Python — requires Python 3.9+

Package pages: webglean on npm · webglean on PyPI

Node.js

import { WebGlean } from "webglean";

// Falls back to the WEBGLEAN_API_KEY env var if apiKey isn't passed.
const client = new WebGlean({ apiKey: process.env.WEBGLEAN_API_KEY });

const { markdown } = await client.scrape({ url: "https://example.com" });
console.log(markdown);

Methods

MethodEndpoint
scrape(params)POST /v1/scrape
crawl(params) / getCrawl(id) / crawlAndWait(id, opts?)POST /v1/crawl, GET /v1/crawl/:id
extract<T>(params)POST /v1/extract
map(params)POST /v1/map
search(params)POST /v1/search
createMonitor(params) / listMonitors() / getMonitor(id) / deleteMonitor(id)POST / GET / GET / DELETE /v1/monitor
batchScrape(params) / getBatch(id) / batchScrapeAndWait(id, opts?)POST /v1/batch/scrape, GET /v1/batch/scrape/:id

Crawling and waiting for the result

const { id } = await client.crawl({ url: "https://example.com", maxDepth: 2, maxPages: 20 });

// Polls GET /v1/crawl/:id every 2s (configurable) until status is "done" or "failed".
const result = await client.crawlAndWait(id, { pollIntervalMs: 2000, timeoutMs: 10 * 60_000 });

for (const page of result.pages) {
  console.log(page.url, page.markdown.slice(0, 80));
}

batchScrapeAndWait(id, opts?) works the same way for POST /v1/batch/scrape.

Errors

Any non-2xx response throws a WebGleanError with .status and .message taken from the API's error body:

import { WebGlean, WebGleanError } from "webglean";

try {
  await client.scrape({ url: "https://example.com" });
} catch (err) {
  if (err instanceof WebGleanError) {
    console.error(err.status, err.message); // e.g. 402 "Insufficient credits"
  }
}

Python

from webglean import WebGlean

# Falls back to the WEBGLEAN_API_KEY env var if api_key isn't passed.
client = WebGlean(api_key="wg_your_key")

result = client.scrape("https://example.com")
print(result["markdown"])

Methods

MethodEndpoint
scrape(url, format=, only_main_content=)POST /v1/scrape
crawl(url, ...) / get_crawl(id) / crawl_and_wait(id, ...)POST /v1/crawl, GET /v1/crawl/:id
extract(url, schema=, prompt=)POST /v1/extract
map(url, max_urls=, search=)POST /v1/map
search(query, num_results=, country=, lang=)POST /v1/search
create_monitor(url, ...) / list_monitors() / get_monitor(id) / delete_monitor(id)POST / GET / GET / DELETE /v1/monitor
batch_scrape(urls=, items=, ...) / get_batch(id) / batch_scrape_and_wait(id, ...)POST /v1/batch/scrape, GET /v1/batch/scrape/:id

Crawling and waiting for the result

crawl_id = client.crawl("https://example.com", max_depth=2, max_pages=20)

# Polls GET /v1/crawl/:id every 2s (configurable) until status is "done" or "failed".
result = client.crawl_and_wait(crawl_id, poll_interval=2.0, timeout=600.0)

for page in result["pages"]:
    print(page["url"], page["markdown"][:80])

batch_scrape_and_wait(batch_id, ...) works the same way for POST /v1/batch/scrape.

The client can also be used as a context manager to close its underlying HTTP connection pool: with WebGlean(api_key="...") as client: ....

Errors

Any non-2xx response raises a WebGleanError with .status set from the API's error body:

from webglean import WebGlean, WebGleanError

try:
    client.scrape("https://example.com")
except WebGleanError as err:
    print(err.status, str(err))  # e.g. 402 "Insufficient credits"

Notes shared by both SDKs

  • crawlAndWait / crawl_and_wait and batchScrapeAndWait / batch_scrape_and_wait raise/throw a WebGleanError with status 0 (no HTTP response — a client-side timeout) if timeoutMs / timeout is exceeded.
  • map()'s response is not wrapped in data like every other endpoint — both SDKs handle this for you and just return { links, total }.
  • search() can return a per-result item with markdown: null and error set if that specific page failed to scrape, even though the overall search call succeeded — check error on each item rather than assuming every result has markdown.
  • The /v1/batch/scrape endpoint does not auto-prepend https:// to bare-domain URLs the way every other endpoint does — pass full URLs (https://example.com, not example.com) when using batchScrape / batch_scrape directly. The CLI and MCP server both normalize this for you automatically.