Official client libraries for Node.js and Python. Both wrap every endpoint (scrape, crawl, extract, map, search, monitor, batch scrape), throw a typed error on any non-2xx response, and include polling helpers for the two async endpoints (crawl, batch/scrape).
Installation
npm install webglean # Node.js — requires Node 18+
pip install webglean # Python — requires Python 3.9+
Package pages: webglean on npm · webglean on PyPI
Node.js
import { WebGlean } from "webglean";
// Falls back to the WEBGLEAN_API_KEY env var if apiKey isn't passed.
const client = new WebGlean({ apiKey: process.env.WEBGLEAN_API_KEY });
const { markdown } = await client.scrape({ url: "https://example.com" });
console.log(markdown);
Methods
| Method | Endpoint |
|---|---|
scrape(params) | POST /v1/scrape |
crawl(params) / getCrawl(id) / crawlAndWait(id, opts?) | POST /v1/crawl, GET /v1/crawl/:id |
extract<T>(params) | POST /v1/extract |
map(params) | POST /v1/map |
search(params) | POST /v1/search |
createMonitor(params) / listMonitors() / getMonitor(id) / deleteMonitor(id) | POST / GET / GET / DELETE /v1/monitor |
batchScrape(params) / getBatch(id) / batchScrapeAndWait(id, opts?) | POST /v1/batch/scrape, GET /v1/batch/scrape/:id |
Crawling and waiting for the result
const { id } = await client.crawl({ url: "https://example.com", maxDepth: 2, maxPages: 20 });
// Polls GET /v1/crawl/:id every 2s (configurable) until status is "done" or "failed".
const result = await client.crawlAndWait(id, { pollIntervalMs: 2000, timeoutMs: 10 * 60_000 });
for (const page of result.pages) {
console.log(page.url, page.markdown.slice(0, 80));
}
batchScrapeAndWait(id, opts?) works the same way for POST /v1/batch/scrape.
Errors
Any non-2xx response throws a WebGleanError with .status and .message taken from the API's error body:
import { WebGlean, WebGleanError } from "webglean";
try {
await client.scrape({ url: "https://example.com" });
} catch (err) {
if (err instanceof WebGleanError) {
console.error(err.status, err.message); // e.g. 402 "Insufficient credits"
}
}
Python
from webglean import WebGlean
# Falls back to the WEBGLEAN_API_KEY env var if api_key isn't passed.
client = WebGlean(api_key="wg_your_key")
result = client.scrape("https://example.com")
print(result["markdown"])
Methods
| Method | Endpoint |
|---|---|
scrape(url, format=, only_main_content=) | POST /v1/scrape |
crawl(url, ...) / get_crawl(id) / crawl_and_wait(id, ...) | POST /v1/crawl, GET /v1/crawl/:id |
extract(url, schema=, prompt=) | POST /v1/extract |
map(url, max_urls=, search=) | POST /v1/map |
search(query, num_results=, country=, lang=) | POST /v1/search |
create_monitor(url, ...) / list_monitors() / get_monitor(id) / delete_monitor(id) | POST / GET / GET / DELETE /v1/monitor |
batch_scrape(urls=, items=, ...) / get_batch(id) / batch_scrape_and_wait(id, ...) | POST /v1/batch/scrape, GET /v1/batch/scrape/:id |
Crawling and waiting for the result
crawl_id = client.crawl("https://example.com", max_depth=2, max_pages=20)
# Polls GET /v1/crawl/:id every 2s (configurable) until status is "done" or "failed".
result = client.crawl_and_wait(crawl_id, poll_interval=2.0, timeout=600.0)
for page in result["pages"]:
print(page["url"], page["markdown"][:80])
batch_scrape_and_wait(batch_id, ...) works the same way for POST /v1/batch/scrape.
The client can also be used as a context manager to close its underlying HTTP connection pool: with WebGlean(api_key="...") as client: ....
Errors
Any non-2xx response raises a WebGleanError with .status set from the API's error body:
from webglean import WebGlean, WebGleanError
try:
client.scrape("https://example.com")
except WebGleanError as err:
print(err.status, str(err)) # e.g. 402 "Insufficient credits"
Notes shared by both SDKs
crawlAndWait/crawl_and_waitandbatchScrapeAndWait/batch_scrape_and_waitraise/throw aWebGleanErrorwith status0(no HTTP response — a client-side timeout) iftimeoutMs/timeoutis exceeded.map()'s response is not wrapped indatalike every other endpoint — both SDKs handle this for you and just return{ links, total }.search()can return a per-result item withmarkdown: nullanderrorset if that specific page failed to scrape, even though the overall search call succeeded — checkerroron each item rather than assuming every result has markdown.- The
/v1/batch/scrapeendpoint does not auto-prependhttps://to bare-domain URLs the way every other endpoint does — pass full URLs (https://example.com, notexample.com) when usingbatchScrape/batch_scrapedirectly. The CLI and MCP server both normalize this for you automatically.
WebGlean