serp.fast

Open Source

Notable repositories

Curated selection of search, extraction, and browser automation repositories. Each entry includes editorial context.

AI Scraping

Crawl4AI30K+ stars · Python

Open-source async crawler optimized for LLM data extraction.

The fastest-growing AI scraping tool. Async-first, produces clean markdown for LLM pipelines. The open-source answer to Firecrawl.

ScrapeGraphAI23K+ stars · Python

LLM-orchestrated web scraping with automatic graph-based extraction.

Novel approach: uses LLMs to plan and execute scraping tasks. Impressive for complex extraction but LLM costs add up at scale.

Browser Use78K+ stars · Python

AI framework that gives LLMs the ability to control a web browser.

The most-starred AI browser tool by a wide margin. Lets AI agents browse the web like a human. The future of web interaction, today.

Firecrawl93K+ stars · TypeScript

Turn websites into LLM-ready markdown with a single API call.

The standard for AI-friendly web scraping. Open-source core, excellent hosted API. If you're building RAG or AI data pipelines, start here.

Stagehand8K+ stars · TypeScript

Natural language browser automation built on Playwright.

From the Browserbase team. Tell it what to do in plain English, it drives Playwright. Early-stage but the developer experience is unmatched.

Anti-Detection

playwright-extra1K+ stars · TypeScript

Plugin framework for Playwright with stealth and ad-blocking plugins.

Stealth plugin makes Playwright invisible to most anti-bot systems. Essential for production scraping of protected sites.

curl-impersonate14K+ stars · C

curl with browser TLS fingerprints to bypass anti-bot detection.

Clever approach: make curl look like a real browser at the TLS level. Works surprisingly well against Cloudflare and Akamai.

nodriver5K+ stars · Python

Python web automation without a traditional webdriver dependency.

Successor to undetected-chromedriver. Removes the webdriver dependency entirely, making detection even harder.

Browser Automation

Playwright70K+ stars · TypeScript

Cross-browser automation library for Chromium, Firefox, and WebKit.

The backbone of modern scraping stacks. Microsoft-backed, fast, reliable. If you're doing JS-rendered scraping, you're probably using this.

Puppeteer89K+ stars · TypeScript

Chrome DevTools Protocol automation for Node.js.

Still the most popular browser automation tool by stars. Playwright is technically superior but Puppeteer's ecosystem is massive.

Selenium31K+ stars · Java

Browser automation framework supporting multiple languages and browsers.

The grandfather of browser automation. Still relevant for legacy projects and teams with existing Selenium infrastructure. Modern projects should pick Playwright.

undetected-chromedriver10K+ stars · Python

Custom Selenium chromedriver that avoids detection by anti-bot services.

Solves a real problem: getting past Cloudflare and similar anti-bot systems. Fragile by nature (Chrome updates break it regularly) but nothing else does this job.

Data Parsing

Beautiful SoupN/A stars · Python

Python library for pulling data out of HTML and XML files.

Every Python developer's first scraping tool. Simple, well-documented, battle-tested. For parsing, not crawling.

Cheerio28K+ stars · TypeScript

Fast, flexible jQuery-like HTML parser for Node.js.

The Node.js equivalent of Beautiful Soup. Incredibly fast for server-side HTML parsing. Pairs perfectly with Crawlee or raw HTTP requests.

lxml2.7K+ stars · Python

High-performance XML and HTML processing library for Python.

When Beautiful Soup is too slow. C-backed, XPath support, handles malformed HTML. The performance choice for heavy parsing workloads.

Scraping Frameworks

Scrapy53K+ stars · Python

Fast, high-level web crawling and scraping framework for Python.

The OG Python scraping framework. Mature ecosystem, steep learning curve, but nothing matches it for large-scale structured crawling pipelines.

Crawlee16K+ stars · TypeScript

Web scraping and browser automation library for Node.js.

From the Apify team. Best-in-class for JavaScript scraping with built-in Playwright and Cheerio support. The TypeScript-first approach is refreshing.

Colly23K+ stars · Go

Elegant scraping framework for Go with a clean callback API.

If you're a Go shop, this is your only real option. Fast, concurrent, and well-maintained. Limited compared to Scrapy's ecosystem.

MechanicalSoup4.7K+ stars · Python

Python library for automating interaction with websites.

Simple alternative to Scrapy for small projects. Wraps requests + BeautifulSoup. Perfect for quick scripts, not for production pipelines.

Scrapling31K+ stars · Python

Adaptive web scraping framework with smart element tracking, anti-bot bypass, and stealth browser mode.

Hit 31K stars within months of launch. The adaptive selector engine — finds elements even after a site redesign — is something no other framework does. Three fetcher modes, MCP server, BSD-3 licensed.