Notable repositories
AI Scraping
The fastest-growing AI scraping tool. Async-first, produces clean markdown for LLM pipelines. The open-source answer to Firecrawl.
Novel approach: uses LLMs to plan and execute scraping tasks. Impressive for complex extraction but LLM costs add up at scale.
The most-starred AI browser tool by a wide margin. Lets AI agents browse the web like a human. The future of web interaction, today.
The standard for AI-friendly web scraping. Open-source core, excellent hosted API. If you're building RAG or AI data pipelines, start here.
From the Browserbase team. Tell it what to do in plain English, it drives Playwright. Early-stage but the developer experience is unmatched.
Anti-Detection
Stealth plugin makes Playwright invisible to most anti-bot systems. Essential for production scraping of protected sites.
Clever approach: make curl look like a real browser at the TLS level. Works surprisingly well against Cloudflare and Akamai.
Successor to undetected-chromedriver. Removes the webdriver dependency entirely, making detection even harder.
Browser Automation
The backbone of modern scraping stacks. Microsoft-backed, fast, reliable. If you're doing JS-rendered scraping, you're probably using this.
Still the most popular browser automation tool by stars. Playwright is technically superior but Puppeteer's ecosystem is massive.
The grandfather of browser automation. Still relevant for legacy projects and teams with existing Selenium infrastructure. Modern projects should pick Playwright.
Solves a real problem: getting past Cloudflare and similar anti-bot systems. Fragile by nature (Chrome updates break it regularly) but nothing else does this job.
Data Parsing
Every Python developer's first scraping tool. Simple, well-documented, battle-tested. For parsing, not crawling.
The Node.js equivalent of Beautiful Soup. Incredibly fast for server-side HTML parsing. Pairs perfectly with Crawlee or raw HTTP requests.
When Beautiful Soup is too slow. C-backed, XPath support, handles malformed HTML. The performance choice for heavy parsing workloads.
Scraping Frameworks
The OG Python scraping framework. Mature ecosystem, steep learning curve, but nothing matches it for large-scale structured crawling pipelines.
From the Apify team. Best-in-class for JavaScript scraping with built-in Playwright and Cheerio support. The TypeScript-first approach is refreshing.
If you're a Go shop, this is your only real option. Fast, concurrent, and well-maintained. Limited compared to Scrapy's ecosystem.
Simple alternative to Scrapy for small projects. Wraps requests + BeautifulSoup. Perfect for quick scripts, not for production pipelines.
Hit 31K stars within months of launch. The adaptive selector engine — finds elements even after a site redesign — is something no other framework does. Three fetcher modes, MCP server, BSD-3 licensed.