serp.fast
← All posts

Comparing browser infrastructure for web scraping

·6 min read·serp.fast

Browser automation underpins most serious web scraping operations. JavaScript-heavy sites, single-page applications, and increasingly sophisticated anti-bot systems all require rendering pages in a real browser. The choice of browser infrastructure affects cost, reliability, and whether your setup can scale.

This comparison covers two layers: the open-source frameworks that control browsers, and the cloud services that run them. Both matter, and most production setups use a combination.

The frameworks: Playwright, Puppeteer, Selenium

Playwright

Playwright has become the default for new projects. The data bears this out: npm weekly downloads sit at roughly 37 million (plus another 11 million for @playwright/test), and a 2025 TestGuild survey found 45.1% QA professional adoption, more than double Selenium's 22.1%. GitHub stars exceed 83,000. Microsoft maintains it actively, and the developer experience — auto-wait, cross-browser support (Chromium, Firefox, WebKit), built-in tracing — is why adoption accelerated.

For scraping specifically, Playwright's advantages are the auto-wait mechanism (reduces flaky scrapes on dynamic pages), network interception (filter or modify requests mid-flight), and multi-browser support (test your scraper against different rendering engines). It runs in Python, Node.js, Java, and .NET.

The limitation is resource consumption. Each browser instance needs real memory and CPU. Running hundreds of concurrent Playwright instances requires either powerful servers or a cloud browser service.

Puppeteer

Google's Puppeteer pioneered the Node.js browser automation category. It still pulls roughly 6 million npm weekly downloads (plus 11 million for puppeteer-core), and its GitHub stars exceed 89,000 — technically the most-starred of the three. The Chromium-only constraint is both its limitation and its advantage: less complexity, deeper Chrome DevTools Protocol integration, and a lighter dependency tree.

For teams already embedded in the Chrome ecosystem or those needing advanced CDP features — custom JavaScript injection, fine-grained network control, performance profiling — Puppeteer remains a strong choice. It is also the native interface for many cloud browser services, since most run Chromium.

The gap with Playwright is widening, though. Playwright's multi-browser support and more active feature development have drawn away new projects. Puppeteer is well-maintained but no longer the default recommendation for greenfield work.

Selenium

Selenium has been around since 2004. It supports every major language (Java, Python, C#, Ruby, JavaScript) and every major browser. Its Python package alone sees over 50 million monthly downloads on PyPI. It powers tests in over 354,000 repositories and appears in over 10,000 US job postings.

Those numbers reflect entrenchment, not necessarily preference. Selenium's WebDriver protocol adds latency compared to CDP-based tools like Playwright and Puppeteer. Configuration is more involved, flakiness is more common without careful setup, and the developer experience lags behind Playwright's.

That said, if your team works primarily in Java or C# — where Playwright's support is less mature — or if you have years of existing Selenium infrastructure, there is no compelling reason to rewrite everything. The tool works. It is just no longer where the frontier is.

The cloud browser services

Running browser instances at scale — hundreds or thousands concurrently — is an infrastructure problem that a new category of companies is specifically designed to solve. This is one of the fastest-moving segments in web data infrastructure, with over $100 million in funding across the category in the past year alone.

Browserbase

The most funded player in the space, with $67.5 million raised including a $40 million Series B in June 2025 at a $300 million valuation. CEO Paul Klein IV framed the thesis directly: "Two years ago, AI agents browsing the web sounded like science fiction. Today, they're here — and they need better infrastructure."

Browserbase has served over 50 million browser sessions across more than 1,000 companies and 20,000 developers. Customers include Perplexity, 11x, and Vercel. The company also developed Stagehand, an open-source SDK for browser automation that has gained traction as an abstraction layer on top of Playwright.

Glenn Solomon of Notable Capital, who led the Series B, called it "the Stripe for browser automation."

Browser Use

The open-source phenomenon of the category. Browser Use, created by ETH Zurich graduates Magnus Muller and Gregor Zunic, hit over 78,000 GitHub stars — the most-starred browser automation project by that metric. It raised a $17 million seed round from Felicis Ventures in March 2025 as part of Y Combinator's Winter 2025 batch.

The core idea is making websites accessible to AI agents. As Muller explained to SiliconANGLE: "A lot of agents rely on vision-based systems and try and navigate websites through screenshots, and in the process, things break. We convert [websites] into something agents can understand."

Co-founder Zunic told TechCrunch: "In our minds, there will be more agents on the web than humans by the end of the year." Daily downloads of the package grew from roughly 5,000 to 28,000 in a single week during March 2025.

Browserless

The most mature player, founded in November 2017 by Joel Griffith — years before the AI agent wave. Griffith bootstrapped it on $500. In a Failory interview, he explained the core problem: "Chrome (which is the defacto headless browser) is just crazy-hungry for resources, so babysitting it and providing a web-service around it is crucial."

Browserless offers a Docker-based deployment model with pricing from $25 to $200 per month. It handles the operational burden of running headless Chrome at scale: concurrency limits, resource management, session cleanup. For teams that need a straightforward managed Chromium service without the AI-agent abstractions of newer entrants, it remains a solid and proven option.

Lightpanda

The performance outlier. Lightpanda is a headless browser built from scratch in Zig — not a Chromium fork, not a WebKit patch. Their published benchmarks show it completing a 100-page Puppeteer workload in 2.3 seconds with 24MB peak RAM on an AWS EC2 m5.large instance. Chrome took 25.2 seconds and 207MB for the same task. That is 11x faster execution and 9x less memory.

Founder Francis Bouvier wrote on the company blog: "The current stack wasn't built for what's coming. Automation is no longer a niche use case, it's becoming the primary driver of web traffic." On the language choice, he added: "I chose Zig because I'm not smart enough to build a big project in C++ or Rust."

Lightpanda raised a pre-seed in June 2025 led by ISAI, with angels including Arthur Mensch (Mistral CEO) and Thomas Wolf (Hugging Face). It has over 22,000 GitHub stars but remains in beta. The performance claims are striking, but production readiness is still a question mark for teams that need full web platform compatibility.

Anchor Browser

Anchor takes a different approach: deterministic browser automation for enterprise. Founded by Unit 8200 alumni Idan Raman, Dor Dankner, and Guy Ben Simhon, the company raised a $6 million seed led by Blumberg Capital in November 2025.

The distinguishing feature is a Cloudflare Verified Bot partnership — Anchor's bot traffic is cryptographically signed so website operators can verify who is accessing their site. As Raman told The AI Insider: "Most of the web is still inaccessible to AI because it wasn't designed for machines."

Other notable players

Airtop (formerly Switchboard) raised $13.8 million from Sequoia Capital and offers SOC 2 Type II and HIPAA-compliant browser infrastructure — a differentiator for healthcare and financial services use cases. Hyperbrowser, backed by Y Combinator, Accel, and SV Angel, claims sub-second browser launch times with support for thousands of concurrent sessions. Steel.dev, open-source, focuses on reducing the token volume sent to LLMs by stripping page content down to essentials before extraction.

Decision framework

The choice depends on three variables: scale, budget, and whether you are building for AI agents or traditional scraping.

Under 10,000 pages per day, self-hosted: Playwright on your own servers is the pragmatic default. The tooling is excellent, the community is large, and you avoid per-session cloud costs. Use Puppeteer only if you have a specific need for deep CDP access or are already invested in it.

Over 10,000 pages per day: Evaluate cloud browser services against the cost of managing your own fleet. Browserbase is the most proven at scale. Browserless is the most mature and cost-predictable. Lightpanda is worth watching for workloads where raw speed matters and you can tolerate beta-stage tooling.

AI agent workloads: Browser Use is the open-source starting point — the GitHub traction reflects real utility. For managed infrastructure with enterprise compliance requirements, Browserbase and Airtop are the funded options. Anchor is worth evaluating if verified bot access and deterministic execution matter for your use case.

Existing Selenium infrastructure: Do not rewrite it unless you have a specific reason. Selenium works. It is slower and more maintenance-heavy than Playwright, but migration cost is real, and "newer" is not the same as "better for your situation."

The full comparison of browser automation tools is in our directory.

browser automationcomparison

Weekly briefing — tool launches, legal shifts, market data.