serp.fast

ClawBench

Open source benchmark evaluating AI browser agents on 153 everyday tasks across 144 live websites, with request interception and full behavioral trace capture.

Benchmarks are how you separate marketing claims from measured reality. Instead of trusting vendor-reported numbers, benchmarks run the same tasks against every system under a shared methodology and publish the results. For AI product builders picking an agentic extraction or search stack, a trustworthy benchmark is the single best input to the build-vs-buy decision — and a fast way to spot when a category is still too immature to rely on.

Features

JS Rendering
Structured Output
Open Source
Self-Hosted Option
Pricing:Free

Editorial assessment

A rare benchmark run on real production websites rather than sandboxed environments. Five layers of behavioral data — session replay, screenshots, HTTP traffic, reasoning traces, and browser actions — make failure analysis tractable, while a request interceptor blocks irreversible actions like payments and bookings before they fire. Ships as `pip install clawbench-eval` with an interactive leaderboard and trace viewer at claw-bench.com. The current numbers are sobering: Claude Sonnet 4.6 tops at 33.3%, GLM-5 trails at 24.2%, and no model exceeds 50% in any category. Finance and academic tasks are easier; travel and dev tasks are much harder. If you're building or picking an agentic extraction stack, this is the honest scoreboard to test against.

How ClawBench compares

Browser Use

Browser Use is one of the open-source agent frameworks you'd actually run through ClawBench to see how it performs.

Stagehand

Stagehand is the TypeScript agent SDK from Browserbase — a direct target for ClawBench-style evaluation.

Skyvern

Skyvern is the vision-based browser automation framework worth benchmarking against ClawBench's live-website tasks.

Frequently asked questions

What is ClawBench?

Open source benchmark evaluating AI browser agents on 153 everyday tasks across 144 live websites, with request interception and full behavioral trace capture. It falls under the Benchmarks category in our directory. ClawBench is open source, meaning you can inspect the code and self-host it.

How much does ClawBench cost?

ClawBench uses a free pricing model. It is completely free to use.

What are the best alternatives to ClawBench?

The top alternatives to ClawBench include Browser Use, Stagehand, Skyvern. Each offers a different approach to benchmarks — see our comparison section above for detailed analysis.

Does ClawBench support JavaScript rendering?

Yes, ClawBench supports JavaScript rendering, which means it can handle dynamic websites that load content via JavaScript frameworks like React, Vue, or Angular.

Does ClawBench provide structured output?

Yes, ClawBench returns structured output (typically JSON), making it straightforward to integrate into AI pipelines, RAG systems, and data processing workflows.

Can I self-host ClawBench?

Yes, ClawBench offers a self-hosted option, giving you full control over the infrastructure, data privacy, and deployment environment.

Weekly briefing — tool launches, legal shifts, market data.