Benchmarks (5)

Public benchmarks and leaderboards that measure how AI browser agents, scraping APIs, and search tools actually perform.

5 strong choices for AI builders

Selected from 5 tools in this category. Equal-weighted; not ranked 1–5.

Editor's pick

ClawBench

Open source benchmark evaluating AI browser agents on 153 everyday tasks across 144 live websites, with request interception and full behavioral trace capture.

FreeView details →

Editor's pick

WebArena

Reproducible benchmark of 812 long-horizon web tasks across self-hosted realistic websites (e-commerce, forum, GitLab, CMS, maps) – the most-cited agent eval in 2024-2026.

FreeView details →

Editor's pick

Mind2Web

Generalist web agent benchmark with 2,350 tasks across 137 real websites in 31 domains – measures cross-site, cross-domain transfer rather than single-site mastery.

FreeView details →

Editor's pick

OSWorld

Computer-use benchmark with 369 real tasks across Ubuntu, Windows, and macOS environments – the reference eval for agents that act on full operating systems, not just browsers.

FreeView details →

Editor's pick

WebVoyager

Live-web benchmark of 643 tasks across 15 real websites (Allrecipes, Amazon, Apple, ArXiv, BBC News, GitHub, Google variants, etc.) for end-to-end multimodal web agents.

FreeView details →