serp.fast

ScrapyMost Popular

The original Python web crawling framework – battle-tested, extensible, and the foundation of the modern scraping ecosystem.

Nathan Kessler
By Nathan KesslerUpdated

Each tool is evaluated against our methodology using public docs, vendor demos, and hands-on testing.

Open source scraping frameworks give engineering teams full control over their web data pipeline. You choose where to deploy, how to scale, and what data to collect – with no vendor lock-in or per-request pricing. The trade-off is infrastructure maintenance and anti-bot engineering, which commercial APIs handle for you.

Features

JS Rendering
Structured Output
Open Source
Self-Hosted Option
Pricing:Free

Editorial assessment

53K+ GitHub stars and 15+ years of production use make Scrapy the most trusted crawling framework. The middleware system and extensive plugin ecosystem handle nearly any scraping challenge. No built-in JavaScript rendering – you need Splash or Playwright integration for modern SPAs. The learning curve is steeper than newer tools, and the callback-based architecture feels dated.

How Scrapy compares

Crawlee

Crawlee is the modern alternative with built-in JS rendering and a cleaner async architecture.

Crawl4AI

Crawl4AI is built for AI workloads with LLM-ready output, something Scrapy was never designed for.

Playwright

Playwright handles JS-heavy sites that Scrapy can't touch without plugins.

Frequently asked questions

Is Scrapy free or paid?

Scrapy is free. It is a BSD-licensed open-source framework with no paid tiers, license fees, or usage limits in the project itself. You install it as a Python package and run it on your own machine or servers. Your only real costs are the infrastructure you host it on and any third-party proxy or rendering services you add. Zyte, the company behind it, sells separate hosted products, but the framework stays free.

Is Scrapy open source?

Yes. Scrapy is open source under the BSD license, with public code on GitHub, tens of thousands of stars, and more than 15 years of production history. It is maintained by Zyte with many community contributors. The middleware and pipeline architecture is built to be extended, so you can read, fork, and modify any part of it. There is no closed core or proprietary tier gating features behind a paywall.

Does Scrapy render JavaScript?

Not on its own. Scrapy fetches and parses raw HTML, so content rendered client-side by JavaScript will not appear in responses by default. To handle single-page apps and dynamic sites you integrate a browser layer, usually scrapy-playwright or the older Splash service, as a download handler. That works but adds setup and overhead. If most of your targets are JavaScript-heavy, a browser-first tool may fit better.

Can Scrapy be self-hosted?

Yes. Scrapy is a Python library you run yourself, so self-hosting is the standard way to use it. You can run spiders locally, on your own servers, in containers, or on a scheduler. Zyte offers Scrapy Cloud as a managed deployment option if you would rather not operate the infrastructure, but nothing forces you onto it. You keep full control over scheduling, storage, and proxy configuration.

What is Scrapy best used for?

Scrapy suits large, structured crawls of mostly static or server-rendered sites where you need scheduling, retries, throttling, deduplication, and clean export to JSON, CSV, or storage backends. Teams reach for it when a crawl runs continuously or spans many pages and the middleware and pipeline model pays off. It is less ideal for quick one-off jobs or sites that depend heavily on JavaScript, where the setup cost is harder to justify.

How does Scrapy compare to Crawlee?

Both are open-source crawling frameworks. Crawlee is newer and works in Node and Python, with browser automation through Playwright and Puppeteer treated as a first-class feature rather than an add-on. Scrapy is Python-only and has a longer track record, a larger plugin ecosystem, and a more mature middleware system, but no native JavaScript rendering. Choose Crawlee if browser rendering is central. Choose Scrapy for established Python pipelines and large structured crawls.

Weekly briefing – tool launches, legal shifts, market data.

Visit

Scrapy

Visit →