serp.fast

Mozilla Readability

The pure-JavaScript library Firefox uses for Reader Mode – extracts the primary article from an HTML document with no dependencies.

Nathan Kessler
By Nathan KesslerUpdated

Each tool is evaluated against our methodology using public docs, vendor demos, and hands-on testing.

Open source scraping frameworks give engineering teams full control over their web data pipeline. You choose where to deploy, how to scale, and what data to collect – with no vendor lock-in or per-request pricing. The trade-off is infrastructure maintenance and anti-bot engineering, which commercial APIs handle for you.

Features

JS Rendering
Structured Output
Open Source
Self-Hosted Option
Pricing:Free

Editorial assessment

The same rule-based extractor that powers Firefox Reader Mode, packaged as a standalone Node module with zero dependencies. Fast, deterministic, and aggressive about stripping boilerplate – quality holds up surprisingly well against larger systems on news and article content. The natural choice for Node pipelines or browser-side extraction. Pair it with Cheerio if you also need DOM traversal, reach for Trafilatura instead in Python, or use Crawl4AI/Firecrawl when you need rendering and fetching alongside extraction.

How Mozilla Readability compares

Trafilatura

Trafilatura is the Python equivalent and tends to win head-to-head accuracy benchmarks on long-tail content.

Cheerio

Cheerio is a general jQuery-like parser – use it alongside Readability when you also need custom DOM selection.

Crawl4AI

Crawl4AI handles fetching, JS rendering, and LLM-ready markdown that Readability leaves to you.

Frequently asked questions

Is Mozilla Readability open source?

Yes. Mozilla Readability is open source under the Apache 2.0 license, with the code on GitHub. It is the same extractor that powers Firefox Reader View, packaged as a standalone module. Because it is permissively licensed, you can use it in commercial products without paying or sharing your own source. The repository is maintained by Mozilla.

How much does Mozilla Readability cost?

Mozilla Readability is free. It is an open-source library installed from npm, with no paid tier, no usage limits, and no commercial license to buy. Your only cost is the compute you run it on. That differs from hosted extraction services like Firecrawl, which charge per request or page. If you self-host extraction, Readability adds nothing to the bill.

Does Mozilla Readability render JavaScript?

No. Mozilla Readability does not fetch pages or run JavaScript. It takes an already-parsed DOM document and extracts the main article from it, returning the title, byline, text content, and an HTML version. In Node you supply the DOM yourself, usually via jsdom. For pages that need a real browser to render content first, pair it with a fetching layer or use Crawl4AI or Firecrawl instead.

How does Mozilla Readability compare to Trafilatura?

Both extract the main article and strip boilerplate. Mozilla Readability is a JavaScript library that stays consistent on standard news and article pages. Trafilatura is a Python library that tends to score higher on mixed extraction benchmarks and handles messier pages better, partly because it falls back on several methods. Choose Readability for Node or browser pipelines, Trafilatura when your stack is Python.

What is Mozilla Readability best used for?

It is best for pulling clean article text out of HTML you already have, inside a Node or browser pipeline. Common uses include reader-mode views, building text corpora for LLMs, and feed processing. It is rule-based and deterministic, so the output is predictable. Pair it with Cheerio if you also need DOM traversal. It is not a crawler or fetcher, so handle retrieval separately.

Can Mozilla Readability be self-hosted?

Yes. Mozilla Readability is a library you run inside your own code, so it is self-hosted by nature, with no external service involved. You install it from npm and call it on a DOM document, in a browser or a Node process with jsdom. Nothing leaves your environment, which suits privacy-sensitive extraction. Note that it does not sanitize untrusted HTML, so add a sanitizer like DOMPurify for that.

Weekly briefing – tool launches, legal shifts, market data.

Visit

Mozilla Readability

Visit →