Mozilla Readability
Open source scraping frameworks give engineering teams full control over their web data pipeline. You choose where to deploy, how to scale, and what data to collect – with no vendor lock-in or per-request pricing. The trade-off is infrastructure maintenance and anti-bot engineering, which commercial APIs handle for you.
How Mozilla Readability compares
Frequently asked questions
Is Mozilla Readability open source?
Yes. Mozilla Readability is open source under the Apache 2.0 license, with the code on GitHub. It is the same extractor that powers Firefox Reader View, packaged as a standalone module. Because it is permissively licensed, you can use it in commercial products without paying or sharing your own source. The repository is maintained by Mozilla.
How much does Mozilla Readability cost?
Mozilla Readability is free. It is an open-source library installed from npm, with no paid tier, no usage limits, and no commercial license to buy. Your only cost is the compute you run it on. That differs from hosted extraction services like Firecrawl, which charge per request or page. If you self-host extraction, Readability adds nothing to the bill.
Does Mozilla Readability render JavaScript?
No. Mozilla Readability does not fetch pages or run JavaScript. It takes an already-parsed DOM document and extracts the main article from it, returning the title, byline, text content, and an HTML version. In Node you supply the DOM yourself, usually via jsdom. For pages that need a real browser to render content first, pair it with a fetching layer or use Crawl4AI or Firecrawl instead.
How does Mozilla Readability compare to Trafilatura?
Both extract the main article and strip boilerplate. Mozilla Readability is a JavaScript library that stays consistent on standard news and article pages. Trafilatura is a Python library that tends to score higher on mixed extraction benchmarks and handles messier pages better, partly because it falls back on several methods. Choose Readability for Node or browser pipelines, Trafilatura when your stack is Python.
What is Mozilla Readability best used for?
It is best for pulling clean article text out of HTML you already have, inside a Node or browser pipeline. Common uses include reader-mode views, building text corpora for LLMs, and feed processing. It is rule-based and deterministic, so the output is predictable. Pair it with Cheerio if you also need DOM traversal. It is not a crawler or fetcher, so handle retrieval separately.
Can Mozilla Readability be self-hosted?
Yes. Mozilla Readability is a library you run inside your own code, so it is self-hosted by nature, with no external service involved. You install it from npm and call it on a DOM document, in a browser or a Node process with jsdom. Nothing leaves your environment, which suits privacy-sensitive extraction. Note that it does not sanitize untrusted HTML, so add a sanitizer like DOMPurify for that.
Weekly briefing – tool launches, legal shifts, market data.
Visit
Mozilla Readability
