Question 1

Is Mozilla Readability open source?

Accepted Answer

Yes. Mozilla Readability is open source under the Apache 2.0 license, with the code on GitHub. It is the same extractor that powers Firefox Reader View, packaged as a standalone module. Because it is permissively licensed, you can use it in commercial products without paying or sharing your own source. The repository is maintained by Mozilla.

Question 2

How much does Mozilla Readability cost?

Accepted Answer

Mozilla Readability is free. It is an open-source library installed from npm, with no paid tier, no usage limits, and no commercial license to buy. Your only cost is the compute you run it on. That differs from hosted extraction services like Firecrawl, which charge per request or page. If you self-host extraction, Readability adds nothing to the bill.

Question 3

Does Mozilla Readability render JavaScript?

Accepted Answer

No. Mozilla Readability does not fetch pages or run JavaScript. It takes an already-parsed DOM document and extracts the main article from it, returning the title, byline, text content, and an HTML version. In Node you supply the DOM yourself, usually via jsdom. For pages that need a real browser to render content first, pair it with a fetching layer or use Crawl4AI or Firecrawl instead.

Question 4

How does Mozilla Readability compare to Trafilatura?

Accepted Answer

Both extract the main article and strip boilerplate. Mozilla Readability is a JavaScript library that stays consistent on standard news and article pages. Trafilatura is a Python library that tends to score higher on mixed extraction benchmarks and handles messier pages better, partly because it falls back on several methods. Choose Readability for Node or browser pipelines, Trafilatura when your stack is Python.

Question 5

What is Mozilla Readability best used for?

Accepted Answer

It is best for pulling clean article text out of HTML you already have, inside a Node or browser pipeline. Common uses include reader-mode views, building text corpora for LLMs, and feed processing. It is rule-based and deterministic, so the output is predictable. Pair it with Cheerio if you also need DOM traversal. It is not a crawler or fetcher, so handle retrieval separately.

Question 6

Can Mozilla Readability be self-hosted?

Accepted Answer

Yes. Mozilla Readability is a library you run inside your own code, so it is self-hosted by nature, with no external service involved. You install it from npm and call it on a DOM document, in a browser or a Node process with jsdom. Nothing leaves your environment, which suits privacy-sensitive extraction. Note that it does not sanitize untrusted HTML, so add a sanitizer like DOMPurify for that.

Mozilla Readability

How Mozilla Readability compares

Frequently asked questions