The legal status of web scraping in 2026
The legal landscape for web scraping in the United States is more settled than the discourse suggests. Three court rulings between 2019 and 2024 — hiQ Labs v. LinkedIn, Meta Platforms v. Bright Data, and a string of Computer Fraud and Abuse Act cases — established a workable framework: scraping public, unauthenticated data is generally lawful; scraping authenticated or terms-of-service-restricted data is risky; and DMCA, copyright, and contract law remain the active battlegrounds.
This post summarizes where the case law actually stands as of April 2026, what AI product builders should pay attention to, and what most "is scraping legal?" articles get wrong.
What the courts have decided
hiQ Labs v. LinkedIn — public data is not "without authorization"
The hiQ case, filed in 2017 and finally settled in late 2022, established that scraping publicly accessible LinkedIn profiles does not violate the CFAA's "without authorization" provision. The Ninth Circuit's 2019 ruling — affirmed and refined in 2022 — held that the CFAA targets unauthorized access to protected computer systems, not access to publicly available web pages.
The settlement itself was less favorable to scrapers than the headline suggests: hiQ ultimately agreed to a permanent injunction barring it from scraping LinkedIn, and LinkedIn won on contract-breach claims tied to its terms of service. The CFAA precedent stood; the practical right to scrape LinkedIn at scale did not.
What this means for AI builders. If you are extracting data from public web pages — pages a logged-out user could see in a browser — you are not running CFAA risk under current Ninth Circuit precedent. If you are bypassing authentication, evading rate limits in ways a court might call "unauthorized," or violating a clickwrap agreement you accepted, the analysis changes.
Meta Platforms v. Bright Data — public data, public default
In January 2024, a Northern District of California judge granted summary judgment to Bright Data in Meta's lawsuit alleging unlawful scraping of Facebook and Instagram. The ruling reinforced that data exposed to the public web is, by default, scrapable, and that platform terms of service do not unilaterally convert public data into protected data.
Meta did not appeal. The decision is now widely cited as the cleanest post-hiQ articulation of the public-data principle in U.S. law.
CFAA, DMCA, and the active battlegrounds
The CFAA is mostly resolved for public scraping. The active legal pressure has shifted to three areas:
DMCA Section 1201 — circumventing technical access controls. The 2024 SerpApi v. Google DMCA action and parallel cease-and-desist letters to other SERP providers turn on whether anti-bot protections (CAPTCHAs, JavaScript challenges, IP fingerprinting) qualify as "technological measures" under § 1201. The case law here is genuinely unsettled, and courts have split on whether bot-detection systems count.
Copyright — particularly relevant to LLM training. The wave of 2023–2025 lawsuits against OpenAI, Anthropic, Microsoft, and Stability AI by news publishers, authors, and artists is reshaping what data can be collected for model training. The New York Times v. OpenAI case is the most-watched. Outcomes will affect the legality of training on scraped data more than the legality of scraping itself.
Contract law — terms of service breach. Even where scraping is not a CFAA violation, a court can enforce a scraper's prior agreement to a site's terms. This is what bit hiQ in the LinkedIn settlement. If your scraper accepts terms of service to register an account, you are exposed to contract claims even if criminal liability is off the table.
What AI builders actually need to do
Most "scraping legal" content is written for SEO, not for someone deciding whether to ship an AI product. Here is the practical version.
1. Distinguish public from authenticated. Public unauthenticated pages are the safe zone under current U.S. case law. Authenticated content, paywalled content, and content behind clickwrap terms is risky. If your AI product scrapes only public pages, you are operating inside the hiQ and Bright Data precedents.
2. Watch DMCA, not CFAA. The active legal action is around circumvention of anti-bot measures. If your scraping vendor markets "stealth" or "anti-detect" capabilities, ask them how they handle DMCA exposure. Some vendors have started offering legal indemnification; most have not.
3. Care about the training-data lawsuits if you fine-tune. If you build on top of a frontier model and don't train your own, the OpenAI/Anthropic copyright cases mostly affect your model providers' liability, not yours. If you train or fine-tune on scraped data, the answer changes.
4. Don't rely on "fair use" as a single defense. Fair use is a fact-specific multi-factor analysis. It is not a get-out-of-court-free card. Courts have ruled both ways on transformative-use analyses for AI training; the Authors Guild v. OpenAI and similar cases will refine the doctrine over the next two to three years.
5. Know who your scraping vendor is sued by. Apify, Bright Data, Oxylabs, and SerpApi are large enough to be named in suits. Their litigation history is public and informative. If your vendor has been sued repeatedly by the same plaintiff and lost, that risk transfers to you in part.
The specific cases worth tracking in 2026
Reddit v. Anthropic — filed June 2025, alleging unauthorized commercial use of Reddit content. Settlement or summary judgment expected late 2026.
New York Times v. OpenAI / Microsoft — copyright infringement claim. Discovery is ongoing; trial date unconfirmed. Outcome will set the most consequential AI-training precedent to date.
SerpApi DMCA matter — Google's 2024 DMCA action against SerpApi appears to have produced a confidential resolution; details have not been published. Other SERP providers received parallel notices.
EU AI Act enforcement — the Act took effect August 2024 with phased enforcement. Article 53 transparency requirements for general-purpose AI models begin to apply in 2026; data-source disclosure obligations will indirectly affect the scraping infrastructure beneath those models.
What to remember
The U.S. legal status of scraping public web data is closer to "established and lawful" than the typical conference panel suggests. The active risk is not CFAA — it is contract enforcement, DMCA circumvention, and the still-open copyright questions around AI training. AI builders should choose vendors that scrape only public data, maintain clean compliance posture, and don't market the things that get them sued.
The single best filter is: would a logged-out user be able to see this page in a browser? If yes, you are working inside the post-hiQ mainstream. If no, you have decisions to make.
This post is general information, not legal advice. If your product depends on scraping at scale, talk to a lawyer who has handled CFAA, DMCA, and contract-breach cases — not a generalist.
Weekly briefing — tool launches, legal shifts, market data.