Crawl Budget

Crawl budget is the term for the maximum number of pages a search engine or other crawler will fetch from a given site within a time window. Google's crawl budget for a site is determined by two factors: how fast the site responds (sites that load slowly or return errors get less budget) and how much demand there is to crawl the site (popular, frequently updated sites get more). For most small-to-medium sites, crawl budget is not a binding constraint. For very large sites, it becomes critical — pages outside the crawl budget never get indexed. The practical levers for crawl budget management are response speed (fast sites get more crawl), URL hygiene (eliminate duplicates, parameter explosions, and infinite calendars), sitemap quality (a clean XML sitemap helps the crawler find priority pages), and internal link structure (PageRank flow concentrates crawl on important pages). Removing low-value URLs from the crawl set — through robots.txt or noindex tags — frees budget for the pages you care about. For AI builders running content sites, crawl budget rarely matters until you cross thousands of indexed URLs. Once you do, programmatic SEO patterns (city pages, category pages, alternative listings) can balloon URL count and exhaust budget on low-value pages. The correction is the same as for traditional SEO: prune ruthlessly, link generously to your priority pages, and keep response times under 200ms.

Related terms