Rate Limiting

Rate limiting is the practice of capping how many requests a client can make to a server within a time window. It is enforced both server-side (the API returns 429 Too Many Requests once you exceed the budget) and at the network layer (Cloudflare, AWS WAF, and similar services rate-limit per IP before requests reach the origin). Rate limits exist to prevent abuse, protect infrastructure, and maintain quality of service for legitimate users. For scrapers, rate limiting is the most common reason requests start failing. The two practical responses are slowing down (rate-shaping your client to stay under the limit) and spreading out (using proxy rotation to make requests appear to come from many clients). Most production scraping platforms combine both. A typical default is one request per second per IP, with concurrency configured at the IP-pool level. For AI builders, rate-limit-awareness should be built into any scraping system from the start. Hitting limits is fine; what matters is detecting 429 responses, backing off with exponential delay, and resuming gracefully. Hardcoding sleeps between requests is fragile; respecting `Retry-After` headers and tracking per-host rate is the durable approach. Most scraping APIs absorb this complexity for you, but if you build your own pipeline, plan for rate limiting on day one.

Related tools

Related terms