429 Too Many Requests: How Rate Limiting Actually Works
Technical breakdown of how websites implement rate limiting. Covers token bucket, sliding window, 429 responses, and proxy-based solutions.
429 Too Many Requests: How Rate Limiting Actually Works
You’re scraping an API or website, everything works perfectly for the first 100 requests, then suddenly: 429 Too Many Requests. Your scraper stops dead. You wait 60 seconds, try again, get another 429. You add a delay, but it’s not enough. You have no idea what the actual limits are.
This article explains how rate limiting works under the hood — the algorithms, the signals, and the infrastructure decisions — so you can design scrapers that respect limits when appropriate and bypass them when your use case demands it.
What a 429 Response Means
HTTP 429 is defined in RFC 6585. Unlike a 403 Forbidden (which means “you’re not allowed”), 429 specifically means “you’re allowed, but you’ve sent too many requests.”
Key differences:
| Status | Meaning | Typical Cause |
|---|---|---|
| 403 | Access denied | IP blocked, bot detected, authentication required |
| 429 | Rate limited | Too many requests in time window |
| 503 | Service unavailable | Server overloaded or maintenance |
A 429 response should include a Retry-After header telling you when to try again. In practice, only about 40% of sites actually include this header.
HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: text/plain
Rate limit exceeded. Please retry after 60 seconds.
How Rate Limiting Algorithms Work
1. Fixed Window Counter
The simplest algorithm. Count requests in fixed time windows (e.g., 1-minute blocks):
Window: 12:00:00 - 12:00:59 → 0/100 requests used
Request at 12:00:45 → 1/100
Request at 12:00:46 → 2/100
...
Request at 12:00:55 → 100/100 ← LIMIT REACHED
Request at 12:00:56 → 429 ← BLOCKED
Window resets: 12:01:00 → 0/100
Weakness: Burst attacks at window boundaries. If you send 100 requests at 12:00:59 and 100 more at 12:01:00, you’ve sent 200 requests in 2 seconds while technically staying within limits.
2. Sliding Window Log
Tracks the exact timestamp of every request:
Check: "How many requests in the last 60 seconds?"
If count >= limit → 429
Else → Allow
This prevents the boundary burst attack but is memory-intensive.
3. Token Bucket (Most Common)
The industry standard for API rate limiting. A “bucket” holds tokens:
- Tokens are added at a fixed rate (e.g., 10 per second)
- Each request consumes one token
- If the bucket is empty, the request is rejected (429)
- The bucket has a maximum capacity (burst limit)
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Steady state: 10 req/s sustained
Burst: Up to 100 requests instantly, then wait for refill
This is why some APIs allow short bursts but throttle sustained traffic.
4. Sliding Window Counter
A hybrid approach that divides time into sub-windows and uses weighted counting:
Current window (12:01): 85 requests
Previous window (12:00): 120 requests
Time into current window: 30s (50%)
Weighted count: 85 + (120 × 0.5) = 145
Limit: 150
Result: ALLOW (5 remaining)
What Gets Rate Limited (and What Doesn’t)
Rate limits are typically applied per:
| Scope | Example | How to Identify |
|---|---|---|
| IP address | 100 req/min per IP | Most common for public sites |
| API key | 1000 req/day per key | SaaS APIs |
| User account | 500 req/hour per user | Authenticated endpoints |
| Endpoint | 10 req/min to /search | Expensive operations |
| Global | 10,000 req/min total | Small sites with limited capacity |
Many sites apply multiple rate limits simultaneously:
- 10 requests per second per IP (burst protection)
- 1,000 requests per minute per IP (sustained protection)
- 10,000 requests per day per IP (daily cap)
You can hit any of these independently.
Rate Limiting Infrastructure
Modern rate limiting is typically implemented at one of these layers:
CDN/Edge Level (Cloudflare, AWS CloudFront)
- Fastest to apply (before request reaches origin)
- Usually IP-based or geographic
- Cloudflare’s WAF can combine rate limiting with bot detection
API Gateway (Kong, Nginx, AWS API Gateway)
- Applied after CDN but before application code
- Can use API keys, user accounts, or custom identifiers
- Often configurable per endpoint
Application Level (Custom middleware)
- Most flexible but slowest
- Can use complex logic (account tier, payment status)
- Redis-backed token buckets are the standard implementation
# Typical Redis-based rate limiter (Python)
import redis
import time
r = redis.Redis()
def is_rate_limited(client_ip, limit=100, window=60):
key = f"rate_limit:{client_ip}"
current = r.get(key)
if current and int(current) >= limit:
return True
pipe = r.pipeline()
pipe.incr(key)
pipe.expire(key, window)
pipe.execute()
return False
How to Handle 429 Responses
Strategy 1: Exponential Backoff
The standard approach for API integrations:
import time
import random
def request_with_backoff(url, max_retries=5):
for attempt in range(max_retries):
response = requests.get(url)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 2 ** attempt))
jitter = random.uniform(0, retry_after * 0.1)
time.sleep(retry_after + jitter)
continue
return response
raise Exception("Rate limit exceeded after max retries")
Strategy 2: Request Throttling
Proactively limit your request rate to stay below known thresholds:
import time
class Throttler:
def __init__(self, requests_per_second=5):
self.min_interval = 1.0 / requests_per_second
self.last_request = 0
def wait(self):
elapsed = time.time() - self.last_request
if elapsed < self.min_interval:
time.sleep(self.min_interval - elapsed)
self.last_request = time.time()
Strategy 3: IP Rotation with Proxies
When rate limits are per-IP and you need higher throughput, distribute requests across multiple IPs using residential proxies:
Without proxies:
1 IP × 10 req/min = 10 req/min total
With 100 rotating residential IPs:
100 IPs × 10 req/min = 1,000 req/min total
This is the most common approach for large-scale data collection. Bright Data and Oxylabs handle IP rotation automatically, distributing your requests across millions of residential IPs.
Strategy 4: Distributed Architecture
For serious scale, distribute your scraping across multiple processes and machines:
Architecture:
├─ Job Queue (Redis/RabbitMQ)
│ └─ Contains target URLs with priority
├─ Worker Pool (10-100 workers)
│ ├─ Each worker uses a different proxy
│ ├─ Each worker respects per-IP rate limits
│ └─ Failed requests go back to queue
├─ Rate Limiter (shared Redis)
│ └─ Tracks global rate across all workers
└─ Results Store (PostgreSQL/S3)
└─ Deduplication and storage
Detecting Rate Limit Patterns
Before you start scraping, you can often identify rate limits through:
- API documentation — best case, limits are published
- Response headers — look for
X-RateLimit-*headers:X-RateLimit-Limit: 100 X-RateLimit-Remaining: 23 X-RateLimit-Reset: 1677777600 - Empirical testing — start slow, increase speed, note when 429s begin
Retry-Afterheader — tells you the exact wait time
Rate Limiting vs. Bot Detection
Important distinction: rate limiting and bot detection are separate systems that often work together.
| Feature | Rate Limiting | Bot Detection |
|---|---|---|
| Triggers on | Request volume | Request characteristics |
| Response code | 429 | 403 or challenge page |
| Solution | Slow down or rotate IPs | Fix fingerprint, use browser |
| Per-target? | Usually per-IP | Across all traffic |
A premium proxy provider helps with both: IP rotation handles rate limits, and residential IPs avoid bot detection. For targets with aggressive bot detection, see our guides on Cloudflare Turnstile and Datadome.
Key Takeaways
- 429 means “slow down”, not “go away.” It’s a temporary limit, not a permanent block.
- Token bucket is the dominant algorithm — understand burst capacity vs. sustained rate.
- Multiple rate limits often stack — per-second, per-minute, and per-day limits simultaneously.
- Always check
Retry-AfterandX-RateLimit-*headers before implementing workarounds. - IP rotation is the standard solution for per-IP rate limits at scale.
- Respect rate limits on APIs where you have a legitimate account — push limits only on public data that you have a right to access.
Related Reads
- HTTP 403 Forbidden — When the issue isn’t rate limiting but bot detection
- Cloudflare Error 1020 — Cloudflare’s WAF often combines rate limiting with fingerprinting
- Best Residential Proxy Providers 2026 — IP rotation infrastructure for rate limit bypass
- Cheapest DIY Residential Proxy — Build your own rotation pool
- Protect Your Home IP — Why you need protection when running proxy infrastructure
ProxyOps Team
Independent infrastructure reviews from engineers who've deployed at scale. No vendor bias, just data.