429 Too Many Requests: How Rate Limiting Actually Works

You’re scraping an API or website, everything works perfectly for the first 100 requests, then suddenly: 429 Too Many Requests. Your scraper stops dead. You wait 60 seconds, try again, get another 429. You add a delay, but it’s not enough. You have no idea what the actual limits are.

This article explains how rate limiting works under the hood — the algorithms, the signals, and the infrastructure decisions — so you can design scrapers that respect limits when appropriate and bypass them when your use case demands it.

What a 429 Response Means

HTTP 429 is defined in RFC 6585. Unlike a 403 Forbidden (which means “you’re not allowed”), 429 specifically means “you’re allowed, but you’ve sent too many requests.”

Key differences:

Status	Meaning	Typical Cause
403	Access denied	IP blocked, bot detected, authentication required
429	Rate limited	Too many requests in time window
503	Service unavailable	Server overloaded or maintenance

A 429 response should include a Retry-After header telling you when to try again. In practice, only about 40% of sites actually include this header.

HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: text/plain

Rate limit exceeded. Please retry after 60 seconds.

How Rate Limiting Algorithms Work

1. Fixed Window Counter

The simplest algorithm. Count requests in fixed time windows (e.g., 1-minute blocks):

Window: 12:00:00 - 12:00:59 → 0/100 requests used
Request at 12:00:45 → 1/100
Request at 12:00:46 → 2/100
...
Request at 12:00:55 → 100/100 ← LIMIT REACHED
Request at 12:00:56 → 429 ← BLOCKED

Window resets: 12:01:00 → 0/100

Weakness: Burst attacks at window boundaries. If you send 100 requests at 12:00:59 and 100 more at 12:01:00, you’ve sent 200 requests in 2 seconds while technically staying within limits.

2. Sliding Window Log

Tracks the exact timestamp of every request:

Check: "How many requests in the last 60 seconds?"
If count >= limit → 429
Else → Allow

This prevents the boundary burst attack but is memory-intensive.

3. Token Bucket (Most Common)

The industry standard for API rate limiting. A “bucket” holds tokens:

Tokens are added at a fixed rate (e.g., 10 per second)
Each request consumes one token
If the bucket is empty, the request is rejected (429)
The bucket has a maximum capacity (burst limit)

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second

Steady state: 10 req/s sustained
Burst: Up to 100 requests instantly, then wait for refill

This is why some APIs allow short bursts but throttle sustained traffic.

4. Sliding Window Counter

A hybrid approach that divides time into sub-windows and uses weighted counting:

Current window (12:01): 85 requests
Previous window (12:00): 120 requests
Time into current window: 30s (50%)

Weighted count: 85 + (120 × 0.5) = 145
Limit: 150
Result: ALLOW (5 remaining)

What Gets Rate Limited (and What Doesn’t)

Rate limits are typically applied per:

Scope	Example	How to Identify
IP address	100 req/min per IP	Most common for public sites
API key	1000 req/day per key	SaaS APIs
User account	500 req/hour per user	Authenticated endpoints
Endpoint	10 req/min to `/search`	Expensive operations
Global	10,000 req/min total	Small sites with limited capacity

Many sites apply multiple rate limits simultaneously:

10 requests per second per IP (burst protection)
1,000 requests per minute per IP (sustained protection)
10,000 requests per day per IP (daily cap)

You can hit any of these independently.

Rate Limiting Infrastructure

Modern rate limiting is typically implemented at one of these layers:

CDN/Edge Level (Cloudflare, AWS CloudFront)

Fastest to apply (before request reaches origin)
Usually IP-based or geographic
Cloudflare’s WAF can combine rate limiting with bot detection

API Gateway (Kong, Nginx, AWS API Gateway)

Applied after CDN but before application code
Can use API keys, user accounts, or custom identifiers
Often configurable per endpoint

Application Level (Custom middleware)

Most flexible but slowest
Can use complex logic (account tier, payment status)
Redis-backed token buckets are the standard implementation

# Typical Redis-based rate limiter (Python)
import redis
import time

r = redis.Redis()

def is_rate_limited(client_ip, limit=100, window=60):
    key = f"rate_limit:{client_ip}"
    current = r.get(key)
    if current and int(current) >= limit:
        return True
    pipe = r.pipeline()
    pipe.incr(key)
    pipe.expire(key, window)
    pipe.execute()
    return False

How to Handle 429 Responses

Strategy 1: Exponential Backoff

The standard approach for API integrations:

import time
import random

def request_with_backoff(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url)
        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', 2 ** attempt))
            jitter = random.uniform(0, retry_after * 0.1)
            time.sleep(retry_after + jitter)
            continue
        return response
    raise Exception("Rate limit exceeded after max retries")

Strategy 2: Request Throttling

Proactively limit your request rate to stay below known thresholds:

import time

class Throttler:
    def __init__(self, requests_per_second=5):
        self.min_interval = 1.0 / requests_per_second
        self.last_request = 0

    def wait(self):
        elapsed = time.time() - self.last_request
        if elapsed < self.min_interval:
            time.sleep(self.min_interval - elapsed)
        self.last_request = time.time()

Strategy 3: IP Rotation with Proxies

When rate limits are per-IP and you need higher throughput, distribute requests across multiple IPs using residential proxies:

Without proxies:
1 IP × 10 req/min = 10 req/min total

With 100 rotating residential IPs:
100 IPs × 10 req/min = 1,000 req/min total

This is the most common approach for large-scale data collection. Bright Data and Oxylabs handle IP rotation automatically, distributing your requests across millions of residential IPs.

Strategy 4: Distributed Architecture

For serious scale, distribute your scraping across multiple processes and machines:

Architecture:
├─ Job Queue (Redis/RabbitMQ)
│   └─ Contains target URLs with priority
├─ Worker Pool (10-100 workers)
│   ├─ Each worker uses a different proxy
│   ├─ Each worker respects per-IP rate limits
│   └─ Failed requests go back to queue
├─ Rate Limiter (shared Redis)
│   └─ Tracks global rate across all workers
└─ Results Store (PostgreSQL/S3)
    └─ Deduplication and storage

Detecting Rate Limit Patterns

Before you start scraping, you can often identify rate limits through:

API documentation — best case, limits are published

Response headers — look for X-RateLimit-* headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 23
X-RateLimit-Reset: 1677777600

Empirical testing — start slow, increase speed, note when 429s begin
Retry-After header — tells you the exact wait time

Rate Limiting vs. Bot Detection

Important distinction: rate limiting and bot detection are separate systems that often work together.

Feature	Rate Limiting	Bot Detection
Triggers on	Request volume	Request characteristics
Response code	429	403 or challenge page
Solution	Slow down or rotate IPs	Fix fingerprint, use browser
Per-target?	Usually per-IP	Across all traffic

A premium proxy provider helps with both: IP rotation handles rate limits, and residential IPs avoid bot detection. For targets with aggressive bot detection, see our guides on Cloudflare Turnstile and Datadome.

Key Takeaways

429 means “slow down”, not “go away.” It’s a temporary limit, not a permanent block.
Token bucket is the dominant algorithm — understand burst capacity vs. sustained rate.
Multiple rate limits often stack — per-second, per-minute, and per-day limits simultaneously.
Always check Retry-After and X-RateLimit-* headers before implementing workarounds.
IP rotation is the standard solution for per-IP rate limits at scale.
Respect rate limits on APIs where you have a legitimate account — push limits only on public data that you have a right to access.

HTTP 403 Forbidden — When the issue isn’t rate limiting but bot detection
Cloudflare Error 1020 — Cloudflare’s WAF often combines rate limiting with fingerprinting
Best Residential Proxy Providers 2026 — IP rotation infrastructure for rate limit bypass
Cheapest DIY Residential Proxy — Build your own rotation pool
Protect Your Home IP — Why you need protection when running proxy infrastructure