web-scraping TechArticle Information Gain: 8/10

429 Too Many Requests: How Rate Limiting Actually Works

Technical breakdown of how websites implement rate limiting. Covers token bucket, sliding window, 429 responses, and proxy-based solutions.

By ProxyOps Team ·

429 Too Many Requests: How Rate Limiting Actually Works

You’re scraping an API or website, everything works perfectly for the first 100 requests, then suddenly: 429 Too Many Requests. Your scraper stops dead. You wait 60 seconds, try again, get another 429. You add a delay, but it’s not enough. You have no idea what the actual limits are.

This article explains how rate limiting works under the hood — the algorithms, the signals, and the infrastructure decisions — so you can design scrapers that respect limits when appropriate and bypass them when your use case demands it.


What a 429 Response Means

HTTP 429 is defined in RFC 6585. Unlike a 403 Forbidden (which means “you’re not allowed”), 429 specifically means “you’re allowed, but you’ve sent too many requests.”

Key differences:

StatusMeaningTypical Cause
403Access deniedIP blocked, bot detected, authentication required
429Rate limitedToo many requests in time window
503Service unavailableServer overloaded or maintenance

A 429 response should include a Retry-After header telling you when to try again. In practice, only about 40% of sites actually include this header.

HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: text/plain

Rate limit exceeded. Please retry after 60 seconds.

How Rate Limiting Algorithms Work

1. Fixed Window Counter

The simplest algorithm. Count requests in fixed time windows (e.g., 1-minute blocks):

Window: 12:00:00 - 12:00:59 → 0/100 requests used
Request at 12:00:45 → 1/100
Request at 12:00:46 → 2/100
...
Request at 12:00:55 → 100/100 ← LIMIT REACHED
Request at 12:00:56 → 429 ← BLOCKED

Window resets: 12:01:00 → 0/100

Weakness: Burst attacks at window boundaries. If you send 100 requests at 12:00:59 and 100 more at 12:01:00, you’ve sent 200 requests in 2 seconds while technically staying within limits.

2. Sliding Window Log

Tracks the exact timestamp of every request:

Check: "How many requests in the last 60 seconds?"
If count >= limit → 429
Else → Allow

This prevents the boundary burst attack but is memory-intensive.

3. Token Bucket (Most Common)

The industry standard for API rate limiting. A “bucket” holds tokens:

  • Tokens are added at a fixed rate (e.g., 10 per second)
  • Each request consumes one token
  • If the bucket is empty, the request is rejected (429)
  • The bucket has a maximum capacity (burst limit)
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second

Steady state: 10 req/s sustained
Burst: Up to 100 requests instantly, then wait for refill

This is why some APIs allow short bursts but throttle sustained traffic.

4. Sliding Window Counter

A hybrid approach that divides time into sub-windows and uses weighted counting:

Current window (12:01): 85 requests
Previous window (12:00): 120 requests
Time into current window: 30s (50%)

Weighted count: 85 + (120 × 0.5) = 145
Limit: 150
Result: ALLOW (5 remaining)

What Gets Rate Limited (and What Doesn’t)

Rate limits are typically applied per:

ScopeExampleHow to Identify
IP address100 req/min per IPMost common for public sites
API key1000 req/day per keySaaS APIs
User account500 req/hour per userAuthenticated endpoints
Endpoint10 req/min to /searchExpensive operations
Global10,000 req/min totalSmall sites with limited capacity

Many sites apply multiple rate limits simultaneously:

  • 10 requests per second per IP (burst protection)
  • 1,000 requests per minute per IP (sustained protection)
  • 10,000 requests per day per IP (daily cap)

You can hit any of these independently.


Rate Limiting Infrastructure

Modern rate limiting is typically implemented at one of these layers:

CDN/Edge Level (Cloudflare, AWS CloudFront)

  • Fastest to apply (before request reaches origin)
  • Usually IP-based or geographic
  • Cloudflare’s WAF can combine rate limiting with bot detection

API Gateway (Kong, Nginx, AWS API Gateway)

  • Applied after CDN but before application code
  • Can use API keys, user accounts, or custom identifiers
  • Often configurable per endpoint

Application Level (Custom middleware)

  • Most flexible but slowest
  • Can use complex logic (account tier, payment status)
  • Redis-backed token buckets are the standard implementation
# Typical Redis-based rate limiter (Python)
import redis
import time

r = redis.Redis()

def is_rate_limited(client_ip, limit=100, window=60):
    key = f"rate_limit:{client_ip}"
    current = r.get(key)
    if current and int(current) >= limit:
        return True
    pipe = r.pipeline()
    pipe.incr(key)
    pipe.expire(key, window)
    pipe.execute()
    return False

How to Handle 429 Responses

Strategy 1: Exponential Backoff

The standard approach for API integrations:

import time
import random

def request_with_backoff(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url)
        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', 2 ** attempt))
            jitter = random.uniform(0, retry_after * 0.1)
            time.sleep(retry_after + jitter)
            continue
        return response
    raise Exception("Rate limit exceeded after max retries")

Strategy 2: Request Throttling

Proactively limit your request rate to stay below known thresholds:

import time

class Throttler:
    def __init__(self, requests_per_second=5):
        self.min_interval = 1.0 / requests_per_second
        self.last_request = 0

    def wait(self):
        elapsed = time.time() - self.last_request
        if elapsed < self.min_interval:
            time.sleep(self.min_interval - elapsed)
        self.last_request = time.time()

Strategy 3: IP Rotation with Proxies

When rate limits are per-IP and you need higher throughput, distribute requests across multiple IPs using residential proxies:

Without proxies:
1 IP × 10 req/min = 10 req/min total

With 100 rotating residential IPs:
100 IPs × 10 req/min = 1,000 req/min total

This is the most common approach for large-scale data collection. Bright Data and Oxylabs handle IP rotation automatically, distributing your requests across millions of residential IPs.

RecommendedTry Bright Data — Auto IP Rotationvia Scale past rate limits

Strategy 4: Distributed Architecture

For serious scale, distribute your scraping across multiple processes and machines:

Architecture:
├─ Job Queue (Redis/RabbitMQ)
│   └─ Contains target URLs with priority
├─ Worker Pool (10-100 workers)
│   ├─ Each worker uses a different proxy
│   ├─ Each worker respects per-IP rate limits
│   └─ Failed requests go back to queue
├─ Rate Limiter (shared Redis)
│   └─ Tracks global rate across all workers
└─ Results Store (PostgreSQL/S3)
    └─ Deduplication and storage

Detecting Rate Limit Patterns

Before you start scraping, you can often identify rate limits through:

  1. API documentation — best case, limits are published
  2. Response headers — look for X-RateLimit-* headers:
    X-RateLimit-Limit: 100
    X-RateLimit-Remaining: 23
    X-RateLimit-Reset: 1677777600
  3. Empirical testing — start slow, increase speed, note when 429s begin
  4. Retry-After header — tells you the exact wait time

Rate Limiting vs. Bot Detection

Important distinction: rate limiting and bot detection are separate systems that often work together.

FeatureRate LimitingBot Detection
Triggers onRequest volumeRequest characteristics
Response code429403 or challenge page
SolutionSlow down or rotate IPsFix fingerprint, use browser
Per-target?Usually per-IPAcross all traffic

A premium proxy provider helps with both: IP rotation handles rate limits, and residential IPs avoid bot detection. For targets with aggressive bot detection, see our guides on Cloudflare Turnstile and Datadome.


Key Takeaways

  1. 429 means “slow down”, not “go away.” It’s a temporary limit, not a permanent block.
  2. Token bucket is the dominant algorithm — understand burst capacity vs. sustained rate.
  3. Multiple rate limits often stack — per-second, per-minute, and per-day limits simultaneously.
  4. Always check Retry-After and X-RateLimit-* headers before implementing workarounds.
  5. IP rotation is the standard solution for per-IP rate limits at scale.
  6. Respect rate limits on APIs where you have a legitimate account — push limits only on public data that you have a right to access.

PS

ProxyOps Team

Independent infrastructure reviews from engineers who've deployed at scale. No vendor bias, just data.