Rate Limiting

A technique that controls the number of requests a user or system can make to an API or service within a specified time period.

Also known as:ThrottlingRequest Limiting

What is Rate Limiting?

Rate limiting is a technique used to control the rate of requests that clients can make to an API or service. It protects against abuse, ensures fair usage, maintains service stability, and manages costs.

Why Rate Limit?

Security

  • Prevent brute force attacks
  • Block credential stuffing
  • Mitigate DDoS
  • Stop scraping

Stability

  • Protect backend systems
  • Ensure availability
  • Manage load

Business

  • Enforce usage tiers
  • Control costs
  • Fair resource sharing

Rate Limiting Strategies

Fixed Window

  • X requests per time window
  • Simple to implement
  • Burst at window boundaries

Sliding Window

  • Smooth request distribution
  • More complex
  • Better protection

Token Bucket

  • Allows controlled bursts
  • Refills over time
  • Flexible

Leaky Bucket

  • Constant output rate
  • Queues excess requests
  • Smooths traffic

Implementation Levels

Application Per-endpoint limits.

User/API Key Per-account limits.

IP Address Per-source limits.

Global Total service capacity.

Response Handling

HTTP 429 Too Many Requests.

Retry-After Header When to retry.

X-RateLimit Headers

  • X-RateLimit-Limit
  • X-RateLimit-Remaining
  • X-RateLimit-Reset

Best Practices

  • Use multiple strategies
  • Communicate limits clearly
  • Provide rate limit headers
  • Allow reasonable bursts
  • Consider tiered limits