Rate Limiting Strategies for SaaS APIs
Rate limiting protects your API from abuse, prevents a single user from consuming all resources, and keeps your infrastructure costs predictable. This article covers the three main algorithms, when to use each, and how Styrby implements rate limiting with Upstash Redis.
Three Algorithms
Fixed Window
Divide time into fixed intervals (e.g., 1 minute) and count requests per interval. When the count exceeds the limit, reject requests until the next interval starts.
// Fixed window: 100 requests per minute
const key = `rate:${userId}:${Math.floor(Date.now() / 60000)}`;
const count = await redis.incr(key);
if (count === 1) await redis.expire(key, 60);
if (count > 100) return { status: 429, retryAfter: 60 - (Date.now() % 60000) / 1000 };Pro: Simple to implement and understand. Low memory: one counter per user per window.
Con: Burst problem at window boundaries. A user can send 100 requests at second 59 of window 1, then 100 more at second 0 of window 2. That is 200 requests in 2 seconds despite a 100/minute limit.
Sliding Window
Combines the current window count with a weighted portion of the previous window to smooth out boundary bursts:
// Sliding window: 100 requests per minute
const currentWindow = Math.floor(Date.now() / 60000);
const previousWindow = currentWindow - 1;
const elapsed = (Date.now() % 60000) / 60000; // 0.0 to 1.0
const currentCount = await redis.get(`rate:${userId}:${currentWindow}`) || 0;
const previousCount = await redis.get(`rate:${userId}:${previousWindow}`) || 0;
const weightedCount = previousCount * (1 - elapsed) + currentCount;
if (weightedCount >= 100) return { status: 429 };Pro: Eliminates the boundary burst problem. Still uses fixed memory per user.
Con: The weighted count is an approximation, not exact. For most applications, the approximation is close enough.
Token Bucket
Each user has a bucket that fills with tokens at a steady rate. Each request consumes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity that limits bursts.
// Token bucket: 100 tokens/minute, max burst of 20
interface Bucket {
tokens: number;
lastRefill: number;
}
function checkRateLimit(bucket: Bucket): { allowed: boolean; bucket: Bucket } {
const now = Date.now();
const elapsed = (now - bucket.lastRefill) / 1000;
const refillRate = 100 / 60; // tokens per second
// Refill tokens based on elapsed time
const newTokens = Math.min(
20, // max burst capacity
bucket.tokens + elapsed * refillRate
);
if (newTokens < 1) {
return { allowed: false, bucket: { ...bucket, tokens: newTokens } };
}
return {
allowed: true,
bucket: { tokens: newTokens - 1, lastRefill: now },
};
}Pro: Natural burst handling. Allows short bursts up to the bucket capacity while enforcing the average rate.
Con: Slightly more complex. Requires storing the token count and last refill timestamp per user.
When to Use Each
| Algorithm | Best For | Avoid When |
|---|---|---|
| Fixed Window | Simple APIs, internal services, prototyping | Boundary bursts would cause problems |
| Sliding Window | Public APIs, SaaS products, general use | You need exact (not approximate) counting |
| Token Bucket | APIs that should allow controlled bursts | Simple rate limits suffice |
Per-Endpoint vs. Per-User Limits
Styrby uses both:
- Per-user global limit: 1,000 requests per minute across all endpoints. Prevents any single user from overwhelming the system.
- Per-endpoint limits: Sensitive endpoints have tighter limits. The session creation endpoint allows 10 requests per minute. The cost data endpoint allows 60 requests per minute.
Both limits apply simultaneously. A user might be within their global limit but hit the per-endpoint limit on a specific route.
Implementation with Upstash Redis
Styrby uses Upstash Redis for rate limiting because it provides a serverless Redis instance that works well with Vercel and Supabase Edge Functions. The @upstash/ratelimit library handles the algorithm implementation:
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = new Redis({
url: process.env.UPSTASH_REDIS_URL!,
token: process.env.UPSTASH_REDIS_TOKEN!,
});
// Sliding window: 10 session creations per minute
const sessionCreateLimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(10, "1 m"),
prefix: "ratelimit:session-create",
});
// In the API route handler
export async function POST(request: Request) {
const userId = getUserId(request);
const { success, limit, remaining, reset } =
await sessionCreateLimit.limit(userId);
if (!success) {
return Response.json(
{ error: "RATE_LIMITED", message: "Too many requests", retryAfter: reset },
{
status: 429,
headers: {
"X-RateLimit-Limit": String(limit),
"X-RateLimit-Remaining": String(remaining),
"X-RateLimit-Reset": String(reset),
"Retry-After": String(Math.ceil((reset - Date.now()) / 1000)),
},
}
);
}
// Process the request...
}Rate Limit Headers
Always include rate limit information in response headers so clients can self-regulate:
X-RateLimit-Limit: Maximum requests allowedX-RateLimit-Remaining: Requests remaining in windowX-RateLimit-Reset: Unix timestamp when the window resetsRetry-After: Seconds until the client should retry (on 429 only)
Monitoring and Tuning
Track how often rate limits fire. If legitimate users regularly hit limits, the thresholds are too low. If limits never fire, they might be too high to provide protection. Review rate limit metrics monthly and adjust based on actual usage patterns.
Ready to manage your AI agents from one place?
Styrby gives you cost tracking, remote permissions, and session replay across five agents.