Concepts

Rate Limiting

Rate limiting controls how many requests a client can make to your API within a given time window. It protects your backend from traffic spikes, enforces fair usage across consumers, and enables tiered access for different customer plans.

Zuplo's rate limiter uses a sliding window algorithm enforced globally across all edge locations. When a client exceeds the limit, they receive a 429 Too Many Requests response with a retry-after header indicating when they can retry.

Rate limiting policies

Zuplo provides two rate limiting policies, each suited to different levels of complexity.

Rate Limiting policy

The Rate Limiting policy enforces a single request counter per time window. Configure a maximum number of requests, a time window, and how to identify callers.


Code
{
  "name": "my-rate-limit-policy",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 100,
      "timeWindowMinutes": 1
    }
  }
}

Use this policy when you need a straightforward "X requests per Y minutes" limit.

Complex Rate Limiting policy

The Complex Rate Limiting policy supports multiple named counters in a single policy. Each counter tracks a different resource or unit of work.


Code
{
  "name": "my-complex-rate-limit-policy",
  "policyType": "complex-rate-limit-inbound",
  "handler": {
    "export": "ComplexRateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "timeWindowMinutes": 1,
      "limits": {
        "requests": 100,
        "compute": 500
      }
    }
  }
}

You can override counter increments programmatically per request using ComplexRateLimitInboundPolicy.setIncrements(). This is useful for usage-based pricing where different endpoints consume different amounts of a resource (for example, counting compute units or tokens instead of raw requests).

Choosing a policy

Scenario	Policy
Fixed requests-per-minute limit for all callers	Rate Limiting
Different limits per customer tier (free vs. paid)	Rate Limiting with a custom function
Counting multiple resources (requests + compute units)	Complex Rate Limiting
Usage-based billing with variable cost per request	Complex Rate Limiting with dynamic increments

How `rateLimitBy` works

The rateLimitBy option determines how the rate limiter groups requests into buckets. Both policies support the same four modes.

`ip`

Groups requests by the client's IP address. No authentication is required. This is the simplest option and works well for public APIs or as a first layer of protection.

`user`

Groups requests by the authenticated user's identity (request.user.sub). When using API key authentication, the sub value is the consumer name you assigned when creating the API key. When using JWT authentication, it comes from the token's sub claim.

This is the recommended mode for authenticated APIs because it ties limits to the actual consumer rather than a shared IP address.

`function`

Groups requests using a custom TypeScript function that you provide. The function returns a CustomRateLimitDetails object containing a grouping key and, optionally, overridden values for requestsAllowed and timeWindowMinutes.

This mode enables dynamic rate limiting where limits vary based on customer tier, route parameters, or any other request property.

`all`

Applies a single shared counter across all requests to the route, regardless of who makes them. Use this for global rate limits on endpoints that call resource-constrained backends.

Dynamic rate limiting with custom functions

When rateLimitBy is set to "function", you provide a TypeScript module that determines the rate limit at request time. The function signature is:


Code
import {
  CustomRateLimitDetails,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

export function rateLimit(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails | undefined {
  const user = request.user;

  if (user.data.customerType === "premium") {
    return {
      key: user.sub,
      requestsAllowed: 1000,
      timeWindowMinutes: 1,
    };
  }

  return {
    key: user.sub,
    requestsAllowed: 50,
    timeWindowMinutes: 1,
  };
}

The CustomRateLimitDetails object has the following properties:

key - The string used to group requests into rate limit buckets
requestsAllowed (optional) - Overrides the policy's requestsAllowed value
timeWindowMinutes (optional) - Overrides the policy's timeWindowMinutes value

Returning undefined skips rate limiting for that request entirely.

The function can also be async if you need to look up limits from a database or external service. See Per-user rate limiting using a database for a complete example using the ZoneCache for performance.

Wire the function into the policy configuration using the identifier option:


Code
{
  "export": "RateLimitInboundPolicy",
  "module": "$import(@zuplo/runtime)",
  "options": {
    "rateLimitBy": "function",
    "requestsAllowed": 50,
    "timeWindowMinutes": 1,
    "identifier": {
      "export": "rateLimit",
      "module": "$import(./modules/rate-limit)"
    }
  }
}

The requestsAllowed and timeWindowMinutes values in the policy configuration serve as defaults. The custom function can override them per request.

Combining rate limiting with authentication

Rate limiting works best when combined with authentication so that limits apply per consumer rather than per IP. A typical policy pipeline is:

Authentication (e.g., API Key Authentication) -- validates credentials and populates request.user
Rate Limiting with rateLimitBy: "user" -- enforces per-consumer limits using request.user.sub

With API key authentication, the consumer's metadata (stored when creating the key) is available at request.user.data. A custom rate limit function can read fields like customerType or plan from the metadata to apply tiered limits.

Rate limiting and monetization

If you use Zuplo's Monetization feature, the monetization policy handles quota enforcement based on subscription plans. You can still add a rate limiting policy after the monetization policy to provide per-second or per-minute spike protection on top of monthly billing quotas. These serve different purposes:

Monetization quotas enforce monthly or billing-period usage limits tied to a subscription plan
Rate limiting protects against short-duration traffic spikes that could overwhelm your backend

Combining multiple rate limit policies

You can apply multiple rate limiting policies to the same route. For example, you might enforce both a per-minute and a per-hour limit. When using multiple policies, apply the longest time window first, followed by shorter durations.

Additional options

Both rate limiting policies support the following additional options:

headerMode - Set to "retry-after" (default) to include the retry-after header in 429 responses, or "none" to omit it
mode - Set to "strict" (default) for synchronous enforcement, or "async" for non-blocking checks that may allow some requests over the limit
throwOnFailure - Set to true to return an error if the rate limit service is unreachable, or false (default) to allow the request through

Edit this page

Last modified on March 27, 2026

API Keys API Errors

Concepts

Rate Limiting

Rate limiting policies

Zuplo provides two rate limiting policies, each suited to different levels of complexity.

Rate Limiting policy

The Rate Limiting policy enforces a single request counter per time window. Configure a maximum number of requests, a time window, and how to identify callers.


Code
{
  "name": "my-rate-limit-policy",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 100,
      "timeWindowMinutes": 1
    }
  }
}

Use this policy when you need a straightforward "X requests per Y minutes" limit.

Complex Rate Limiting policy

The Complex Rate Limiting policy supports multiple named counters in a single policy. Each counter tracks a different resource or unit of work.


Code
{
  "name": "my-complex-rate-limit-policy",
  "policyType": "complex-rate-limit-inbound",
  "handler": {
    "export": "ComplexRateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "timeWindowMinutes": 1,
      "limits": {
        "requests": 100,
        "compute": 500
      }
    }
  }
}

Choosing a policy

Scenario	Policy
Fixed requests-per-minute limit for all callers	Rate Limiting
Different limits per customer tier (free vs. paid)	Rate Limiting with a custom function
Counting multiple resources (requests + compute units)	Complex Rate Limiting
Usage-based billing with variable cost per request	Complex Rate Limiting with dynamic increments

How `rateLimitBy` works

The rateLimitBy option determines how the rate limiter groups requests into buckets. Both policies support the same four modes.

`ip`

Groups requests by the client's IP address. No authentication is required. This is the simplest option and works well for public APIs or as a first layer of protection.

`user`

This is the recommended mode for authenticated APIs because it ties limits to the actual consumer rather than a shared IP address.

`function`

This mode enables dynamic rate limiting where limits vary based on customer tier, route parameters, or any other request property.

`all`

Applies a single shared counter across all requests to the route, regardless of who makes them. Use this for global rate limits on endpoints that call resource-constrained backends.

Dynamic rate limiting with custom functions

When rateLimitBy is set to "function", you provide a TypeScript module that determines the rate limit at request time. The function signature is:


Code
import {
  CustomRateLimitDetails,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

export function rateLimit(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails | undefined {
  const user = request.user;

  if (user.data.customerType === "premium") {
    return {
      key: user.sub,
      requestsAllowed: 1000,
      timeWindowMinutes: 1,
    };
  }

  return {
    key: user.sub,
    requestsAllowed: 50,
    timeWindowMinutes: 1,
  };
}

The CustomRateLimitDetails object has the following properties:

key - The string used to group requests into rate limit buckets
requestsAllowed (optional) - Overrides the policy's requestsAllowed value
timeWindowMinutes (optional) - Overrides the policy's timeWindowMinutes value

Returning undefined skips rate limiting for that request entirely.

Wire the function into the policy configuration using the identifier option:


Code
{
  "export": "RateLimitInboundPolicy",
  "module": "$import(@zuplo/runtime)",
  "options": {
    "rateLimitBy": "function",
    "requestsAllowed": 50,
    "timeWindowMinutes": 1,
    "identifier": {
      "export": "rateLimit",
      "module": "$import(./modules/rate-limit)"
    }
  }
}

The requestsAllowed and timeWindowMinutes values in the policy configuration serve as defaults. The custom function can override them per request.

Combining rate limiting with authentication

Rate limiting works best when combined with authentication so that limits apply per consumer rather than per IP. A typical policy pipeline is:

Authentication (e.g., API Key Authentication) -- validates credentials and populates request.user
Rate Limiting with rateLimitBy: "user" -- enforces per-consumer limits using request.user.sub

Rate limiting and monetization

Monetization quotas enforce monthly or billing-period usage limits tied to a subscription plan
Rate limiting protects against short-duration traffic spikes that could overwhelm your backend

Combining multiple rate limit policies

Additional options

Both rate limiting policies support the following additional options:

headerMode - Set to "retry-after" (default) to include the retry-after header in 429 responses, or "none" to omit it
mode - Set to "strict" (default) for synchronous enforcement, or "async" for non-blocking checks that may allow some requests over the limit
throwOnFailure - Set to true to return an error if the rate limit service is unreachable, or false (default) to allow the request through

Edit this page

Last modified on March 27, 2026

API Keys API Errors

Rate limiting policies

Rate Limiting policy

Complex Rate Limiting policy

Choosing a policy

How rateLimitBy works

ip

user

function

all

Dynamic rate limiting with custom functions

Combining rate limiting with authentication

Rate limiting and monetization

Combining multiple rate limit policies

Additional options

Related resources

Rate limiting policies

Rate Limiting policy

Complex Rate Limiting policy

Choosing a policy

How rateLimitBy works

ip

user

function

all

Dynamic rate limiting with custom functions

Combining rate limiting with authentication

Rate limiting and monetization

Combining multiple rate limit policies

Additional options

Related resources

How `rateLimitBy` works

`ip`

`user`

`function`

`all`

How `rateLimitBy` works

`ip`

`user`

`function`

`all`