Two layers of rate limiting protect expensive generation flows before costs drift
Understand how Vidbyte protects the API with request and token windows
The backend currently uses a dual-layer rate-limiting model. One limiter controls request bursts over a short window. The other controls total token consumption over a rolling window. Together they protect cost-intensive generation routes without forcing every route family to reinvent its own traffic rules.
Limiter 1
Requests per minute
Redis-backed request throttling protects the API from burst traffic.
Limiter 2
Tokens per 30 minutes
Mongo-backed token windows protect total spend across rolling usage.
Weighted routes
Yes
More expensive endpoints can count for more than one request unit.
Mechanics
Vidbyte applies both a short request window and a rolling token budget
The request limiter protects against bursts by tracking how many calls arrive in a short minute-level window. The token limiter protects against cost blowouts by tracking how many tokens the authenticated identity consumes over a longer rolling window.
These checks happen after identity is resolved, which means the system can apply limits in the context of the actual authenticated account instead of trying to guess from anonymous traffic alone.
Current backend configuration
The repo currently defines these tier-level limits
Free: 5 requests per minute and 100,000 tokens per 30 minutes.
Explorer: 10 requests per minute and 500,000 tokens per 30 minutes.
Pioneer: 50 requests per minute and 5,000,000 tokens per 30 minutes.
Enterprise: 200 requests per minute and 50,000,000 tokens per 30 minutes.
Weighted costs
Some endpoints count as more than one request because they are heavier
The current backend configuration already models certain expensive endpoints with heavier request weights. That means one call can consume more than one unit of request budget even when it still looks like a single HTTP request from the client side.
As a rule of thumb, generation-heavy routes should be treated as more operationally expensive than lightweight retrieval routes.
Failure behavior
Watch for the status code and headers rather than retrying blindly
429 indicates the request-rate limiter has been exceeded.
402 indicates the token-window budget has been exceeded.
Successful responses can include rate-limit headers that tell you how much budget remains.