Understand how Vidbyte protects the API with request and token windows
The backend currently uses a dual-layer rate-limiting model. One limiter controls request bursts over a short window. The other controls total token consumption over a rolling window. Together they protect cost-intensive generation routes without forcing every route family to reinvent its own traffic rules.
Vidbyte applies both a short request window and a rolling token budget
The request limiter protects against bursts by tracking how many calls arrive in a short minute-level window. The token limiter protects against cost blowouts by tracking how many tokens the authenticated identity consumes over a longer rolling window.
These checks happen after identity is resolved, which means the system can apply limits in the context of the actual authenticated account instead of trying to guess from anonymous traffic alone.
The repo currently defines these tier-level limits
| Tier | Requests / minute | Tokens / 30 minutes |
|---|---|---|
| Free | 5 | 100,000 |
| Explorer | 10 | 500,000 |
| Pioneer | 50 | 5,000,000 |
| Enterprise | 200 | 50,000,000 |
These values are sourced from the current backend rate-limiting configuration in `backend/RATE_LIMITING.md`.
Some endpoints count as more than one request because they are heavier
The current backend configuration already models certain expensive endpoints with heavier request weights. That means one call can consume more than one unit of request budget even when it still looks like a single HTTP request from the client side.
As a rule of thumb, generation-heavy routes should be treated as more operationally expensive than lightweight retrieval routes.
| Weight | Current routes |
|---|---|
| 3 | /api/quickhits/create, /api/roadmap/create, /api/socratic/start, /api/socratic/analyze |
| 2 | /api/quickhits/expand, /api/socratic/answer |
| 1 | /api/socratic/hint, /api/socratic/reveal |
The backend weighting table currently includes internal or adjacent routes alongside the public Vidbyte v1 endpoints.
Watch for the status code and headers rather than retrying blindly
429 indicates the request-rate limiter has been exceeded.
402 indicates the token-window budget has been exceeded.
Successful responses can include rate-limit headers that tell you how much budget remains.
Rate-limit headers
Common response headers
X-RateLimit-Remaining-Requests
X-RateLimit-Reset-Requests
X-RateLimit-Remaining-Tokens