Vidbyte

API documentation

Understand how Vidbyte protects the API with request and token windows

The backend currently uses a dual-layer rate-limiting model. One limiter controls request bursts over a short window. The other controls total token consumption over a rolling window. Together they protect cost-intensive generation routes without forcing every route family to reinvent its own traffic rules.

Vidbyte applies both a short request window and a rolling token budget

The request limiter protects against bursts by tracking how many calls arrive in a short minute-level window. The token limiter protects against cost blowouts by tracking how many tokens the authenticated identity consumes over a longer rolling window.

These checks happen after identity is resolved, which means the system can apply limits in the context of the actual authenticated account instead of trying to guess from anonymous traffic alone.

The repo currently defines these tier-level limits

TierRequests / minuteTokens / 30 minutes
Free5100,000
Explorer10500,000
Pioneer505,000,000
Enterprise20050,000,000

These values are sourced from the current backend rate-limiting configuration in `backend/RATE_LIMITING.md`.

Some endpoints count as more than one request because they are heavier

The current backend configuration already models certain expensive endpoints with heavier request weights. That means one call can consume more than one unit of request budget even when it still looks like a single HTTP request from the client side.

As a rule of thumb, generation-heavy routes should be treated as more operationally expensive than lightweight retrieval routes.

WeightCurrent routes
3/api/quickhits/create, /api/roadmap/create, /api/socratic/start, /api/socratic/analyze
2/api/quickhits/expand, /api/socratic/answer
1/api/socratic/hint, /api/socratic/reveal

The backend weighting table currently includes internal or adjacent routes alongside the public Vidbyte v1 endpoints.

Watch for the status code and headers rather than retrying blindly

429 indicates the request-rate limiter has been exceeded.

402 indicates the token-window budget has been exceeded.

Successful responses can include rate-limit headers that tell you how much budget remains.

Rate-limit headers

Common response headers

X-RateLimit-Remaining-Requests
X-RateLimit-Reset-Requests
X-RateLimit-Remaining-Tokens