Vidbyte

API documentation

Traffic controlsFundamentalsRoute-accurate reference

Two layers of rate limiting protect expensive generation flows before costs drift

Understand how Vidbyte protects the API with request and token windows

The backend currently uses a dual-layer rate-limiting model. One limiter controls request bursts over a short window. The other controls total token consumption over a rolling window. Together they protect cost-intensive generation routes without forcing every route family to reinvent its own traffic rules.

Manage keys
Stable public contract

Limiter 1

Requests per minute

Redis-backed request throttling protects the API from burst traffic.

Limiter 2

Tokens per 30 minutes

Mongo-backed token windows protect total spend across rolling usage.

Weighted routes

Yes

More expensive endpoints can count for more than one request unit.

Mechanics

Vidbyte applies both a short request window and a rolling token budget

The request limiter protects against bursts by tracking how many calls arrive in a short minute-level window. The token limiter protects against cost blowouts by tracking how many tokens the authenticated identity consumes over a longer rolling window.

These checks happen after identity is resolved, which means the system can apply limits in the context of the actual authenticated account instead of trying to guess from anonymous traffic alone.

Current backend configuration

The repo currently defines these tier-level limits

Free: 5 requests per minute and 100,000 tokens per 30 minutes.

Explorer: 10 requests per minute and 500,000 tokens per 30 minutes.

Pioneer: 50 requests per minute and 5,000,000 tokens per 30 minutes.

Enterprise: 200 requests per minute and 50,000,000 tokens per 30 minutes.

Weighted costs

Some endpoints count as more than one request because they are heavier

The current backend configuration already models certain expensive endpoints with heavier request weights. That means one call can consume more than one unit of request budget even when it still looks like a single HTTP request from the client side.

As a rule of thumb, generation-heavy routes should be treated as more operationally expensive than lightweight retrieval routes.

Failure behavior

Watch for the status code and headers rather than retrying blindly

429 indicates the request-rate limiter has been exceeded.

402 indicates the token-window budget has been exceeded.

Successful responses can include rate-limit headers that tell you how much budget remains.