Rate Limiting
cmfy.cloud uses rate limiting to ensure fair resource distribution and protect the platform from abuse. This page explains how limits work and how to handle them.
Types of Limits
Three types of limits protect the platform:
| Limit Type | What It Measures | Why It Exists |
|---|---|---|
| Requests Per Minute (RPM) | API calls per minute | Prevents API abuse |
| Concurrent Jobs | Jobs currently running | Prevents GPU monopolization |
| Queue Depth | Jobs waiting to start | Prevents queue flooding |
Tier Defaults
Each plan has default limits:
| Tier | RPM | Concurrent Jobs | Queue Depth |
|---|---|---|---|
| Free | 60 | 2 | 100 |
| Pro | 300 | 10 | 100 |
| Enterprise | 1,000 | 50 | 100 |
Enterprise customers can request custom limits. Contact support to discuss your needs.
Rate Limit Headers
Every API response includes rate limit information:
HTTP/1.1 202 Accepted
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1699574400
X-Concurrent-Limit: 2
X-Concurrent-Current: 1
X-Queue-Limit: 100
X-Queue-Current: 3
| Header | Description |
|---|---|
X-RateLimit-Limit | Your RPM limit |
X-RateLimit-Remaining | Remaining requests this minute |
X-RateLimit-Reset | Unix timestamp when limit resets |
X-Concurrent-Limit | Maximum concurrent jobs |
X-Concurrent-Current | Currently running jobs |
X-Queue-Limit | Maximum queued jobs |
X-Queue-Current | Currently queued jobs |
Handling 429 Responses
When you exceed a limit, you receive a 429 Too Many Requests response:
RPM Exceeded
{
"error": {
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Try again in 45 seconds.",
"retry_after": 45
}
}
The Retry-After header tells you when to retry:
HTTP/1.1 429 Too Many Requests
Retry-After: 45
Concurrent Limit Reached
{
"error": {
"code": "concurrent_limit_reached",
"message": "Maximum concurrent jobs reached (2). Wait for a job to complete."
}
}
This happens when you have the maximum number of jobs currently running. Wait for one to complete before submitting more.
Queue Full
{
"error": {
"code": "queue_full",
"message": "Queue limit reached (5 jobs). Wait for existing jobs to complete.",
"queued_jobs": 5
}
}
Your personal queue is full. Wait for jobs to start processing before adding more.
Best Practices
1. Implement Exponential Backoff
When you hit a rate limit, don't retry immediately. Use exponential backoff:
async function submitWithBackoff(workflow, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch('/v1/jobs', {
method: 'POST',
body: JSON.stringify({ prompt: workflow }),
headers: { 'Authorization': `Bearer ${apiKey}` }
});
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After') || 60;
const delay = Math.min(retryAfter * 1000, Math.pow(2, attempt) * 1000);
await new Promise(r => setTimeout(r, delay));
continue;
}
return response.json();
}
throw new Error('Max retries exceeded');
}
2. Monitor Your Usage
Check rate limit headers proactively:
function checkLimits(response) {
const remaining = parseInt(response.headers.get('X-RateLimit-Remaining'));
const queueCurrent = parseInt(response.headers.get('X-Queue-Current'));
const queueLimit = parseInt(response.headers.get('X-Queue-Limit'));
if (remaining < 10) {
console.warn('Approaching RPM limit');
}
if (queueLimit - queueCurrent < 2) {
console.warn('Queue almost full');
}
}
3. Use Webhooks Instead of Polling
Polling for job status consumes RPM:
❌ Bad: Poll every second (60+ requests/minute)
GET /v1/jobs/{id}
GET /v1/jobs/{id}
GET /v1/jobs/{id}
...
✓ Good: Use webhooks (1 request + 1 webhook)
POST /v1/jobs → webhook notification
Webhooks don't count against your rate limit.
4. Batch When Possible
Instead of making many small requests:
❌ 100 separate API calls
Consider:
✓ Submit workflow that generates multiple images in one job
5. Respect Concurrent Limits
Don't submit more jobs than your concurrent limit allows:
const MAX_CONCURRENT = 2; // Your tier's limit
const pendingJobs = new Set();
async function submitJob(workflow) {
// Wait if at capacity
while (pendingJobs.size >= MAX_CONCURRENT) {
await new Promise(r => setTimeout(r, 1000));
}
const { job_id } = await submitWithBackoff(workflow);
pendingJobs.add(job_id);
// Remove when complete (via webhook callback)
return job_id;
}
Understanding the Sliding Window
RPM limits use a sliding window algorithm:
At any moment, the system counts requests in the past 60 seconds. This is smoother than fixed windows (which would reset every minute).
Example with 60 RPM limit:
- T=0: 30 requests → 30 remaining
- T=30: 20 more requests → 10 remaining
- T=60: First 30 requests "expire" → 40 remaining
- T=90: Next 20 expire → 60 remaining
Error Reference
| HTTP Status | Error Code | Meaning |
|---|---|---|
| 429 | rate_limit_exceeded | RPM limit exceeded |
| 429 | concurrent_limit_reached | Max running jobs |
| 429 | queue_full | Personal queue at capacity |
What's Next?
- Fair Queuing - How jobs are scheduled
- Architecture Overview - How the system works