Skip to main content

Rate Limiting

cmfy.cloud uses rate limiting to ensure fair resource distribution and protect the platform from abuse. This page explains how limits work and how to handle them.

Types of Limits

Three types of limits protect the platform:

Limit TypeWhat It MeasuresWhy It Exists
Requests Per Minute (RPM)API calls per minutePrevents API abuse
Concurrent JobsJobs currently runningPrevents GPU monopolization
Queue DepthJobs waiting to startPrevents queue flooding

Tier Defaults

Each plan has default limits:

TierRPMConcurrent JobsQueue Depth
Free602100
Pro30010100
Enterprise1,00050100
Custom Limits

Enterprise customers can request custom limits. Contact support to discuss your needs.

Rate Limit Headers

Every API response includes rate limit information:

HTTP/1.1 202 Accepted
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1699574400
X-Concurrent-Limit: 2
X-Concurrent-Current: 1
X-Queue-Limit: 100
X-Queue-Current: 3
HeaderDescription
X-RateLimit-LimitYour RPM limit
X-RateLimit-RemainingRemaining requests this minute
X-RateLimit-ResetUnix timestamp when limit resets
X-Concurrent-LimitMaximum concurrent jobs
X-Concurrent-CurrentCurrently running jobs
X-Queue-LimitMaximum queued jobs
X-Queue-CurrentCurrently queued jobs

Handling 429 Responses

When you exceed a limit, you receive a 429 Too Many Requests response:

RPM Exceeded

{
"error": {
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Try again in 45 seconds.",
"retry_after": 45
}
}

The Retry-After header tells you when to retry:

HTTP/1.1 429 Too Many Requests
Retry-After: 45

Concurrent Limit Reached

{
"error": {
"code": "concurrent_limit_reached",
"message": "Maximum concurrent jobs reached (2). Wait for a job to complete."
}
}

This happens when you have the maximum number of jobs currently running. Wait for one to complete before submitting more.

Queue Full

{
"error": {
"code": "queue_full",
"message": "Queue limit reached (5 jobs). Wait for existing jobs to complete.",
"queued_jobs": 5
}
}

Your personal queue is full. Wait for jobs to start processing before adding more.

Best Practices

1. Implement Exponential Backoff

When you hit a rate limit, don't retry immediately. Use exponential backoff:

async function submitWithBackoff(workflow, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch('/v1/jobs', {
method: 'POST',
body: JSON.stringify({ prompt: workflow }),
headers: { 'Authorization': `Bearer ${apiKey}` }
});

if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After') || 60;
const delay = Math.min(retryAfter * 1000, Math.pow(2, attempt) * 1000);
await new Promise(r => setTimeout(r, delay));
continue;
}

return response.json();
}
throw new Error('Max retries exceeded');
}

2. Monitor Your Usage

Check rate limit headers proactively:

function checkLimits(response) {
const remaining = parseInt(response.headers.get('X-RateLimit-Remaining'));
const queueCurrent = parseInt(response.headers.get('X-Queue-Current'));
const queueLimit = parseInt(response.headers.get('X-Queue-Limit'));

if (remaining < 10) {
console.warn('Approaching RPM limit');
}
if (queueLimit - queueCurrent < 2) {
console.warn('Queue almost full');
}
}

3. Use Webhooks Instead of Polling

Polling for job status consumes RPM:

❌ Bad: Poll every second (60+ requests/minute)
GET /v1/jobs/{id}
GET /v1/jobs/{id}
GET /v1/jobs/{id}
...
✓ Good: Use webhooks (1 request + 1 webhook)
POST /v1/jobs → webhook notification

Webhooks don't count against your rate limit.

4. Batch When Possible

Instead of making many small requests:

❌ 100 separate API calls

Consider:

✓ Submit workflow that generates multiple images in one job

5. Respect Concurrent Limits

Don't submit more jobs than your concurrent limit allows:

const MAX_CONCURRENT = 2; // Your tier's limit
const pendingJobs = new Set();

async function submitJob(workflow) {
// Wait if at capacity
while (pendingJobs.size >= MAX_CONCURRENT) {
await new Promise(r => setTimeout(r, 1000));
}

const { job_id } = await submitWithBackoff(workflow);
pendingJobs.add(job_id);

// Remove when complete (via webhook callback)
return job_id;
}

Understanding the Sliding Window

RPM limits use a sliding window algorithm:

At any moment, the system counts requests in the past 60 seconds. This is smoother than fixed windows (which would reset every minute).

Example with 60 RPM limit:

  • T=0: 30 requests → 30 remaining
  • T=30: 20 more requests → 10 remaining
  • T=60: First 30 requests "expire" → 40 remaining
  • T=90: Next 20 expire → 60 remaining

Error Reference

HTTP StatusError CodeMeaning
429rate_limit_exceededRPM limit exceeded
429concurrent_limit_reachedMax running jobs
429queue_fullPersonal queue at capacity

What's Next?

Was this page helpful?