Rate Limiting

cmfy.cloud uses rate limiting to ensure fair resource distribution and protect the platform from abuse. This page explains how limits work and how to handle them.

Types of Limits

Three types of limits protect the platform:

Limit Type	What It Measures	Why It Exists
Requests Per Minute (RPM)	API calls per minute	Prevents API abuse
Concurrent Jobs	Jobs currently running	Prevents GPU monopolization
Queue Depth	Jobs waiting to start	Prevents queue flooding

Tier Defaults

Each plan has default limits:

Tier	RPM	Concurrent Jobs	Queue Depth
Free	60	2	100
Pro	300	10	100
Enterprise	1,000	50	100

Custom Limits

Enterprise customers can request custom limits. Contact support to discuss your needs.

Rate Limit Headers

Every API response includes rate limit information:

HTTP/1.1 202 Accepted
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1699574400
X-Concurrent-Limit: 2
X-Concurrent-Current: 1
X-Queue-Limit: 100
X-Queue-Current: 3

Header	Description
`X-RateLimit-Limit`	Your RPM limit
`X-RateLimit-Remaining`	Remaining requests this minute
`X-RateLimit-Reset`	Unix timestamp when limit resets
`X-Concurrent-Limit`	Maximum concurrent jobs
`X-Concurrent-Current`	Currently running jobs
`X-Queue-Limit`	Maximum queued jobs
`X-Queue-Current`	Currently queued jobs

Handling 429 Responses

When you exceed a limit, you receive a 429 Too Many Requests response:

RPM Exceeded

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Try again in 45 seconds.",
    "retry_after": 45
  }
}

The Retry-After header tells you when to retry:

HTTP/1.1 429 Too Many Requests
Retry-After: 45

Concurrent Limit Reached

{
  "error": {
    "code": "concurrent_limit_reached",
    "message": "Maximum concurrent jobs reached (2). Wait for a job to complete."
  }
}

This happens when you have the maximum number of jobs currently running. Wait for one to complete before submitting more.

Queue Full

{
  "error": {
    "code": "queue_full",
    "message": "Queue limit reached (5 jobs). Wait for existing jobs to complete.",
    "queued_jobs": 5
  }
}

Your personal queue is full. Wait for jobs to start processing before adding more.

Best Practices

1. Implement Exponential Backoff

When you hit a rate limit, don't retry immediately. Use exponential backoff:

async function submitWithBackoff(workflow, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch('/v1/jobs', {
      method: 'POST',
      body: JSON.stringify({ prompt: workflow }),
      headers: { 'Authorization': `Bearer ${apiKey}` }
    });

    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After') || 60;
      const delay = Math.min(retryAfter * 1000, Math.pow(2, attempt) * 1000);
      await new Promise(r => setTimeout(r, delay));
      continue;
    }

    return response.json();
  }
  throw new Error('Max retries exceeded');
}

2. Monitor Your Usage

Check rate limit headers proactively:

function checkLimits(response) {
  const remaining = parseInt(response.headers.get('X-RateLimit-Remaining'));
  const queueCurrent = parseInt(response.headers.get('X-Queue-Current'));
  const queueLimit = parseInt(response.headers.get('X-Queue-Limit'));

  if (remaining < 10) {
    console.warn('Approaching RPM limit');
  }
  if (queueLimit - queueCurrent < 2) {
    console.warn('Queue almost full');
  }
}

3. Use Webhooks Instead of Polling

Polling for job status consumes RPM:

❌ Bad: Poll every second (60+ requests/minute)
GET /v1/jobs/{id}
GET /v1/jobs/{id}
GET /v1/jobs/{id}
...

✓ Good: Use webhooks (1 request + 1 webhook)
POST /v1/jobs → webhook notification

Webhooks don't count against your rate limit.

4. Batch When Possible

Instead of making many small requests:

❌ 100 separate API calls

Consider:

✓ Submit workflow that generates multiple images in one job

5. Respect Concurrent Limits

Don't submit more jobs than your concurrent limit allows:

const MAX_CONCURRENT = 2; // Your tier's limit
const pendingJobs = new Set();

async function submitJob(workflow) {
  // Wait if at capacity
  while (pendingJobs.size >= MAX_CONCURRENT) {
    await new Promise(r => setTimeout(r, 1000));
  }

  const { job_id } = await submitWithBackoff(workflow);
  pendingJobs.add(job_id);

  // Remove when complete (via webhook callback)
  return job_id;
}

Understanding the Sliding Window

RPM limits use a sliding window algorithm:

At any moment, the system counts requests in the past 60 seconds. This is smoother than fixed windows (which would reset every minute).

Example with 60 RPM limit:

T=0: 30 requests → 30 remaining
T=30: 20 more requests → 10 remaining
T=60: First 30 requests "expire" → 40 remaining
T=90: Next 20 expire → 60 remaining

Error Reference

HTTP Status	Error Code	Meaning
429	`rate_limit_exceeded`	RPM limit exceeded
429	`concurrent_limit_reached`	Max running jobs
429	`queue_full`	Personal queue at capacity

What's Next?

Fair Queuing - How jobs are scheduled
Architecture Overview - How the system works

Was this page helpful?

Types of Limits​

Tier Defaults​

Rate Limit Headers​

Handling 429 Responses​

RPM Exceeded​

Concurrent Limit Reached​

Queue Full​

Best Practices​

1. Implement Exponential Backoff​

2. Monitor Your Usage​

3. Use Webhooks Instead of Polling​

4. Batch When Possible​

5. Respect Concurrent Limits​

Understanding the Sliding Window​

Error Reference​

What's Next?​