Fair Queuing
cmfy.cloud serves many users with a shared pool of GPUs. Fair queuing ensures everyone gets reasonable access, even when the system is busy.
The Problem: Noisy Neighbors
Without protection, a single user could:
- Submit thousands of jobs at once
- Fill the entire processing queue
- Monopolize all GPU resources
- Cause long waits for everyone else
This is the "noisy neighbor" problem - one heavy user degrades the experience for everyone.
The Solution: Per-User Fair Queues
Instead of a single global queue, cmfy.cloud uses per-user queues with fair scheduling:
The scheduler cycles through active users, giving each one a turn. This prevents any single user from blocking others.
How It Works
Round-Robin Scheduling
Jobs are processed in a rotating order across users:
| Turn | User | Job Processed |
|---|---|---|
| 1 | User A | Job 1 |
| 2 | User B | Job 1 |
| 3 | User C | Job 1 |
| 4 | User A | Job 2 |
| 5 | User C | Job 2 |
| 6 | User A | Job 3 |
| 7 | User A | Job 4 |
| 8 | User A | Job 5 |
Notice how:
- User B had 1 job - processed on turn 2
- User C had 2 jobs - processed on turns 3 and 5
- User A had 5 jobs - spread across multiple turns
Even though User A submitted the most jobs, Users B and C weren't blocked.
Your Position in Queue
When you submit a job, you receive a queue position:
{
"job_id": "...",
"status": "queued",
"queue_position": 3,
"estimated_wait_seconds": 45
}
This is your position in your personal queue, not the global queue. Your actual wait time depends on:
- How many other users have queued jobs
- How many jobs are ahead of you in your queue
- Current GPU availability
Priority Tiers
Paid plans get priority processing through weighted scheduling:
| Plan | Priority Level | Effective Weight |
|---|---|---|
| Enterprise | High | 2× more likely to be selected |
| Pro | Normal | 1.5× more likely to be selected |
| Free | Low | Baseline |
Higher-weight users get more "turns" in the round-robin, resulting in faster processing.
What to Expect Under Load
Light Load (Most of the Time)
When the system isn't busy:
- Jobs start almost immediately
- Queues are typically empty
- Wait times are minimal
Moderate Load
During busy periods:
- Some queuing occurs
- Fair scheduling kicks in
- Wait times may be 30-60 seconds
High Load (Rare)
During peak demand:
- Longer queues form
- Fair scheduling ensures everyone makes progress
- Paid tiers see shorter waits than free tiers
Best Practices
1. Submit What You Need
Don't queue more jobs than you'll actually use. Extra queued jobs don't speed up processing and may hit your rate limits.
2. Use Webhooks
Instead of submitting jobs and polling, use webhooks. This frees up your concurrent slots and lets the system notify you when results are ready.
3. Consider Batching
If you have many similar generations, consider:
- Submitting in smaller batches
- Waiting for each batch to complete
- Checking results before submitting more
This is more efficient than queueing hundreds of jobs at once.
4. Handle Queue Full Errors
If you hit your queue depth limit:
{
"error": {
"code": "queue_full",
"message": "Queue limit reached (5 jobs). Wait for existing jobs to complete."
}
}
Wait for some jobs to complete before submitting more. Don't retry immediately - that just wastes API calls.
Dynamic Weight Reduction
To further prevent noisy neighbors, users with many queued jobs get lower priority:
Effective weight = Base weight ÷ (queue_depth + 1)
| Queued Jobs | Weight Multiplier |
|---|---|
| 1 | 0.5× |
| 2 | 0.33× |
| 4 | 0.2× |
| 8 | 0.11× |
| 16 | 0.06× |
This ensures that users with small queues are processed quickly, while heavy users still make progress but don't monopolize resources.
What's Next?
- Rate Limiting - Understand your tier's limits
- Architecture Overview - How the system works