Fair Queuing

cmfy.cloud serves many users with a shared pool of GPUs. Fair queuing ensures everyone gets reasonable access, even when the system is busy.

The Problem: Noisy Neighbors

Without protection, a single user could:

Submit thousands of jobs at once
Fill the entire processing queue
Monopolize all GPU resources
Cause long waits for everyone else

This is the "noisy neighbor" problem - one heavy user degrades the experience for everyone.

The Solution: Per-User Fair Queues

Instead of a single global queue, cmfy.cloud uses per-user queues with fair scheduling:

The scheduler cycles through active users, giving each one a turn. This prevents any single user from blocking others.

How It Works

Round-Robin Scheduling

Jobs are processed in a rotating order across users:

Turn	User	Job Processed
1	User A	Job 1
2	User B	Job 1
3	User C	Job 1
4	User A	Job 2
5	User C	Job 2
6	User A	Job 3
7	User A	Job 4
8	User A	Job 5

Notice how:

User B had 1 job - processed on turn 2
User C had 2 jobs - processed on turns 3 and 5
User A had 5 jobs - spread across multiple turns

Even though User A submitted the most jobs, Users B and C weren't blocked.

Your Position in Queue

When you submit a job, it enters your personal queue:

{
  "job_id": "...",
  "status": "queued",
  "created_at": "2024-01-15T10:30:00Z"
}

Your actual wait time depends on:

How many other users have queued jobs
How many jobs are ahead of you in your queue
Current GPU availability

Priority Tiers

Paid plans get priority processing through weighted scheduling:

Plan	Priority Level	Effective Weight
Enterprise	High	2× more likely to be selected
Pro	Normal	1.5× more likely to be selected
Free	Low	Baseline

Higher-weight users get more "turns" in the round-robin, resulting in faster processing.

What to Expect Under Load

Light Load (Most of the Time)

When the system isn't busy:

Jobs start almost immediately
Queues are typically empty
Wait times are minimal

Moderate Load

During busy periods:

Some queuing occurs
Fair scheduling kicks in
Wait times may be 30-60 seconds

High Load (Rare)

During peak demand:

Longer queues form
Fair scheduling ensures everyone makes progress
Paid tiers see shorter waits than free tiers

Best Practices

1. Submit What You Need

Don't queue more jobs than you'll actually use. Extra queued jobs don't speed up processing and may hit your rate limits.

2. Use Webhooks

Instead of submitting jobs and polling, use webhooks. This frees up your concurrent slots and lets the system notify you when results are ready.

3. Consider Batching

If you have many similar generations, consider:

Submitting in smaller batches
Waiting for each batch to complete
Checking results before submitting more

This is more efficient than queueing hundreds of jobs at once.

4. Handle Queue Full Errors

If you hit your queue depth limit:

{
  "error": {
    "code": "queue_full",
    "message": "Queue limit reached (5 jobs). Wait for existing jobs to complete."
  }
}

Wait for some jobs to complete before submitting more. Don't retry immediately - that just wastes API calls.

Dynamic Weight Reduction

To further prevent noisy neighbors, users with many queued jobs get lower priority:

Effective weight = Base weight ÷ (queue_depth + 1)

Queued Jobs	Weight Multiplier
1	0.5×
2	0.33×
4	0.2×
8	0.11×
16	0.06×

This ensures that users with small queues are processed quickly, while heavy users still make progress but don't monopolize resources.

What's Next?

Rate Limiting - Understand your tier's limits
Architecture Overview - How the system works

Was this page helpful?

The Problem: Noisy Neighbors​

The Solution: Per-User Fair Queues​

How It Works​

Round-Robin Scheduling​

Your Position in Queue​

Priority Tiers​

What to Expect Under Load​

Light Load (Most of the Time)​

Moderate Load​

High Load (Rare)​

Best Practices​

1. Submit What You Need​

2. Use Webhooks​

3. Consider Batching​

4. Handle Queue Full Errors​

Dynamic Weight Reduction​

What's Next?​