Skip to main content

Fair Queuing

cmfy.cloud serves many users with a shared pool of GPUs. Fair queuing ensures everyone gets reasonable access, even when the system is busy.

The Problem: Noisy Neighbors

Without protection, a single user could:

  • Submit thousands of jobs at once
  • Fill the entire processing queue
  • Monopolize all GPU resources
  • Cause long waits for everyone else

This is the "noisy neighbor" problem - one heavy user degrades the experience for everyone.

The Solution: Per-User Fair Queues

Instead of a single global queue, cmfy.cloud uses per-user queues with fair scheduling:

The scheduler cycles through active users, giving each one a turn. This prevents any single user from blocking others.

How It Works

Round-Robin Scheduling

Jobs are processed in a rotating order across users:

TurnUserJob Processed
1User AJob 1
2User BJob 1
3User CJob 1
4User AJob 2
5User CJob 2
6User AJob 3
7User AJob 4
8User AJob 5

Notice how:

  • User B had 1 job - processed on turn 2
  • User C had 2 jobs - processed on turns 3 and 5
  • User A had 5 jobs - spread across multiple turns

Even though User A submitted the most jobs, Users B and C weren't blocked.

Your Position in Queue

When you submit a job, you receive a queue position:

{
"job_id": "...",
"status": "queued",
"queue_position": 3,
"estimated_wait_seconds": 45
}

This is your position in your personal queue, not the global queue. Your actual wait time depends on:

  • How many other users have queued jobs
  • How many jobs are ahead of you in your queue
  • Current GPU availability

Priority Tiers

Paid plans get priority processing through weighted scheduling:

PlanPriority LevelEffective Weight
EnterpriseHigh2× more likely to be selected
ProNormal1.5× more likely to be selected
FreeLowBaseline

Higher-weight users get more "turns" in the round-robin, resulting in faster processing.

What to Expect Under Load

Light Load (Most of the Time)

When the system isn't busy:

  • Jobs start almost immediately
  • Queues are typically empty
  • Wait times are minimal

Moderate Load

During busy periods:

  • Some queuing occurs
  • Fair scheduling kicks in
  • Wait times may be 30-60 seconds

High Load (Rare)

During peak demand:

  • Longer queues form
  • Fair scheduling ensures everyone makes progress
  • Paid tiers see shorter waits than free tiers

Best Practices

1. Submit What You Need

Don't queue more jobs than you'll actually use. Extra queued jobs don't speed up processing and may hit your rate limits.

2. Use Webhooks

Instead of submitting jobs and polling, use webhooks. This frees up your concurrent slots and lets the system notify you when results are ready.

3. Consider Batching

If you have many similar generations, consider:

  • Submitting in smaller batches
  • Waiting for each batch to complete
  • Checking results before submitting more

This is more efficient than queueing hundreds of jobs at once.

4. Handle Queue Full Errors

If you hit your queue depth limit:

{
"error": {
"code": "queue_full",
"message": "Queue limit reached (5 jobs). Wait for existing jobs to complete."
}
}

Wait for some jobs to complete before submitting more. Don't retry immediately - that just wastes API calls.

Dynamic Weight Reduction

To further prevent noisy neighbors, users with many queued jobs get lower priority:

Effective weight = Base weight ÷ (queue_depth + 1)
Queued JobsWeight Multiplier
10.5×
20.33×
40.2×
80.11×
160.06×

This ensures that users with small queues are processed quickly, while heavy users still make progress but don't monopolize resources.

What's Next?

Was this page helpful?