Architecture Overview

cmfy.cloud is designed for speed and reliability. Understanding how requests flow through the system helps you make better integration decisions and troubleshoot issues.

The Big Picture

When you submit a workflow, it goes through several stages before reaching a GPU:

Each stage has a specific job:

Stage	What It Does
API Gateway	Authenticates your request, validates the workflow, applies rate limits
Router	Finds the best GPU node based on cached models
Message Queue	Reliably delivers jobs to nodes (even if they're temporarily busy)
GPU Node	Executes your ComfyUI workflow and sends results

Asynchronous by Design

cmfy.cloud uses an async job model. Here's why:

Workflows take time - Image generation can take 5-60+ seconds depending on complexity
Resources are shared - Multiple users submit jobs to a pool of GPUs
Reliability matters - Jobs shouldn't be lost if a node goes offline

When you submit a workflow:

Key points:

You get a job_id immediately (within milliseconds)
The actual execution happens asynchronously
Results come via webhook or polling

Request Flow Details

1. API Gateway

The gateway is your entry point. Every request passes through:

The gateway checks:

Authentication - Is your API key valid?
Validation - Is the workflow properly formatted?
Rate limits - Are you within your tier's limits?

2. Intelligent Routing

The router's job is to minimize cold starts by routing to nodes that already have your models cached.

When a node has your models cached, you skip the download time entirely. See Cache-Aware Routing for details.

3. GPU Node Execution

Once a job reaches a GPU node:

The node:

Checks if required models are on disk
Downloads any missing models
Loads models into GPU memory
Executes the ComfyUI workflow
Uploads output images
Sends results to your webhook

Key Concepts

Jobs Have States

Every job goes through a lifecycle:

State	Description
`queued`	Waiting for an available GPU
`running`	Executing on a GPU node
`completed`	Finished successfully, results available
`failed`	Something went wrong
`cancelled`	You cancelled the job

Webhooks Are Primary

While you can poll for status, webhooks are the preferred way to receive results:

# Your webhook receives:
{
  "job_id": "...",
  "status": "completed",
  "outputs": {
    "images": ["https://cdn.cmfy.cloud/..."]
  }
}

This is more efficient than polling and gives you results immediately.

cmfy.cloud serves multiple users simultaneously. To prevent any single user from monopolizing resources:

Per-user queues - Your jobs are scheduled fairly with others
Rate limits - Requests per minute and concurrent job limits
Priority tiers - Paid plans get faster processing

See Fair Queuing and Rate Limiting for details.

What's Next?

Now that you understand the architecture:

Workflows - Learn the ComfyUI workflow format
Cache-Aware Routing - Optimize for faster execution
Fair Queuing - Understand scheduling
Rate Limiting - Stay within your limits

Was this page helpful?

The Big Picture​

Asynchronous by Design​

Request Flow Details​

1. API Gateway​

2. Intelligent Routing​

3. GPU Node Execution​

Key Concepts​

Jobs Have States​

Webhooks Are Primary​

Fair Resource Sharing​

What's Next?​