Skip to main content

Architecture Overview

cmfy.cloud is designed for speed and reliability. Understanding how requests flow through the system helps you make better integration decisions and troubleshoot issues.

The Big Picture

When you submit a workflow, it goes through several stages before reaching a GPU:

Each stage has a specific job:

StageWhat It Does
API GatewayAuthenticates your request, validates the workflow, applies rate limits
RouterFinds the best GPU node based on cached models
Message QueueReliably delivers jobs to nodes (even if they're temporarily busy)
GPU NodeExecutes your ComfyUI workflow and sends results

Asynchronous by Design

cmfy.cloud uses an async job model. Here's why:

  1. Workflows take time - Image generation can take 5-60+ seconds depending on complexity
  2. Resources are shared - Multiple users submit jobs to a pool of GPUs
  3. Reliability matters - Jobs shouldn't be lost if a node goes offline

When you submit a workflow:

Key points:

  • You get a job_id immediately (within milliseconds)
  • The actual execution happens asynchronously
  • Results come via webhook or polling

Request Flow Details

1. API Gateway

The gateway is your entry point. Every request passes through:

The gateway checks:

  • Authentication - Is your API key valid?
  • Validation - Is the workflow properly formatted?
  • Rate limits - Are you within your tier's limits?
  • URL allowlists - Are model URLs from approved sources?

2. Intelligent Routing

The router's job is to minimize cold starts by routing to nodes that already have your models cached.

When a node has your models cached, you skip the download time entirely. See Cache-Aware Routing for details.

3. GPU Node Execution

Once a job reaches a GPU node:

The node:

  1. Checks if required models are on disk
  2. Downloads any missing models
  3. Loads models into GPU memory
  4. Executes the ComfyUI workflow
  5. Uploads output images
  6. Sends results to your webhook

Key Concepts

Jobs Have States

Every job goes through a lifecycle:

StateDescription
queuedWaiting for an available GPU
runningExecuting on a GPU node
completedFinished successfully, results available
failedSomething went wrong
cancelledYou cancelled the job

Webhooks Are Primary

While you can poll for status, webhooks are the preferred way to receive results:

# Your webhook receives:
{
"job_id": "...",
"status": "completed",
"outputs": {
"images": ["https://cdn.cmfy.cloud/..."]
}
}

This is more efficient than polling and gives you results immediately.

Fair Resource Sharing

cmfy.cloud serves multiple users simultaneously. To prevent any single user from monopolizing resources:

  • Per-user queues - Your jobs are scheduled fairly with others
  • Rate limits - Requests per minute and concurrent job limits
  • Priority tiers - Paid plans get faster processing

See Fair Queuing and Rate Limiting for details.

What's Next?

Now that you understand the architecture:

Was this page helpful?