Architecture Overview
cmfy.cloud is designed for speed and reliability. Understanding how requests flow through the system helps you make better integration decisions and troubleshoot issues.
The Big Picture
When you submit a workflow, it goes through several stages before reaching a GPU:
Each stage has a specific job:
| Stage | What It Does |
|---|---|
| API Gateway | Authenticates your request, validates the workflow, applies rate limits |
| Router | Finds the best GPU node based on cached models |
| Message Queue | Reliably delivers jobs to nodes (even if they're temporarily busy) |
| GPU Node | Executes your ComfyUI workflow and sends results |
Asynchronous by Design
cmfy.cloud uses an async job model. Here's why:
- Workflows take time - Image generation can take 5-60+ seconds depending on complexity
- Resources are shared - Multiple users submit jobs to a pool of GPUs
- Reliability matters - Jobs shouldn't be lost if a node goes offline
When you submit a workflow:
Key points:
- You get a
job_idimmediately (within milliseconds) - The actual execution happens asynchronously
- Results come via webhook or polling
Request Flow Details
1. API Gateway
The gateway is your entry point. Every request passes through:
The gateway checks:
- Authentication - Is your API key valid?
- Validation - Is the workflow properly formatted?
- Rate limits - Are you within your tier's limits?
- URL allowlists - Are model URLs from approved sources?
2. Intelligent Routing
The router's job is to minimize cold starts by routing to nodes that already have your models cached.
When a node has your models cached, you skip the download time entirely. See Cache-Aware Routing for details.
3. GPU Node Execution
Once a job reaches a GPU node:
The node:
- Checks if required models are on disk
- Downloads any missing models
- Loads models into GPU memory
- Executes the ComfyUI workflow
- Uploads output images
- Sends results to your webhook
Key Concepts
Jobs Have States
Every job goes through a lifecycle:
| State | Description |
|---|---|
queued | Waiting for an available GPU |
running | Executing on a GPU node |
completed | Finished successfully, results available |
failed | Something went wrong |
cancelled | You cancelled the job |
Webhooks Are Primary
While you can poll for status, webhooks are the preferred way to receive results:
# Your webhook receives:
{
"job_id": "...",
"status": "completed",
"outputs": {
"images": ["https://cdn.cmfy.cloud/..."]
}
}
This is more efficient than polling and gives you results immediately.
Fair Resource Sharing
cmfy.cloud serves multiple users simultaneously. To prevent any single user from monopolizing resources:
- Per-user queues - Your jobs are scheduled fairly with others
- Rate limits - Requests per minute and concurrent job limits
- Priority tiers - Paid plans get faster processing
See Fair Queuing and Rate Limiting for details.
What's Next?
Now that you understand the architecture:
- Workflows - Learn the ComfyUI workflow format
- Cache-Aware Routing - Optimize for faster execution
- Fair Queuing - Understand scheduling
- Rate Limiting - Stay within your limits