Skip to main content
Image, video, audio, and 3D generation can take anywhere from 1 second to several minutes. The MCP server is designed around this: generate never blocks waiting for the result.

The pattern

┌──────────────┐                         ┌──────────────┐
│  AI client   │                         │  MCP server  │
└──────┬───────┘                         └──────┬───────┘
       │                                        │
       │── generate(tool, input) ─────────────▶│
       │                                        │  enqueue job
       │◀── { job_id, status: "pending" } ─────│  (returns in ~1s)
       │                                        │
       │── check_generation(job_id) ──────────▶│
       │◀── { status: "processing", done:false} │
       │                                        │
       │      ...wait poll_after_seconds...     │
       │                                        │
       │── check_generation(job_id) ──────────▶│
       │◀── { status: "success", output, done:true }
       │                                        │

Why async?

Before this pattern, generate blocked the HTTP connection waiting for the job to finish. For long videos (Veo 3.1 with 8s duration → ~2 minutes), connections would drop at Cloudflare’s 100s edge timeout or the MCP client’s own timeout, and the response would be lost even though the job completed.
By splitting into generate + check_generation, no individual HTTP request lasts more than a few hundred milliseconds — timeouts become irrelevant.

What the server returns

Every generate response includes hints to drive the polling loop:
{
  "job_id": "...",
  "status": "pending",
  "eta_seconds": 120,
  "poll_after_seconds": 10
}
  • eta_seconds — rough estimate; useful to set user expectations.
  • poll_after_seconds — how long your AI should wait before the next check_generation. Clamped to 3–10 seconds.
check_generation returns the same poll_after_seconds while done: false, and drops it when done: true.

How AI clients implement polling

Claude Code, Cursor, and similar clients already know this pattern. When they see poll_after_seconds in a tool response, they sleep and re-call. No client-side code required. If you’re writing a custom MCP client, the pseudocode is:
const { job_id, poll_after_seconds } = await mcp.call('generate', { tool, input });

let result = { done: false };
let interval = poll_after_seconds;

while (!result.done) {
  await sleep(interval * 1000);
  result = await mcp.call('check_generation', { generation_id: job_id });
  interval = result.poll_after_seconds ?? interval;
}

if (result.status === 'success') {
  console.log(result.output);
} else {
  console.error(result.error);
}

Status values

StatusMeaningTerminal?
pendingQueued, not started yetno
processingRunningno
successCompleted, output is the URLyes
errorFailed, error has the messageyes
Keep polling while done: false. Stop when done: true.

Credits

Credits are charged upfront in generate, before the job is queued. If the job later fails, credits are automatically refunded — no action needed. This means:
  • generate can return insufficient_credits immediately if you’re out.
  • If check_generation eventually returns status: "error", your credits are already back.

Webhooks vs polling

If you’re building a backend that uses the REST API, webhooks are better than polling. But MCP clients are interactive: your AI is waiting in the conversation, polling is fine, and there’s no webhook endpoint for a laptop. For production backends: use the REST API with webhooks. For chat / editor workflows: use MCP with polling.

Next steps

Tool reference

Exact shapes for every tool response.

Examples

Example prompts and conversations.