The single-provider problem
The simplest AI generation route picks one model and calls it. If the model is unavailable, returns an error, or produces unusable output, the route fails. The user sees an error, retries manually, or gives up.
This is fine for prototypes. In production workflows that run on a schedule or on user demand, single-provider routes become operational risk. Any provider outage, quota exhaustion, or API change breaks every generation that depends on it.
The fallback chain pattern
Instead of one provider, define an ordered list. Try the primary. If it fails, try the first fallback. If that fails, try the next. Record which provider actually delivered.
interface ProviderStrategy {
primary: string;
fallbacks: string[];
}
async function generateWithFallback(
strategy: ProviderStrategy,
generate: (provider: string) => Promise<string>
): Promise<{ result: string; provider: string; fallbackUsed: boolean }> {
const chain = [strategy.primary, ...strategy.fallbacks];
for (let i = 0; i < chain.length; i++) {
const provider = chain[i];
try {
const result = await generate(provider);
return { result, provider, fallbackUsed: i > 0 };
} catch (err) {
if (i === chain.length - 1) throw err; // last in chain — re-raise
console.warn(`Provider ${provider} failed, trying ${chain[i + 1]}`);
}
}
throw new Error("All providers failed");
}
The caller always gets a result and knows which provider delivered it. Fallback is invisible to the user unless they look at the provider label.
Store fallback health
Recording what happened makes the system observable and debuggable:
interface ProviderHealth {
lastSuccessAt?: string;
lastFailureAt?: string;
lastError?: string;
lastResolvedSource?: string; // which provider actually ran
}
After each generation run, write the health record. This lets the admin surface show: "Primary provider last failed 3 days ago. Last run used fallback." Without these records, every failure looks like the first.
The one-run override
The default strategy should be automatic and require no user input. But sometimes you know the primary is going to fail — planned maintenance, quota exhaustion — and you want to skip straight to a specific provider for one run.
The override is a request-time hint, not a settings change:
// Default: use whatever the configured strategy says
POST /api/generate/hero-image
{ slug: "my-article" }
// Override: use this provider for this run only
POST /api/generate/hero-image
{ slug: "my-article", providerOverride: "lummi" }
The override does not change the stored strategy. The next run goes back to the default. This is the distinction between an escape hatch and a settings change — the escape hatch is temporary by design.
Surfacing fallback to the user
When a fallback was used, tell the user — but briefly. They do not need a detailed failure report for a generation that succeeded.
// In the API response
{
url: "https://...",
provider: "lummi",
fallbackUsed: true
}
// In the UI
fallbackUsed
? "Hero image ready (via fallback: Lummi)"
: "Hero image ready"
This is enough to explain why the image looks slightly different from usual without alarming anyone.
What goes in the fallback chain
Good fallback targets:
- A slower but more reliable version of the same provider
- A different provider that produces compatible output
- A
manualsentinel that marks the asset as needing human input, rather than failing silently
// Example chains
heroImage: ["gemini-imagen-3", "lummi", "manual"]
socialText: ["claude-sonnet-4-6", "claude-haiku-4-5", "manual"]
videoClip: ["veo-2", "manual"]
The manual sentinel is important: it means "generation failed but the workflow continues — a human needs to provide this asset." This is better than an error that halts everything.
The product rule: defaults must be one-click
The fallback chain is infrastructure. The user should never have to configure it for a normal run. The only user interaction is the optional one-run override when they have a specific reason to deviate.
If your fallback system requires the user to select a provider before every generation, it has drifted from infrastructure into ceremony. Keep defaults automatic. Keep overrides optional and temporary.