Can CeylanVienna-based, globally curious.
Learn/Backend

Cache AI API results by content hash to prevent cost explosions

Users upload the same image multiple times. AI APIs charge per call. A cache keyed on SHA-256 of the input bytes ensures you pay for each unique input once — not once per upload.

2026-04-18·3 min read·intermediate

The problem: duplicate inputs, duplicate costs

A user uploads a product photo. Your application calls an AI vision API to describe it and estimate a price. The API charges per call.

The same user uploads the same photo three times — or ten users upload the same stock image. Without caching, you pay for ten API calls that return identical results.

At low volume this is background noise. At scale, or with expensive models, it becomes a meaningful cost centre.

The solution: content-addressed caching

Instead of caching by filename or user ID, cache by a hash of the actual content:

import hashlib
import json

def get_cache_key(image_bytes: bytes) -> str:
    return hashlib.sha256(image_bytes).hexdigest()

def get_ai_result(image_bytes: bytes) -> dict:
    key = get_cache_key(image_bytes)
    
    # Check cache first
    cached = db.query("SELECT result FROM ai_cache WHERE content_hash = ?", [key])
    if cached:
        db.execute("UPDATE ai_cache SET hit_count = hit_count + 1, last_hit_at = ? WHERE content_hash = ?",
                   [now(), key])
        return json.loads(cached["result"])
    
    # Cache miss — call the API
    result = call_ai_api(image_bytes)
    
    # Store result
    db.execute(
        "INSERT INTO ai_cache (content_hash, result, created_at, hit_count) VALUES (?, ?, ?, 0)",
        [key, json.dumps(result), now()]
    )
    return result

The cache key is the SHA-256 of the image bytes. Identical images — regardless of filename, upload time, or user — produce the same key and hit the cache.

Why SHA-256 and not filename or URL

Filenames are not unique. photo.jpg from User A and photo.jpg from User B may be different images — or the same image. URLs change when files are moved or re-uploaded.

SHA-256 is a content fingerprint. It changes if and only if the bytes change. Two identical images always produce the same hash. No false cache hits, no false misses.

The cache schema

CREATE TABLE ai_result_cache (
    content_hash TEXT PRIMARY KEY,  -- SHA-256 hex
    result       TEXT NOT NULL,      -- JSON-encoded API response
    created_at   TEXT NOT NULL,
    hit_count    INTEGER DEFAULT 0,
    last_hit_at  TEXT
);

hit_count and last_hit_at give you analytics — how often is the cache being hit, and which results are most frequently reused. This data is useful for understanding your actual API cost savings.

For text inputs, normalise before hashing

For text-based AI calls (classification, summarisation, extraction), normalise the input before hashing:

def normalise_text(text: str) -> str:
    return " ".join(text.lower().split())

def get_text_cache_key(text: str) -> str:
    normalised = normalise_text(text)
    return hashlib.sha256(normalised.encode()).hexdigest()

This ensures "Hello World" and "hello world" hit the same cache entry. Without normalisation, minor formatting differences produce cache misses for semantically identical inputs.

Cache invalidation

Content-addressed caches have a natural invalidation policy: if the AI model changes and you want fresh results, clear the cache. The content hasn't changed, but your definition of a valid result has.

Add a model_version column to the cache table if you need to maintain multiple model versions simultaneously:

ALTER TABLE ai_result_cache ADD COLUMN model_version TEXT DEFAULT 'v1';

Query with both content_hash and model_version. This lets you run old and new model versions in parallel during a migration.

The ROI

Measure it. SELECT SUM(hit_count) FROM ai_result_cache tells you how many API calls were avoided. Multiply by your per-call cost. The number is usually larger than expected.

More like this, straight to your inbox.

I write about Backend and a handful of other things I actually care about. No schedule, no filler — just when I have something worth saying.

If this raised a question, I'd be happy to talk about it.

Find me →
← Back to Learn