The problem: duplicate inputs, duplicate costs
A user uploads a product photo. Your application calls an AI vision API to describe it and estimate a price. The API charges per call.
The same user uploads the same photo three times — or ten users upload the same stock image. Without caching, you pay for ten API calls that return identical results.
At low volume this is background noise. At scale, or with expensive models, it becomes a meaningful cost centre.
The solution: content-addressed caching
Instead of caching by filename or user ID, cache by a hash of the actual content:
import hashlib
import json
def get_cache_key(image_bytes: bytes) -> str:
return hashlib.sha256(image_bytes).hexdigest()
def get_ai_result(image_bytes: bytes) -> dict:
key = get_cache_key(image_bytes)
# Check cache first
cached = db.query("SELECT result FROM ai_cache WHERE content_hash = ?", [key])
if cached:
db.execute("UPDATE ai_cache SET hit_count = hit_count + 1, last_hit_at = ? WHERE content_hash = ?",
[now(), key])
return json.loads(cached["result"])
# Cache miss — call the API
result = call_ai_api(image_bytes)
# Store result
db.execute(
"INSERT INTO ai_cache (content_hash, result, created_at, hit_count) VALUES (?, ?, ?, 0)",
[key, json.dumps(result), now()]
)
return result
The cache key is the SHA-256 of the image bytes. Identical images — regardless of filename, upload time, or user — produce the same key and hit the cache.
Why SHA-256 and not filename or URL
Filenames are not unique. photo.jpg from User A and photo.jpg from User B may be different images — or the same image. URLs change when files are moved or re-uploaded.
SHA-256 is a content fingerprint. It changes if and only if the bytes change. Two identical images always produce the same hash. No false cache hits, no false misses.
The cache schema
CREATE TABLE ai_result_cache (
content_hash TEXT PRIMARY KEY, -- SHA-256 hex
result TEXT NOT NULL, -- JSON-encoded API response
created_at TEXT NOT NULL,
hit_count INTEGER DEFAULT 0,
last_hit_at TEXT
);
hit_count and last_hit_at give you analytics — how often is the cache being hit, and which results are most frequently reused. This data is useful for understanding your actual API cost savings.
For text inputs, normalise before hashing
For text-based AI calls (classification, summarisation, extraction), normalise the input before hashing:
def normalise_text(text: str) -> str:
return " ".join(text.lower().split())
def get_text_cache_key(text: str) -> str:
normalised = normalise_text(text)
return hashlib.sha256(normalised.encode()).hexdigest()
This ensures "Hello World" and "hello world" hit the same cache entry. Without normalisation, minor formatting differences produce cache misses for semantically identical inputs.
Cache invalidation
Content-addressed caches have a natural invalidation policy: if the AI model changes and you want fresh results, clear the cache. The content hasn't changed, but your definition of a valid result has.
Add a model_version column to the cache table if you need to maintain multiple model versions simultaneously:
ALTER TABLE ai_result_cache ADD COLUMN model_version TEXT DEFAULT 'v1';
Query with both content_hash and model_version. This lets you run old and new model versions in parallel during a migration.
The ROI
Measure it. SELECT SUM(hit_count) FROM ai_result_cache tells you how many API calls were avoided. Multiply by your per-call cost. The number is usually larger than expected.