The instinct: write a migration script
You add a new field to a stored record type. Existing records don't have it. The straightforward fix is a migration script: fetch every record, add the field, write it back.
Migration scripts are fine for relational databases with schema enforcement. For document stores, KV stores, and any system where records are read more often than they are written, they create unnecessary operational risk:
- The migration runs once, at a moment you choose, and you have to be present for it
- If it fails halfway through, you have partial state
- You need to coordinate the migration with the deployment of the code that expects the new field
- For large record sets, the migration may time out or hit rate limits
The alternative: normalize on read
Instead of migrating upfront, detect missing fields on every read and fill them lazily.
async function normalizeRecord(record: Article): Promise<Article> {
// New field: distributionState, missing on older records
let distributionState = record.distributionState;
if (!distributionState) {
distributionState = buildDistributionState({
article: record,
distributionMap: await getDistributionMap(),
socialPosts: record.socialPosts,
});
}
// New field: publishedAt, recoverable from legacy signals
let publishedAt = record.publishedAt;
if (record.published && !publishedAt) {
publishedAt = await inferLegacyPublishedAt(record.slug);
}
const changed =
distributionState !== record.distributionState ||
publishedAt !== record.publishedAt;
// Only write if something actually changed, prevents infinite write loops
if (!changed) return record;
const updated = { ...record, distributionState, publishedAt };
await kv.set(recordKey(record.slug), updated);
return updated;
}
The pattern works in three phases:
- First read: the field is missing, so it gets computed and written back. One KV write.
- Every subsequent read: the field is present. The
!changedguard returns early. Zero extra writes. - Rollout complete: after every record has been read at least once, all records are normalized. No migration script was needed.
The changed-guard is critical
Without the changed check, the normalize function writes on every read, even when nothing changed. This turns every GET into a GET + SET, multiplying write load and potentially triggering unnecessary index updates.
Use reference equality (!==) for object fields: if the field was already present and you didn't build a new object, the reference is unchanged, and changed stays false.
For primitive fields (strings, booleans), compare values directly.
When to use this pattern
Good fit:
- Adding optional or derivable fields to stored records
- Fields that can be inferred from other existing data (timestamps from logs, structured state from legacy flat data)
- Systems where reads are frequent and records are accessed regularly
Not a good fit:
- Fields that are required immediately at write time and cannot be inferred from existing data
- Schema changes that alter how existing fields are interpreted (requires explicit migration)
- Relational databases with foreign key constraints
The "what not to rewrite" rule
Lazy backfill can introduce a subtle bug: if the derivation logic changes after some records have already been normalized, early-normalized records will have the old shape while un-normalized records will get the new shape.
The fix: only backfill when the field is completely absent. Never rewrite an existing field just because the derivation logic changed. If the logic needs to change for existing records, that is a deliberate migration decision, not a lazy normalization.
// ✓ Only backfill when missing
if (!record.distributionState) {
record.distributionState = buildDistributionState(record);
}
// ✗ Don't rewrite existing state just because the build logic changed
// record.distributionState = buildDistributionState(record); // always overwrites
This rule ensures the backfill is idempotent and safe to run in production without supervision.