Lazy backfill: roll out new data shapes without a migration script

When you add a new required field to stored records, the instinct is to write a migration script that processes everything upfront. A better approach normalizes on read, detect missing fields, fill them lazily, write once, then stable.

The instinct: write a migration script

You add a new field to a stored record type. Existing records don't have it. The straightforward fix is a migration script: fetch every record, add the field, write it back.

Migration scripts are fine for relational databases with schema enforcement. For document stores, KV stores, and any system where records are read more often than they are written, they create unnecessary operational risk:

The migration runs once, at a moment you choose, and you have to be present for it
If it fails halfway through, you have partial state
You need to coordinate the migration with the deployment of the code that expects the new field
For large record sets, the migration may time out or hit rate limits

The alternative: normalize on read

Instead of migrating upfront, detect missing fields on every read and fill them lazily.

async function normalizeRecord(record: Article): Promise<Article> {
  // New field: distributionState, missing on older records
  let distributionState = record.distributionState;
  if (!distributionState) {
    distributionState = buildDistributionState({
      article: record,
      distributionMap: await getDistributionMap(),
      socialPosts: record.socialPosts,
    });
  }

  // New field: publishedAt, recoverable from legacy signals
  let publishedAt = record.publishedAt;
  if (record.published && !publishedAt) {
    publishedAt = await inferLegacyPublishedAt(record.slug);
  }

  const changed =
    distributionState !== record.distributionState ||
    publishedAt !== record.publishedAt;

  // Only write if something actually changed, prevents infinite write loops
  if (!changed) return record;

  const updated = { ...record, distributionState, publishedAt };
  await kv.set(recordKey(record.slug), updated);
  return updated;
}

The pattern works in three phases:

First read: the field is missing, so it gets computed and written back. One KV write.
Every subsequent read: the field is present. The !changed guard returns early. Zero extra writes.
Rollout complete: after every record has been read at least once, all records are normalized. No migration script was needed.

The changed-guard is critical

Without the changed check, the normalize function writes on every read, even when nothing changed. This turns every GET into a GET + SET, multiplying write load and potentially triggering unnecessary index updates.

Use reference equality (!==) for object fields: if the field was already present and you didn't build a new object, the reference is unchanged, and changed stays false.

For primitive fields (strings, booleans), compare values directly.

When to use this pattern

Good fit:

Adding optional or derivable fields to stored records
Fields that can be inferred from other existing data (timestamps from logs, structured state from legacy flat data)
Systems where reads are frequent and records are accessed regularly

Not a good fit:

Fields that are required immediately at write time and cannot be inferred from existing data
Schema changes that alter how existing fields are interpreted (requires explicit migration)
Relational databases with foreign key constraints

The "what not to rewrite" rule

Lazy backfill can introduce a subtle bug: if the derivation logic changes after some records have already been normalized, early-normalized records will have the old shape while un-normalized records will get the new shape.

The fix: only backfill when the field is completely absent. Never rewrite an existing field just because the derivation logic changed. If the logic needs to change for existing records, that is a deliberate migration decision, not a lazy normalization.

// ✓ Only backfill when missing
if (!record.distributionState) {
  record.distributionState = buildDistributionState(record);
}

// ✗ Don't rewrite existing state just because the build logic changed
// record.distributionState = buildDistributionState(record); // always overwrites

This rule ensures the backfill is idempotent and safe to run in production without supervision.