Can CeylanVienna-based, globally curious.
Learn/Architecture

Lazy backfill: roll out new data shapes without a migration script

When you add a new required field to stored records, the instinct is to write a migration script that processes everything upfront. A better approach normalizes on read, detect missing fields, fill them lazily, write once, then stable.

2026-04-25·3 min read·intermediate

The instinct: write a migration script

You add a new field to a stored record type. Existing records don't have it. The straightforward fix is a migration script: fetch every record, add the field, write it back.

Migration scripts are fine for relational databases with schema enforcement. For document stores, KV stores, and any system where records are read more often than they are written, they create unnecessary operational risk:

  • The migration runs once, at a moment you choose, and you have to be present for it
  • If it fails halfway through, you have partial state
  • You need to coordinate the migration with the deployment of the code that expects the new field
  • For large record sets, the migration may time out or hit rate limits

The alternative: normalize on read

Instead of migrating upfront, detect missing fields on every read and fill them lazily.

async function normalizeRecord(record: Article): Promise<Article> {
  // New field: distributionState, missing on older records
  let distributionState = record.distributionState;
  if (!distributionState) {
    distributionState = buildDistributionState({
      article: record,
      distributionMap: await getDistributionMap(),
      socialPosts: record.socialPosts,
    });
  }

  // New field: publishedAt, recoverable from legacy signals
  let publishedAt = record.publishedAt;
  if (record.published && !publishedAt) {
    publishedAt = await inferLegacyPublishedAt(record.slug);
  }

  const changed =
    distributionState !== record.distributionState ||
    publishedAt !== record.publishedAt;

  // Only write if something actually changed, prevents infinite write loops
  if (!changed) return record;

  const updated = { ...record, distributionState, publishedAt };
  await kv.set(recordKey(record.slug), updated);
  return updated;
}

The pattern works in three phases:

  1. First read: the field is missing, so it gets computed and written back. One KV write.
  2. Every subsequent read: the field is present. The !changed guard returns early. Zero extra writes.
  3. Rollout complete: after every record has been read at least once, all records are normalized. No migration script was needed.

The changed-guard is critical

Without the changed check, the normalize function writes on every read, even when nothing changed. This turns every GET into a GET + SET, multiplying write load and potentially triggering unnecessary index updates.

Use reference equality (!==) for object fields: if the field was already present and you didn't build a new object, the reference is unchanged, and changed stays false.

For primitive fields (strings, booleans), compare values directly.

When to use this pattern

Good fit:

  • Adding optional or derivable fields to stored records
  • Fields that can be inferred from other existing data (timestamps from logs, structured state from legacy flat data)
  • Systems where reads are frequent and records are accessed regularly

Not a good fit:

  • Fields that are required immediately at write time and cannot be inferred from existing data
  • Schema changes that alter how existing fields are interpreted (requires explicit migration)
  • Relational databases with foreign key constraints

The "what not to rewrite" rule

Lazy backfill can introduce a subtle bug: if the derivation logic changes after some records have already been normalized, early-normalized records will have the old shape while un-normalized records will get the new shape.

The fix: only backfill when the field is completely absent. Never rewrite an existing field just because the derivation logic changed. If the logic needs to change for existing records, that is a deliberate migration decision, not a lazy normalization.

// ✓ Only backfill when missing
if (!record.distributionState) {
  record.distributionState = buildDistributionState(record);
}

// ✗ Don't rewrite existing state just because the build logic changed
// record.distributionState = buildDistributionState(record); // always overwrites

This rule ensures the backfill is idempotent and safe to run in production without supervision.

More like this, straight to your inbox.

I write about Architecture and a handful of other things I actually care about. No schedule, no filler. Just when I have something worth saying.

More on Architecture

Make workflow state additive before making it authoritative

When migrating a system to a new state model, the instinct is to replace the old state immediately. The safer path is additive rollout: the new state coexists with the old, falls back to it when absent, and only becomes authoritative once it is proven in production.

Preventing a single channel from becoming the accidental default in multi-channel systems

In multi-platform publishing and notification systems, whichever channel was implemented first tends to become the silent default. Other channels get skipped without error. The fix is making required outputs explicit from the start, not implicit from what exists.

Catch data conflicts before code review, not after

Two features touching the same database table is a conflict waiting to happen. A feature registry and a mandatory conflict check forces the conversation before the code is written.

If this raised a question, I'd be happy to talk about it.

Find me →
← Back to Learn