Can CeylanVienna-based, globally curious.
Learn/AI & Tools

Send a read-only agent first

One agent spent three hours chasing a build error. A second agent read the migrations against the query code in two minutes and found the real bug. The lesson isn't about which AI is smarter — it's about audit-first workflows.

2026-05-06·4 min read·intermediate

What happened

An app was returning a 500 on every request. One AI agent started debugging. Three hours later the page was loading, but the app was still broken — schedule data empty, admin dashboard throwing errors, authenticated queries blocked by RLS even after login.

A second agent was given the same codebase with one constraint: read-only. No edits, no dev server, no fixes. Just read and report.

It came back in two minutes with this:

  • The database column is booking_url. The app queries eversports_url. Silent failure — Supabase returns an empty array, not an error.
  • The database column is date DATE + start_time TIME. The app queries starts_at TIMESTAMPTZ. Every datetime filter in the admin returns nothing.
  • The signup route writes role and tenant_id into user_metadata. The RLS policies read from app_metadata. Every authenticated query is blocked at the DB layer.
  • The instance generator never writes tenant_id to a NOT NULL column. Every insert fails.

Four bugs. All of them invisible from the browser. All of them findable by reading two files side by side.

Why the first agent missed it

The first agent got stuck on a real problem — a poisoned environment variable from another process was crashing the build server. That was genuinely broken and worth fixing. But fixing it revealed a working page with broken data, and by then the agent was in fix-mode: iterating on symptoms, restarting servers, checking logs.

It never stopped to read the migration against the query.

This is a known failure mode in agentic debugging: once an agent starts executing, it develops momentum. Each fix produces a new error. Each new error gets addressed. The agent is always moving forward, never auditing from the start.

A read-only constraint forces the audit. You can't fix anything, so you have to understand everything first.

The schema drift pattern

The specific bug here — code and schema using different names for the same column — happens on almost every project that evolves past its first sprint.

Sprint 1 ships with booking_url. Sprint 5 renames it to eversports_url in the application layer but never writes the migration. Or the migration runs in one environment and not another. Or a developer writes new code against the intended schema before the migration is applied.

The gap is invisible in normal testing because:

  1. Supabase (and most ORMs) return an empty result, not an error, when you query a non-existent column in a select
  2. The page renders — it just renders nothing
  3. You're looking at the frontend, not the migration history

The only reliable way to catch it is to read the migrations and the query code together and check every column name.

The auth metadata pattern

The second bug — writing to user_metadata when RLS reads from app_metadata — is subtler.

In Supabase, user_metadata is writable by the authenticated user. app_metadata is only writable by the service role. RLS policies that check auth.jwt()->'app_metadata'->>'tenant_id' are designed to be tamper-proof: a user can't escalate their own access by editing their metadata.

The implication is that your signup flow and your seed data must both write to app_metadata. If either writes to user_metadata, authenticated queries will silently fail RLS — the user is logged in, the JWT is valid, but the tenant_id the policy checks is in the wrong claim.

This bug produces a deeply confusing symptom: authenticated users can't read their own data. Everything looks correct from the application layer.

Again: only visible if you read the RLS policy and the signup route at the same time.

How to structure multi-agent debugging

When an agentic session has been running for more than an hour without resolving the root cause, the right move is to stop and run a read-only audit before continuing.

Concretely:

1. Freeze the executing agent. Stop iterating on symptoms.

2. Spawn a read-only auditor. Give it the codebase and a specific question: what do the queries expect that the schema doesn't have? What does the auth layer write that the RLS layer doesn't read?

3. Cross-read the contracts. The auditor should compare:

  • Migration files vs. active query code (column names, types, constraints)
  • Auth metadata writes (signup, seed) vs. RLS policy reads (jwt claims)
  • Insert statements vs. NOT NULL constraints

4. Fix from the audit, not from the logs. Once you have the full list of contract violations, fix them all in one migration. Don't patch one and watch what breaks next.

The rule

Before an agent touches anything, it should be able to answer: what does the code expect the database to look like, and does the database actually look like that?

If it can't answer that question, it's guessing. And guessing at 2x speed is still guessing.


/learn candidate

Anonymized title: Send a read-only agent first — audit before you fix
Cluster: ai-tools

More like this, straight to your inbox.

I write about AI & Tools and a handful of other things I actually care about. No schedule, no filler — just when I have something worth saying.

More on AI & Tools

Use a working-memory file as the handoff layer between AI coding sessions

AI coding agents forget everything between sessions. A working-memory.md file kept in the repo solves this — it's the shared brain that survives model switches, overnight gaps, and multi-agent collaboration.

Default-first fallback orchestration for AI generation pipelines

AI generation routes that call a single provider are brittle. Default-first fallback orchestration makes them resilient: try the configured primary, fall back automatically on failure, record what actually ran, and let users override for one run without changing the default.

How to split work across two AI agents without merge conflicts

When two AI agents work on the same codebase in parallel, file-level collisions are inevitable without a deliberate coordination pattern. Protected lanes and explicit ownership boundaries solve this without requiring real-time communication.

Read the broader essay

Article

Is Microsoft the New Nokia?

Everyone thinks Google is losing the AI race. I think the more interesting question is whether Microsoft is quietly becoming the next Nokia.

Article

I Hired Two AI Developers. One Is a Rocket. The Other One Checks the Wiring.

Managing Claude and Codex as a solo founder felt uncomfortably familiar — turns out scaling AI agents has the same team dynamics as scaling a real operations team.

If this raised a question, I'd be happy to talk about it.

Find me →
← Back to Learn