What happened
An app was returning a 500 on every request. One AI agent started debugging. Three hours later the page was loading, but the app was still broken — schedule data empty, admin dashboard throwing errors, authenticated queries blocked by RLS even after login.
A second agent was given the same codebase with one constraint: read-only. No edits, no dev server, no fixes. Just read and report.
It came back in two minutes with this:
- The database column is
booking_url. The app querieseversports_url. Silent failure — Supabase returns an empty array, not an error. - The database column is
date DATE+start_time TIME. The app queriesstarts_at TIMESTAMPTZ. Every datetime filter in the admin returns nothing. - The signup route writes
roleandtenant_idintouser_metadata. The RLS policies read fromapp_metadata. Every authenticated query is blocked at the DB layer. - The instance generator never writes
tenant_idto aNOT NULLcolumn. Every insert fails.
Four bugs. All of them invisible from the browser. All of them findable by reading two files side by side.
Why the first agent missed it
The first agent got stuck on a real problem — a poisoned environment variable from another process was crashing the build server. That was genuinely broken and worth fixing. But fixing it revealed a working page with broken data, and by then the agent was in fix-mode: iterating on symptoms, restarting servers, checking logs.
It never stopped to read the migration against the query.
This is a known failure mode in agentic debugging: once an agent starts executing, it develops momentum. Each fix produces a new error. Each new error gets addressed. The agent is always moving forward, never auditing from the start.
A read-only constraint forces the audit. You can't fix anything, so you have to understand everything first.
The schema drift pattern
The specific bug here — code and schema using different names for the same column — happens on almost every project that evolves past its first sprint.
Sprint 1 ships with booking_url. Sprint 5 renames it to eversports_url in the application layer but never writes the migration. Or the migration runs in one environment and not another. Or a developer writes new code against the intended schema before the migration is applied.
The gap is invisible in normal testing because:
- Supabase (and most ORMs) return an empty result, not an error, when you query a non-existent column in a select
- The page renders — it just renders nothing
- You're looking at the frontend, not the migration history
The only reliable way to catch it is to read the migrations and the query code together and check every column name.
The auth metadata pattern
The second bug — writing to user_metadata when RLS reads from app_metadata — is subtler.
In Supabase, user_metadata is writable by the authenticated user. app_metadata is only writable by the service role. RLS policies that check auth.jwt()->'app_metadata'->>'tenant_id' are designed to be tamper-proof: a user can't escalate their own access by editing their metadata.
The implication is that your signup flow and your seed data must both write to app_metadata. If either writes to user_metadata, authenticated queries will silently fail RLS — the user is logged in, the JWT is valid, but the tenant_id the policy checks is in the wrong claim.
This bug produces a deeply confusing symptom: authenticated users can't read their own data. Everything looks correct from the application layer.
Again: only visible if you read the RLS policy and the signup route at the same time.
How to structure multi-agent debugging
When an agentic session has been running for more than an hour without resolving the root cause, the right move is to stop and run a read-only audit before continuing.
Concretely:
1. Freeze the executing agent. Stop iterating on symptoms.
2. Spawn a read-only auditor. Give it the codebase and a specific question: what do the queries expect that the schema doesn't have? What does the auth layer write that the RLS layer doesn't read?
3. Cross-read the contracts. The auditor should compare:
- Migration files vs. active query code (column names, types, constraints)
- Auth metadata writes (signup, seed) vs. RLS policy reads (jwt claims)
- Insert statements vs. NOT NULL constraints
4. Fix from the audit, not from the logs. Once you have the full list of contract violations, fix them all in one migration. Don't patch one and watch what breaks next.
The rule
Before an agent touches anything, it should be able to answer: what does the code expect the database to look like, and does the database actually look like that?
If it can't answer that question, it's guessing. And guessing at 2x speed is still guessing.
/learn candidate
Anonymized title: Send a read-only agent first — audit before you fix
Cluster: ai-tools