Can CeylanVienna-based, globally curious.
Learn/Backend

The data isolation audit: every endpoint must be scoped to the requesting user

The most common multi-tenant security bug is an endpoint that returns the right data for the right user, most of the time. A systematic audit ensures user_id filtering is never accidentally omitted.

2026-04-18·3 min read·intermediate

The bug that's easy to miss in review

You're building a multi-tenant application. A user can see their own data, not anyone else's.

You add a new endpoint:

@router.get("/api/searches")
async def get_searches(db = Depends(get_db)):
    return db.query("SELECT * FROM searches WHERE active = 1")

You test it. It returns your searches. Works correctly, for you, because you're the only user in development.

In production, with multiple users, it returns everyone's searches to everyone. The user_id filter is missing. You have a data isolation breach.

Why this happens

Data isolation failures are rarely intentional. They happen because:

  1. The developer tested with a single user account and didn't observe the failure
  2. The filter was present in some query functions and assumed in others
  3. A refactor removed the filter accidentally
  4. The endpoint was copied from a public-data endpoint and the filter wasn't added

Code review catches many bugs, but data isolation failures are hard to spot unless the reviewer is specifically looking for missing user_id clauses on every query.

The systematic audit

Before shipping any feature, audit every endpoint:

For each endpoint that returns data:
  □ Does the query include WHERE user_id = current_user.id?
  □ OR is this data intentionally public? (document why)
  □ OR is this an admin endpoint? (require admin role check, not just auth)

For admin bypass, the filter must be explicit, not omitted:

# Wrong: admin bypass by omitting the filter
if user.is_admin:
    return db.query("SELECT * FROM searches")
else:
    return db.query("SELECT * FROM searches WHERE user_id = ?", [user.id])

# Right: explicit bypass with documentation
if user.is_admin:
    # Admin can see all searches for support purposes
    return db.query("SELECT * FROM searches ORDER BY created_at DESC")

The distinction matters for auditing. An omitted filter is invisible. An explicit # Admin can see all comment is visible and intentional.

Writing negative tests

For each scoped endpoint, write a test that verifies cross-user access is blocked:

def test_user_cannot_see_other_users_searches(client, user_a, user_b):
    # user_b creates a search
    client.post("/api/searches", json={"keyword": "test"}, headers=auth(user_b))
    
    # user_a fetches searches, should not see user_b's search
    res = client.get("/api/searches", headers=auth(user_a))
    slugs = [s["keyword"] for s in res.json()]
    assert "test" not in slugs

This test fails immediately if the user_id filter is missing. It cannot be accidentally removed during a refactor without breaking the test.

Geolocation and sensitive fields: negative tests for field presence

Some data isolation isn't about users, it's about which fields should never appear in any response:

def test_api_response_contains_no_location_data(client, user):
    res = client.get("/api/products", headers=auth(user))
    for product in res.json():
        assert "location_lat" not in product
        assert "location_lng" not in product
        assert "seller_address" not in product

These negative tests verify that sensitive fields are absent from the response. They're easy to write and catch the "I added a new field to the model and forgot to exclude it from the serialiser" class of bug.

The rule

Every query that touches user data has user_id = current_user.id in its WHERE clause, or a documented, tested reason why it doesn't. There is no middle ground.

More like this, straight to your inbox.

I write about Backend and a handful of other things I actually care about. No schedule, no filler. Just when I have something worth saying.

More on Backend

Batch email sends before rate limits look like caps

A newsletter send to 13 people reported 5 accepted and 8 failed. It looked like a hidden recipient cap. The real problem was parallel API calls hitting a provider rate limit.

Separate the editorial date from the publish timestamp, they mean different things

Content systems routinely conflate two different concepts: the date the author wrote something, and when it was actually published. Treating them as one field causes sorting bugs, broken date displays, and incorrect analytics. They need to be separate from the start.

Layering data sources: accept both APIs as fallback, don't choose one

Financial data from a single free API is unreliable. Layer a secondary source on top, not as a replacement, but as a fallback when the primary returns None. You get resilience without complexity.

If this raised a question, I'd be happy to talk about it.

Find me →
← Back to Learn