Can CeylanVienna-based, globally curious.
Learn/Product Thinking

Category-aware scoring: why one median across mixed data produces noise

Computing a single price median across a mixed dataset produces false positives. Segment by category, subcategory, and relevant attributes before scoring — or your deals aren't deals.

2026-04-18·3 min read·intermediate

The false positive problem

You're building a deal detector. The logic: if the listed price is significantly below the median price for that item, flag it as a deal.

You compute the median across all listings for a search keyword. An item at €25 gets flagged as a deal because the median is €60.

But the €60 median is the median across mixed sizes. Size XL listings average €70; size XS listings average €25. The €25 item is median for its size — not a deal at all. Your detector is producing noise.

Why aggregating across attributes destroys signal

Price is not a single distribution across a product category — it's a family of distributions, one per meaningful attribute combination. Items in the same keyword search may have different:

  • Size: a size W28 jean and a size W38 jean are not comparable price benchmarks
  • Condition: new with tags vs heavily used are different markets
  • Brand tier: luxury vs high street vs fast fashion have different price floors
  • Gender target: men's and women's versions of the same garment often price differently

If you compute a median across all of these, you're averaging apples and oranges. The result is a number that doesn't accurately describe any individual market.

The fix: segment first, score within segment

def compute_segment_key(item: dict) -> str:
    """Create a segment key from the attributes that drive price."""
    size = normalise_size(item.get("size", ""))
    condition = item.get("condition", "unknown")
    brand_tier = classify_brand(item.get("brand", ""))
    return f"{condition}:{size}:{brand_tier}"

def score_item(item: dict, all_items: list[dict]) -> float | None:
    segment = compute_segment_key(item)
    comparable = [i for i in all_items if compute_segment_key(i) == segment]
    
    if len(comparable) < 5:  # not enough data for a reliable median
        return None
    
    median_price = statistics.median(i["price"] for i in comparable)
    if median_price == 0:
        return None
    
    return (median_price - item["price"]) / median_price  # positive = below median

A positive score means the item is priced below its segment's median. Now you're comparing like with like.

Size normalisation is non-trivial

Size labels are inconsistent. "M", "Medium", "38", "EU 38", "UK 10" can all mean the same thing — or different things depending on brand and gender targeting.

Before you can segment by size, you need a normalisation step:

SIZE_MAP = {
    "xs": ["xs", "extra small", "34", "eu 34"],
    "s":  ["s", "small", "36", "eu 36"],
    "m":  ["m", "medium", "38", "eu 38"],
    "l":  ["l", "large", "40", "eu 40"],
    "xl": ["xl", "extra large", "42", "eu 42"],
}

def normalise_size(raw: str) -> str:
    clean = raw.lower().strip()
    for canonical, variants in SIZE_MAP.items():
        if clean in variants:
            return canonical
    return "unknown"

Items with size = "unknown" should be scored against other unknown items — not mixed into the overall population.

The minimum sample threshold

A segment median is only meaningful above a minimum sample size. With 3 items in a segment, one outlier moves the median dramatically.

A practical floor: require at least 5–10 comparable items in a segment before computing a score. Below this threshold, return None and do not display a deal rating. A "no data" result is better than a false one.

What this looks like at category level

For electronics, size is irrelevant — score by condition and brand tier.
For clothing, size, condition, and brand tier all matter.
For collectibles, condition and specific model matter; size is irrelevant.

The attributes that define a segment are category-specific. A generic segmentation key won't work across all categories. You need a per-category definition of what makes two items comparable.

The underlying principle

Before computing any aggregate statistic (median, mean, standard deviation), ask: is this population actually homogeneous? If not, segment it until it is. Statistical measures are only meaningful within comparable groups.

More like this, straight to your inbox.

I write about Product Thinking and a handful of other things I actually care about. No schedule, no filler — just when I have something worth saying.

More on Product Thinking

Two AI agents need one live memory file

If two AI coding agents share a repo but not a single mutable memory layer, the user becomes the message bus. Here is the failure mode, why it happens, and the operating model that fixes it.

The distribution map pattern: one config that drives all publishing outputs

Multi-platform publishing workflows accumulate per-platform hacks over time. The distribution map pattern replaces them with a single declarative config: topic → platform list, which becomes the source of truth for what gets generated and where it goes.

Manual means manual posting, not manual preparation

In multi-platform publishing workflows, 'manual' platforms are often under-served because automation feels unnecessary. But manual posting and manual preparation are different things. Automating the preparation — content, assets, copy — is always worth doing, even when the final post is made by hand.

Read the broader essay

Article

The Silence of Good People: Black Emancipation, European Assimilation, and Why Raising Your Voice Still Costs Something

Austria just cut funding to ZARA, one of the only organisations consistently documenting racism in the country — and the silence from good people is exactly the problem.

Article

The 'Foreigners and Crime' Argument Is Designed to Fail You

When a chancellor says 'little pashas' out loud, the debate isn't really about crime — it's about who gets to be seen as human first.

If this raised a question, I'd be happy to talk about it.

Find me →
← Back to Learn