Can CeylanVienna-based, globally curious.
Learn/Product Thinking

Category-aware scoring: why one median across mixed data produces noise

Computing a single price median across a mixed dataset produces false positives. Segment by category, subcategory, and relevant attributes before scoring, or your deals aren't deals.

2026-04-18·3 min read·intermediate

The false positive problem

You're building a deal detector. The logic: if the listed price is significantly below the median price for that item, flag it as a deal.

You compute the median across all listings for a search keyword. An item at €25 gets flagged as a deal because the median is €60.

But the €60 median is the median across mixed sizes. Size XL listings average €70; size XS listings average €25. The €25 item is median for its size, not a deal at all. Your detector is producing noise.

Why aggregating across attributes destroys signal

Price is not a single distribution across a product category, it's a family of distributions, one per meaningful attribute combination. Items in the same keyword search may have different:

  • Size: a size W28 jean and a size W38 jean are not comparable price benchmarks
  • Condition: new with tags vs heavily used are different markets
  • Brand tier: luxury vs high street vs fast fashion have different price floors
  • Gender target: men's and women's versions of the same garment often price differently

If you compute a median across all of these, you're averaging apples and oranges. The result is a number that doesn't accurately describe any individual market.

The fix: segment first, score within segment

def compute_segment_key(item: dict) -> str:
    """Create a segment key from the attributes that drive price."""
    size = normalise_size(item.get("size", ""))
    condition = item.get("condition", "unknown")
    brand_tier = classify_brand(item.get("brand", ""))
    return f"{condition}:{size}:{brand_tier}"

def score_item(item: dict, all_items: list[dict]) -> float | None:
    segment = compute_segment_key(item)
    comparable = [i for i in all_items if compute_segment_key(i) == segment]
    
    if len(comparable) < 5:  # not enough data for a reliable median
        return None
    
    median_price = statistics.median(i["price"] for i in comparable)
    if median_price == 0:
        return None
    
    return (median_price - item["price"]) / median_price  # positive = below median

A positive score means the item is priced below its segment's median. Now you're comparing like with like.

Size normalisation is non-trivial

Size labels are inconsistent. "M", "Medium", "38", "EU 38", "UK 10" can all mean the same thing, or different things depending on brand and gender targeting.

Before you can segment by size, you need a normalisation step:

SIZE_MAP = {
    "xs": ["xs", "extra small", "34", "eu 34"],
    "s":  ["s", "small", "36", "eu 36"],
    "m":  ["m", "medium", "38", "eu 38"],
    "l":  ["l", "large", "40", "eu 40"],
    "xl": ["xl", "extra large", "42", "eu 42"],
}

def normalise_size(raw: str) -> str:
    clean = raw.lower().strip()
    for canonical, variants in SIZE_MAP.items():
        if clean in variants:
            return canonical
    return "unknown"

Items with size = "unknown" should be scored against other unknown items, not mixed into the overall population.

The minimum sample threshold

A segment median is only meaningful above a minimum sample size. With 3 items in a segment, one outlier moves the median dramatically.

A practical floor: require at least 5-10 comparable items in a segment before computing a score. Below this threshold, return None and do not display a deal rating. A "no data" result is better than a false one.

What this looks like at category level

For electronics, size is irrelevant, score by condition and brand tier.
For clothing, size, condition, and brand tier all matter.
For collectibles, condition and specific model matter; size is irrelevant.

The attributes that define a segment are category-specific. A generic segmentation key won't work across all categories. You need a per-category definition of what makes two items comparable.

The underlying principle

Before computing any aggregate statistic (median, mean, standard deviation), ask: is this population actually homogeneous? If not, segment it until it is. Statistical measures are only meaningful within comparable groups.

More like this, straight to your inbox.

I write about Product Thinking and a handful of other things I actually care about. No schedule, no filler. Just when I have something worth saying.

More on Product Thinking

Two AI agents need one live memory file

If two AI coding agents share a repo but not a single mutable memory layer, the user becomes the message bus. Here is the failure mode, why it happens, and the operating model that fixes it.

Manual means manual posting, not manual preparation

In multi-platform publishing workflows, 'manual' platforms are often under-served because automation feels unnecessary. But manual posting and manual preparation are different things. Automating the preparation, content, assets, copy, is always worth doing, even when the final post is made by hand.

The distribution map pattern: one config that drives all publishing outputs

Multi-platform publishing workflows accumulate per-platform hacks over time. The distribution map pattern replaces them with a single declarative config: topic → platform list, which becomes the source of truth for what gets generated and where it goes.

Read the broader essay

Article

The Silence of Good People: Black Emancipation, European Assimilation, and Why Raising Your Voice Still Costs Something

Austria just cut funding to ZARA, one of the only organisations consistently documenting racism in the country, and the silence from good people is exactly the problem.

Article

The 'Foreigners and Crime' Argument Is Designed to Fail You

When a chancellor says 'little pashas' out loud, the debate isn't really about crime, it's about who gets to be seen as human first.

If this raised a question, I'd be happy to talk about it.

Find me →
← Back to Learn