Category-aware scoring: why one median across mixed data produces noise

Computing a single price median across a mixed dataset produces false positives. Segment by category, subcategory, and relevant attributes before scoring — or your deals aren't deals.

The false positive problem

You're building a deal detector. The logic: if the listed price is significantly below the median price for that item, flag it as a deal.

You compute the median across all listings for a search keyword. An item at €25 gets flagged as a deal because the median is €60.

But the €60 median is the median across mixed sizes. Size XL listings average €70; size XS listings average €25. The €25 item is median for its size — not a deal at all. Your detector is producing noise.

Why aggregating across attributes destroys signal

Price is not a single distribution across a product category — it's a family of distributions, one per meaningful attribute combination. Items in the same keyword search may have different:

Size: a size W28 jean and a size W38 jean are not comparable price benchmarks
Condition: new with tags vs heavily used are different markets
Brand tier: luxury vs high street vs fast fashion have different price floors
Gender target: men's and women's versions of the same garment often price differently

If you compute a median across all of these, you're averaging apples and oranges. The result is a number that doesn't accurately describe any individual market.

The fix: segment first, score within segment

def compute_segment_key(item: dict) -> str:
    """Create a segment key from the attributes that drive price."""
    size = normalise_size(item.get("size", ""))
    condition = item.get("condition", "unknown")
    brand_tier = classify_brand(item.get("brand", ""))
    return f"{condition}:{size}:{brand_tier}"

def score_item(item: dict, all_items: list[dict]) -> float | None:
    segment = compute_segment_key(item)
    comparable = [i for i in all_items if compute_segment_key(i) == segment]
    
    if len(comparable) < 5:  # not enough data for a reliable median
        return None
    
    median_price = statistics.median(i["price"] for i in comparable)
    if median_price == 0:
        return None
    
    return (median_price - item["price"]) / median_price  # positive = below median

A positive score means the item is priced below its segment's median. Now you're comparing like with like.

Size normalisation is non-trivial

Size labels are inconsistent. "M", "Medium", "38", "EU 38", "UK 10" can all mean the same thing — or different things depending on brand and gender targeting.

Before you can segment by size, you need a normalisation step:

SIZE_MAP = {
    "xs": ["xs", "extra small", "34", "eu 34"],
    "s":  ["s", "small", "36", "eu 36"],
    "m":  ["m", "medium", "38", "eu 38"],
    "l":  ["l", "large", "40", "eu 40"],
    "xl": ["xl", "extra large", "42", "eu 42"],
}

def normalise_size(raw: str) -> str:
    clean = raw.lower().strip()
    for canonical, variants in SIZE_MAP.items():
        if clean in variants:
            return canonical
    return "unknown"

Items with size = "unknown" should be scored against other unknown items — not mixed into the overall population.

The minimum sample threshold

A segment median is only meaningful above a minimum sample size. With 3 items in a segment, one outlier moves the median dramatically.

A practical floor: require at least 5–10 comparable items in a segment before computing a score. Below this threshold, return None and do not display a deal rating. A "no data" result is better than a false one.

What this looks like at category level

For electronics, size is irrelevant — score by condition and brand tier.
For clothing, size, condition, and brand tier all matter.
For collectibles, condition and specific model matter; size is irrelevant.

The attributes that define a segment are category-specific. A generic segmentation key won't work across all categories. You need a per-category definition of what makes two items comparable.

The underlying principle

Before computing any aggregate statistic (median, mean, standard deviation), ask: is this population actually homogeneous? If not, segment it until it is. Statistical measures are only meaningful within comparable groups.