Summary
The priceHistogram extra-result currently treats every entity using PriceInnerRecordHandling.LOWEST_PRICE as a single data point — the entity's price-for-sale (i.e., the lowest winning price across its inner records). This collapses all per-inner-record price variation into one bucket contribution and produces a histogram that does not reflect the actual price distribution of the catalog.
This issue tracks fixing this so that for LOWEST_PRICE entities the histogram receives one data point per inner-record-id (selected by price-list priority, after individualPricePredicate is applied). NONE (all matching prices per entity) and SUM (one cumulated price per entity) keep their current semantics.
Why this is a bug, not a new mode
A LOWEST_PRICE entity (e.g. a master product with multiple variants) carries a family of prices — one per inner-record-id. Each inner-record price is a real, sellable price that the storefront can render. The histogram is supposed to answer the question "what prices are reachable in this candidate pool?" — a UI that uses it to draw a price-slider needs to see those per-variant prices, not just one representative value per master.
Today the histogram answers a strictly narrower question: "what are the prices-for-sale of the matching entities?". That is an artifact of code reuse (the same LowestPriceTerminationFormula.filteredPriceRecords is used for filtering, sorting, and histogram), not an intentional design choice. There is no requirement or use-case where the current behavior is the correct one — the bucket distribution it produces is systematically biased toward the cheapest variant of each entity.
Therefore: this is a defect in the histogram, not a new opt-in behavior. We will not add a new HistogramBehavior enum value or a request-level switch. The behavior simply changes — and is published as a breaking semantic change in the release notes (bucket counts and thresholds against LOWEST_PRICE catalogs will shift after upgrade; clients with hard-coded expectations must re-baseline).
Business goals
- Storefront UIs that render a price slider from
priceHistogram show representative price distributions for catalogs of master/variant products, not "one price per master" distributions that mask the spread.
- Faceted price filtering driven by histogram output produces buckets that correspond to actual purchasable prices rather than an arbitrary projection.
- Consistency with
SUM and NONE: each strategy contributes the prices that genuinely characterize the entity in that handling mode.
Current behavior — verified evidence
PriceHistogramTranslator collects FilteredPriceRecordAccessor instances with LookUp.SHALLOW, so it stops at LowestPriceTerminationFormula near the top of the price subtree.
LowestPriceTerminationFormula.computeInternal() already builds a per-inner-record map keyed by innerRecordId, populated in price-list priority order and gated by individualPricePredicate. It then collapses that map to a single lowestPrice per entity and stores only that one in filteredPriceRecords. The per-inner-record values — exactly the data the histogram needs — are computed and discarded today; the fix exposes them rather than recomputing them.
PriceHistogramComputer.getPriceRecords() consumes the entity-level array via FilteredPriceRecordsCollector, or reuses the FilteredPricesSorter.getPriceRecordsLookupResult() when a sorter is present (also entity-level only).
Cache flattening payloads involved:
FlattenedFormulaWithFilteredPrices — emitted by PlainPriceTerminationFormulaWithPriceFilter (NONE, SUM). Not touched by this issue.
FlattenedFormulaWithFilteredPricesAndFilteredOutRecords — emitted by LowestPriceTerminationFormula. Stays as-is for non-histogram queries; a new sibling class is introduced for histogram queries (see mechanics below).
Required behavior
For each scope/scenario:
PriceInnerRecordHandling |
Histogram data points per entity (after change) |
NONE |
All matching prices (unchanged) |
SUM |
One cumulated price per entity (unchanged) |
LOWEST_PRICE |
One per inner-record-id — the per-inner-record winner selected by price-list priority, after individualPricePredicate |
UNKNOWN |
n/a (data not fetched) |
Rough implementation mechanics
This is a sketch — a detailed plan will be authored from this assignment.
1. Planner-driven flag on LowestPriceTerminationFormula
Add a constructor parameter boolean collectPerInnerRecordPrices. Set to true by the planner only when PriceHistogramTranslator.createProducer(...) runs — i.e., only when a priceHistogram requirement is present. The flag participates in the formula's getHash() so cached and live variants don't collide.
This is the load-bearing piece for performance: it ensures the 99% of LOWEST_PRICE queries that don't request a histogram pay zero extra cost.
2. Side-output funnel populated only when flag is true
Inside computeInternal(), while we already iterate entityInnerRecordPrice.values() per entity to find the lowest, also append every value (per-inner-record winner) to a second CompositeObjectArray<PriceRecordContract> — gated on the flag. After the loop, materialize into a new ResolvedFilteredPriceRecords perInnerRecordPriceRecords field parallel to filteredPriceRecords. The entityInnerRecordPrice values already respect individualPricePredicate (line 355), so the side-output inherits that filter for free. sellingPricePredicate is not applied — it's the price-between user filter, already relaxed by UserFilterRelaxer.relax in PriceHistogramProducer.fabricate(...).
3. New accessor method
Add FilteredPriceRecords getFilteredPriceRecordsForHistogram(QueryExecutionContext ctx) to FilteredPriceRecordAccessor with a default delegating to getFilteredPriceRecords(...). Override only on:
LowestPriceTerminationFormula — returns the new side-output when flag set.
- The new flattened cache class (see step 5).
PlainPriceTerminationFormulaWithPriceFilter (NONE, SUM) and SumPriceTerminationFormula use the default → no behavior change for them.
4. PriceHistogramComputer rewired
For the simple branch (no filteringFormulaWithFilteredOutRecords): skip FilteredPriceRecordsCollector entirely for LowestPriceTerminationFormula accessors and read accessor.getFilteredPriceRecordsForHistogram(ctx).getPriceRecords() directly. The funnel only contains entities that already passed the predicate (lines 379-386 only push when anyPriceMatchesTheFilter), so no second walk is needed.
For the relaxed-tree branch: use a new histogram-flavored collector (collectFilteredPriceRecordsFromPriceRecordAccessorsForHistogram) on the remainder bitmap only.
The priceRecordsLookupResult shortcut (when a FilteredPricesSorter is present) must be skipped for LOWEST_PRICE, because the sorter's cached lookup holds entity-level prices only.
5. New cache flattening class — not an extension
Introduce a new sibling class to FlattenedFormulaWithFilteredPricesAndFilteredOutRecords (working name: FlattenedFormulaWithFilteredPricesForHistogram). It carries an additional PriceRecord[] perInnerRecordPriceRecords array and overrides the new accessor method to return it.
LowestPriceTerminationFormula.toSerializableFormula(...) chooses which flattened payload to emit based on the flag:
- Flag
false → existing FlattenedFormulaWithFilteredPricesAndFilteredOutRecords (untouched, no serialization-version bump, no eviction).
- Flag
true → new FlattenedFormulaWithFilteredPricesForHistogram.
The new class owns its own size estimator that charges perInnerRecordPriceRecords.length * PRICE_RECORD_SIZE correctly so the formula cache byte budget is not silently exceeded.
Net effect on cache:
- Non-histogram
LOWEST_PRICE queries: zero change. Existing cache entries remain valid across upgrade.
- Histogram
LOWEST_PRICE queries: separate cache slot (different hash because of the flag), larger payload that correctly reflects what's stored.
6. Tests
- Per-handling-mode functional test: an entity with N inner-record prices under
LOWEST_PRICE produces N histogram data points; SUM produces one cumulated value; NONE produces all matching prices.
- Mixed-catalog test: histogram bucket counts/thresholds across mixed handling modes match expectations.
- Cache test: histogram query is correctly served from cache on second invocation; non-histogram query against the same filter still hits the unchanged cache slot.
- Audit existing histogram tests against
LOWEST_PRICE fixtures — many will need updated expectations.
7. Release notes / breaking change entry
Mark as breaking change. Document that priceHistogram over LOWEST_PRICE catalogs now reflects per-inner-record prices, and bucket distributions will shift. Storefronts with hardcoded bucket expectations or end-to-end snapshot tests must rebaseline.
Performance considerations
LowestPriceTerminationFormula.computeInternal() is on the hot path of every priceForSale-bearing query, not just histogram queries. The implementation plan must be vigilant about:
- Memory allocations. The side-output adds up to N entries per entity (N = number of inner records) where today there is 1. Allocation patterns must amortize cleanly — chunked buffers over growing/copying arrays, reuse of intermediate maps already in scope, reference sharing instead of deep copies.
- Algorithmic complexity. The fix must not add a second pass over the entity set or duplicate the per-inner-record iteration already happening. The data needed for the histogram is computed today and discarded — the goal is to expose it, not recompute it.
- No cost when not asked for. Queries that don't request a
priceHistogram must pay zero extra cost (no extra allocations, no extra branches in the inner loop, no growth of the cached formula payload). This drives the planner-flag and separate-flattened-class decisions above.
- Cache impact. The formula cache is sized in bytes; any payload growth must be reflected in the size estimator. Existing cache entries for non-histogram queries must remain valid across upgrade — no serialization-version bump on shared types.
A focused performance review of the initial sketch produced the design choices documented in the mechanics section (planner flag, direct read of side-output, new sibling flattened class). The implementation plan must preserve those invariants and validate them with allocation profiling and microbenchmarks on a representative LOWEST_PRICE catalog before merge.
Alternatives explicitly considered and rejected
- Flat
int[] + per-entity offsets instead of PriceRecordContract[]. Rejected: the histogram cruncher consumes both price and entity PK from PriceRecordContract; refactoring its API saves negligible memory given reference copies are 8 bytes.
- Always populate the side-output, skip the flag. Rejected: regresses every non-histogram
LOWEST_PRICE query.
- Extend the existing flattened class with an optional field. Rejected in favor of a new sibling class — keeps the non-histogram cache path byte-for-byte identical.
- Cache-miss-on-histogram recompute instead of carrying the per-inner-record array in the cache payload. Rejected: defeats the formula cache for histogram queries entirely.
Acceptance criteria
- For
LOWEST_PRICE entities, the histogram reflects per-inner-record winning prices (one per inner-record-id, selected by price-list priority, after individualPricePredicate).
NONE and SUM behavior is unchanged (regression-tested).
- Non-histogram
LOWEST_PRICE queries show no measurable latency or allocation regression in benchmarks.
- Formula cache entries for non-histogram
LOWEST_PRICE queries are unchanged (no serialization-version bump on FlattenedFormulaWithFilteredPricesAndFilteredOutRecords).
- Formula cache size estimator for the new histogram-flavored flattened class correctly charges the per-inner-record array.
- Release notes include a breaking-change entry documenting the shift in histogram buckets for
LOWEST_PRICE catalogs.
Summary
The
priceHistogramextra-result currently treats every entity usingPriceInnerRecordHandling.LOWEST_PRICEas a single data point — the entity's price-for-sale (i.e., the lowest winning price across its inner records). This collapses all per-inner-record price variation into one bucket contribution and produces a histogram that does not reflect the actual price distribution of the catalog.This issue tracks fixing this so that for
LOWEST_PRICEentities the histogram receives one data point per inner-record-id (selected by price-list priority, afterindividualPricePredicateis applied).NONE(all matching prices per entity) andSUM(one cumulated price per entity) keep their current semantics.Why this is a bug, not a new mode
A
LOWEST_PRICEentity (e.g. a master product with multiple variants) carries a family of prices — one per inner-record-id. Each inner-record price is a real, sellable price that the storefront can render. The histogram is supposed to answer the question "what prices are reachable in this candidate pool?" — a UI that uses it to draw a price-slider needs to see those per-variant prices, not just one representative value per master.Today the histogram answers a strictly narrower question: "what are the prices-for-sale of the matching entities?". That is an artifact of code reuse (the same
LowestPriceTerminationFormula.filteredPriceRecordsis used for filtering, sorting, and histogram), not an intentional design choice. There is no requirement or use-case where the current behavior is the correct one — the bucket distribution it produces is systematically biased toward the cheapest variant of each entity.Therefore: this is a defect in the histogram, not a new opt-in behavior. We will not add a new
HistogramBehaviorenum value or a request-level switch. The behavior simply changes — and is published as a breaking semantic change in the release notes (bucket counts and thresholds againstLOWEST_PRICEcatalogs will shift after upgrade; clients with hard-coded expectations must re-baseline).Business goals
priceHistogramshow representative price distributions for catalogs of master/variant products, not "one price per master" distributions that mask the spread.SUMandNONE: each strategy contributes the prices that genuinely characterize the entity in that handling mode.Current behavior — verified evidence
PriceHistogramTranslatorcollectsFilteredPriceRecordAccessorinstances withLookUp.SHALLOW, so it stops atLowestPriceTerminationFormulanear the top of the price subtree.LowestPriceTerminationFormula.computeInternal()already builds a per-inner-record map keyed byinnerRecordId, populated in price-list priority order and gated byindividualPricePredicate. It then collapses that map to a singlelowestPriceper entity and stores only that one infilteredPriceRecords. The per-inner-record values — exactly the data the histogram needs — are computed and discarded today; the fix exposes them rather than recomputing them.PriceHistogramComputer.getPriceRecords()consumes the entity-level array viaFilteredPriceRecordsCollector, or reuses theFilteredPricesSorter.getPriceRecordsLookupResult()when a sorter is present (also entity-level only).Cache flattening payloads involved:
FlattenedFormulaWithFilteredPrices— emitted byPlainPriceTerminationFormulaWithPriceFilter(NONE, SUM). Not touched by this issue.FlattenedFormulaWithFilteredPricesAndFilteredOutRecords— emitted byLowestPriceTerminationFormula. Stays as-is for non-histogram queries; a new sibling class is introduced for histogram queries (see mechanics below).Required behavior
For each scope/scenario:
PriceInnerRecordHandlingNONESUMLOWEST_PRICEindividualPricePredicateUNKNOWNRough implementation mechanics
This is a sketch — a detailed plan will be authored from this assignment.
1. Planner-driven flag on
LowestPriceTerminationFormulaAdd a constructor parameter
boolean collectPerInnerRecordPrices. Set totrueby the planner only whenPriceHistogramTranslator.createProducer(...)runs — i.e., only when apriceHistogramrequirement is present. The flag participates in the formula'sgetHash()so cached and live variants don't collide.This is the load-bearing piece for performance: it ensures the 99% of
LOWEST_PRICEqueries that don't request a histogram pay zero extra cost.2. Side-output funnel populated only when flag is true
Inside
computeInternal(), while we already iterateentityInnerRecordPrice.values()per entity to find the lowest, also append every value (per-inner-record winner) to a secondCompositeObjectArray<PriceRecordContract>— gated on the flag. After the loop, materialize into a newResolvedFilteredPriceRecords perInnerRecordPriceRecordsfield parallel tofilteredPriceRecords. TheentityInnerRecordPricevalues already respectindividualPricePredicate(line 355), so the side-output inherits that filter for free.sellingPricePredicateis not applied — it's the price-between user filter, already relaxed byUserFilterRelaxer.relaxinPriceHistogramProducer.fabricate(...).3. New accessor method
Add
FilteredPriceRecords getFilteredPriceRecordsForHistogram(QueryExecutionContext ctx)toFilteredPriceRecordAccessorwith a default delegating togetFilteredPriceRecords(...). Override only on:LowestPriceTerminationFormula— returns the new side-output when flag set.PlainPriceTerminationFormulaWithPriceFilter(NONE, SUM) andSumPriceTerminationFormulause the default → no behavior change for them.4.
PriceHistogramComputerrewiredFor the simple branch (no
filteringFormulaWithFilteredOutRecords): skipFilteredPriceRecordsCollectorentirely forLowestPriceTerminationFormulaaccessors and readaccessor.getFilteredPriceRecordsForHistogram(ctx).getPriceRecords()directly. The funnel only contains entities that already passed the predicate (lines 379-386 only push whenanyPriceMatchesTheFilter), so no second walk is needed.For the relaxed-tree branch: use a new histogram-flavored collector (
collectFilteredPriceRecordsFromPriceRecordAccessorsForHistogram) on the remainder bitmap only.The
priceRecordsLookupResultshortcut (when aFilteredPricesSorteris present) must be skipped forLOWEST_PRICE, because the sorter's cached lookup holds entity-level prices only.5. New cache flattening class — not an extension
Introduce a new sibling class to
FlattenedFormulaWithFilteredPricesAndFilteredOutRecords(working name:FlattenedFormulaWithFilteredPricesForHistogram). It carries an additionalPriceRecord[] perInnerRecordPriceRecordsarray and overrides the new accessor method to return it.LowestPriceTerminationFormula.toSerializableFormula(...)chooses which flattened payload to emit based on the flag:false→ existingFlattenedFormulaWithFilteredPricesAndFilteredOutRecords(untouched, no serialization-version bump, no eviction).true→ newFlattenedFormulaWithFilteredPricesForHistogram.The new class owns its own size estimator that charges
perInnerRecordPriceRecords.length * PRICE_RECORD_SIZEcorrectly so the formula cache byte budget is not silently exceeded.Net effect on cache:
LOWEST_PRICEqueries: zero change. Existing cache entries remain valid across upgrade.LOWEST_PRICEqueries: separate cache slot (different hash because of the flag), larger payload that correctly reflects what's stored.6. Tests
LOWEST_PRICEproduces N histogram data points;SUMproduces one cumulated value;NONEproduces all matching prices.LOWEST_PRICEfixtures — many will need updated expectations.7. Release notes / breaking change entry
Mark as breaking change. Document that
priceHistogramoverLOWEST_PRICEcatalogs now reflects per-inner-record prices, and bucket distributions will shift. Storefronts with hardcoded bucket expectations or end-to-end snapshot tests must rebaseline.Performance considerations
LowestPriceTerminationFormula.computeInternal()is on the hot path of everypriceForSale-bearing query, not just histogram queries. The implementation plan must be vigilant about:priceHistogrammust pay zero extra cost (no extra allocations, no extra branches in the inner loop, no growth of the cached formula payload). This drives the planner-flag and separate-flattened-class decisions above.A focused performance review of the initial sketch produced the design choices documented in the mechanics section (planner flag, direct read of side-output, new sibling flattened class). The implementation plan must preserve those invariants and validate them with allocation profiling and microbenchmarks on a representative
LOWEST_PRICEcatalog before merge.Alternatives explicitly considered and rejected
int[]+ per-entity offsets instead ofPriceRecordContract[]. Rejected: the histogram cruncher consumes both price and entity PK fromPriceRecordContract; refactoring its API saves negligible memory given reference copies are 8 bytes.LOWEST_PRICEquery.Acceptance criteria
LOWEST_PRICEentities, the histogram reflects per-inner-record winning prices (one per inner-record-id, selected by price-list priority, afterindividualPricePredicate).NONEandSUMbehavior is unchanged (regression-tested).LOWEST_PRICEqueries show no measurable latency or allocation regression in benchmarks.LOWEST_PRICEqueries are unchanged (no serialization-version bump onFlattenedFormulaWithFilteredPricesAndFilteredOutRecords).LOWEST_PRICEcatalogs.