Skip to content

feat: measurement and sim-hit tables for the Arrow/Parquet output#5586

Draft
murnanedaniel wants to merge 5 commits into
acts-project:mainfrom
murnanedaniel:tracker-hits-v2-tables
Draft

feat: measurement and sim-hit tables for the Arrow/Parquet output#5586
murnanedaniel wants to merge 5 commits into
acts-project:mainfrom
murnanedaniel:tracker-hits-v2-tables

Conversation

@murnanedaniel

@murnanedaniel murnanedaniel commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #5410: splits tracker-hit output into a reco table and a truth table, linked by row indices.

tracker_hits - one row per measurement (row index = the id in track hit_ids):

  • loc0, loc1, var_loc0, var_loc1 (full bound layout)
  • time, var_time
  • subspace bitmask (bit0/1/2 = loc0/loc1/time measured)
  • x, y, z (reco global)
  • detector, volume_id, layer_id, surface_id
  • size_loc0, size_loc1, n_channels, sum_activation, local_eta, local_phi, global_eta, global_phi, eta_angle, phi_angle
  • particle_ids, simhit_ids (nested truth links; merged clusters carry several contributors)

tracker_simhits - one row per sim-hit, all sim-hits, in container order (= the simhit_ids indices):

  • true_x, true_y, true_z, true_time
  • tpx, tpy, tpz, tE, dE
  • particle_id, hit_index
  • detector, volume_id, layer_id, surface_id

Only inputMeasurements + trackingGeometry are required: clusters and truth inputs are optional (shape columns zeroed / truth links empty), all collection names configurable, and the detector numbering can be supplied as a custom mapping. Python bindings and tests included.

The shape columns give cheap cluster topology without writing out full cell lists (~1M cells/event at ttbar PU200). They currently come out zeroed: clusters reach the converter with empty channels even in geometric digitization, though localParameters fills them and ModuleClusters should preserve them. Am I misreading the flow, or are the cells dropped somewhere? The columns are there either way, so they'll fill in once the cells do.

Validated on the ODD up to ttbar PU200: ~2% merged measurements, <=0.006% unmatched sim-hits, all links resolve.

cc @paulgessinger @andiwand @benjaminhuth

murnanedaniel and others added 5 commits June 12, 2026 08:09
Emit tracker_hits as one row per measurement instead of one per sim-hit,
with the contributing sim-hits' truth (particle_id, true_x/true_y/true_z,
time) as nested list<list<>> columns (mirroring the calo contrib_* pattern).
The row index is the measurement id, so there is no measurement_id column and
tracks reference measurements by that index.

Tracks: hit_ids are measurement indices, num_measurements is dropped
(== len(hit_ids)), and hit_outlier is carried per measurement-state.

Sim-hits contributing to no measurement are dropped; the dropped fraction is
checked at runtime against maxUnmatchedSimHitFraction (default 0.1%).
ArrowSimHitOutputConverter becomes the TRUTH-table converter: one entry per
sim-hit in container order (the position is the sim-hit id the measurement
table references). Standalone-complete for re-digitization: true position +
time, 4-momentum at the hit (momentum4Before), deposited energy, particle
link (row index into the particle table), hit index along the trajectory,
and sensor identification. Drops the measurement inputs entirely; the
sim/reco split follows the Release-2 schema discussion (truth in one file,
reco in another, links between them).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…erter)

One entry per measurement (position = measurement id referenced by track
hit_ids): local parameters and variances always expanded to the full bound
layout with a subspace bitmask (bit0=loc0, bit1=loc1, bit2=time) saying
which components were measured; measured time + variance; reco global
position; sensor ids; cluster-shape features from the geometric digitization
(sizes, n_channels, sum_activation, local/global eta-phi and incidence
angles); and truth LINKS only - particle_ids (particle-table rows) and
simhit_ids (tracker_simhits rows, same event). The unmatched-sim-hit
fraction is a warning-only diagnostic now that the truth table is complete.

Python bindings for the converter + measurementSchema(); test_arrow.py gains
the two-table contract test (link resolution, particle consistency between
tables, subspace semantics, envelope checks).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
UnsafeAppend without Reserve writes past the buffer - heap corruption that
surfaced as a Sequencer::run() segfault on the first real event.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…erter

Generalizes the converter beyond geometric-digitization MC workflows:

- inputClusters optional: without it (e.g. smearing-only digitization) the
  shape columns are emitted as zeros, keeping the schema stable.
- inputSimHits + inputSimHitMeasurementsMap optional as a pair: without them
  (data, or reco-only conversion) the particle_ids/simhit_ids link columns
  are emitted as empty lists and the unmatched diagnostic is skipped.

Adds a reco-only unit test (no truth, no clusters wired): reco columns
populated, links empty, shape zeroed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions github-actions Bot added this to the next milestone Jun 12, 2026
@github-actions github-actions Bot added Component - Examples Affects the Examples module Component - Plugins Affects one or more Plugins labels Jun 12, 2026
@github-actions

Copy link
Copy Markdown
Contributor

📊: Physics performance monitoring for 5c4c1b5

Full contents

physmon summary

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component - Examples Affects the Examples module Component - Plugins Affects one or more Plugins

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant