Add detector id encoding by tdixon97 · Pull Request #222 · legend-exp/legend-pydataobj

tdixon97 · 2026-04-01T13:08:55Z

Adds methods to decode/encode the detector IDs.

Seems reasonable and is certainly well tested, but I am not so sure its the most direct implementation

codecov · 2026-04-01T13:10:44Z

Codecov Report

❌ Patch coverage is 93.47826% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.33%. Comparing base (f4c7fbb) to head (ba256ca).

Files with missing lines	Patch %	Lines
src/lgdo/detectorid.py	93.44%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
+ Coverage   85.53%   86.33%   +0.79%     
==========================================
  Files          20       21       +1     
  Lines        1645     1829     +184     
==========================================
+ Hits         1407     1579     +172     
- Misses        238      250      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Adds first-class support for encoding/decoding LEGEND detector ID strings to/from the 32-bit integer representation defined by the LEGEND data format specification, with a comprehensive test suite and package-level exports.

Changes:

Introduces lgdo.detectorid with encode_detectorid / decode_detectorid implementing the spec (including special C variants and validation).
Adds extensive pytest coverage for spec vectors, round-trips, boundary cases, and invalid inputs.
Re-exports the new functions from lgdo.__init__ and updates codespell ignore words for domain terminology.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/lgdo/detectorid.py`	New implementation of detector ID encoding/decoding and validation logic.
`tests/test_detectorid.py`	New tests covering spec cases, validation errors, and round-trip behavior.
`src/lgdo/__init__.py`	Exposes `encode_detectorid`/`decode_detectorid` at the package top level.
`pyproject.toml`	Updates codespell ignore list to accommodate domain-specific “puls”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

iguinn · 2026-04-01T17:32:50Z

I'm a little worried about performance on arrays, and I think it would be nice to have this vectorized (this would work great for doing array{detectorid} == encode_detectorid("...") but would require a loop to decode the array).

Unfortunately, numba doesn't make this easy because @vectorize and @guvectorize don't work with bytestrings. You can work with arrays of bytes, though, so a model for vectorizing could be:

from numba import jit, guvectorize
import numpy as np

@guvectorize("void(u4, u1[:])", "(),(m)")
def _test(x: int, out) -> str:
    out[:] = 0
    for i, c in enumerate(str(x)):
        out[i] = ord(c)

a = np.empty(9, dtype="u1")
test2(1234, a)
print(np.frombuffer(a, dtype="|S9"))

def test(x):
    x = np.array(x, dtype='u4')
    by = np.empty(x.shape + (9,), dtype="u1")
    _test(x, by)
    return np.frombuffer(by, dtype="|S9")

print(test3(1234))
print(test3([1234, 56789, 123456789]))

In practice this implementation could get quite annoying because you will have to deal more manually with conversions and positioning in the bytestring.

Another solution that could work for vectorizing this without having to go through all of that is to get the unique IDs and apply what you have to do the conversion. This would minimize the amount of string formatting that needs to be done. This would look something like:

a = np.concat([np.repeat(50, 5), np.repeat(100, 5), np.repeat(150, 5)])
vals, idx = np.unique(a, return_inverse=True)
out = np.empty_like(a, dtype="<U9")

for i, v in enumerate(vals):
    np.copyto(out, np.array(str(v), dtype="<U9"), where = (idx==i)) # replace str(v) with a more complicated formatting

print(out)

Having now written both of these out, I think the second solution is probably better

gipert · 2026-04-01T17:34:32Z

I had your same thought @iguinn but maybe we can support just slicing a int32 array with a string for now? and then later figure out a good implementation for arrays of strings?

tdixon97 · 2026-04-01T17:35:04Z

@iguinn I think your right to be concerned and your point is great, but do you think we should add in a second PR? I.e. first get something working even if its not highly performant?

Agent-Logs-Url: https://github.com/tdixon97/legend-pydataobj/sessions/4f7e5a5a-0b37-4a7b-95a2-f936a4c94a29 Co-authored-by: tdixon97 <56904179+tdixon97@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 1, 2026 13:08

tdixon97 changed the title ~~Copilot/add detector id encoding~~ Add detector id encoding Apr 1, 2026

Copilot started reviewing on behalf of tdixon97 April 1, 2026 13:09 View session

tdixon97 marked this pull request as draft April 1, 2026 13:09

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Comment thread src/lgdo/detectorid.py Outdated

Comment thread src/lgdo/detectorid.py

tdixon97 marked this pull request as ready for review April 2, 2026 12:54

Copilot AI and others added 3 commits April 2, 2026 13:55

Add encode_detectorid/decode_detectorid per LEGEND detector ID spec

e75fd07

Agent-Logs-Url: https://github.com/tdixon97/legend-pydataobj/sessions/4f7e5a5a-0b37-4a7b-95a2-f936a4c94a29 Co-authored-by: tdixon97 <56904179+tdixon97@users.noreply.github.com>

Fix walrus operator in test - use literal 0x09 directly

fcb1aa8

Agent-Logs-Url: https://github.com/tdixon97/legend-pydataobj/sessions/4f7e5a5a-0b37-4a7b-95a2-f936a4c94a29 Co-authored-by: tdixon97 <56904179+tdixon97@users.noreply.github.com>

Update src/lgdo/detectorid.py

ba256ca

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tdixon97 force-pushed the copilot/add-detector-id-encoding branch from dffe9b0 to ba256ca Compare April 2, 2026 12:55

ggmarshall marked this pull request as draft April 7, 2026 14:46

ManuelHu mentioned this pull request May 12, 2026

Switch to DetectorID UIDs legend-exp/legend-pygeom-l200#216

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add detector id encoding#222

Add detector id encoding#222
tdixon97 wants to merge 3 commits into
legend-exp:mainfrom
tdixon97:copilot/add-detector-id-encoding

tdixon97 commented Apr 1, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

iguinn commented Apr 1, 2026

Uh oh!

gipert commented Apr 1, 2026

Uh oh!

tdixon97 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tdixon97 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

iguinn commented Apr 1, 2026

Uh oh!

gipert commented Apr 1, 2026

Uh oh!

tdixon97 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tdixon97 commented Apr 1, 2026 •

edited

Loading

codecov Bot commented Apr 1, 2026 •

edited

Loading