Skip to content

Add detector id encoding#222

Draft
tdixon97 wants to merge 3 commits into
legend-exp:mainfrom
tdixon97:copilot/add-detector-id-encoding
Draft

Add detector id encoding#222
tdixon97 wants to merge 3 commits into
legend-exp:mainfrom
tdixon97:copilot/add-detector-id-encoding

Conversation

@tdixon97

@tdixon97 tdixon97 commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Adds methods to decode/encode the detector IDs.

Seems reasonable and is certainly well tested, but I am not so sure its the most direct implementation

@iguinn ?

Copilot AI review requested due to automatic review settings April 1, 2026 13:08
@tdixon97 tdixon97 changed the title Copilot/add detector id encoding Add detector id encoding Apr 1, 2026
@tdixon97 tdixon97 marked this pull request as draft April 1, 2026 13:09
@codecov

codecov Bot commented Apr 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.47826% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.33%. Comparing base (f4c7fbb) to head (ba256ca).

Files with missing lines Patch % Lines
src/lgdo/detectorid.py 93.44% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
+ Coverage   85.53%   86.33%   +0.79%     
==========================================
  Files          20       21       +1     
  Lines        1645     1829     +184     
==========================================
+ Hits         1407     1579     +172     
- Misses        238      250      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class support for encoding/decoding LEGEND detector ID strings to/from the 32-bit integer representation defined by the LEGEND data format specification, with a comprehensive test suite and package-level exports.

Changes:

  • Introduces lgdo.detectorid with encode_detectorid / decode_detectorid implementing the spec (including special C variants and validation).
  • Adds extensive pytest coverage for spec vectors, round-trips, boundary cases, and invalid inputs.
  • Re-exports the new functions from lgdo.__init__ and updates codespell ignore words for domain terminology.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/lgdo/detectorid.py New implementation of detector ID encoding/decoding and validation logic.
tests/test_detectorid.py New tests covering spec cases, validation errors, and round-trip behavior.
src/lgdo/__init__.py Exposes encode_detectorid/decode_detectorid at the package top level.
pyproject.toml Updates codespell ignore list to accommodate domain-specific “puls”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lgdo/detectorid.py Outdated
Comment thread src/lgdo/detectorid.py
@iguinn

iguinn commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

I'm a little worried about performance on arrays, and I think it would be nice to have this vectorized (this would work great for doing array{detectorid} == encode_detectorid("...") but would require a loop to decode the array).

Unfortunately, numba doesn't make this easy because @vectorize and @guvectorize don't work with bytestrings. You can work with arrays of bytes, though, so a model for vectorizing could be:

from numba import jit, guvectorize
import numpy as np

@guvectorize("void(u4, u1[:])", "(),(m)")
def _test(x: int, out) -> str:
    out[:] = 0
    for i, c in enumerate(str(x)):
        out[i] = ord(c)

a = np.empty(9, dtype="u1")
test2(1234, a)
print(np.frombuffer(a, dtype="|S9"))

def test(x):
    x = np.array(x, dtype='u4')
    by = np.empty(x.shape + (9,), dtype="u1")
    _test(x, by)
    return np.frombuffer(by, dtype="|S9")

print(test3(1234))
print(test3([1234, 56789, 123456789]))

In practice this implementation could get quite annoying because you will have to deal more manually with conversions and positioning in the bytestring.

Another solution that could work for vectorizing this without having to go through all of that is to get the unique IDs and apply what you have to do the conversion. This would minimize the amount of string formatting that needs to be done. This would look something like:

a = np.concat([np.repeat(50, 5), np.repeat(100, 5), np.repeat(150, 5)])
vals, idx = np.unique(a, return_inverse=True)
out = np.empty_like(a, dtype="<U9")

for i, v in enumerate(vals):
    np.copyto(out, np.array(str(v), dtype="<U9"), where = (idx==i)) # replace str(v) with a more complicated formatting

print(out)

Having now written both of these out, I think the second solution is probably better

@gipert

gipert commented Apr 1, 2026

Copy link
Copy Markdown
Member

I had your same thought @iguinn but maybe we can support just slicing a int32 array with a string for now? and then later figure out a good implementation for arrays of strings?

@tdixon97

tdixon97 commented Apr 1, 2026

Copy link
Copy Markdown
Contributor Author

@iguinn I think your right to be concerned and your point is great, but do you think we should add in a second PR? I.e. first get something working even if its not highly performant?

@tdixon97 tdixon97 marked this pull request as ready for review April 2, 2026 12:54
Copilot AI and others added 3 commits April 2, 2026 13:55
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@tdixon97 tdixon97 force-pushed the copilot/add-detector-id-encoding branch from dffe9b0 to ba256ca Compare April 2, 2026 12:55
@ggmarshall ggmarshall marked this pull request as draft April 7, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants