Add detector id encoding#222
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #222 +/- ##
==========================================
+ Coverage 85.53% 86.33% +0.79%
==========================================
Files 20 21 +1
Lines 1645 1829 +184
==========================================
+ Hits 1407 1579 +172
- Misses 238 250 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds first-class support for encoding/decoding LEGEND detector ID strings to/from the 32-bit integer representation defined by the LEGEND data format specification, with a comprehensive test suite and package-level exports.
Changes:
- Introduces
lgdo.detectoridwithencode_detectorid/decode_detectoridimplementing the spec (including special C variants and validation). - Adds extensive pytest coverage for spec vectors, round-trips, boundary cases, and invalid inputs.
- Re-exports the new functions from
lgdo.__init__and updates codespell ignore words for domain terminology.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/lgdo/detectorid.py |
New implementation of detector ID encoding/decoding and validation logic. |
tests/test_detectorid.py |
New tests covering spec cases, validation errors, and round-trip behavior. |
src/lgdo/__init__.py |
Exposes encode_detectorid/decode_detectorid at the package top level. |
pyproject.toml |
Updates codespell ignore list to accommodate domain-specific “puls”. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I'm a little worried about performance on arrays, and I think it would be nice to have this vectorized (this would work great for doing Unfortunately, numba doesn't make this easy because In practice this implementation could get quite annoying because you will have to deal more manually with conversions and positioning in the bytestring. Another solution that could work for vectorizing this without having to go through all of that is to get the unique IDs and apply what you have to do the conversion. This would minimize the amount of string formatting that needs to be done. This would look something like: Having now written both of these out, I think the second solution is probably better |
|
I had your same thought @iguinn but maybe we can support just slicing a int32 array with a string for now? and then later figure out a good implementation for arrays of strings? |
|
@iguinn I think your right to be concerned and your point is great, but do you think we should add in a second PR? I.e. first get something working even if its not highly performant? |
Agent-Logs-Url: https://github.com/tdixon97/legend-pydataobj/sessions/4f7e5a5a-0b37-4a7b-95a2-f936a4c94a29 Co-authored-by: tdixon97 <56904179+tdixon97@users.noreply.github.com>
Agent-Logs-Url: https://github.com/tdixon97/legend-pydataobj/sessions/4f7e5a5a-0b37-4a7b-95a2-f936a4c94a29 Co-authored-by: tdixon97 <56904179+tdixon97@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
dffe9b0 to
ba256ca
Compare
Adds methods to decode/encode the detector IDs.
Seems reasonable and is certainly well tested, but I am not so sure its the most direct implementation
@iguinn ?