Skip to content

Commit 8e4bc0d

Browse files
authored
Merge pull request #5 from msmk0/v2
Changes for version 2
2 parents 568cc23 + 71fe6ea commit 8e4bc0d

File tree

3 files changed

+64
-22
lines changed

3 files changed

+64
-22
lines changed

README.md

+54-11
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ TrackML utility library
22
=======================
33

44
A python library to simplify working with the
5-
[High Energy Physics Tracking Machine Learning challenge](kaggle_trackml)
5+
[High Energy Physics Tracking Machine Learning challenge][kaggle_trackml]
66
dataset.
77

88
Installation
@@ -96,11 +96,10 @@ some hits can be left unassigned). The training dataset contains the recorded
9696
hits, their truth association to particles, and the initial parameters of those
9797
particles. The test dataset contains only the recorded hits.
9898

99-
The dataset is provided as a set of plain `.csv` files (`.csv.gz` or `.csv.bz2`
100-
are also allowed). Each event has four associated files that contain hits, hit
101-
cells, particles, and the ground truth association between them. The common
102-
prefix (like `event000000000`) is fully constrained to be `event` followed by 9
103-
digits.
99+
The dataset is provided as a set of plain `.csv` files. Each event has four
100+
associated files that contain hits, hit cells, particles, and the ground truth
101+
association between them. The common prefix, e.g. `event000000010`, is always
102+
`event` followed by 9 digits.
104103

105104
event000000000-hits.csv
106105
event000000000-cells.csv
@@ -122,7 +121,7 @@ a name starting with `submission`, e.g.
122121
The hits file contains the following values for each hit/entry:
123122

124123
* **hit_id**: numerical identifier of the hit inside the event.
125-
* **x, y, z**: measured x, y, z position (in millimeters) of the hit in
124+
* **x, y, z**: measured x, y, z position (in millimeter) of the hit in
126125
global coordinates.
127126
* **volume_id**: numerical identifier of the detector group.
128127
* **layer_id**: numerical identifier of the detector layer inside the
@@ -159,7 +158,7 @@ The particles files contains the following values for each particle/entry:
159158
coordinates.
160159
* **px, py, pz**: initial momentum (in GeV/c) along each global axis.
161160
* **q**: particle charge (as multiple of the absolute electron charge).
162-
* **nhits**: number of hits generated by this particle
161+
* **nhits**: number of hits generated by this particle.
163162

164163
All entries contain the generated information or ground truth.
165164

@@ -171,7 +170,8 @@ particle/track.
171170

172171
* **hit_id**: numerical identifier of the hit as defined in the hits file.
173172
* **particle_id**: numerical identifier of the generating particle as defined
174-
in the particles file.
173+
in the particles file. A value of 0 means that the hit did not originate
174+
from a reconstructible particle, but e.g. from detector noise.
175175
* **tx, ty, tz** true intersection point in global coordinates (in
176176
millimeters) between the particle trajectory and the sensitive surface.
177177
* **tpx, tpy, tpz** true particle momentum (in GeV/c) in the global
@@ -186,14 +186,57 @@ The submission file must associate each hit in each event to one and only one
186186
reconstructed particle track. The reconstructed tracks must be uniquely
187187
identified only within each event. Participants are advised to compress the
188188
submission file (with zip, bzip2, gzip) before submission to the
189-
[Kaggle site](kaggle_trackml).
189+
[Kaggle site][kaggle_trackml].
190190

191191
* **event_id**: numerical identifier of the event; corresponds to the number
192192
found in the per-event file name prefix.
193193
* **hit_id**: numerical identifier of the hit inside the event as defined in
194194
the per-event hits file.
195195
* **track_id**: user-defined numerical identifier (non-negative integer) of
196-
the track
196+
the track.
197+
198+
### Additional detector geometry information
199+
200+
The detector modules that measure particles and generated the hits are organized
201+
into detector groups or volumes identified by a volume id. Inside a volume they
202+
are further grouped into layers identified by a layer id. Each layer can contain
203+
an arbitrary number of detector modules, the smallest geometrically distinct
204+
detector object, each identified by a module_id. Within each group detector
205+
modules are of the same type have e.g. the same granularity. All simulated
206+
detector modules are so-called semiconductor sensors that are build from thin
207+
silicon sensor chips. Each module can be represented by a two-dimensional,
208+
planar, bounded sensitive surface. These sensitive surfaces are subdivided into
209+
regular grids that define the detectors cells, the smallest granularity within
210+
the detector.
211+
212+
Each module has a different position and orientation described in the detectors
213+
file. A local, right-handed coordinate system is defined on each sensitive
214+
surface such that the first two coordinates u and v are on the sensitive surface
215+
and the third coordinate w is normal to the surface. The orientation and
216+
position are defined by the following transformation
217+
218+
pos_xyz = rotation_matrix * pos_uvw + offset
219+
220+
that transform a position described in local coordinates u,v,w into the
221+
equivalent position x,y,z in global coordinates using a rotation matrix and
222+
an offset.
223+
224+
* **volume_id**: numerical identifier of the detector group.
225+
* **layer_id**: numerical identifier of the detector layer inside the
226+
group.
227+
* **module_id**: numerical identifier of the detector module inside
228+
the layer.
229+
* **cx, cy, cz**: position of the local origin in the described in the global
230+
coordinate system (in millimeter).
231+
* **rot_xu, rot_xv, rot_xw, rot_yu, ...**: components of the rotation matrix
232+
to rotate from local u,v,w to global x,y,z coordinates.
233+
* **module_t**: thickness of the detector module (in millimeter).
234+
* **module_minhu, module_maxhu**: the minimum/maximum half-length of the
235+
module boundary along the local u direction (in millimeter).
236+
* **module_hv**: the half-length of the module boundary along the local v
237+
direction (in millimeter).
238+
* **pitch_u, pitch_v**: the size of detector cells along the local u and v
239+
direction (in millimeter).
197240

198241

199242
[cern]: https://home.cern

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
setup(
1313
name='trackml',
14-
version='1',
14+
version='2',
1515
description='TrackML utility library',
1616
long_description=long_description,
1717
long_description_content_type='text/markdown',

trackml/randomize.py

+9-10
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,11 @@
66
import numpy
77
import numpy.random
88

9-
def _make_submission(mapping, track_ids, renumber=True):
9+
def _make_submission(hit_ids, track_ids, renumber=True):
1010
"""Create a submission DataFrame with hit_id and track_id columns.
1111
1212
Optionally renumbers the track_id to random small integers.
1313
"""
14-
hit_ids = mapping['hit_id']
1514
if renumber:
1615
unique_ids, inverse = numpy.unique(track_ids, return_inverse=True)
1716
numbers = numpy.arange(1, len(unique_ids) + 1, dtype=unique_ids.dtype)
@@ -23,18 +22,18 @@ def set_seed(seed):
2322
"""Set the random seed used for randomness in this module."""
2423
numpy.random.seed(seed)
2524

26-
def random_solution(truth, ntracks):
27-
"""Generate a completely random solution with the given number of particles.
25+
def random_solution(hits, ntracks):
26+
"""Generate a completely random solution with the given number of tracks.
2827
2928
Parameters
3029
----------
31-
truth : pandas.DataFrame
32-
Truth mapping must contain hit_id and particle_id columns.
30+
hits : pandas.DataFrame
31+
Hits information must contain hit_id column.
3332
ntracks : int
3433
Number of tracks the submission should contain.
3534
"""
36-
ids = numpy.random.randint(1, nparticles + 1, size=len(mapping), dtype='i4')
37-
return _make_submission(truth, ids, renumber=False)
35+
ids = numpy.random.randint(1, ntracks + 1, size=len(hits), dtype='i4')
36+
return _make_submission(hits['hit_id'], ids, renumber=False)
3837

3938
def drop_hits(truth, probability):
4039
"""Drop hits from each track with a certain probability.
@@ -55,7 +54,7 @@ def drop_hits(truth, probability):
5554
fakeids = numpy.arange(fakeid0, fakeid0 + dropped_count, dtype='i8')
5655
# replace masked particle ids with fakes ones
5756
numpy.place(out, dropped_mask, fakeids)
58-
return _make_submission(truth, out)
57+
return _make_submission(truth['hit_id'], out)
5958

6059
def shuffle_hits(truth, probability):
6160
"""Randomly assign hits to a wrong particle with a certain probability.
@@ -73,4 +72,4 @@ def shuffle_hits(truth, probability):
7372
wrongparticles = numpy.random.choice(numpy.unique(out), size=shuffled_count)
7473
# replace masked particle ids with random valid ids
7574
numpy.place(out, shuffled_mask, wrongparticles)
76-
return _make_submission(truth, out)
75+
return _make_submission(truth['hit_id'], out)

0 commit comments

Comments
 (0)