You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A python library to simplify working with the dataset of the tracking machine
5
-
learning challenge.
4
+
A python library to simplify working with the
5
+
[High Energy Physics Tracking Machine Learning challenge](kaggle_trackml)
6
+
dataset.
6
7
7
8
Installation
8
9
------------
@@ -50,9 +51,10 @@ for event_id, hits, cells, particles, truth in load_dataset('path/to/dataset'):
50
51
...
51
52
```
52
53
53
-
Each event is lazily loaded during the iteration. Options are available to
54
-
read only a subset of available events or only read selected parts, e.g. only
55
-
hits or only particles.
54
+
The dataset path can be the path to a directory or to a zip file containing the
55
+
events `.csv` files. Each event is lazily loaded during the iteration. Options
56
+
are available to read only a subset of available events or only read selected
57
+
parts, e.g. only hits or only particles.
56
58
57
59
To generate a random test submission from truth information and compute the
58
60
expected score:
@@ -65,8 +67,8 @@ shuffled = shuffle_hits(truth, 0.05) # 5% probability to reassign a hit
65
67
score = score_event(truth, shuffled)
66
68
```
67
69
68
-
All methods either take or return `pandas.DataFrame` objects. Please have a look
69
-
at the function docstrings for detailed documentation.
70
+
All methods either take or return `pandas.DataFrame` objects. You can have a
71
+
look at the function docstrings for detailed information.
70
72
71
73
Authors
72
74
-------
@@ -94,9 +96,11 @@ some hits can be left unassigned). The training dataset contains the recorded
94
96
hits, their truth association to particles, and the initial parameters of those
95
97
particles. The test dataset contains only the recorded hits.
96
98
97
-
The dataset is provided as a set of plain `.csv` files ('.csv.gz' or '.csv.bz2'
98
-
are also allowed)'. Each event has four associated files that contain hits,
99
-
hit cells, particles, and the ground truth association between them. The common prefix (like `event000000000`) is fully constrained to be `event` followed by 9 digits.
99
+
The dataset is provided as a set of plain `.csv` files (`.csv.gz` or `.csv.bz2`
100
+
are also allowed). Each event has four associated files that contain hits, hit
101
+
cells, particles, and the ground truth association between them. The common
102
+
prefix (like `event000000000`) is fully constrained to be `event` followed by 9
103
+
digits.
100
104
101
105
event000000000-hits.csv
102
106
event000000000-cells.csv
@@ -132,15 +136,17 @@ are given here to simplify detector-specific data handling.
132
136
### Event hit cells
133
137
134
138
The cells file contains the constituent active detector cells that comprise each
135
-
hit. A cell is the smallest granularity inside each detector module, much like a pixel on a screen, except that depending on the volume_id a cell can be a square or a long rectangle. It is
136
-
identified by two channel identifiers that are unique within each detector
137
-
module and encode the position, much like row/column numbers of a matrix. A cell can provide signal information that the
138
-
detector module has recorded in addition to the position. Depending on the
139
-
detector type only one of the channel identifiers is valid, e.g. for the strip
140
-
detectors, and the value might have different resolution.
139
+
hit. A cell is the smallest granularity inside each detector module, much like a
140
+
pixel on a screen, except that depending on the volume_id a cell can be a square
141
+
or a long rectangle. It is identified by two channel identifiers that are unique
142
+
within each detector module and encode the position, much like column/row
143
+
numbers of a matrix. A cell can provide signal information that the detector
144
+
module has recorded in addition to the position. Depending on the detector type
145
+
only one of the channel identifiers is valid, e.g. for the strip detectors, and
146
+
the value might have different resolution.
141
147
142
148
***hit_id**: numerical identifier of the hit as defined in the hits file.
143
-
***ch0, ch1**: channel identifier/coordinates unique with one module.
149
+
***ch0, ch1**: channel identifier/coordinates unique within one module.
144
150
***value**: signal value information, e.g. how much charge a particle has
145
151
deposited.
146
152
@@ -149,7 +155,8 @@ detectors, and the value might have different resolution.
149
155
The particles files contains the following values for each particle/entry:
150
156
151
157
***particle_id**: numerical identifier of the particle inside the event.
152
-
***vx, vy, vz**: initial position (in millimeters) (vertex) in global coordinates.
158
+
***vx, vy, vz**: initial position or vertex (in millimeters) in global
159
+
coordinates.
153
160
***px, py, pz**: initial momentum (in GeV/c) along each global axis.
154
161
***q**: particle charge (as multiple of the absolute electron charge).
155
162
***nhits**: number of hits generated by this particle
@@ -165,23 +172,31 @@ particle/track.
165
172
***hit_id**: numerical identifier of the hit as defined in the hits file.
166
173
***particle_id**: numerical identifier of the generating particle as defined
167
174
in the particles file.
168
-
***tx, ty, tz** true intersection point in global coordinates (in millimeters) between
169
-
the particle trajectory and the sensitive surface.
170
-
***tpx, tpy, tpz** true particle momentum (in GeV/c) in the global coordinate system
171
-
at the intersection point. The corresponding unit vector is tangent to the particle trajectory.
175
+
***tx, ty, tz** true intersection point in global coordinates (in
176
+
millimeters) between the particle trajectory and the sensitive surface.
177
+
***tpx, tpy, tpz** true particle momentum (in GeV/c) in the global
178
+
coordinate system at the intersection point. The corresponding vector
179
+
is tangent to the particle trajectory at the intersection point.
172
180
***weight** per-hit weight used for the scoring metric; total sum of weights
173
181
within one event equals to one.
174
182
175
183
### Dataset submission information
176
184
177
-
The submission file must associate each hit in each event to one and only one reconstructed particle track. The reconstructed tracks must be uniquely identified only within each event. Participants are advised to compress the submission file (with zip, bzip2, gzip) before submission to Kaggle site.
185
+
The submission file must associate each hit in each event to one and only one
186
+
reconstructed particle track. The reconstructed tracks must be uniquely
187
+
identified only within each event. Participants are advised to compress the
188
+
submission file (with zip, bzip2, gzip) before submission to the
189
+
[Kaggle site](kaggle_trackml).
178
190
179
191
***event_id**: numerical identifier of the event; corresponds to the number
180
192
found in the per-event file name prefix.
181
-
***hit_id**: numerical identifier (non negative integer) of the hit inside the event as defined in the per-event hits file.
182
-
***track_id**: user defined numerical identifier (non negative integer) of the track
193
+
***hit_id**: numerical identifier of the hit inside the event as defined in
194
+
the per-event hits file.
195
+
***track_id**: user-defined numerical identifier (non-negative integer) of
0 commit comments