Commit 70b9050
authored
Support proper numpy integration for ~100x performance boost (#259)
# flatdata-py performance: vectorized access and scalar optimization
## What
Adds NumPy-based vectorized field access to flatdata-py and optimizes the scalar (element-by-element) read path. Also fixes a pre-existing bug in `read_value()` for unaligned 64-bit fields.
## Changes
### Vectorized access (`data_access.py`, `resources.py`)
- `read_field_vectorized()`: reads a bit-packed field from all vector elements at once via NumPy, returning an `ndarray`. Zero-copy over the mmap'd buffer.
- `Vector.__getattr__("field")` returns a DataFrame column for the field.
- `Vector.to_numpy()` / `to_data_frame()` return all fields at once.
- `_VectorSlice` gets the same vectorized methods.
- Results are cached per vector instance via `_as_numpy_2d()`.
### Pre-computed field readers (`data_access.py`, `structure.py`)
- `make_field_reader(offset, width, signed)` builds a specialized closure with all constants (byte offset, bit shift, mask, sign handling) pre-computed. Six variants cover the cross-product of field types.
- `Structure.__init_subclass__` builds a `_READERS` dict once per class.
- `__getattr__`, `as_dict`, `as_list`, `as_tuple`, `as_nparray` all use `_READERS`.
- `read_value()` is preserved as a thin wrapper around `make_field_reader` for one-off reads.
### Bug fix (`data_access.py`)
- `read_value()` for 64-bit fields at non-byte-aligned offsets could return values wider than 64 bits (Python arbitrary-precision ints). The bit mask was only applied when `num_bits < 64`, missing the case where `offset_extra_bits > 0`. Fixed by masking when `num_bits < 64 or offset_extra_bits > 0`.
### Other
- `__slots__ = ()` added to generated Structure subclasses (generator template + 10 golden files). Reduces instance size from 72 to 48 bytes.
- `Vector.__iter__` uses local variable caching to avoid repeated attribute lookups.
- Removed unnecessary `list()` on dict keys in `Archive.__getattr__`.
- Performance tips section added to `flatdata-py/README.md`.
- Version bump: flatdata-generator and flatdata-py both 0.4.10 → 0.4.11.
- CI workflow updated to install local generator before flatdata-py (`py.yml`).
## Performance
Measured on a vector from a test archive (5.8M elements, 20 fields, 32 bytes each):
| Access pattern | Before | After |
|---|---|---|
| Scalar iteration (1 field) | 9.7s | 5.8s |
| Vectorized column access (1 field) | n/a | 0.07s |
---------
Signed-off-by: Christian Vetter <christian.vetter@here.com>1 parent 0a6cb89 commit 70b9050
21 files changed
Lines changed: 436 additions & 24 deletions
File tree
- .github/workflows
- flatdata-generator
- flatdata/generator/templates/py
- tests/generators/py_expectations
- archives
- structs
- flatdata-py
- flatdata/lib
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
24 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
Lines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
40 | 42 | | |
41 | 43 | | |
42 | 44 | | |
| 45 | + | |
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
| |||
53 | 56 | | |
54 | 57 | | |
55 | 58 | | |
| 59 | + | |
56 | 60 | | |
57 | 61 | | |
58 | 62 | | |
| |||
66 | 70 | | |
67 | 71 | | |
68 | 72 | | |
| 73 | + | |
69 | 74 | | |
70 | 75 | | |
71 | 76 | | |
| |||
Lines changed: 3 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
| 96 | + | |
95 | 97 | | |
96 | 98 | | |
97 | 99 | | |
| |||
184 | 186 | | |
185 | 187 | | |
186 | 188 | | |
| 189 | + | |
187 | 190 | | |
188 | 191 | | |
189 | 192 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
| 4 | + | |
0 commit comments