You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`flatdata-py` supports two data access patterns with very different performance characteristics on large archives.
24
+
25
+
Iterating over a vector yields one Python object per element. Each field access unpacks bits from the underlying memory-mapped data. This is fine for accessing individual elements or small ranges, but has significant per-element overhead for bulk operations:
26
+
27
+
```python
28
+
count =sum(1for x in archive.links if x.speed_limit >100)
29
+
```
30
+
31
+
For bulk operations, use the vectorized access methods that read fields directly into NumPy arrays:
32
+
33
+
```python
34
+
# single column access, returns a pandas DataFrame
35
+
df = archive.links.speed_limit
36
+
count =len(df[df['speed_limit'] >100])
37
+
38
+
# full NumPy structured array with all fields
39
+
arr = archive.links.to_numpy()
40
+
count =int(np.sum(arr['speed_limit'] >100))
41
+
42
+
# slices work too
43
+
arr = archive.links[1000:2000].to_numpy()
44
+
df = archive.links[::10].to_data_frame()
45
+
```
46
+
47
+
* Use `vector.field_name` (column access) when you only need one or a few fields.
48
+
* Use `vector.to_numpy()` or `vector.to_data_frame()` when you need all fields at once.
49
+
* Use `vector[i].field` for random access to individual elements.
50
+
* The underlying data is memory-mapped; the OS pages it from disk on demand. Vectorized results are materialized as NumPy arrays in RAM.
51
+
21
52
## Using the inspector
22
53
23
54
`flatdata-py` comes with a handy tool called the `flatdata-inspector` to inspect the contents of an archive:
0 commit comments