Commit 489bc83
authored
feat(dataset): hybrid IP storage with parallel batch processing (#128)
* feat(dataset): hybrid IP storage with parallel batch processing
Implement memory-efficient hybrid storage that separates individual IPs
from CIDR ranges, significantly reducing memory usage for typical workloads.
## Changes
- Add IPMap using sync.Map for individual IP addresses (O(1) lookups)
- Keep BART (RangeSet) only for CIDR ranges requiring LPM
- Parallelize Add/Remove operations using sync.WaitGroup.Go() (Go 1.24+)
- CheckIP() checks IPMap first, falls back to RangeSet for range matching
## Memory Optimization
Most real-world deployments receive individual IP decisions, not ranges.
BART's radix trie has significant per-entry overhead optimized for prefix
matching. By storing individual IPs in a simple map:
- **1K IPs**: BART ~598KB → Map significantly less
- **10K IPs**: BART ~5.6MB → Map significantly less
- **50K IPs**: BART ~26.5MB → Map significantly less
## Performance
- IPMap lookup: ~72 ns/op, 0 allocs
- RangeSet lookup: ~69 ns/op, 0 allocs
- Parallel batch processing for Add/Remove operations
## Breaking Changes
None - API unchanged, internal storage optimization only.
* fix(ipmap): prevent data races with concurrent readers
Address Copilot review comments:
1. Clone RemediationIdsMap before modifying in add() and remove()
- Prevents race between Contains() readers and Add/Remove modifiers
- sync.Map's Load→Modify→Store pattern is unsafe with mutable values
2. Fix version comment: sync.WaitGroup.Go was added in Go 1.23, not 1.24
3. Remove trailing whitespace in benchmark_test.go
* refactor(dataset): rename BartUnifiedIPSet to BartRangeSet
More descriptive name since it now only handles CIDR ranges,
while individual IPs are stored in IPMap.
* feat(dataset): Consistent COW for lock free reads (#129)
* perf: lock-free reads for IPMap and CNSet
Use atomic pointers for completely lock-free reads across all data structures.
SPOA handlers never block, even during batch updates.
Changes:
1. IPMap: atomic.Pointer per entry for individual IPs
2. CNSet: atomic.Pointer for entire country map (small, cloning is cheap)
- Added NewCNSet() constructor for consistency
- Added defensive nil checks in Add/Remove/Contains
3. BartRangeSet: already uses atomic pointer (unchanged)
Concurrency model (consistent across all):
- Reads: atomic pointer load (instant, never blocks)
- Writes: clone → modify → atomic store (copy-on-write)
- Readers always see consistent state (old or new, never partial)
This ensures SPOA always has immediate access to decision data,
regardless of ongoing batch updates from the stream bouncer.
* refactor(ipmap): simplify addLocked - remove unreachable LoadOrStore fallback
Since writeMu is held for the entire batch, no other writer can race.
The LoadOrStore fallback code was unreachable - simplified to use Store directly.
* Simplify RemediationMap: remove ID tracking, optimize cloning
- Changed from map[Remediation][]RemediationDetails to map[Remediation]string
- Removed ID tracking (LAPI only sends longest decisions)
- Simplified Add/Remove methods (no ID parameters)
- Optimized IPMap cloning: skip clone for empty maps and single-entry removals
- Updated all callers to use new simplified API
- Reduced memory allocations and GC pressure
* Remove writeMu mutex from IPMap - single writer guarantee
- Removed writeMu mutex (unnecessary with single writer thread)
- Renamed addLocked/removeLocked to add/remove
- Updated comments to reflect single-writer, multiple-reader model
- Stream handler processes decisions sequentially, no concurrent writes
- Atomic pointer operations handle reader-writer synchronization safely
* Address Copilot review comments: remove unused ID parameters
- Remove unused 'id' parameter from CNSet.Add and CNSet.Remove methods
- Update addCN/removeCN helper functions to match new signatures
- Update outdated comments: 'merging IDs' -> 'merging remediations'
- Optimize CNSet.Add preallocation: remove +1 to avoid overallocation
- Fix map initialization syntax in CNSet.Add
* Remove unused ID fields from operation structs
- Remove ID field from IPAddOp, IPRemoveOp, BartAddOp, BartRemoveOp
- Remove id field from internal cnOp struct
- Update all operation struct literals to remove ID assignments
- Update benchmark tests to match new struct definitions
- IDs are no longer tracked since LAPI behavior ensures only longest decisions
* Add comprehensive metrics tests and optimize no-op handling
- Add comprehensive test suite for metrics tracking (IPMap, BartRangeSet, CNSet)
- Fix BartRangeSet no-op check to use exact prefix lookup (Get) instead of LPM
- Add HasRemediation and GetOriginForRemediation methods to BartRangeSet for exact prefix matching
- Optimize ipType assignment to only occur after no-op checks
- Remove pre-allocation of operation slices to avoid wasting memory when many decisions are no-ops
- Replace hardcoded metric increments with len(decisions) for better readability
* Fix outdated comment: replace 'ID' with 'remediation'
- Update comment in bart_types.go to reflect that IDs are no longer tracked
- Addresses Copilot review feedback from PR #129
* Refactor Remove() to return error for cleaner duplicate delete handling
- Add ErrRemediationNotFound sentinel error
- Update RemediationMap.Remove() to return error instead of silently ignoring
- Update all call sites to use errors.Is() for cleaner error checking
- Simplify RemoveBatch logic - no need to check existence before calling Remove()
- Metrics are only updated for actual removals, not duplicate deletes
This makes the code easier to follow and more explicit about error handling.
* refactor(metrics): use WithLabelValues instead of With for better performance
Replace prometheus.With(Labels{...}) with WithLabelValues() to avoid
map allocation overhead. This aligns with the optimization from main branch.
- Remove unused prometheus import
- Update all metrics calls to use WithLabelValues
- Add comments indicating label order for clarity
* fix(lint): remove unused nolint directive in metrics tests
* fix: verify origin matches before removing decisions
- Add origin verification in IPMap.remove() to prevent removing decisions
when origin has been overwritten (e.g., by CAPI)
- Add same origin check in BartRangeSet.RemoveBatch() for range removals
- Remove optimization that deleted IP when only one remediation existed
to handle edge cases where same origin changes remediation types
- Ensures metrics are decremented correctly and prevents orphaned decisions
Fixes issue where unsubscribing from blocklists didn't properly remove
decisions when origins were overwritten by other sources.
* fix: correct Go version in comments and add origin overwrite test
- Fix version documentation: sync.WaitGroup.Go was added in Go 1.22, not 1.23
- Add TestMetrics_OriginOverwriteAndDelete to verify metrics correctness
when same origin changes remediation types (ban -> captcha -> delete ban)
- Test ensures origin verification works correctly and metrics are accurate1 parent 2b0de18 commit 489bc83
File tree
8 files changed
+1781
-218
lines changed- pkg/dataset
8 files changed
+1781
-218
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| |||
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | | - | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
28 | 27 | | |
29 | 28 | | |
30 | 29 | | |
31 | 30 | | |
32 | 31 | | |
33 | | - | |
| 32 | + | |
34 | 33 | | |
35 | 34 | | |
36 | | - | |
37 | | - | |
| 35 | + | |
| 36 | + | |
38 | 37 | | |
39 | 38 | | |
40 | 39 | | |
41 | 40 | | |
42 | | - | |
| 41 | + | |
43 | 42 | | |
44 | 43 | | |
45 | 44 | | |
46 | | - | |
47 | | - | |
| 45 | + | |
| 46 | + | |
48 | 47 | | |
49 | 48 | | |
50 | 49 | | |
| |||
55 | 54 | | |
56 | 55 | | |
57 | 56 | | |
58 | | - | |
| 57 | + | |
59 | 58 | | |
60 | 59 | | |
61 | 60 | | |
| |||
78 | 77 | | |
79 | 78 | | |
80 | 79 | | |
81 | | - | |
| 80 | + | |
82 | 81 | | |
83 | | - | |
| 82 | + | |
84 | 83 | | |
85 | | - | |
| 84 | + | |
86 | 85 | | |
87 | 86 | | |
88 | | - | |
| 87 | + | |
89 | 88 | | |
90 | 89 | | |
91 | 90 | | |
| |||
99 | 98 | | |
100 | 99 | | |
101 | 100 | | |
102 | | - | |
| 101 | + | |
103 | 102 | | |
104 | | - | |
105 | | - | |
| 103 | + | |
| 104 | + | |
106 | 105 | | |
107 | 106 | | |
108 | 107 | | |
| |||
126 | 125 | | |
127 | 126 | | |
128 | 127 | | |
129 | | - | |
| 128 | + | |
130 | 129 | | |
131 | 130 | | |
132 | 131 | | |
| |||
141 | 140 | | |
142 | 141 | | |
143 | 142 | | |
144 | | - | |
| 143 | + | |
145 | 144 | | |
146 | 145 | | |
147 | | - | |
| 146 | + | |
148 | 147 | | |
149 | 148 | | |
150 | | - | |
| 149 | + | |
151 | 150 | | |
152 | 151 | | |
153 | 152 | | |
154 | 153 | | |
155 | 154 | | |
156 | 155 | | |
157 | | - | |
158 | | - | |
| 156 | + | |
| 157 | + | |
159 | 158 | | |
160 | 159 | | |
161 | 160 | | |
| |||
165 | 164 | | |
166 | 165 | | |
167 | 166 | | |
168 | | - | |
| 167 | + | |
169 | 168 | | |
| 169 | + | |
170 | 170 | | |
171 | | - | |
| 171 | + | |
172 | 172 | | |
173 | 173 | | |
174 | 174 | | |
| |||
199 | 199 | | |
200 | 200 | | |
201 | 201 | | |
202 | | - | |
| 202 | + | |
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
| |||
208 | 208 | | |
209 | 209 | | |
210 | 210 | | |
211 | | - | |
212 | | - | |
213 | | - | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
214 | 216 | | |
215 | | - | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
216 | 223 | | |
217 | 224 | | |
218 | | - | |
| 225 | + | |
219 | 226 | | |
220 | 227 | | |
221 | | - | |
222 | | - | |
223 | | - | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
224 | 243 | | |
225 | 244 | | |
226 | 245 | | |
| |||
229 | 248 | | |
230 | 249 | | |
231 | 250 | | |
232 | | - | |
| 251 | + | |
233 | 252 | | |
234 | 253 | | |
235 | 254 | | |
| |||
244 | 263 | | |
245 | 264 | | |
246 | 265 | | |
247 | | - | |
| 266 | + | |
248 | 267 | | |
249 | 268 | | |
250 | 269 | | |
| |||
275 | 294 | | |
276 | 295 | | |
277 | 296 | | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
0 commit comments