Skip to content

feat: IP extension types, extract_ips filters, defang support, and mkdocs API reference#17

Merged
erichutchins merged 16 commits intomainfrom
refactor/rust-modernization
Mar 9, 2026
Merged

feat: IP extension types, extract_ips filters, defang support, and mkdocs API reference#17
erichutchins merged 16 commits intomainfrom
refactor/rust-modernization

Conversation

@erichutchins
Copy link
Copy Markdown
Owner

Summary

  • IP extension typesIPv4 (UInt32) and IPAddress (Binary/16-byte) Arrow extension types with full Parquet/IPC round-trip support; to_ipv4(), to_address(), to_string() conversion functions
  • IP extraction overhaul — new extract_ips() with defang support (e.g. 192[.]168[.]1[.]1) and filter flags (ipv6, only_public, ignore_private, ignore_loopback, ignore_broadcast); extract_public_ips() and extract_private_ips() convenience shortcuts; extract_all_ips() deprecated in favour of extract_ips()
  • Rust modernization — upgraded ip-extract to 0.2.0, switched to match_iter() for defang support, bitmask-keyed extractor cache, removed unused regex dependency
  • Docs — new docs/ip-types.md (reference snippets + end-to-end Parquet workflow), docs/api-reference.md (auto-generated via mkdocstrings), expanded numpy-style docstrings on all public functions, README badges and IP types example
  • Version bump — 0.1.10 → 0.2.0

Test plan

  • All existing tests pass
  • New tests: defang extraction, public/private/loopback/broadcast filters, extract_public_ips, extract_private_ips, deprecation warnings
  • New tests: Parquet and IPC round-trips for IPv4 and IPAddress extension types
  • uv run --group docs mkdocs build --strict — zero warnings

🤖 Generated with Claude Code

erichutchins and others added 16 commits February 15, 2026 17:01
- Replace lazy_static with std::sync::OnceLock (removes external dependency)
- Eliminate all .unwrap() calls with proper error handling
- Implement ListString variant in BuilderWrapper for unified type handling
- Pre-allocate IPv6 trie capacity for better performance
- Add inline hints to hot path functions
- Add Criterion benchmarking framework
- Add thiserror dependency for future error types
- Apply clippy fixes and follow Polars linting conventions

Performance improvements: 1-4% faster on key operations (benchmarks included)
…cks and panics

Critical fixes from code review:
- Remove double-registration: delete extension.rs, register only from Python
- Remove IPv6 type (unused), keep IPv4 and IPAddress only
- Convert panic! to PolarsResult in utils.rs with error propagation
- Replace unwrap() with expect() in regex init with BUG messages
- Add SAFETY comments on unsafe Reader::open_mmap blocks
- Fix IpSeriesExt.to_string bug: strip extension wrapper via pl.lit()
- Restore deprecated ipv4_to_numeric/numeric_to_ipv4 methods with warnings

Improvements:
- Remove dead code: pl_ip_address_from_str, unused #[inline] on FFI functions
- Clean up stale comments, fix format string inconsistencies
- Add comprehensive test suite for extension types (19 tests, 100% pass)
- Document Polars upstream limitation (all-null extension panic)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch from find_iter to match_iter for automatic defang handling.
Add filter flags (only_public, ignore_private, ignore_loopback,
ignore_broadcast) with cached extractors. Add pl_extract_private_ips.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… API

Deprecate extract_all_ips in favor of extract_ips with filter parameters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update existing tests to use extract_ips. Add tests for defanged IPs,
only_public, extract_public_ips, extract_private_ips, ignore_private,
and deprecation warning for extract_all_ips.

Note bracket-wrapped IPv6 (e.g. [2001:db8::1]) not extracted by
match_iter due to ip-extract's ']' boundary character handling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Restores extraction of [2001:db8::1] style addresses in XFF headers
and RFC 2732 bracket notation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tests verifying IPv4 and IPAddress extension types survive
Parquet and IPC write/read cycles with dtype and value preservation.

Add known-issue links to docstrings (all-null panic, to_list crash)
and dyn_display_value TODO stubs for when pola-rs/polars#26649 lands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…erence

New features:
- IPv4 and IPAddress Arrow extension types with Parquet/IPC round-trip support
- to_ipv4(), to_address(), to_string() conversion functions
- extract_ips() with defang support and filter flags (ipv6, only_public,
  ignore_private, ignore_loopback, ignore_broadcast)
- extract_public_ips(), extract_private_ips() convenience functions
- Deprecate extract_all_ips() in favour of extract_ips()

Docs:
- Add mkdocstrings[python] with numpy-style docstring rendering
- Expand all public function docstrings (Parameters, Returns, Examples)
- New docs/ip-types.md: reference snippets + end-to-end Parquet workflow
- New docs/api-reference.md: auto-generated API reference for all modules

Bump version 0.1.10 → 0.2.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nse, Claude, Gemini)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kdocstrings

- Remove "3.9" from test matrix (requires-python = ">=3.10")
- Remove min-versions-test job that tested against Python 3.9
- Update build job needs to remove min-versions-test dependency
- Switch docs.yml from bare pip to uv --group docs so mkdocstrings is installed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hon, enable uv cache

- Add -C debuginfo=0 to RUSTFLAGS to reduce Rust compile time and memory
- Set save-if: main on all rust-cache steps so PR runs don't thrash the cache
- Add enable-cache: true to all setup-uv steps
- Pin test-os Python to 3.12 (was unspecified, resolved to runner default)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@erichutchins erichutchins merged commit 687b527 into main Mar 9, 2026
54 of 57 checks passed
@erichutchins erichutchins deleted the refactor/rust-modernization branch March 9, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant