feat: add cdylib target, harden C FFI, fix capture offsets, and add timestamp event boundaries#210
Open
jackluo923 wants to merge 4 commits intoy-scope:log-mechanicfrom
Open
feat: add cdylib target, harden C FFI, fix capture offsets, and add timestamp event boundaries#210jackluo923 wants to merge 4 commits intoy-scope:log-mechanicfrom
jackluo923 wants to merge 4 commits intoy-scope:log-mechanicfrom
Conversation
added 4 commits
February 7, 2026 13:59
Add "cdylib" to crate-type so the library can be loaded via dlopen (e.g. Python cffi). Replace unconditional println! calls in DFA/NFA construction with a debug_println! macro gated behind an AtomicBool flag, controllable at runtime via debug::set_debug().
Track consumed_end (exclusive byte offset past last consumed char)
instead of consumed (start of last consumed char). Use input[i..j]
(exclusive) for capture slices and input[..consumed_end] for the
lexeme. Pass consumed_end to final_operations for correct capture
end offsets.
Also zero-pad hex codes in pattern_for_delimiters ({:02x}) so the
regex parser accepts single-digit codepoints like \u{0a}.
…spection - schema_set_delimiters and schema_add_rule now return bool instead of panicking on invalid UTF-8 or regex parse failures - lexer_new returns *mut Lexer (null on failure) instead of Box<Lexer> - lexer_delete takes *mut Lexer with explicit null check - Add set_debug FFI to toggle debug output at runtime - Add schema_rule_count and schema_rule_name FFI for rule introspection - Accept underscores in regex capture group names (?<my_var>...) - Fix doc comment typos: "interal" -> "internal", "nolonger" -> "no longer"
Add is_timestamp field to Schema::Rule and add_timestamp_rule() method. Add is_event_start field to Fragment, set to true when a timestamp rule matches at byte offset 0 or immediately after a newline. This enables downstream consumers to split multi-line log events. Also add schema_add_timestamp_rule FFI and is_event_start to CLogFragment.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add several improvements to the Rust lexer's build targets, FFI safety, correctness,
and functionality:
cdylib target: Add
cdylibto crate-type so the library can be loaded viadlopen(e.g. Python cffi).
Runtime-configurable debug output: Replace unconditional
println!in DFA/NFAconstruction with a
debug_println!macro gated behind anAtomicBool, controllable atruntime via
debug::set_debug(). This lets upper-layer consumers (e.g. Python bindings)suppress noisy internal output by default and enable it only when needed.
Fix capture boundary offsets: The DFA used inclusive byte offsets (
input[i..=j],input[..=consumed]), causing incorrect capture slices. For example, matching(?<key>[a-z]+)=(?<val>[0-9]+)againstfoo=123would return12instead of123forthe
valcapture because the end offset pointed to the start of the last character ratherthan past it. Fix by tracking
consumed_endas an exclusive byte offset and usinginput[i..j]/input[..consumed_end].Fix delimiter hex encoding:
pattern_for_delimitersformatted codepoints with{:x},producing
\u{a}for newline, which the regex parser rejected. Fix by zero-padding with{:02x}to produce\u{0a}.Support underscores in capture group names: Named capture groups like
(?<my_var>...)are common convention but were rejected by the regex parser. Add support by switching from
alphanumeric1totake_while1(|c| c.is_alphanumeric() || c == '_').Fix FFI panic on invalid input: FFI functions like
schema_add_ruleandlexer_newwould panic (abort the calling process) on invalid UTF-8 or bad regex patterns. Now they
return
bool/ null pointer so callers can handle errors gracefully.Harden FFI lifecycle:
lexer_newreturns*mut Lexer(null on failure) instead ofBox<Lexer>.lexer_deletetakes*mut Lexerwith explicit null check. Addset_debug,schema_rule_count, andschema_rule_nameintrospection functions.Timestamp rules + event boundary detection: Add
is_timestampfield toSchema::Ruleand
add_timestamp_rule()method. Addis_event_startfield toFragment, set when atimestamp rule matches at byte offset 0 or immediately after a newline, enabling downstream
consumers to split multi-line log events.
Checklist
breaking change.
Validation performed
cargo test— all unit tests pass, including new tests for capture boundaries and eventboundary detection
log-surgeon-ffi 0.1.0b10 — the
published Python package uses the Rust cdylib from this branch as its cffi backend,
exercising the FFI, capture offsets, and timestamp event boundary features against real
log data across Python 3.9–3.13 on Linux and macOS