Skip to content

Latest commit

 

History

History
110 lines (87 loc) · 5.44 KB

File metadata and controls

110 lines (87 loc) · 5.44 KB

coregex Architecture

Production-grade regex engine for Go achieving 3-3000x speedup over stdlib through multi-engine architecture and SIMD optimizations.

Execution Pipeline

Pattern → Parse → NFA Compile → Literal Extract → Strategy Select
                                                        ↓
                                    ┌──────────────────────────────────────┐
                                    │ Strategy (one of 15):                │
                                    │  UseNFA, UseDFA, UseBoth,            │
                                    │  UseReverseAnchored, UseReverseSuffix│
                                    │  UseOnePass, UseReverseInner,        │
                                    │  UseBoundedBacktracker, UseTeddy,    │
                                    │  UseReverseSuffixSet, UseCharClass,  │
                                    │  UseDigitPrefilter, UseAhoCorasick,  │
                                    │  UseComposite, UseAnchoredLiteral    │
                                    └──────────────────────────────────────┘
                                                        ↓
Input → Prefilter (memchr/memmem/teddy) → Engine Search → Match Result

Engine Architecture (Rust-aligned)

DFA Layer (dfa/lazy/)

  • Lazy DFA: On-demand state construction with byte class compression
  • Flat transition table: flatTrans[sid*stride+class] — single array lookup, no pointer chase
  • Byte-based cache limit: 2MB default (matches Rust hybrid_cache_capacity)
  • Cache clearing: Up to 5 clears before NFA fallback (Rust approach)
  • Acceleration: Detects self-loop states, uses SIMD memchr for skip-ahead
  • Integrated prefilter: Skip-ahead at start state in DFA loop (Rust hybrid/search.rs:232)
  • Per-goroutine cache: Immutable DFA + mutable DFACache (thread-safe)

PikeVM Layer (nfa/pikevm.go)

  • Dual SlotTable: Flat per-state capture storage (curr/next, swapped per byte)
    • Zero-allocation capture tracking (Rust ActiveStates pattern)
    • Stack-based epsilon closure with RestoreCapture frames
    • searchThread (12 bytes) vs legacy thread (40+ bytes with COW)
  • Integrated prefilter: Skip-ahead when no active threads (Rust pikevm.rs:1293)
  • SearchMode: Dynamic slot sizing (0=IsMatch, 2=Find, full=Captures)

BoundedBacktracker (nfa/backtrack.go)

  • Generation-based visited table (O(1) reset, uint16)
  • Visited limit: 256KB for UseNFA (Rust default), 64MB for UseBoundedBacktracker (POSIX)
  • Fallback to PikeVM when input exceeds capacity

Prefilter Layer (prefilter/)

  • AVX2 memchr: SIMD byte search (12x faster than bytes.IndexByte)
  • Memmem: SIMD substring search with Rabin-Karp fingerprinting
  • Teddy: SIMD multi-pattern matching (1-8 patterns, AVX2/SSSE3)
  • Aho-Corasick: DFA-based multi-pattern for >8 patterns
  • DigitPrefilter: SIMD digit detection for \d+ patterns

Memory Architecture

Per-Pattern (compile-time, shared immutable)

  • NFA graph (states, transitions)
  • DFA configuration (byte classes, start map)
  • Prefilter (literal tables, SIMD masks)
  • Strategy-specific searchers (reverse DFA, composite, etc.)

Per-Goroutine (search-time, pooled via sync.Pool)

  • SearchState holds all mutable search state
  • DFACache: flat transition table + state map (2MB default)
  • PikeVMState: dual SlotTable + thread queues + visited set
  • BacktrackerState: visited array + generation counter

Memory Budget (Kostya LangArena, 13 patterns, 7MB log)

Component v0.12.18 v0.12.19
Total alloc (FindAll) 89 MB 25 MB
RSS 353 MB 41 MB
FindAllSubmatch (5 pat, 50K matches) 554 MB 26 MB

Thread Safety

                    ┌──────────────┐
                    │   Engine     │ ← Immutable after compile
                    │  (shared)    │
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              ↓            ↓            ↓
        ┌────────────┐ ┌────────────┐ ┌────────────┐
        │ SearchState│ │ SearchState│ │ SearchState│ ← Per-goroutine
        │(goroutine1)│ │(goroutine2)│ │(goroutine3)│    (sync.Pool)
        └────────────┘ └────────────┘ └────────────┘

Key Design Decisions

  1. Multi-engine: Strategy selection at compile time, not runtime
  2. Rust reference: Architecture mirrors Rust regex crate (lazy DFA, PikeVM, prefilters)
  3. Go stdlib compat: POSIX leftmost-longest semantics (differs from Rust leftmost-first)
  4. Zero-alloc hot paths: IsMatch(), FindIndices(), Count() — no heap allocation
  5. SIMD first: AVX2/SSSE3 prefilters for x86_64, pure Go fallback for other archs

References

  • Rust regex crate — primary architecture reference
  • RE2 — O(n) performance guarantees
  • Hyperscan — SIMD multi-pattern (Teddy algorithm)