Skip to content

Commit 862a8bb

Browse files
chore: update pattern detection docs
Signed-off-by: Matthew Grigsby <[email protected]>
1 parent ed2f4c4 commit 862a8bb

File tree

2 files changed

+18
-46
lines changed

2 files changed

+18
-46
lines changed

docs/reference/llm/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ The **LLM (Large Language Model) Module** provides interfaces and implementation
1111

1212
### LLM Adapters
1313
Adapters to support multiple LLM providers:
14+
1415
- **`BaseVendorAdapter`** - Abstract base class for all LLM vendor adapters.
1516
- **`AnthropicAdapter`** - Adapter for Anthropic’s Claude models.
1617
- **`MistralAdapter`** - Adapter for Mistral AI models.

docs/reference/llm/pattern_detection/index.md

+17-46
Original file line numberDiff line numberDiff line change
@@ -8,43 +8,11 @@ The **Pattern Detection Module** provides utilities for detecting predefined pat
88

99
## Core Components
1010

11-
<div style="display: flex; margin-bottom: 20px;">
12-
<div style="flex: 1; padding-right: 15px;">
13-
14-
### Automaton Classes
15-
16-
* **`AhoCorasickAutomaton`**
17-
✓ Trie-based pattern matching engine
18-
✓ Linear-time complexity
19-
✓ Simultaneous multi-pattern search
20-
21-
* **`AhoCorasickAutomatonNormalized`**
22-
✓ Whitespace-insensitive matching
23-
✓ Pattern normalization
24-
✓ Original-to-normalized index mapping
25-
26-
</div>
27-
<div style="flex: 1; padding-left: 15px;">
28-
29-
### Processor Classes
30-
31-
* **`BaseBufferedProcessor`**
32-
✓ Abstract base for text streaming
33-
✓ Buffer management
34-
✓ Chunk-based processing
35-
36-
* **`AhoCorasickBufferedProcessor`**
37-
✓ Exact pattern matching
38-
✓ YAML-configurable patterns
39-
✓ Streaming-ready implementation
40-
41-
* **`AhoCorasickBufferedProcessorNormalized`**
42-
✓ Whitespace-invariant detection
43-
✓ Flexible text matching
44-
✓ Preserves original text positions
45-
46-
</div>
47-
</div>
11+
| Automaton Classes | Processor Classes |
12+
|-------------------|-------------------|
13+
| **`AhoCorasickAutomaton`**<br>✓ Trie-based pattern matching engine<br>✓ Linear-time complexity<br>✓ Simultaneous multi-pattern search | **`BaseBufferedProcessor`**<br>✓ Abstract base for text streaming<br>✓ Buffer management<br>✓ Chunk-based processing |
14+
| **`AhoCorasickAutomatonNormalized`**<br>✓ Whitespace-insensitive matching<br>✓ Pattern normalization<br>✓ Original-to-normalized index mapping | **`AhoCorasickBufferedProcessor`**<br>✓ Exact pattern matching<br>✓ YAML-configurable patterns<br>✓ Streaming-ready implementation |
15+
| | **`AhoCorasickBufferedProcessorNormalized`**<br>✓ Whitespace-invariant detection<br>✓ Flexible text matching<br>✓ Preserves original text positions |
4816

4917
### Utility Functions
5018

@@ -58,14 +26,14 @@ The **Pattern Detection Module** provides utilities for detecting predefined pat
5826
The module processes text in two primary stages:
5927

6028
1. **Pattern Preprocessing**
61-
- Patterns are loaded from YAML configuration
62-
- The Aho-Corasick automaton is constructed from patterns
63-
- Failure links connect states for efficient pattern transitions
29+
- Patterns are loaded from YAML configuration
30+
- The Aho-Corasick automaton is constructed from patterns
31+
- Failure links connect states for efficient pattern transitions
6432

6533
2. **Buffered Text Processing**
66-
- Text is processed in manageable chunks
67-
- Partial matches at chunk boundaries are preserved
68-
- Match information includes pattern name and position
34+
- Text is processed in manageable chunks
35+
- Partial matches at chunk boundaries are preserved
36+
- Match information includes pattern name and position
6937

7038
### Text Normalization Pipeline
7139

@@ -81,16 +49,19 @@ The module processes text in two primary stages:
8149

8250
## Performance Considerations
8351

84-
* **Time Complexity**: O(n + m + k) where:
52+
**Time Complexity** O(n + m + k) where:
53+
8554
* n = length of input text
8655
* m = total length of all patterns
8756
* k = number of pattern occurrences
8857

89-
* **Space Efficiency**:
58+
**Space Efficiency**:
59+
9060
* Buffered processing minimizes memory usage
9161
* Suitable for streaming applications with unbounded input
9262

93-
* **Flexibility vs. Performance**:
63+
**Flexibility vs. Performance**:
64+
9465
* Standard processors offer exact matching with minimal overhead
9566
* Normalized processors provide flexibility with slight computational cost
9667

0 commit comments

Comments
 (0)