Skip to content

Commit a78b11f

Browse files
authored
Performance and extract work (#30)
* Add support for `extract` in pattern definitions, allowing the `extract` in `%{name:alias:extract}` to be retrieved from the pattern. * Add `Pattern::pattern()` to `Matches` to get the pattern that was used to match this `Matches` instance. * Reduced the scope of inline pattern definitions to the current pattern only (before 2.2.0 they were added globally, in 2.2.0 they were available to all nested patterns, and in 2.3.0 they are only available to the current pattern). * `Grok::compile()` is now `&self` instead of `&mut self`. * `Grok::with_default_patterns()` now uses `Cow` for the built-in patterns and allocates significantly less per `Grok` instance (as well as being much faster). * Built-in patterns are available individually for reference in the new `patterns` module.
1 parent 6a443cb commit a78b11f

File tree

15 files changed

+496
-128
lines changed

15 files changed

+496
-128
lines changed

CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,23 @@ All user visible changes to this project will be documented in this file.
44
This project adheres to [Semantic Versioning](http://semver.org/), as described
55
for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/text/1105-api-evolution.md)
66

7+
## 2.3.0 - 2025-07-05
8+
9+
* Add support for `extract` in pattern definitions, allowing the `extract` in
10+
`%{name:alias:extract}` to be retrieved from the pattern.
11+
* Add `Pattern::pattern()` to `Matches` to get the pattern that was used to
12+
match this `Matches` instance.
13+
* Reduced the scope of inline pattern definitions to the current pattern only
14+
(before 2.2.0 they were added globally, in 2.2.0 they were available to all
15+
nested patterns, and in 2.3.0 they are only available to the current
16+
pattern).
17+
* `Grok::compile()` is now `&self` instead of `&mut self`.
18+
* `Grok::with_default_patterns()` now uses `Cow` for the built-in patterns and
19+
allocates significantly less per `Grok` instance (as well as being much
20+
faster).
21+
* Built-in patterns are available individually for reference in the new
22+
`patterns` module.
23+
724
## 2.2.0 - 2025-07-04
825

926
* Rewrote the pattern parsing to avoid using regular expressions, making all
@@ -18,6 +35,8 @@ for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/
1835
* (breaking) `Matches::len()` was removed as it was previously reporting the
1936
pattern name count. Use `Matches::iter().count()` instead.
2037
* (breaking) `Matches::is_empty()` was removed. Use `Matches::iter().count() == 0` instead.
38+
* (breaking) Inline pattern definitions are no longer added to the global
39+
pattern list.
2140

2241
## 2.1.0 - 2025-05-29
2342

Cargo.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "grok"
3-
version = "2.2.0"
3+
version = "2.3.0"
44
authors = ["Matt Mastracci <[email protected]>", "Michael Nitschinger <[email protected]>"]
55
license = "Apache-2.0"
66
readme = "README.md"
@@ -55,3 +55,7 @@ harness = false
5555
[[bench]]
5656
name = "simple"
5757
harness = false
58+
59+
[[bench]]
60+
name = "pattern"
61+
harness = false

README.md

Lines changed: 58 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,28 @@
11
grok
22
====
33

4-
The `grok` library allows you to quickly parse and match potentially unstructured data into a structed result. It is especially helpful when parsing logfiles of all kinds. This [Rust](http://rust-lang.org) version is mainly a port from the [java version](https://github.com/thekrakken/java-grok) which in turn drew inspiration from the original [ruby version](https://github.com/logstash-plugins/logstash-filter-grok).
5-
64
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
75
[![Latest Version](https://img.shields.io/crates/v/grok.svg)](https://crates.io/crates/grok)
86
[![Documentation](https://docs.rs/grok/badge.svg)](https://docs.rs/grok)
97
![Continuous Integration](https://github.com/mmastrac/grok/actions/workflows/ci.yml/badge.svg?branch=main)
108

9+
The `grok` library allows you to quickly parse and match potentially
10+
unstructured data into a structed result. It is especially helpful when parsing
11+
logfiles of all kinds. This [Rust](http://rust-lang.org) version is mainly a
12+
port from the [Java version](https://github.com/thekrakken/java-grok) which in
13+
turn drew inspiration from the original [Ruby
14+
version](https://github.com/logstash-plugins/logstash-filter-grok).
15+
1116
## Usage
1217
Add this to your `Cargo.toml`:
1318

1419
```toml
1520
[dependencies]
16-
grok = "2.0"
21+
grok = "2.3"
1722
```
1823

19-
Here is a simple example which stores a pattern, compiles it and then matches a line on it:
24+
Here is a simple example which stores a pattern, compiles it and then matches a
25+
line on it:
2026

2127
```rust
2228
use grok::Grok;
@@ -41,11 +47,50 @@ fn main() {
4147
}
4248
```
4349

44-
Note that compiling the pattern is an expensive operation, so very similar to plain regex handling the `compile`
45-
operation should be performed once and then the `match_against` method on the pattern can be called repeatedly
46-
in a loop or iterator. The returned pattern is not bound to the lifetime of the original grok instance so it can
47-
be passed freely around. For performance reasons the `Match` returned is bound to the pattern lifetime so keep
48-
them close together or clone/copy out the containing results as needed.
50+
Note that compiling the pattern is an expensive operation, so very similar to
51+
plain regex handling the `compile` operation should be performed once and then
52+
the `match_against` method on the pattern can be called repeatedly in a loop or
53+
iterator. The returned pattern is not bound to the lifetime of the original grok
54+
instance so it can be passed freely around. For performance reasons the `Match`
55+
returned is bound to the pattern lifetime so keep them close together or
56+
clone/copy out the containing results as needed.
57+
58+
## Pattern Syntax
59+
60+
A grok pattern is a standard regular expression string with grok pattern
61+
placeholders embedded in it.
62+
63+
The grok pattern placeholders are of the form
64+
`%{name:alias:extract=definition}`, where `name` is the name of the pattern,
65+
`alias` is the alias of the pattern, `extract` is the extract of the pattern,
66+
and `definition` is the definition of the pattern.
67+
68+
- `name` is the name of the pattern and is required. It may contain any
69+
alphanumeric character, or `_`.
70+
- `alias` is the alias of the pattern and is optional. It may contain any
71+
alphanumeric character, or any of `_-[].`. If extract is provided, `alias` may
72+
be empty.
73+
- `extract` is the extract of the pattern and is optional. It may contain any
74+
alphanumeric character, or any of `_-[].`.
75+
- `definition` is the definition of the pattern and is optional. It may contain
76+
any character other than `{` or `}`.
77+
78+
A literal `%` character may appear in a grok pattern as long as it is not
79+
followed by `{`. You can surround the percent with grouped parentheses
80+
`(%){..}`, a non-capturing group `(?:%){..}`, or use the `\x25` escape
81+
sequence, ie: `\x25{..}`.
82+
83+
For example, to match log messages like so:
84+
85+
```text
86+
2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message
87+
```
88+
89+
... the following pattern could be used:
90+
91+
```text
92+
%{TIMESTAMP_ISO8601:timestamp} \[%{IPV4:ip}:%{WORD:environment}\] %{LOGLEVEL:log_level} %{GREEDYDATA:message}
93+
```
4994

5095
## Further Information
5196

@@ -59,31 +104,31 @@ The default engine is `onig` for compatibility with previous 2.x releases:
59104

60105
```toml
61106
[dependencies]
62-
grok = { version = "2.0", features = ["onig"] }
107+
grok = { version = "2.3", features = ["onig"] }
63108
```
64109

65110
The `pcre2` engine is a more complete Rust regex library supporting
66111
backtracking, JIT compilation and is the fastest engine for most use cases:
67112

68113
```toml
69114
[dependencies]
70-
grok = { version = "2.0", default-features = false, features = ["pcre2"] }
115+
grok = { version = "2.3", default-features = false, features = ["pcre2"] }
71116
```
72117

73118
The `fancy-regex` engine is a more complete Rust regex library supporting
74119
backtracking:
75120

76121
```toml
77122
[dependencies]
78-
grok = { version = "2.0", default-features = false, features = ["fancy-regex"] }
123+
grok = { version = "2.3", default-features = false, features = ["fancy-regex"] }
79124
```
80125

81126
The `regex` engine is supported, but it does not support backtracking, so many
82127
patterns are unusable. This is not recommended for most use cases:
83128

84129
```toml
85130
[dependencies]
86-
grok = { version = "2.0", default-features = false, features = ["regex"] }
131+
grok = { version = "2.3", default-features = false, features = ["regex"] }
87132
```
88133

89134
## License

benches/apache.rs

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ fn main() {
1111
fn r#match(b: divan::Bencher) {
1212
let msg = r#"220.181.108.96 - - [13/Jun/2015:21:14:28 +0000] "GET /blog/geekery/xvfb-firefox.html HTTP/1.1" 200 10975 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)""#;
1313

14-
let mut grok = Grok::default();
14+
let grok = Grok::default();
1515
let pattern = grok.compile(r#"%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} %{NUMBER:bytes} %{QS:referrer} %{QS:agent}"#, false)
1616
.expect("Error while compiling!");
1717

@@ -26,7 +26,7 @@ fn r#match(b: divan::Bencher) {
2626
fn no_match_start(b: divan::Bencher) {
2727
let msg = r#"tash-scale11x/css/fonts/Roboto-Regular.ttf HTTP/1.1" 200 41820 "http://semicomplete.com/presentations/logs"#;
2828

29-
let mut grok = Grok::default();
29+
let grok = Grok::default();
3030
let pattern = grok.compile(r#"%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} %{NUMBER:bytes} %{QS:referrer} %{QS:agent}"#, false)
3131
.expect("Error while compiling!");
3232

@@ -41,7 +41,7 @@ fn no_match_start(b: divan::Bencher) {
4141
fn no_match_middle(b: divan::Bencher) {
4242
let msg = r#"220.181.108.96 - - [13/Jun/2015:21:14:28 +0000] "111 /blog/geekery/xvfb-firefox.html HTTP/1.1" 200 10975 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)""#;
4343

44-
let mut grok = Grok::default();
44+
let grok = Grok::default();
4545
let pattern = grok.compile(r#"%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} %{NUMBER:bytes} %{QS:referrer} %{QS:agent}"#, false)
4646
.expect("Error while compiling!");
4747

@@ -56,7 +56,7 @@ fn no_match_middle(b: divan::Bencher) {
5656
fn no_match_end(b: divan::Bencher) {
5757
let msg = r#"220.181.108.96 - - [13/Jun/2015:21:14:28 +0000] "GET /blog/geekery/xvfb-firefox.html HTTP/1.1" 200 10975 "-" 1"#;
5858

59-
let mut grok = Grok::default();
59+
let grok = Grok::default();
6060
let pattern = grok.compile(r#"%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} %{NUMBER:bytes} %{QS:referrer} %{QS:agent}"#, false)
6161
.expect("Error while compiling!");
6262

@@ -71,7 +71,7 @@ fn no_match_end(b: divan::Bencher) {
7171
fn match_anchor(b: divan::Bencher) {
7272
let msg = r#"220.181.108.96 - - [13/Jun/2015:21:14:28 +0000] "GET /blog/geekery/xvfb-firefox.html HTTP/1.1" 200 10975 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)""#;
7373

74-
let mut grok = Grok::default();
74+
let grok = Grok::default();
7575
let pattern = grok.compile(r#"^%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} %{NUMBER:bytes} %{QS:referrer} %{QS:agent}$"#, false)
7676
.expect("Error while compiling!");
7777

@@ -86,7 +86,7 @@ fn match_anchor(b: divan::Bencher) {
8686
fn no_match_start_anchor(b: divan::Bencher) {
8787
let msg = r#"tash-scale11x/css/fonts/Roboto-Regular.ttf HTTP/1.1" 200 41820 "http://semicomplete.com/presentations/logs"#;
8888

89-
let mut grok = Grok::default();
89+
let grok = Grok::default();
9090
let pattern = grok.compile(r#"^%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} %{NUMBER:bytes} %{QS:referrer} %{QS:agent}$"#, false)
9191
.expect("Error while compiling!");
9292

@@ -101,7 +101,7 @@ fn no_match_start_anchor(b: divan::Bencher) {
101101
fn no_match_middle_anchor(b: divan::Bencher) {
102102
let msg = r#"220.181.108.96 - - [13/Jun/2015:21:14:28 +0000] "111 /blog/geekery/xvfb-firefox.html HTTP/1.1" 200 10975 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)""#;
103103

104-
let mut grok = Grok::default();
104+
let grok = Grok::default();
105105
let pattern = grok.compile(r#"^%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} %{NUMBER:bytes} %{QS:referrer} %{QS:agent}$"#, false)
106106
.expect("Error while compiling!");
107107

@@ -116,7 +116,7 @@ fn no_match_middle_anchor(b: divan::Bencher) {
116116
fn no_match_end_anchor(b: divan::Bencher) {
117117
let msg = r#"220.181.108.96 - - [13/Jun/2015:21:14:28 +0000] "GET /blog/geekery/xvfb-firefox.html HTTP/1.1" 200 10975 "-" 1"#;
118118

119-
let mut grok = Grok::default();
119+
let grok = Grok::default();
120120
let pattern = grok.compile(r#"^%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} %{NUMBER:bytes} %{QS:referrer} %{QS:agent}$"#, false)
121121
.expect("Error while compiling!");
122122

benches/log.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ fn main() {
1111
fn r#match(b: divan::Bencher) {
1212
let msg = "2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message";
1313

14-
let mut grok = Grok::default();
14+
let grok = Grok::default();
1515
let pattern = grok.compile(r"%{TIMESTAMP_ISO8601:timestamp} \[%{IPV4:ip}:%{WORD:environment}\] %{LOGLEVEL:log_level} %{GREEDYDATA:message}", false)
1616
.expect("Error while compiling!");
1717

@@ -26,7 +26,7 @@ fn r#match(b: divan::Bencher) {
2626
fn no_match(b: divan::Bencher) {
2727
let msg = "2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message";
2828

29-
let mut grok = Grok::default();
29+
let grok = Grok::default();
3030
let pattern = grok.compile(r"%{TIMESTAMP_ISO8601:timestamp} \[%{IPV4:ip};%{WORD:environment}\] %{LOGLEVEL:log_level} %{GREEDYDATA:message}", false)
3131
.expect("Error while compiling!");
3232

@@ -41,7 +41,7 @@ fn no_match(b: divan::Bencher) {
4141
fn match_anchor(b: divan::Bencher) {
4242
let msg = "2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message";
4343

44-
let mut grok = Grok::default();
44+
let grok = Grok::default();
4545
let pattern = grok.compile(r"^%{TIMESTAMP_ISO8601:timestamp} \[%{IPV4:ip}:%{WORD:environment}\] %{LOGLEVEL:log_level} %{GREEDYDATA:message}$", false)
4646
.expect("Error while compiling!");
4747

@@ -56,7 +56,7 @@ fn match_anchor(b: divan::Bencher) {
5656
fn no_match_anchor(b: divan::Bencher) {
5757
let msg = "2016-09-19T18:19:00 [8.8.8.8;prd] DEBUG this is an example log message";
5858

59-
let mut grok = Grok::default();
59+
let grok = Grok::default();
6060
let pattern = grok.compile(r"^%{TIMESTAMP_ISO8601:timestamp} \[%{IPV4:ip}:%{WORD:environment}\] %{LOGLEVEL:log_level} %{GREEDYDATA:message}$", false)
6161
.expect("Error while compiling!");
6262

benches/pattern.rs

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#![allow(clippy::incompatible_msrv)]
2+
// ^need 1.66 for `black_box`
3+
4+
use grok::Grok;
5+
6+
fn main() {
7+
divan::main();
8+
}
9+
10+
#[divan::bench]
11+
fn create_with_default_patterns(b: divan::Bencher) {
12+
let grok = Grok::with_default_patterns();
13+
divan::black_box(grok);
14+
b.bench(|| {
15+
let grok = Grok::with_default_patterns();
16+
divan::black_box(grok);
17+
});
18+
}
19+
20+
#[divan::bench]
21+
fn parse_complex_pattern(b: divan::Bencher) {
22+
let grok = Grok::with_default_patterns();
23+
b.bench(|| {
24+
let pattern = grok.compile("%{BACULA_LOGLINE}", false).unwrap();
25+
divan::black_box(pattern);
26+
});
27+
}
28+
29+
#[divan::bench]
30+
fn parse_complex_pattern_alias_only(b: divan::Bencher) {
31+
let grok = Grok::with_default_patterns();
32+
b.bench(|| {
33+
let pattern = grok.compile("%{BACULA_LOGLINE}", true).unwrap();
34+
divan::black_box(pattern);
35+
});
36+
}

0 commit comments

Comments
 (0)