Skip to content

Commit 60646bd

Browse files
committed
Implement zero-copy tokenization for Word, SingleQuotedString, and Whitespace
Convert token string fields to use Cow<'a, str> to enable zero-copy tokenization for commonly used tokens: - Word.value: Regular identifiers and keywords now borrow from source - SingleQuotedString: String literals borrow when no escape processing needed - Whitespace: Single-line and multi-line comments borrow from source Also add benchmark for measuring tokenization performance
1 parent 0f17b32 commit 60646bd

File tree

12 files changed

+1258
-217
lines changed

12 files changed

+1258
-217
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,7 @@ Cargo.lock
1818

1919
*.swp
2020

21-
.DS_store
21+
.DS_store
22+
23+
# dhat profiler output files
24+
dhat*.json

Cargo.toml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ visitor = ["sqlparser_derive"]
4848
bigdecimal = { version = "0.4.1", features = ["serde"], optional = true }
4949
log = "0.4"
5050
recursive = { version = "0.1.1", optional = true}
51+
unicase = "2.7"
5152

5253
serde = { version = "1.0", default-features = false, features = ["derive", "alloc"], optional = true }
5354
# serde_json is only used in examples/cli, but we have to put it outside
@@ -60,7 +61,12 @@ sqlparser_derive = { version = "0.4.0", path = "derive", optional = true }
6061
simple_logger = "5.0"
6162
matches = "0.1"
6263
pretty_assertions = "1"
64+
sysinfo = "0.30"
65+
dhat = "0.3.3"
66+
criterion = "0.5"
6367

6468
[package.metadata.docs.rs]
6569
# Document these features on docs.rs
66-
features = ["serde", "visitor"]
70+
features = ["serde", "visitor"]
71+
72+

profile.json.gz

5.62 KB
Binary file not shown.

sqlparser_bench/Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,7 @@ criterion = "0.7"
3131
[[bench]]
3232
name = "sqlparser_bench"
3333
harness = false
34+
35+
[[bench]]
36+
name = "tokenize_bench"
37+
harness = false

0 commit comments

Comments
 (0)