Release 0.15.1 · dathere/qsv-dateparser

Performance: parse-dispatch optimization

Speeds up Parse::parse on its hot path — the failed parse attempts that dominate qsv stats --infer-dates on non-date columns, where every value previously ran the full regex is_match chain before failing. Four behavior-preserving changes:

Structural byte pre-filter (cannot_be_date, backed by a 256-entry DATE_BYTE lookup table). Any input containing a byte that cannot appear in an accepted format (_, #, non-ASCII, etc.) returns Err immediately, skipping the regex chain.
Dispatch reorder — the cheap regex-gated families run first; the two parsers without a family regex gate (unix_timestamp, rfc2822) move last. Result-preserving: floats match no family gate (so still reach unix_timestamp), and rfc2822 inputs always carry a timezone that the $-anchored month_dmy_* regexes reject — while month_dmy_* only succeeds without a timezone, which makes rfc2822 fail. The two are mutually exclusive.
unix_timestamp first-byte gate — bail before fast_float2::parse unless the first byte is one of digit + - . i I n N (the only leads fast_float2 accepts; it rejects leading whitespace).
rfc2822 colon gate — every RFC 2822 datetime has a time-of-day, so colon-free inputs can't be rfc2822; skip parse_from_rfc2822 for them.

Benchmarks (M4 Max, release, 1000 values/iter)

path	0.15.0	0.15.1	change
non-date strings, `parse_failures` (`category_value_N`)	125 µs	60 µs	−52%
genuine ISO datetimes, `parse_throughput`	398 µs	370 µs	−7%
non-date words, `parse_word_failures`	133 µs	127 µs	−5%
accepted-formats mix, `parse_all`	8.2 µs	7.9 µs	−4%

Correctness

No public API change and byte-identical parse results — verified three ways:

All 26 unit tests + 6 doctests pass (doctests cover the full accepted-formats and DMY lists). New prefilter_rejects_non_date_strings regression test pins the pre-filter (rejects junk; still accepts every separator a real date uses, plus the bare inf/nan the unix-timestamp path accepts).
Integration: qsv stats output is byte-identical before/after on a mixed-format dataset, and qsv's full cargo test stats -F all_features suite passes (752 passed, 0 failed) against this release.

Why it matters for qsv

qsv stats --infer-dates (and, via the stats cache, frequency, schema, tojsonl, sqlp, joinp, pivotp, describegpt) parse dates through this crate. Genuine date-typed columns infer ~3% faster at the qsv level; non-date columns are unaffected. (Integration profiling showed the larger remaining --infer-dates overhead on non-date-heavy data lives in qsv's own sniff step, not this crate — tracked for follow-up on the qsv side.)

Dev-only

Added parse_failures and parse_word_failures benches for the failure hot paths.
Updated parse-order docs in CLAUDE.md.
A RegexSet single-pass classification idea was prototyped and rejected — measured ~1.6–2.1× slower on the common early-gate date path (no early exit; per-is_match setup overhead, not scanning, dominates).

Full diff

0.15.0...0.15.1 — PR #9.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.15.1

Choose a tag to compare

Sorry, something went wrong.