Skip to content

investigate outliers in Rebar regex benchmarks #96413

Open
@danmoseley

Description

@danmoseley

The Rebar regex benchmarks have been updated to 8.0. The good news is that we are now in aggregate faster than PCRE on this benchmark, apparently, and faster than Rust regex on a few individual benchmarks. However there are a few outliers where .NET are significantly behind Rust regex, which is one of the fastest available.

See
BurntSushi/rebar#10 (comment)
BurntSushi/rebar#10 (comment)
these include the command line to reproduce the numbers -- and the repo itself is very well documented and has powerful scripts.

Task here is to investigate any of the biggest outliers of which these are the largest

benchmark                                       dotnet/compiled        rust/regex
---------                                       ---------------        ----------
curated/03-date/ascii                           1162.4 KB/s (139.40x)  158.2 MB/s (1.00x)
curated/03-date/unicode                         1167.8 KB/s (137.05x)  156.3 MB/s (1.00x)
curated/12-dictionary/single                    1436.7 KB/s (507.60x)  712.2 MB/s (1.00x)

and determine whether there is an optimization we are missing that would help close the gap.

It is OK to look at the code in https://docs.rs/crate/regex/latest/source/LICENSE-MIT.

NOTE -- this benchmark does not support source generated regex: these are from compiled. However, almost all optimizations in one apply to the other.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions