Skip to content

Inlining compiled parser regex in PHP for 300x+ cold-cache performance increase #8078

@LauJosefsen

Description

@LauJosefsen

Hi all, this issue is an invitation for discussion of the following issue and possible solution:

Problem and motivation

I recently discovered some of our PHP deployments using this package takes 170-200ms to parse a user agent. This made me think myself that it was not reasonable this parsing took the equivalent time as 50 simple database queries including roundtrip?

I took the freedom and created some phpbench benchmark for this project, and ran every default client parser through it. Each run was with opcache enabled, and i tested both with and without the default static cache. Each benchmark had 10 warmup runs and 100 repetitions.

As it can be seen in the table below, on a cold cache, a lot of parsers take multiple miliseconds, with the Bot parser even above 40ms. This is a huge deal, especially for non-long-running workers (php-fpm, apache) where the StaticCache cannot be re-utilized between requests.

Possible solution: Inlining compiled datastructures in the PHP files

I then selected one of the simpler parsers for optimization: FeedReader

I quickly realized 2 things:

  1. Parsing YAML and inverting the data structure everytime is expensive - I suppose this is why caching has been implemented, which reduced the time taken from 1071.0us to 5.1us.
  2. Even in best case scenario with a hot cache, a good chunk (relatively) of processing time is spent in the cache logic.

I then made a FastFeedReader proof-of-concept class which inlined the parsed and compiled yaml in the class, and removed the caching logic. This simple change reduced the cold-cache runtime of the FeedReader from 1071.0us to ~3us and the warm-cache runtime from 5.1us to the same ~3us. This result can also be seen in the table below. This is a significant 300x+ speedup on a cold cache.

All code and tests can be found here: https://github.com/matomo-org/device-detector/compare/master...LauJosefsen:device-detector:test-inline-regex?expand=1

I also did the same test for the slowest parser, the Bot parser, and it went from 41,623us to 44us (almost 1000x) on a cold cache, and from 47us to 44us on a warm cache. I did not submit the code for this test, as it includes an almost 4000 line long inlined array.

Through this issue I want to open a discussion about whether it would make sense to inline these definitions. This could even be done without any restructure in how the client definitions is maintained today, by utilizing a script that automatically converts the yaml definitions to the precalculated inline definitions similar to what I've done in FastFeedReader.

+------+---------------------+-----------------------+--------------------------------------------------+------+------------+--------------+--------------+----------------+
| iter | benchmark           | subject               | set                                              | revs | mem_peak   | time_avg     | comp_z_value | comp_deviation |
+------+---------------------+-----------------------+--------------------------------------------------+------+------------+--------------+--------------+----------------+
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Client\FeedReader          | 100  | 990,832b   | 1,070.970μs  | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Client\FastFeedReader      | 100  | 990,832b   | 2.880μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Client\MobileApp           | 100  | 1,053,880b | 18,128.460μs | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Client\MediaPlayer         | 100  | 990,832b   | 1,234.480μs  | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Client\PIM                 | 100  | 990,832b   | 1,062.250μs  | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Client\Browser             | 100  | 1,255,672b | 26,825.500μs | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Client\Library             | 100  | 990,832b   | 4,763.100μs  | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Device\HbbTv               | 100  | 1,145,664b | 2.400μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Device\ShellTv             | 100  | 1,145,664b | 2.370μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Device\Notebook            | 100  | 1,146,104b | 2.150μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Device\Console             | 100  | 1,146,184b | 588.540μs    | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Device\CarBrowser          | 100  | 1,146,200b | 306.030μs    | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Device\Camera              | 100  | 1,146,184b | 193.550μs    | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Device\PortableMediaPlayer | 100  | 1,146,280b | 983.970μs    | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Device\Mobile              | 100  | 1,146,184b | 4.060μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsers          | DeviceDetector\Parser\Bot                        | 100  | 1,677,032b | 41,625.440μs | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\FeedReader          | 100  | 990,880b   | 5.150μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\FastFeedReader      | 100  | 990,880b   | 3.190μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\MobileApp           | 100  | 1,053,928b | 50.030μs     | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\MediaPlayer         | 100  | 990,880b   | 6.120μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\PIM                 | 100  | 990,880b   | 4.810μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\Browser             | 100  | 1,255,608b | 728.660μs    | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\Library             | 100  | 990,880b   | 8.740μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\HbbTv               | 100  | 1,145,712b | 2.300μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\ShellTv             | 100  | 1,145,712b | 2.650μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\Notebook            | 100  | 1,146,152b | 2.050μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\Console             | 100  | 1,146,232b | 4.230μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\CarBrowser          | 100  | 1,146,248b | 4.600μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\Camera              | 100  | 1,146,232b | 4.420μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\PortableMediaPlayer | 100  | 1,146,328b | 4.520μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\Mobile              | 100  | 1,146,232b | 3.970μs      | +0.00σ       | +0.00%         |
| 0    | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Bot                        | 100  | 1,677,080b | 47.000μs     | +0.00σ       | +0.00%         |
+------+---------------------+-----------------------+--------------------------------------------------+------+------------+--------------+--------------+----------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions