-
Notifications
You must be signed in to change notification settings - Fork 507
Description
Hi all, this issue is an invitation for discussion of the following issue and possible solution:
Problem and motivation
I recently discovered some of our PHP deployments using this package takes 170-200ms to parse a user agent. This made me think myself that it was not reasonable this parsing took the equivalent time as 50 simple database queries including roundtrip?
I took the freedom and created some phpbench benchmark for this project, and ran every default client parser through it. Each run was with opcache enabled, and i tested both with and without the default static cache. Each benchmark had 10 warmup runs and 100 repetitions.
As it can be seen in the table below, on a cold cache, a lot of parsers take multiple miliseconds, with the Bot parser even above 40ms. This is a huge deal, especially for non-long-running workers (php-fpm, apache) where the StaticCache cannot be re-utilized between requests.
Possible solution: Inlining compiled datastructures in the PHP files
I then selected one of the simpler parsers for optimization: FeedReader
I quickly realized 2 things:
- Parsing YAML and inverting the data structure everytime is expensive - I suppose this is why caching has been implemented, which reduced the time taken from 1071.0us to 5.1us.
- Even in best case scenario with a hot cache, a good chunk (relatively) of processing time is spent in the cache logic.
I then made a FastFeedReader proof-of-concept class which inlined the parsed and compiled yaml in the class, and removed the caching logic. This simple change reduced the cold-cache runtime of the FeedReader from 1071.0us to ~3us and the warm-cache runtime from 5.1us to the same ~3us. This result can also be seen in the table below. This is a significant 300x+ speedup on a cold cache.
All code and tests can be found here: https://github.com/matomo-org/device-detector/compare/master...LauJosefsen:device-detector:test-inline-regex?expand=1
I also did the same test for the slowest parser, the Bot parser, and it went from 41,623us to 44us (almost 1000x) on a cold cache, and from 47us to 44us on a warm cache. I did not submit the code for this test, as it includes an almost 4000 line long inlined array.
Through this issue I want to open a discussion about whether it would make sense to inline these definitions. This could even be done without any restructure in how the client definitions is maintained today, by utilizing a script that automatically converts the yaml definitions to the precalculated inline definitions similar to what I've done in FastFeedReader.
+------+---------------------+-----------------------+--------------------------------------------------+------+------------+--------------+--------------+----------------+
| iter | benchmark | subject | set | revs | mem_peak | time_avg | comp_z_value | comp_deviation |
+------+---------------------+-----------------------+--------------------------------------------------+------+------------+--------------+--------------+----------------+
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Client\FeedReader | 100 | 990,832b | 1,070.970μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Client\FastFeedReader | 100 | 990,832b | 2.880μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Client\MobileApp | 100 | 1,053,880b | 18,128.460μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Client\MediaPlayer | 100 | 990,832b | 1,234.480μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Client\PIM | 100 | 990,832b | 1,062.250μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Client\Browser | 100 | 1,255,672b | 26,825.500μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Client\Library | 100 | 990,832b | 4,763.100μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Device\HbbTv | 100 | 1,145,664b | 2.400μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Device\ShellTv | 100 | 1,145,664b | 2.370μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Device\Notebook | 100 | 1,146,104b | 2.150μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Device\Console | 100 | 1,146,184b | 588.540μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Device\CarBrowser | 100 | 1,146,200b | 306.030μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Device\Camera | 100 | 1,146,184b | 193.550μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Device\PortableMediaPlayer | 100 | 1,146,280b | 983.970μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Device\Mobile | 100 | 1,146,184b | 4.060μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsers | DeviceDetector\Parser\Bot | 100 | 1,677,032b | 41,625.440μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\FeedReader | 100 | 990,880b | 5.150μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\FastFeedReader | 100 | 990,880b | 3.190μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\MobileApp | 100 | 1,053,928b | 50.030μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\MediaPlayer | 100 | 990,880b | 6.120μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\PIM | 100 | 990,880b | 4.810μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\Browser | 100 | 1,255,608b | 728.660μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Client\Library | 100 | 990,880b | 8.740μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\HbbTv | 100 | 1,145,712b | 2.300μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\ShellTv | 100 | 1,145,712b | 2.650μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\Notebook | 100 | 1,146,152b | 2.050μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\Console | 100 | 1,146,232b | 4.230μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\CarBrowser | 100 | 1,146,248b | 4.600μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\Camera | 100 | 1,146,232b | 4.420μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\PortableMediaPlayer | 100 | 1,146,328b | 4.520μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Device\Mobile | 100 | 1,146,232b | 3.970μs | +0.00σ | +0.00% |
| 0 | DeviceDetectorBench | benchParsersWithCache | DeviceDetector\Parser\Bot | 100 | 1,677,080b | 47.000μs | +0.00σ | +0.00% |
+------+---------------------+-----------------------+--------------------------------------------------+------+------------+--------------+--------------+----------------+