-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up new router implementation #279
base: main
Are you sure you want to change the base?
Conversation
This makes sense and was probably foreseeable. The previous version created
the MM matchers ahead of time. I guess with a forking server, this makes
the most sense: longer start-up, but faster operation.
…On Tue, Dec 31, 2024, 6:04 PM Kyle Plump ***@***.***> wrote:
related to #278 <#278>
in the new Trie leaf implementation, Mustermann matchers were created at
the time of evaluation, causing a semi-significant slowdown. we have all of
the information to create the matcher when building the trie - so, this fix
creates the matcher at the time of leaf creation (leaving a fallback in
#match), while maintaining the benefits of having leaves as introduced in
2.2, with this PR <#273>
this brings us back to similar performance as hanami-router v2.1, with
the exception of startup (this fix is temporarily called 2.2.1):
rps.png (view on web)
<https://github.com/user-attachments/assets/e4a135d5-53f5-49ca-8be0-acab1d4674d9>
log_rps.png (view on web)
<https://github.com/user-attachments/assets/5a31d0b9-a928-4a32-8103-71fa2db5d4f0>
runtime_with_startup.png (view on web)
<https://github.com/user-attachments/assets/4514f85f-abb1-42da-8d3c-2d72485ce156>
this slowdown on boot is expected - more work (creating leaves /
mustermann) is happening. the way to get around this would likely be a new
structure other than a trie (maybe someday!), or maybe evaluating /
building the trie async (😬)
please let me know if you'd like more test cases (especially looking for
thoughts from @dcr8898 <https://github.com/dcr8898>) . thanks!
------------------------------
You can view, comment on, or merge this pull request online at:
#279
Commit Summary
- 9d0cebd
<9d0cebd>
initialize matcher on leaf init
File Changes
(1 file <https://github.com/hanami/router/pull/279/files>)
- *M* lib/hanami/router/leaf.rb
<https://github.com/hanami/router/pull/279/files#diff-0231862f976007f9290c3585560fd4109a0a674c1d56f418a1de35eb0b50b12d>
(1)
Patch Links:
- https://github.com/hanami/router/pull/279.patch
- https://github.com/hanami/router/pull/279.diff
—
Reply to this email directly, view it on GitHub
<#279>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC7IFGFPTZNTR2GZCTRHDBD2IMPGHAVCNFSM6AAAAABUN74JTWVHI2DSMVQWIX3LMV43ASLTON2WKOZSG43DINZRGY2TEMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sweet! The 10+ second startup time for 10,000 routes is still worrisome to me, is there anyway we could bring that down? Is it possible to postpone creating leaves until we need them, or batch them up somehow? I guess that's what you mean by making it async, but I could be wrong. |
Postponing object creation in leaves is what was causing the problem: it
meant creating a Mustermann matchers each time we had a potential route
match.
Kyle's fix moves that cost to boot up by creating all possible matchers at
boot time instead.
I would note that 2.2 should create many fewer MM matchers compared to 2.1.
2.2 creates one matcher for each complete defined route (that contains a
dynamic element). 2.1 was creating a MM matcher for every dynamic element,
which could mean multiple matchers for a route, depending on the number of
dynamic elements in the route.
I haven't seen how the 10,000 routes are structured in this test. It's
possible they don't reflect real-world scenarios.
It's also possible that the 2.2 router code that constructs the routing
trie could be better optimized for performance. I think it's more or less
idiomatic Ruby code. I don't think there's much there you would call
"clever." But idiomatic Ruby is not necessarily the fastest Ruby. Jeremy's
Polished Ruby book goes into that, but I have much to learn in that area. I
am happy to dig deeper.
I really appreciate Kyle zeroing on the main problem so quickly! 🧐 Kudos
to him!
…On Wed, Jan 1, 2025, 5:47 PM Sean Collins ***@***.***> wrote:
Sweet! The 10+ second startup time for 10,000 routes is still worrisome to
me, is there anyway we could bring that down?
Is it possible to postpone creating leaves until we need them, or batch
them up somehow? I guess that's what you mean by making it async, but I
could be wrong.
—
Reply to this email directly, view it on GitHub
<#279 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC7IFGDLVCT7LEIIBTIIYN32IRV7HAVCNFSM6AAAAABUN74JTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRXGE3TAMZXG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
howdy gentlemen thank you for the thoughts! i think i made some headway here contexti did some poking around to see if i could figure out where the slowdown on boot was, and surprise surprise, it's our friend Mustermann. Damian is correct - in theory we're creating significantly less matchers since its only creating them for defined routes, rather than each individual dynamic segment. (for reference, the original design). so in 2.1, we would have matchers for things like mustyin this section of the mustermann docs we learn two things:
point #2 is the important part. when giving dynamic segments in 2.1, it was very possible we were sharing matchers between routes. a matcher would match i confirmed this by looking at the memory usage of in Sean's original test, memory was similar between proposalroute caching 🤔 i think there could be a world where we pre-compile the Trie on the first boot, and then save to a gitignore'd file on disk. on subsequent loads, we could check if the routes have changed. if they havent, we should have constant boot times. if they have changed, compile the changed routes into a new version of that file. i think this could be an acceptable solution, since long start up times really only starts to take effect when you have a significant number of routes, and i imagine very few people are going to add 10k routes before the first time starting the app. though, the consequence of this would be adding more complexity to the Hanami boot process. if this is a direction people are interested in, i'd love to create a separate issue for tracking that effort lmk what you think! thank you as always |
follow up here: ended up squeezing some more juice out of the 2.2.1 implementation (now named 2.2.12 although idk if this is the correct usage of semver or not 🤷 ) updated graphs (on my machine at least): as you can see memory usage is pretty much the same, this is still because of the issue discussed above. i did some extra benchmarking and realized we were leaving a lot on the table with please let me know if you see any issues with this updated approach! |
Thank you for continuing to look into this! I've been swamped on my end through the holidays. Optimizing Hanami Router 2.2Your switch from RegEx to string splitting is what I meant about finding the most performant implementation for our approach. It's good to see that 2.2.1 is faster than 2.1 when implemented properly. There may be more wins to be had with similar optimizations. Jeremy Evans' book has a lot of tips for this. Using Mustermann will always require a trade-offIn terms of boot time vs response time, I believe this will always be a trade-off as long as we are using Mustermann. I don't know if it would be possible to pre-compile the trie structure somehow. I've never seen that done. You would essentially have to serialize Ruby objects and then de-serialize them at runtime. Not sure if this would be possible or faster. Interesting thought. At present, I consider a longer boot-time for faster run-time in 2.2.1 to be the better trade-off (compared to the opposite in 2.2). Mustermann discussionWith regard to Mustermann, I would bet that Hanami's use of tries allows it to compare favorably with any other framework that also uses Mustermann. This doesn't apply to Roda, since Roda does not seem to use Mustermann. However, it would be hard for Hanami Router to match Roda's speed, because Roda's routing config is essentially a trie structure implemented in code that does not use Mustermann (see the "Faster" section here). Therefore, it's faster both at boot (there is no routing compile step at all) and during run time. (You can read a comprehensive write-up of Roda here.) Hanami uses Mustermann because of the power and flexibility it provides out of the box, and because it's battle tested. With that said, there are other options. Options for moving forwardI recommend that we continue to use Mustermann for now, as improved in this PR, but we should consider other options that could further improve Hanami Router, such as:
Other thoughtsAs a general rule, the more flexible we allow our routes to be, the more overhead we should expect. Part of the reason Roda is fast is because it implements a small set of routing options to start (for example, only Other considerations include developer experience. Hanami Router's configuration of routes is very similar to that of Rails, and (I think) very readable. Roda's presentation is very different, and perhaps not as readable (could just be my lack of familiarity). The fastest possible implementation would have developers specify a routing trie data structure directly in the routes file, but the DX for this would be poor. Trade-offs of some kind will always exist. @kyleplump do you have time to discuss your code changes before we recommend merging them? |
Thanks for finding this and writing up your thoughts @kyleplump @dcr8898! Pre-compiling the trie is an interesting idea but adds too much complexity. It'd be a cool experiment for sure, but it's too novel to add at this point. In terms of versioning proposals, I think it'd be better to refer to them via something like 2.2.0.proposal1 and 2.2.0.proposal2 or Is there some way we could take a hybrid approach here? That is, building Mustermann leaves ahead of time, for the first N routes, then after that just build the matchers lazily. We could use benchmarking to figure out what N should be by default, and allow that N to be configurable. Similarly, could we have pre-building leaves an option? i.e. N=0 is valid and implements (essentially) the previous behavior from 2.1. I don't know if it would be safe, but I could see wanting lazy built matchers in dev and test environments, then build them ahead of time in production for increased performance (at the expense of slower startup time). |
@cllns These are some interesting ideas for sure. My initial feeling is that this would be a case of premature optimization. I don't know how many apps there are out there with 10K routes. I would think a huge app would have 1-2K routes (based on my limited experience). If the graph above is logarithmic, this would imply startup times of 1-2 seconds for such apps. Do you think that would be too long for dev & test environments? My feeling is that we should wait until developers are actually feeling that pain before we introduce more complexity. For example, I would like to speak with @kyleplump more to see if we can squeeze the current memory usage a little more, and maybe the startup time, with further optimizations. If we do explore your suggestions, what would be the gain of compiling some matchers ahead of time and not others? As far as this benchmark is concerned, I presume it attempts to exercise all of the generated routes more or less equally. This is actually one factor that makes the r10k tool a little unrealistic. In a real-world app, there would definitely be some distribution of popular and less-popular routes, but this distribution would not be captured by an arbitrary division of route handling based on a number. What might work in this case is the original 2.2 strategy and some kind of fast, in-memory cache of created Mustermann matchers. I don't know if this is worthwhile, but it would provide a compromise approach. (EDIT: I have no idea how this would work in a concurrent environment. Maybe it is best to compile the routes first, and then potentially freeze the router. 🤷 ) The second part of your suggestion, handling dev and test environments differently than production, could be implemented pretty easily (I think 🤔 ), but I still wouldn't recommend introducing that additional complexity until actual developer experience starts to suffer. |
i appreciate you both thinking about this so deeply! i love hearing your perspectives - trying to keep up with you two is helping me learn a ton :) i like this idea of pre-generating them all on production builds. is there a world where we keep it simple, at first, and just lazy load the matchers when running regardless - i'd love to keep poking around and seeing if we can be more 'clever' with our optimizations and make this router blazingly fast (side note: do people still say this?). be happy to bounce ideas off of anyone, but def @dcr8898 if youre volunteering on versioning: once we iron out a game plan, i can make the modifications, close this PR and create a formal 'proposal' laying out changes made if that would be better? 2.2 fixes the bug, we're now just talking about optimization (which is the name of the game for a project like this). point being: if we bundle it all up nicely it can get merged whenever. thanks again! |
@kyleplump I'm in favor of polishing this PR up and merging it to fix the problem. Then we can experiment with further changes at our leisure. Let me know when you want to meet up. I think we can make a few more optimizations before submitting. I might still saying blazing fast, but I'm likely an anomaly. 🤔 |
i've cleaned up the some of the code / made some final touches / ran some benchmarks. i believe its now in a place i'm comfortable with (and hopefully you guys too!) the resultshere are the r10k results from hanami 2.1, 2.2, and then this change (2.2.proposal1). i ran this a few times, and they all fell in these ratios (although the obvious disclaimer of 'this is just my machine', etc): the changesthere were two main changes to lead to these numbers:
the reasoningwith the router bug fix introduced in 2.2,
the second main change is memoizing route parts. when constructing the
again, I would argue this tradeoff is worth it. the jump in RPS performance is significant enough, i would say developers would rather have their pages load faster in exchange for a few MB of memory. the summaryi wanted to sum up this whole PR conversation in one place, and hopefully this does that. the code changes are minimal, so it's more a conversation of acceptable trade-offs. this is my opinion, but im interested to hear @cllns thoughts. Thanks again! |
Thank you for this @kyleplump! I really appreciate the way that you zeroed in on the performance issues introduced by 2.2. I believe that the performance of the router "in use" is paramount, and more important than the startup time. This is the explicit tradeoff of using Mustermann. I'm in favor of merging this PR -- and remaining open to a discussion of changing or replacing our dependency on Mustermann. I think this benchmark tool is useful in general. It's definitely great for marketing, but more limited in reflecting real-world performance of the router. I would caution against making significant decisions based on this tool without taking a closer look at its methodology. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this @kyleplump! I added some in-line notes / requests for changes.
While it's not common to have 10k+ routes, it's certainly possible and a 10 second start-up time makes starting the development server painful, and makes frequently running system/request/feature tests while developing untenable.
Is there some way we can make the lazy/eager creation of the leaf nodes configurable? Then users can decide whether the trade-off is worth it for them (and possibly change the configuration based on environment, e.g. only create them eagerly in production, when delaying startup is acceptable in exchange for faster performance). Or does that bring back the bugs we were fixing with the new implementation?
Rather than do it that way for the whole router instance, I think my idea from earlier about eagerly creating the first N routes then letting the rest be lazily created could work. It may be an arbitrary number, but we can assume any app (especially that has 10k routes!) will not be hitting all the routes anywhere close to uniformly. Instead, they'll have a small number of routes that are hit frequently. It's already good practice to put your hot routes at the top of the file and your long-tail ones at the bottom, so I think this could be fine. The long-tail ones can have worse routing performance.
But thinking that through leads me to think of another, more explicit, option of adding lazy { }
(and eager { }
?) blocks to let users specifically opt-out (or opt-in?) to pre-creating the leaf nodes for their routes. I think this would be a novel approach for Ruby routers and we should be hesitant to add syntax for implementation details, but this is a significant trade-off so surfacing to users it might be prudent.
What are your thoughts? @kyleplump @dcr8898 I appreciate how much thought and work you've both put into this :)
lib/hanami/router/trie.rb
Outdated
_, *segments = path.split(SEGMENT_SEPARATOR) | ||
|
||
|
||
@segments_map[path] = segments | ||
segments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
segments |
Hash assignment returns the value being assigned so I don't think this last line does anything
lib/hanami/router/trie.rb
Outdated
_, *segments = path.split(SEGMENT_SEPARATOR) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an empty line here that should be removed, and the contents of the else
branch need to be indented
lib/hanami/router.rb
Outdated
@@ -923,7 +923,7 @@ def _params(env, params) | |||
end | |||
|
|||
env[PARAMS].merge!(::Rack::Utils.parse_nested_query(env[::Rack::QUERY_STRING])) | |||
env[PARAMS].merge!(params) | |||
env[PARAMS].merge!(params) unless params.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related? If the params are often not empty, then this check is slower than just merging the empty hash. If if it's often empty then it can be a useful optimization, but I think we should discuss that separately to decide if the tradeoff is worth it. I guess it might be negligible either way.
thank you for sticking with me @cllns ! I really appreciate all your patience here 😄 i see your point - while its a tradeoff i would certainly make, i'm can see how a framework might not want to make this decision for everyone, across the board. this idea of configurability that youre mentioning is also pretty inline with Hanami (this is why you're the pro) if we were going to allow this configurability, how would you feel about something like this:
my concern with point 2 is the potential complexity of the configuration. what happens if someone wants to disable preloading generally with a setting, but then also sets i wanted to leave the last paragraph in because i still think its an important point to raise, but thinking about this more i'd err on the side of 'yes'. as long as we provide sensible defaults, giving the user the fine grain control over how their apps behave if they want it is nothing but a good thing in my mind. at least, these are the things i would like as a user :) on the point of 'sensible defaults', for the first N routes being pre-compiled, i like the arbitrary number of '100'. thats roughly where things started to fall of the rails in the original PR |
I feel strongly that adding further complexity here falls in the category of premature optimization. We are discussing significant changes based solely on a benchmarking tool that may not accurately reflect real world conditions. I don't think we should pursue this unless and until there are actual Hanami apps experiencing pain as a result of the router design. With the changes Kyle has made, I think we have much to celebrate with the improved speed and correct operation of the router. 🚀 If the new startup time is linear relative to the number of routes, I feel that one second per thousand routes is reasonable, even for development and testing environments. I would guess that an app with 10K+ routes would have a test suite with run times measured in hours, and a 10 second hit to startup time would not be considered significant. (For comparison, I heard a Shopify engineer in 2018 say that their full test suite took 20 minutes to run when fully parallelized on six physical computers. That's two hours total run time. Ten seconds on each machine, or even each thread, would be meaningless in that scenario.) Even if devs were running only partial tests, I think our startup time would be considered understandable in an app with 10K routes (and I seriously doubt that Rails is any better on this measure). With all that said, if we do make matcher caching configurable, I would recommend switching based on the HANAMI_ENV env variable and just skip caching when in DEV or TEST environments. I would always cache in PROD. Another idea is to have a dedicated env variable that defaults to caching false/off in DEV or TEST when not set. I can't think of any advantage in caching some routes and not others (keeping in mind that my experience is limited). Even at its slowest (no caching) the router is still processing several thousand requests per second. This should be plenty for DEV and TEST.
As a final suggestion, since all of this pain is caused by the cost of using Mustermann, I think we should examine that decision more closely. We have chosen to use Mustermann in Rails mode. Perhaps a different mode would be more efficient? We don't use all of the features of Rails mode anyway, since they are not all compatible with our trie-based implementation. Perhaps we could create a Hanami mode that meets our specific requirements. Or, perhaps we should brainstorm an entirely different solution apart from Mustermann. |
👋 (i have one last thought to chime in, then i'll leave it alone) so as an alternative idea: what if we had 'router initialization' be two steps? the first pass is constructing the
considerations:
so in our 10k routes world, we'd achieve 2.1 level startup times (likely even faster, since 2.1 was also creating mustermann objects), benefitting from the lazy load approach, but in the background we can generating these mustermann objects to achieve the 'pre-compile' RPS after ~10 seconds, or whenever the 'second pass' finishes. this behavior would be on by default in dev, off by default in prod, and we could offer a setting to toggle this on / off in either environment |
. . . and concurrency has entered the room. 😱 I am out of my depth on that one. Not sure if having the two processes mucking with the same hash will be problematic. How is contention handled in those situations? I would love to learn more about this. I suggest that we merge Kyle's changes now, since they appear to be correct and solve the current problem, and move further improvements to separate issues. I think this will speed things along and allow for further in-depth conversation and experimentation. For example, I think there may be issues with caching |
@kyleplump's first post:
EnvironmentAn important point here is that this project ( Confirm test suites passes if done lazilyBefore we concern ourselves too much with how end-users would configure this, can you confirm that we wouldn't have any regressions if we went back to doing it lazily, based on some condition? To test it out, you could switch behavior based on a (temporary! for purposes of validating this only) ENV var, e.g. If you could do that and show graphs comparing Configuration proposalsI agree the configuration can get complicated if we have multiple ways to do it (since we have to consider overrides or incompatible options). We'd have to be very careful with that, to make sure it's not confusing or incorrect. Ideally we could just have one setting whose value can be evaluated in Ruby (which can check the environment its being evaluated in, if desired). What do you think of my idea of configuring this as a block within the router, e.g. Hanami::Router.new do
# Not valid syntax, just a proposal
eager do
root to: ->(env) { [200, {}, ["Created at boot"]] }
end
lazy do
get "about", ->(env) { [200, {}, ["Created when first called"]] }
end
end If we wanted to make it more configurable from there then we could do something like @kyleplump's second post:
I appreciate the creative thinking here! I think two processes for a router is too much complexity though.
I explained above, but request frequencies have a long-tail distribution. You can imagine a homepage is going to be hit many times more than the license page of library in a superadmin-only portal that's accessed for quarterly reports. In fact, that route may never be visited before a new deploy, meaning the time and memory spent during start-up to create the leaf node for it was 'wasted' in a sense.
Yes, I hear you about premature optimization and I disagree. I know the benchmarking tool ( One of Hanami's strengths is that it's a framework that helps people build and maintain large apps, so 10,000 routes is something we should support. If we only exploded our performance at, e.g. one million routes, then I wouldn't care so much, but an app with 10,000 routes is reasonable.
You and @kyleplump are the experts on the implementation here, so I'll leave this to you. It looks like Sinatra mode is pretty similar, based on a glance. We have to be careful with backwards compatibility here though, but if you think another mode is appropriate (or creating a Hanami mode, if warranted) you can explore that. I'd be hesitant to create (and maintain) our own string matching solution since we're standing on the shoulders of giants :) Also just to be clear, we need @timriley's input and approval on whatever course of action we end up with to solve #278. @kyleplump @dcr8898 Thanks again for both of your efforts and thoughts on this 🌸 |
@cllns @kyleplump I want to keep expressing my position, by I am sensitive to the fact that written communication can often sound more contentious than it is intended to sound. So I want you both to know that I am coming from a place of only wanting to keep Hanami as vibrant and healthy as possible. 💜 I also really appreciate the willingness to debate. With regard to premature optimization, I don't think it's premature because there are no 10K-route Hanami apps yet, but because there are no people or projects of any size that are coming to us to say, "the Hanami router start-up time is a problem for me." Until that happens, the startup time problem remains theoretical. @cllns With regard to route frequency distribution, I understand your point, but you seem to have drifted into a production-centered view point. I thought we were all of the position that route matchers should be pre-compiled in production? If we aren't, then that is definitely my position. In dev, route frequency is a non-issue, and in test route frequency probably has no relation at all to route frequencies in production. For example, login might be the hottest route in production but only has a few end-to-end tests, whereas that quarterly report route you described might have a hundred tests even though it's only run once every three months in production (you know, to make sure the 👨💼s look good 😉 ). At the end of the day, optimizing router performance for every environment and situation is an interesting problem. It's the kind of thing that gets the mental juices flowing--I mean, Kyle's ideas here are creative AF--but added complexity is added complexity. As Sean pointed out, the solution would have to work for Hanami Router in both stand-alone and framework contexts--an additional dimension of complexity. I don't think we should assume this maintenance burden any sooner than we have to--and hopefully never at all. I recommend merging this PR and continuing to iterate, talk, and innovate at pace. |
As maintainers, we also have to anticipate potential problems before they are reported, in line with what our priorities are for the project. It's impossible to know the exact metric, but I've heard and generally assume that only somewhere around 10% of people report bugs they come across. Being able to support 10k routes is table stakes for a framework that supports large codebases IMO. If we merged this as-is, without configurability, I'd hate for someone to read this PR (or discover via running/seeing r10k graphs) and come away with the impression that Hanami can't handle a large number of routes. Since creating leaf nodes eagerly vs lazily seems to be an intractable trade-off, I think this should be a configurable option: let users choose if they're willing to accept increased start-up time and memory usage in exchange for increased throughput (requests per second). They can do that based on environment if they chose, but we're agnostic to how they make that decision from the perspective of this library (which doesn't know about which environment it's being run in).
The performance in production is what's most important. Throughput generally does not matter for development nor testing, since routing is likely a negligible part of the response time. But startup time in development & test does matter, since we promote having a fast test suite that you can run often.
I think that's a safe default for the fullstack framework ( Next stepsIn the spirit of friendly and candid feedback, I think we have both explained our positions on this well enough :) This discussion is already quite long. We'll see what @kyleplump and @timriley have to say (also curious to hear others' opinions on this if anyone has been lurking this discussion and has an opinion). No urgency on this, just wanted to summarize next steps since they're scattered above:
|
this is all expected, with one surprising twist: mustermann doesnt play any role in startup time (or at least a negligible one). i think its the existence of the i dont know what to do about this yet, but will keep cracking at it. just wanted to through this out there! |
This is my last comment on eager vs. lazy. I swear. 🙄 If we are calling our startup time "slow," I would ask, in comparison to what? It's not really appropriate to compare our approach to Roda or Sinatra, since those routers define their routes in code. They have no startup time because they just call the code. We have chosen not to use this approach in order to make route definition more flexible and developer-friendly. This requires us to translate the route definition DSL into a routing data structure. If we compare Hanami router to Rails, which takes a similar approach, I'd be willing to bet that Hanami router has faster startup time. This doesn't mean we shouldn't try to improve startup time, but it does imply that eliminating startup time will likely require a completely different approach. I like the current Hanami router approach, and I would like to try to optimize it further to see how far we can go with it. |
On a separate note, would you believe that this video was in my watch queue and came up this morning? I play conference talk videos while I do certain mindless tasks. This one held my full attention. Optimization Techniques Used by Benchmark Winners by Jeremy Evans - RugyKaigi 2019 I mention it here because just about every second is relevant to what we are doing now. I encourage everyone in this conversation to watch it. And if that nerd snipe worked, then watch Aaron Patterson's RubyConf 2024 keynote where he describes how the Shopify team has changed Ruby itself in order to fix some of the performance issues pointed out in Jeremy's talk (he doesn't actually mention Jeremy or anything like that). I think these talks go to show that maintainability and performance are orthogonal measures of code, and tradeoffs must be made between them. |
related to #278
in the new Trie
leaf
implementation, Mustermann matchers were created at the time of evaluation, causing a semi-significant slowdown. we have all of the information to create the matcher when building the trie - so, this fix creates the matcher at the time ofleaf
creation (leaving a fallback in#match
), while maintaining the benefits of havingleaves
as introduced in 2.2, with this PRthis brings us back to similar performance as
hanami-router v2.1
, with the exception of startup (this fix is temporarily called2.2.1
):this slowdown on boot is expected - more work (creating leaves / mustermann) is happening. the way to get around this would likely be a new structure other than a trie (maybe someday!), or maybe evaluating / building the trie async (😬)
please let me know if you'd like more test cases (especially looking for thoughts from @dcr8898) . thanks!