Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up new router implementation #279

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

kyleplump
Copy link

related to #278

in the new Trie leaf implementation, Mustermann matchers were created at the time of evaluation, causing a semi-significant slowdown. we have all of the information to create the matcher when building the trie - so, this fix creates the matcher at the time of leaf creation (leaving a fallback in #match), while maintaining the benefits of having leaves as introduced in 2.2, with this PR

this brings us back to similar performance as hanami-router v2.1, with the exception of startup (this fix is temporarily called 2.2.1):

rps
log_rps
runtime_with_startup

this slowdown on boot is expected - more work (creating leaves / mustermann) is happening. the way to get around this would likely be a new structure other than a trie (maybe someday!), or maybe evaluating / building the trie async (😬)

please let me know if you'd like more test cases (especially looking for thoughts from @dcr8898) . thanks!

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 1, 2025 via email

@cllns
Copy link
Member

cllns commented Jan 1, 2025

Sweet! The 10+ second startup time for 10,000 routes is still worrisome to me, is there anyway we could bring that down?

Is it possible to postpone creating leaves until we need them, or batch them up somehow? I guess that's what you mean by making it async, but I could be wrong.

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 1, 2025 via email

@kyleplump
Copy link
Author

@cllns @dcr8898

howdy gentlemen

thank you for the thoughts! i think i made some headway here

context

i did some poking around to see if i could figure out where the slowdown on boot was, and surprise surprise, it's our friend Mustermann.

Damian is correct - in theory we're creating significantly less matchers since its only creating them for defined routes, rather than each individual dynamic segment. (for reference, the original design).

so in 2.1, we would have matchers for things like :id, or :uuid instead of full paths like /entity/:id or /some-route/:unique.

musty

in this section of the mustermann docs we learn two things:

  1. creating pattern matching objects is (intentionally) expensive
  2. when calling #new, and passing the same arguments, mustermann might just return an already created object instead of initializing a new one.

point #2 is the important part. when giving dynamic segments in 2.1, it was very possible we were sharing matchers between routes. a matcher would match :id, regardless of what route that dynamic segment was used in, utilizing the same underlying object. so in theory - less matchers. in practice - probably not.

i confirmed this by looking at the memory usage of 2.2.1:

memory

in Sean's original test, memory was similar between 2.1 and 2.2. this is because we were creating the matchers in the leaves at runtime. when 'precompiling' the routes, we see that initial memory usage is much higher. more evidence that we're initializing many more objects

proposal

route caching 🤔

i think there could be a world where we pre-compile the Trie on the first boot, and then save to a gitignore'd file on disk. on subsequent loads, we could check if the routes have changed. if they havent, we should have constant boot times. if they have changed, compile the changed routes into a new version of that file. i think this could be an acceptable solution, since long start up times really only starts to take effect when you have a significant number of routes, and i imagine very few people are going to add 10k routes before the first time starting the app. though, the consequence of this would be adding more complexity to the Hanami boot process. if this is a direction people are interested in, i'd love to create a separate issue for tracking that effort

lmk what you think! thank you as always

@kyleplump
Copy link
Author

kyleplump commented Jan 3, 2025

@cllns @dcr8898

follow up here: ended up squeezing some more juice out of the 2.2.1 implementation (now named 2.2.12 although idk if this is the correct usage of semver or not 🤷 )

updated graphs (on my machine at least):
rps
log_rps
runtime_with_startup

memory

as you can see memory usage is pretty much the same, this is still because of the issue discussed above.

i did some extra benchmarking and realized we were leaving a lot on the table with Trie#segments_from. turns out this function was taking most of the time in our implementation. we should be splitting based on a string instead of a regex (i didnt just know this off the top of my head, i found this and then tried it), as well as we should be memoizing segments (#put can do the split and cache, #find can just read segments from this cache)

please let me know if you see any issues with this updated approach!

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 6, 2025

@kyleplump

Thank you for continuing to look into this! I've been swamped on my end through the holidays.

Optimizing Hanami Router 2.2

Your switch from RegEx to string splitting is what I meant about finding the most performant implementation for our approach. It's good to see that 2.2.1 is faster than 2.1 when implemented properly. There may be more wins to be had with similar optimizations. Jeremy Evans' book has a lot of tips for this.

Using Mustermann will always require a trade-off

In terms of boot time vs response time, I believe this will always be a trade-off as long as we are using Mustermann. I don't know if it would be possible to pre-compile the trie structure somehow. I've never seen that done. You would essentially have to serialize Ruby objects and then de-serialize them at runtime. Not sure if this would be possible or faster. Interesting thought.

At present, I consider a longer boot-time for faster run-time in 2.2.1 to be the better trade-off (compared to the opposite in 2.2).

Mustermann discussion

With regard to Mustermann, I would bet that Hanami's use of tries allows it to compare favorably with any other framework that also uses Mustermann. This doesn't apply to Roda, since Roda does not seem to use Mustermann.

However, it would be hard for Hanami Router to match Roda's speed, because Roda's routing config is essentially a trie structure implemented in code that does not use Mustermann (see the "Faster" section here). Therefore, it's faster both at boot (there is no routing compile step at all) and during run time. (You can read a comprehensive write-up of Roda here.)

Hanami uses Mustermann because of the power and flexibility it provides out of the box, and because it's battle tested. With that said, there are other options.

Options for moving forward

I recommend that we continue to use Mustermann for now, as improved in this PR, but we should consider other options that could further improve Hanami Router, such as:

  • Contribute improvements to Mustermann.
  • Consider using a different Mustermann type (other than Rails), or even creating a Hanami Mustermann type.
  • Drop Mustermann and create a new matching strategy, perhaps using Roda for inspiration.

Other thoughts

As a general rule, the more flexible we allow our routes to be, the more overhead we should expect. Part of the reason Roda is fast is because it implements a small set of routing options to start (for example, only GET and POST are available by default). More complex options are available as plugins, but users don't incur their added overhead unless they choose to use them. If we do refactor Hanami Router along a different path, we could consider a similar approach of simple first.

Other considerations include developer experience. Hanami Router's configuration of routes is very similar to that of Rails, and (I think) very readable. Roda's presentation is very different, and perhaps not as readable (could just be my lack of familiarity). The fastest possible implementation would have developers specify a routing trie data structure directly in the routes file, but the DX for this would be poor. Trade-offs of some kind will always exist.

@kyleplump do you have time to discuss your code changes before we recommend merging them?

@cllns
Copy link
Member

cllns commented Jan 6, 2025

Thanks for finding this and writing up your thoughts @kyleplump @dcr8898!

Pre-compiling the trie is an interesting idea but adds too much complexity. It'd be a cool experiment for sure, but it's too novel to add at this point.

In terms of versioning proposals, I think it'd be better to refer to them via something like 2.2.0.proposal1 and 2.2.0.proposal2 or .fixN etc. Just something that signifies that it's not an actual gem release. At some point we'll make a 2.2.1 gem release and then the chart & discussion will be inaccurately labeled, which is confusing for future readers.

Is there some way we could take a hybrid approach here? That is, building Mustermann leaves ahead of time, for the first N routes, then after that just build the matchers lazily. We could use benchmarking to figure out what N should be by default, and allow that N to be configurable.

Similarly, could we have pre-building leaves an option? i.e. N=0 is valid and implements (essentially) the previous behavior from 2.1.

I don't know if it would be safe, but I could see wanting lazy built matchers in dev and test environments, then build them ahead of time in production for increased performance (at the expense of slower startup time).

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 6, 2025

@cllns These are some interesting ideas for sure.

My initial feeling is that this would be a case of premature optimization. I don't know how many apps there are out there with 10K routes. I would think a huge app would have 1-2K routes (based on my limited experience). If the graph above is logarithmic, this would imply startup times of 1-2 seconds for such apps. Do you think that would be too long for dev & test environments? My feeling is that we should wait until developers are actually feeling that pain before we introduce more complexity.

For example, I would like to speak with @kyleplump more to see if we can squeeze the current memory usage a little more, and maybe the startup time, with further optimizations.

If we do explore your suggestions, what would be the gain of compiling some matchers ahead of time and not others? As far as this benchmark is concerned, I presume it attempts to exercise all of the generated routes more or less equally. This is actually one factor that makes the r10k tool a little unrealistic.

In a real-world app, there would definitely be some distribution of popular and less-popular routes, but this distribution would not be captured by an arbitrary division of route handling based on a number. What might work in this case is the original 2.2 strategy and some kind of fast, in-memory cache of created Mustermann matchers. I don't know if this is worthwhile, but it would provide a compromise approach. (EDIT: I have no idea how this would work in a concurrent environment. Maybe it is best to compile the routes first, and then potentially freeze the router. 🤷 )

The second part of your suggestion, handling dev and test environments differently than production, could be implemented pretty easily (I think 🤔 ), but I still wouldn't recommend introducing that additional complexity until actual developer experience starts to suffer.

@kyleplump
Copy link
Author

@cllns @dcr8898

i appreciate you both thinking about this so deeply! i love hearing your perspectives - trying to keep up with you two is helping me learn a ton :)

i like this idea of pre-generating them all on production builds. is there a world where we keep it simple, at first, and just lazy load the matchers when running hanami dev, and pre-generate them on prod builds? as long as we document it, and maybe even provide the option to override this behavior, i feel like thats a big win. it allows developers the ability to see everything 'live' and un-cached, but once they're ready to ship a build we can eagerly optimize. i've worked with some tools that do similar-ish things, so as long as we're transparent, there's a precedent.

regardless - i'd love to keep poking around and seeing if we can be more 'clever' with our optimizations and make this router blazingly fast (side note: do people still say this?). be happy to bounce ideas off of anyone, but def @dcr8898 if youre volunteering

on versioning: once we iron out a game plan, i can make the modifications, close this PR and create a formal 'proposal' laying out changes made if that would be better? 2.2 fixes the bug, we're now just talking about optimization (which is the name of the game for a project like this). point being: if we bundle it all up nicely it can get merged whenever.

thanks again!

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 7, 2025

@kyleplump I'm in favor of polishing this PR up and merging it to fix the problem. Then we can experiment with further changes at our leisure. Let me know when you want to meet up. I think we can make a few more optimizations before submitting.

I might still saying blazing fast, but I'm likely an anomaly. 🤔

@kyleplump
Copy link
Author

hi @cllns @dcr8898 👋

i've cleaned up the some of the code / made some final touches / ran some benchmarks. i believe its now in a place i'm comfortable with (and hopefully you guys too!)


the results

here are the r10k results from hanami 2.1, 2.2, and then this change (2.2.proposal1). i ran this a few times, and they all fell in these ratios (although the obvious disclaimer of 'this is just my machine', etc):

rps
log_rps
memory
-411d-9325-630b79273de4)
runtime_with_startup

the changes

there were two main changes to lead to these numbers:

  • creating Mustermann matchers during the initial construction of the Trie
  • memoizing the mapping of full route paths (strings) to route parts (array of strings)

the reasoning

with the router bug fix introduced in 2.2, Mustermann matchers were created at a much higher rate, and also lazily. this lazy object creation led to the degradation in performance. in moving the time of creation up to the initial pass of the routes file / creation of the Trie, we can achieve similar performance to v2.1. that is, with the exception of 'initial start up'. there is a consequence of this design, that the initial boot time of apps with a significant number of routes (> 1000), see worse startup time than router 2.1. I would argue this is an acceptable tradeoff:

  • we still maintain the flexibility given to use by using Mustermann at all
  • we should be skeptical of 2.1's performance anyway: there was a bug in the router, which included significantly less mustermann matchers being created, and incorrect behavior. so we'd be comparing the "correct" version to an "incorrect" version of the router (i realize the unrealistic word choice here, but hopefully you see the point)

the second main change is memoizing route parts. when constructing the Trie during first load, we split the route into parts to insert them into the data structure as nodes. this is unavoidable. BUT, we then re-split the route into nodes to be able to walk the structure whenever matching routes. remembering the association of 'full route' to 'route parts' discovered in the initial creation, allows us to skip that (surprisingly) expensive String#split, and gives us RPS performance better than 2.2, and even 2.1 for that matter. the consequence of this:

  • slightly more memory usage that 2.2, since we're holding another object in memory

again, I would argue this tradeoff is worth it. the jump in RPS performance is significant enough, i would say developers would rather have their pages load faster in exchange for a few MB of memory.

the summary

i wanted to sum up this whole PR conversation in one place, and hopefully this does that. the code changes are minimal, so it's more a conversation of acceptable trade-offs. this is my opinion, but im interested to hear @cllns thoughts.

Thanks again!

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 27, 2025

Thank you for this @kyleplump! I really appreciate the way that you zeroed in on the performance issues introduced by 2.2.

I believe that the performance of the router "in use" is paramount, and more important than the startup time. This is the explicit tradeoff of using Mustermann. I'm in favor of merging this PR -- and remaining open to a discussion of changing or replacing our dependency on Mustermann.

I think this benchmark tool is useful in general. It's definitely great for marketing, but more limited in reflecting real-world performance of the router. I would caution against making significant decisions based on this tool without taking a closer look at its methodology.

@cllns cllns modified the milestone: ley Jan 27, 2025
Copy link
Member

@cllns cllns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @kyleplump! I added some in-line notes / requests for changes.

While it's not common to have 10k+ routes, it's certainly possible and a 10 second start-up time makes starting the development server painful, and makes frequently running system/request/feature tests while developing untenable.

Is there some way we can make the lazy/eager creation of the leaf nodes configurable? Then users can decide whether the trade-off is worth it for them (and possibly change the configuration based on environment, e.g. only create them eagerly in production, when delaying startup is acceptable in exchange for faster performance). Or does that bring back the bugs we were fixing with the new implementation?

Rather than do it that way for the whole router instance, I think my idea from earlier about eagerly creating the first N routes then letting the rest be lazily created could work. It may be an arbitrary number, but we can assume any app (especially that has 10k routes!) will not be hitting all the routes anywhere close to uniformly. Instead, they'll have a small number of routes that are hit frequently. It's already good practice to put your hot routes at the top of the file and your long-tail ones at the bottom, so I think this could be fine. The long-tail ones can have worse routing performance.

But thinking that through leads me to think of another, more explicit, option of adding lazy { } (and eager { }?) blocks to let users specifically opt-out (or opt-in?) to pre-creating the leaf nodes for their routes. I think this would be a novel approach for Ruby routers and we should be hesitant to add syntax for implementation details, but this is a significant trade-off so surfacing to users it might be prudent.

What are your thoughts? @kyleplump @dcr8898 I appreciate how much thought and work you've both put into this :)

_, *segments = path.split(SEGMENT_SEPARATOR)


@segments_map[path] = segments
segments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
segments

Hash assignment returns the value being assigned so I don't think this last line does anything

_, *segments = path.split(SEGMENT_SEPARATOR)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an empty line here that should be removed, and the contents of the else branch need to be indented

@@ -923,7 +923,7 @@ def _params(env, params)
end

env[PARAMS].merge!(::Rack::Utils.parse_nested_query(env[::Rack::QUERY_STRING]))
env[PARAMS].merge!(params)
env[PARAMS].merge!(params) unless params.empty?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related? If the params are often not empty, then this check is slower than just merging the empty hash. If if it's often empty then it can be a useful optimization, but I think we should discuss that separately to decide if the tradeoff is worth it. I guess it might be negligible either way.

@kyleplump
Copy link
Author

thank you for sticking with me @cllns ! I really appreciate all your patience here 😄

i see your point - while its a tradeoff i would certainly make, i'm can see how a framework might not want to make this decision for everyone, across the board. this idea of configurability that youre mentioning is also pretty inline with Hanami (this is why you're the pro)

if we were going to allow this configurability, how would you feel about something like this:

  • eager loading is off by default in development, on by default in production. allow the user to explicitly opt in or out in either case (a setting?, e.g.: setting :enable_route_preloading, constructor ..., or should this be an app config?)
  • eager load only the first N routes by default, allowing the ability to override this number

my concern with point 2 is the potential complexity of the configuration. what happens if someone wants to disable preloading generally with a setting, but then also sets :preload_n_routes = 10? would this be adding unnecessary complexity to the router because we'd need to handle and communicate config errors / conflicts like this? should we only allow point 2?


i wanted to leave the last paragraph in because i still think its an important point to raise, but thinking about this more i'd err on the side of 'yes'. as long as we provide sensible defaults, giving the user the fine grain control over how their apps behave if they want it is nothing but a good thing in my mind. at least, these are the things i would like as a user :)

on the point of 'sensible defaults', for the first N routes being pre-compiled, i like the arbitrary number of '100'. thats roughly where things started to fall of the rails in the original PR

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 28, 2025

@cllns @kyleplump

I feel strongly that adding further complexity here falls in the category of premature optimization. We are discussing significant changes based solely on a benchmarking tool that may not accurately reflect real world conditions. I don't think we should pursue this unless and until there are actual Hanami apps experiencing pain as a result of the router design.

With the changes Kyle has made, I think we have much to celebrate with the improved speed and correct operation of the router. 🚀

If the new startup time is linear relative to the number of routes, I feel that one second per thousand routes is reasonable, even for development and testing environments. I would guess that an app with 10K+ routes would have a test suite with run times measured in hours, and a 10 second hit to startup time would not be considered significant. (For comparison, I heard a Shopify engineer in 2018 say that their full test suite took 20 minutes to run when fully parallelized on six physical computers. That's two hours total run time. Ten seconds on each machine, or even each thread, would be meaningless in that scenario.) Even if devs were running only partial tests, I think our startup time would be considered understandable in an app with 10K routes (and I seriously doubt that Rails is any better on this measure).

With all that said, if we do make matcher caching configurable, I would recommend switching based on the HANAMI_ENV env variable and just skip caching when in DEV or TEST environments. I would always cache in PROD. Another idea is to have a dedicated env variable that defaults to caching false/off in DEV or TEST when not set.

I can't think of any advantage in caching some routes and not others (keeping in mind that my experience is limited). Even at its slowest (no caching) the router is still processing several thousand requests per second. This should be plenty for DEV and TEST.

As an aside, one of the things I love about Hanami, is that I need very few end-to-end tests. I find it way more productive to unit test and component test, with only a few tests to make sure routes and other things are wired together.

As a final suggestion, since all of this pain is caused by the cost of using Mustermann, I think we should examine that decision more closely. We have chosen to use Mustermann in Rails mode. Perhaps a different mode would be more efficient? We don't use all of the features of Rails mode anyway, since they are not all compatible with our trie-based implementation. Perhaps we could create a Hanami mode that meets our specific requirements. Or, perhaps we should brainstorm an entirely different solution apart from Mustermann.

@kyleplump
Copy link
Author

👋 (i have one last thought to chime in, then i'll leave it alone)

so as an alternative idea:

what if we had 'router initialization' be two steps? the first pass is constructing the trie, without any matchers, the second pass is re-going over the trie and creating the matchers at the leaves in a separate process? an example:

  • hanami server started, we need to figure out the routing scheme
  • create the trie structure, no leaf matchers yet
  • router is 'interactive' at this point
  • in a forked process, walk the trie and create the matchers at each leaf node

considerations:

  • any route that is matched before the 'second pass' gets to it, will lazily create the mustermann matcher at that time
  • the 'second pass' process is killed when its complete
  • each node is marked as 'has matcher' or not, and when the dev server is auto restarted after a file change, the process is created / killed again, only creating potentially new mustermann objects
  • other (albeit long lived) processes are created during the boot phase, so there is at least some precedent for spawning threads at this stage

so in our 10k routes world, we'd achieve 2.1 level startup times (likely even faster, since 2.1 was also creating mustermann objects), benefitting from the lazy load approach, but in the background we can generating these mustermann objects to achieve the 'pre-compile' RPS after ~10 seconds, or whenever the 'second pass' finishes. this behavior would be on by default in dev, off by default in prod, and we could offer a setting to toggle this on / off in either environment

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 28, 2025

@kyleplump @cllns

. . . and concurrency has entered the room. 😱 I am out of my depth on that one. Not sure if having the two processes mucking with the same hash will be problematic. How is contention handled in those situations? I would love to learn more about this.

I suggest that we merge Kyle's changes now, since they appear to be correct and solve the current problem, and move further improvements to separate issues. I think this will speed things along and allow for further in-depth conversation and experimentation.

For example, I think there may be issues with caching segments_map that could be avoided by never creating arrays of segments. I would like to explore that further in a separate issue/pr.

@cllns
Copy link
Member

cllns commented Jan 28, 2025

@kyleplump's first post:

if we were going to allow this configurability, how would you feel about something like this:

* eager loading is off by default in development, on by default in production. allow the user to explicitly opt in or out in either case (a setting?, e.g.: `setting :enable_route_preloading, constructor ...`, or should this be an app config?)

* eager load only the first N routes by default, allowing the ability to override this number

my concern with point 2 is the potential complexity of the configuration. what happens if someone wants to disable preloading generally with a setting, but then also sets :preload_n_routes = 10? would this be adding unnecessary complexity to the router because we'd need to handle and communicate config errors / conflicts like this? should we only allow point 2?

Environment

An important point here is that this project (hanami-router) is a standalone Ruby router, not just a router for the full-stack Hanami framework (hanami/hanami). The idea of development/test/production environments only exists in the full-stack framework. We can offer special configuration for the router when using the full-stack framework but we also would need a way to configure this when using hanami-router by itself. Generally, a router is agnostic to which environment it's deployed in (other than adding normal Ruby conditionals around some protected routes). (Also relevant to @dcr8898's post, which I comment on more below)

Confirm test suites passes if done lazily

Before we concern ourselves too much with how end-users would configure this, can you confirm that we wouldn't have any regressions if we went back to doing it lazily, based on some condition? To test it out, you could switch behavior based on a (temporary! for purposes of validating this only) ENV var, e.g. ENV['LAZILY_CREATE_LEAF_NODES'] and then run the test suite with that as true and false.

If you could do that and show graphs comparing 2.2.proposal2-lazy and 2.2.proposal2-eager, that would be terrific :)

Configuration proposals

I agree the configuration can get complicated if we have multiple ways to do it (since we have to consider overrides or incompatible options). We'd have to be very careful with that, to make sure it's not confusing or incorrect. Ideally we could just have one setting whose value can be evaluated in Ruby (which can check the environment its being evaluated in, if desired).

What do you think of my idea of configuring this as a block within the router, e.g.

Hanami::Router.new do
  # Not valid syntax, just a proposal
  eager do
    root to: ->(env) { [200, {}, ["Created at boot"]] }
  end
  
  lazy do
    get "about", ->(env) { [200, {}, ["Created when first called"]] }
  end
end

If we wanted to make it more configurable from there then we could do something like lazy(unless: Hanami.env == :production) or eager(max: 1000). I could see this getting messy in terms of configuration as well, but just want to get your feedback on the general idea.

@kyleplump's second post:

what if we had 'router initialization' be two steps? the first pass is constructing the trie, without any matchers, the second pass is re-going over the trie and creating the matchers at the leaves in a separate process? an example:
...

I appreciate the creative thinking here! I think two processes for a router is too much complexity though.

@dcr8898:

I can't think of any advantage in caching some routes and not others (keeping in mind that my experience is limited). Even at its slowest (no caching) the router is still processing several thousand requests per second. This should be plenty for DEV and TEST.

I explained above, but request frequencies have a long-tail distribution. You can imagine a homepage is going to be hit many times more than the license page of library in a superadmin-only portal that's accessed for quarterly reports. In fact, that route may never be visited before a new deploy, meaning the time and memory spent during start-up to create the leaf node for it was 'wasted' in a sense.

I feel strongly that adding further complexity here falls in the category of premature optimization. We are discussing significant changes based solely on a benchmarking tool that may not accurately reflect real world conditions. I don't think we should pursue this unless and until there are actual Hanami apps experiencing pain as a result of the router design.

Yes, I hear you about premature optimization and I disagree. I know the benchmarking tool (r10k) isn't fully accurate in terms of how routers are used (each route is hit once, so it's the opposite of a long-tail distribution), but it does show us that our start-up time (and memory usage) increases dramatically when we have a large number of routes. That much is accurate.

One of Hanami's strengths is that it's a framework that helps people build and maintain large apps, so 10,000 routes is something we should support. If we only exploded our performance at, e.g. one million routes, then I wouldn't care so much, but an app with 10,000 routes is reasonable.

As a final suggestion, since all of this pain is caused by the cost of using Mustermann, I think we should examine that decision more closely. We have chosen to use Mustermann in Rails mode. Perhaps a different mode would be more efficient? We don't use all of the features of Rails mode anyway, since they are not all compatible with our trie-based implementation. Perhaps we could create a Hanami mode that meets our specific requirements. Or, perhaps we should brainstorm an entirely different solution apart from Mustermann.

You and @kyleplump are the experts on the implementation here, so I'll leave this to you. It looks like Sinatra mode is pretty similar, based on a glance. We have to be careful with backwards compatibility here though, but if you think another mode is appropriate (or creating a Hanami mode, if warranted) you can explore that. I'd be hesitant to create (and maintain) our own string matching solution since we're standing on the shoulders of giants :)

Also just to be clear, we need @timriley's input and approval on whatever course of action we end up with to solve #278.

@kyleplump @dcr8898 Thanks again for both of your efforts and thoughts on this 🌸

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 28, 2025

@cllns @kyleplump I want to keep expressing my position, by I am sensitive to the fact that written communication can often sound more contentious than it is intended to sound. So I want you both to know that I am coming from a place of only wanting to keep Hanami as vibrant and healthy as possible. 💜 I also really appreciate the willingness to debate.

With regard to premature optimization, I don't think it's premature because there are no 10K-route Hanami apps yet, but because there are no people or projects of any size that are coming to us to say, "the Hanami router start-up time is a problem for me." Until that happens, the startup time problem remains theoretical.

@cllns With regard to route frequency distribution, I understand your point, but you seem to have drifted into a production-centered view point. I thought we were all of the position that route matchers should be pre-compiled in production? If we aren't, then that is definitely my position. In dev, route frequency is a non-issue, and in test route frequency probably has no relation at all to route frequencies in production. For example, login might be the hottest route in production but only has a few end-to-end tests, whereas that quarterly report route you described might have a hundred tests even though it's only run once every three months in production (you know, to make sure the 👨‍💼s look good 😉 ).

At the end of the day, optimizing router performance for every environment and situation is an interesting problem. It's the kind of thing that gets the mental juices flowing--I mean, Kyle's ideas here are creative AF--but added complexity is added complexity. As Sean pointed out, the solution would have to work for Hanami Router in both stand-alone and framework contexts--an additional dimension of complexity. I don't think we should assume this maintenance burden any sooner than we have to--and hopefully never at all.

I recommend merging this PR and continuing to iterate, talk, and innovate at pace.

@cllns
Copy link
Member

cllns commented Jan 29, 2025

@dcr8898:

I want to keep expressing my position, by I am sensitive to the fact that written communication can often sound more contentious than it is intended to sound. So I want you both to know that I am coming from a place of only wanting to keep Hanami as vibrant and healthy as possible. 💜 I also really appreciate the willingness to debate.

With regard to premature optimization, I don't think it's premature because there are no 10K-route Hanami apps yet, but because there are no people or projects of any size that are coming to us to say, "the Hanami router start-up time is a problem for me." Until that happens, the startup time problem remains theoretical.

As maintainers, we also have to anticipate potential problems before they are reported, in line with what our priorities are for the project. It's impossible to know the exact metric, but I've heard and generally assume that only somewhere around 10% of people report bugs they come across.

Being able to support 10k routes is table stakes for a framework that supports large codebases IMO. If we merged this as-is, without configurability, I'd hate for someone to read this PR (or discover via running/seeing r10k graphs) and come away with the impression that Hanami can't handle a large number of routes.

Since creating leaf nodes eagerly vs lazily seems to be an intractable trade-off, I think this should be a configurable option: let users choose if they're willing to accept increased start-up time and memory usage in exchange for increased throughput (requests per second). They can do that based on environment if they chose, but we're agnostic to how they make that decision from the perspective of this library (which doesn't know about which environment it's being run in).

With regard to route frequency distribution, I understand your point, but you seem to have drifted into a production-centered view point.

The performance in production is what's most important. Throughput generally does not matter for development nor testing, since routing is likely a negligible part of the response time. But startup time in development & test does matter, since we promote having a fast test suite that you can run often.

I thought we were all of the position that route matchers should be pre-compiled in production? If we aren't, then that is definitely my position.

I think that's a safe default for the fullstack framework (hanami/hanami) if we can ensure behavior is identical, but leave it as configurable if users don't want that.

Next steps

In the spirit of friendly and candid feedback, I think we have both explained our positions on this well enough :) This discussion is already quite long. We'll see what @kyleplump and @timriley have to say (also curious to hear others' opinions on this if anyone has been lurking this discussion and has an opinion).

No urgency on this, just wanted to summarize next steps since they're scattered above:

  • Confirm our test suite passes via both eager and lazy creation of leaf nodes. The ENV['LAZILY_CREATE_LEAF_NODES'] approach I suggested above may also be good for our CI (with a better name), since we'd have to ensure identical performance with both methods and that's a simple way to run the whole suite two different ways
  • Generate and share the r10k graphs comparing the two methods
  • Address the Rubocop violations
  • Address my code review comments
  • Discuss how this can be configured (whether on/off for whole router instance, max N routes, lazy/eager blocks, other)

@kyleplump
Copy link
Author

@cllns @dcr8898

here are the updated graphs:
rps
memory
runtime_with_startup

this is all expected, with one surprising twist: mustermann doesnt play any role in startup time (or at least a negligible one). i think its the existence of the Leaf class at all. in the past (2.1) we just assigned a string to the final node, now we are creating an entirely new object (or potential list of objects). i'm not sure how that operation will ever compare to an assignment.

i dont know what to do about this yet, but will keep cracking at it. just wanted to through this out there!

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 30, 2025

This is my last comment on eager vs. lazy. I swear. 🙄

If we are calling our startup time "slow," I would ask, in comparison to what? It's not really appropriate to compare our approach to Roda or Sinatra, since those routers define their routes in code. They have no startup time because they just call the code.

We have chosen not to use this approach in order to make route definition more flexible and developer-friendly. This requires us to translate the route definition DSL into a routing data structure. If we compare Hanami router to Rails, which takes a similar approach, I'd be willing to bet that Hanami router has faster startup time.

This doesn't mean we shouldn't try to improve startup time, but it does imply that eliminating startup time will likely require a completely different approach. I like the current Hanami router approach, and I would like to try to optimize it further to see how far we can go with it.

@dcr8898
Copy link
Contributor

dcr8898 commented Jan 30, 2025

On a separate note, would you believe that this video was in my watch queue and came up this morning? I play conference talk videos while I do certain mindless tasks. This one held my full attention.

Optimization Techniques Used by Benchmark Winners by Jeremy Evans - RugyKaigi 2019

I mention it here because just about every second is relevant to what we are doing now. I encourage everyone in this conversation to watch it.

And if that nerd snipe worked, then watch Aaron Patterson's RubyConf 2024 keynote where he describes how the Shopify team has changed Ruby itself in order to fix some of the performance issues pointed out in Jeremy's talk (he doesn't actually mention Jeremy or anything like that).

I think these talks go to show that maintainability and performance are orthogonal measures of code, and tradeoffs must be made between them.

Kyle Plump and others added 2 commits February 7, 2025 14:10
@kyleplump kyleplump requested a review from cllns February 7, 2025 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants