-
Notifications
You must be signed in to change notification settings - Fork 115
Add 'fire and forget' mode to OSB #964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
osbenchmark/benchmark.py
Outdated
| default=None | ||
| ) | ||
| test_run_parser.add_argument( | ||
| "--fire-and-forget", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we come up with some other name, like --no-await or --sustain etc?
| self.complete.set() | ||
| await self._cleanup() | ||
|
|
||
| class UnhingedExecutor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Same here, can we change this to AsyncNoAwaitExecutor or something similar to maintain naming convention?
| self.logger.info("Client id [%s] is running now.", self.client_id) | ||
|
|
||
|
|
||
| async def _fire_and_forget_request(self, params: dict, expected_scheduled_time: float, total_start: float) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same here for the naming.
|
LGTM apart from some naming conventions. Can you run test in timed mode, may be for 15-30 mins and share some more results, good idea to test it with ramp up mode as well. |
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>

Description
Adds new 'fire and forget' mode to OSB.
Rather than OSB's traditional client model of 'fire request -> await request -> record metrics -> move on', this mode allows each client to fire requests without needing to worry about awaiting responses or recording metrics. This allows OSB to easily sustain high throughput values when load testing cluster, and is intended for those who don't necessarily want precise measurements on each request, but would rather test their clusters against very high sustained throughput levels.
The drawback here is of course that there is no information returned to the user on how their cluster is performing aside from outside forms of polling/measurements, like the performance charts than can be viewed in the AWS console for their cluster.
The PR introduces a new flag,
--fire-and-forget, which tells OSB to choose the DeterministicScheduler. Unlike the UnitAwareScheduler used for most benchmarks, this scheduler doesn't care about request metadata, it only calculates throughput for each client and tells them to send requests at a certain rate.The flag also tells OSB to make use of a new request executor, called the UnhingedExecutor. This executor, unlike the AsyncExecutor, creates separate asynchronous tasks to send requests without awaiting them, ensuring requests are sent at the specified rate no matter the latency or failure rates. To sum it up, if each executor needs to send 2 RPS, this mode tells them to create 2 async tasks per second, each of which will send the request, rather than trying to send 1 request every 0.5 seconds on its own.
This mode can consume resources very quickly, and users should ensure their hardware is able to handle the thousands of async processes that are created when using this flag. It's also likely they will run into an OSerror
too many open fileswith all of the connections being established.The max number of network sockets can be checked with
ulimit -nand changed with the same command, like:ulimit -n 2048Issues Resolved
#958
Testing
New unit tests + running tests in 'fire and forget' mode against OS cluster at 500 TPS:
Unit tests produce this warning since we don't await the async tasks:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.