Skip to content

Conversation

@Tom0Brien
Copy link
Contributor

@Tom0Brien Tom0Brien commented Apr 20, 2025

This PR adds a script to benchmark both runtime and "control" performance of algorithms. This should help to compare various algorithms and make it easier to measure how changes impact runtime performance.

Usage

Run the benchmark on a specific task:

python hydrax/benchmark/run_benchmark.py --task Pendulum

TODO

  • Add a more general configuration setup so that we can have task specific configs for both running examples and benchmarking
  • General code clean up and simplify to make maintaining easy
  • Add tests

Future plans

  • Add script to perform automatic hyperparameter tuning using the benchmarking script (can actually probably use an algorithm from Evosax or optuna is good).

Example results

Pendulum_comparison
Pendulum_cost_over_time
Pendulum_cost_vs_performance

Benchmarking 5 controllers on task: Pendulum
Benchmarking PredictiveSampling on Pendulum...
Benchmarking MPPI on Pendulum...
Benchmarking CEM on Pendulum...
Benchmarking CMA_ES on Pendulum...
Benchmarking Sep_CMA_ES on Pendulum...
Benchmark complete in 66.43 seconds! Results saved to /home/tom/Projects/hydrax/hydrax/benchmark/results

Performance Summary:
Controller           Avg Cost        Final Cost      Avg Plan Time (s)    Realtime Rate  
------------------------------------------------------------------------------------------
PredictiveSampling   1.097615        0.000011        0.014874             0.95           
CEM                  1.974123        0.000000        0.012524             1.20           
Sep_CMA_ES           2.057362        0.000000        0.025399             0.68           
CMA_ES               3.726031        3.682149        0.032584             0.54           
MPPI                 3.818880        4.001694        0.012006             1.25

@Tom0Brien Tom0Brien mentioned this pull request May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant