Skip to content

Conversation

@rly
Copy link
Contributor

@rly rly commented Sep 9, 2025

Fix #149, fix #148.

Params are now just a tuple of dictionaries. Each benchmark takes one parameter dictionary. File read benchmarks have keys "name", "https_url". Data slicing benchmarks have keys "name", "https_url", "object_name", and "slice_range". I could create dataclasses instead of using dicts, but this keeps things pretty flexible while we iterate. Let me know if you think that would be worth changing. This change adds a couple more boilerplate lines in each benchmark function to unpack the dictionary, but I think that is OK.

I added a "name" key because it helps with display and logging. It's not used anywhere.

BaseBenchmark now just holds common settings for benchmarks, setting:

    rounds = 1
    repeat = 1
    warmup_time = 0.0

I updated the docs accordingly.

Note: The "results" key in the intermediate results json file from asv now looks like this:

"results": {
  "network_tracking_remote_file_reading.HDF5H5pyFileReadBenchmark.track_network_read_hdf5_h5py_fsspec_https_no_cache": [
    [
      true,
      true,
      true
    ],
    [
      [
        "{'name': 'EcephysTestCase', 'https_url': 'https://dandiarchive.s3.amazonaws.com/blobs/30c/6cc/30c6cc7b-4d17-4237-9786-66623a6c65eb'}",
        "{'name': 'OphysTestCase', 'https_url': 'https://dandiarchive.s3.amazonaws.com/blobs/38c/c24/38cc240b-77c5-418a-9040-a7f582ff6541'}",
        "{'name': 'IcephysTestCase', 'https_url': 'https://dandiarchive.s3.amazonaws.com/blobs/c98/3a4/c983a4e1-097a-402c-bda8-e6a41cb7e24a'}"
      ]
    ],
    "594b2cc8d025210bda2b732969917c4a169cac0d9eba056eedb9452627346a38",
    1757392268936,
    32.685,
    null,
    null,
    null,
    null,
    null,
    null,
    [
      {
        "total_transfer_in_number_of_packets": 7869,
        "total_traffic_in_number_of_web_packets": 620,
        "amount_downloaded_in_number_of_packets": 7249,
        "amount_uploaded_in_number_of_packets": 620,
        "total_transfer_in_bytes": 10744542,
        "amount_downloaded_in_bytes": 10698584,
        "amount_uploaded_in_bytes": 45958,
        "total_transfer_time_in_seconds": 1.8256669999998583,
        "network_total_time_in_seconds": 2.2093868255615234
      },
      {
        "total_transfer_in_number_of_packets": 2916,
        "total_traffic_in_number_of_web_packets": 105,
        "amount_downloaded_in_number_of_packets": 2811,
        "amount_uploaded_in_number_of_packets": 105,
        "total_transfer_in_bytes": 4163646,
        "amount_downloaded_in_bytes": 4155891,
        "amount_uploaded_in_bytes": 7755,
        "total_transfer_time_in_seconds": 0.9806429999999909,
        "network_total_time_in_seconds": 1.3765041828155518
      },
      {
        "total_transfer_in_number_of_packets": 3840,
        "total_traffic_in_number_of_web_packets": 370,
        "amount_downloaded_in_number_of_packets": 3470,
        "amount_uploaded_in_number_of_packets": 370,
        "total_transfer_in_bytes": 5166375,
        "amount_downloaded_in_bytes": 5135446,
        "amount_uploaded_in_bytes": 30929,
        "total_transfer_time_in_seconds": 0.9876550000000466,
        "network_total_time_in_seconds": 1.157383918762207
      }
    ]
  ]
},

I adjusted _reduce_results.py to account for that -- results[1] is already a serialization of the single parameter dict. No flattening is required.

@rly
Copy link
Contributor Author

rly commented Sep 9, 2025

@CodyCBakerPhD I don't think the database or flask app need to change because the parameter case is still a string, but since the benchmark results/parameters are in a new format, does the database version need to be bumped?

@rly rly requested a review from CodyCBakerPhD September 9, 2025 06:06
@CodyCBakerPhD
Copy link
Collaborator

does the database version need to be bumped?

Definetely

@CodyCBakerPhD
Copy link
Collaborator

@rly Testing it out now (literally on my own device and mechanistically by adding actual tests to the CI lol)

@CodyCBakerPhD
Copy link
Collaborator

@rly Ran as expected but the results file does not get handled

E:\GitHub\nwb_benchmarks\src\nwb_benchmarks\setup\_reduce_results.py:62: UserWarning: In intermediate results for test case time_remote_slicing.HDF5PyNWBRemfilePreloadedNoCacheContinuousSliceBenchmark.time_slice:
        Length mismatch between parameters (1) and result samples (15)!

Removed unnecessary git fetch command from workflow.
@rly
Copy link
Contributor Author

rly commented Sep 9, 2025

Huh. I thought I tested reduce_results before pushing, but maybe not. I think results[1] should be results[1][0]. I'll test locally and update.

@CodyCBakerPhD
Copy link
Collaborator

FYI I ran nwb_benchmarks run --bench time_remote_slicing.HDF5PyNWBRemfilePreloadedNoCacheContinuousSliceBenchmark.time_slice for testing

@CodyCBakerPhD
Copy link
Collaborator

@rly CI is up and running, and reproduces the issue I see (even in the file_reading benchmark)

@rly
Copy link
Contributor Author

rly commented Sep 9, 2025

Tests pass. This all looks good to me. I have not run a full run through the benchmarks on this branch but will do so after merging

@CodyCBakerPhD
Copy link
Collaborator

Works for me!

@CodyCBakerPhD CodyCBakerPhD merged commit a4f20be into main Sep 9, 2025
3 checks passed
@CodyCBakerPhD CodyCBakerPhD deleted the refactor_params branch September 9, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Continuous slicing benchmarks run many more iterations than they should asv warmup phase can be expensive

3 participants