Skip to content

[Feature] Collector Profiling#3324

Merged
vmoens merged 4 commits into
gh/vmoens/192/basefrom
gh/vmoens/192/head
Jan 13, 2026
Merged

[Feature] Collector Profiling#3324
vmoens merged 4 commits into
gh/vmoens/192/basefrom
gh/vmoens/192/head

Conversation

@vmoens

@vmoens vmoens commented Jan 12, 2026

Copy link
Copy Markdown
Collaborator

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jan 12, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3324

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 4 Unrelated Failures

As of commit 9b7db7b with merge base 9d34dbe (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions

github-actions Bot commented Jan 12, 2026

Copy link
Copy Markdown
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 164. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 85.8015μs 85.3738μs 11.7132 KOps/s 12.4718 KOps/s $\textbf{\color{#d91a1a}-6.08\%}$
test_tensor_to_bytestream_speed[torch.save] 0.1396ms 0.1390ms 7.1947 KOps/s 7.2032 KOps/s $\color{#d91a1a}-0.12\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1185s 0.1178s 8.4901 Ops/s 8.2432 Ops/s $\color{#35bf28}+2.99\%$
test_tensor_to_bytestream_speed[numpy] 2.6853μs 2.6607μs 375.8405 KOps/s 369.5669 KOps/s $\color{#35bf28}+1.70\%$
test_tensor_to_bytestream_speed[safetensors] 40.3278μs 40.1785μs 24.8889 KOps/s 24.3816 KOps/s $\color{#35bf28}+2.08\%$
test_simple 0.5394s 0.5386s 1.8568 Ops/s 1.7663 Ops/s $\textbf{\color{#35bf28}+5.12\%}$
test_transformed 1.1079s 1.1016s 0.9077 Ops/s 0.8821 Ops/s $\color{#35bf28}+2.90\%$
test_serial 1.6431s 1.6394s 0.6100 Ops/s 0.5942 Ops/s $\color{#35bf28}+2.66\%$
test_parallel 1.2207s 1.1236s 0.8900 Ops/s 0.9085 Ops/s $\color{#d91a1a}-2.04\%$
test_step_mdp_speed[True-True-True-True-True] 0.3225ms 44.7526μs 22.3451 KOps/s 22.1090 KOps/s $\color{#35bf28}+1.07\%$
test_step_mdp_speed[True-True-True-True-False] 57.1000μs 24.7872μs 40.3433 KOps/s 38.8693 KOps/s $\color{#35bf28}+3.79\%$
test_step_mdp_speed[True-True-True-False-True] 63.6410μs 24.8371μs 40.2624 KOps/s 39.7370 KOps/s $\color{#35bf28}+1.32\%$
test_step_mdp_speed[True-True-True-False-False] 41.4210μs 13.4956μs 74.0983 KOps/s 71.4730 KOps/s $\color{#35bf28}+3.67\%$
test_step_mdp_speed[True-True-False-True-True] 82.6110μs 46.9208μs 21.3125 KOps/s 20.6143 KOps/s $\color{#35bf28}+3.39\%$
test_step_mdp_speed[True-True-False-True-False] 59.2110μs 27.0531μs 36.9643 KOps/s 35.1964 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_step_mdp_speed[True-True-False-False-True] 59.7910μs 27.5231μs 36.3332 KOps/s 35.7223 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[True-True-False-False-False] 47.4110μs 16.2418μs 61.5694 KOps/s 59.7729 KOps/s $\color{#35bf28}+3.01\%$
test_step_mdp_speed[True-False-True-True-True] 98.7910μs 50.3530μs 19.8598 KOps/s 19.7136 KOps/s $\color{#35bf28}+0.74\%$
test_step_mdp_speed[True-False-True-True-False] 79.1110μs 30.1145μs 33.2066 KOps/s 32.3974 KOps/s $\color{#35bf28}+2.50\%$
test_step_mdp_speed[True-False-True-False-True] 71.9210μs 27.5465μs 36.3022 KOps/s 36.1006 KOps/s $\color{#35bf28}+0.56\%$
test_step_mdp_speed[True-False-True-False-False] 47.9000μs 16.2292μs 61.6172 KOps/s 59.5511 KOps/s $\color{#35bf28}+3.47\%$
test_step_mdp_speed[True-False-False-True-True] 86.2310μs 52.4007μs 19.0837 KOps/s 18.7303 KOps/s $\color{#35bf28}+1.89\%$
test_step_mdp_speed[True-False-False-True-False] 67.2310μs 32.8058μs 30.4824 KOps/s 29.8864 KOps/s $\color{#35bf28}+1.99\%$
test_step_mdp_speed[True-False-False-False-True] 64.9410μs 29.9318μs 33.4092 KOps/s 32.9261 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[True-False-False-False-False] 47.0010μs 18.9726μs 52.7076 KOps/s 51.1067 KOps/s $\color{#35bf28}+3.13\%$
test_step_mdp_speed[False-True-True-True-True] 90.5220μs 50.3847μs 19.8473 KOps/s 19.5266 KOps/s $\color{#35bf28}+1.64\%$
test_step_mdp_speed[False-True-True-True-False] 67.0410μs 30.2002μs 33.1124 KOps/s 32.3397 KOps/s $\color{#35bf28}+2.39\%$
test_step_mdp_speed[False-True-True-False-True] 2.4233ms 31.7143μs 31.5316 KOps/s 31.1709 KOps/s $\color{#35bf28}+1.16\%$
test_step_mdp_speed[False-True-True-False-False] 49.7710μs 18.1360μs 55.1391 KOps/s 53.7846 KOps/s $\color{#35bf28}+2.52\%$
test_step_mdp_speed[False-True-False-True-True] 0.1192ms 52.4959μs 19.0491 KOps/s 18.5648 KOps/s $\color{#35bf28}+2.61\%$
test_step_mdp_speed[False-True-False-True-False] 68.6710μs 32.7473μs 30.5368 KOps/s 29.3468 KOps/s $\color{#35bf28}+4.06\%$
test_step_mdp_speed[False-True-False-False-True] 71.7010μs 33.7230μs 29.6533 KOps/s 28.7632 KOps/s $\color{#35bf28}+3.09\%$
test_step_mdp_speed[False-True-False-False-False] 49.0310μs 20.5676μs 48.6202 KOps/s 46.9061 KOps/s $\color{#35bf28}+3.65\%$
test_step_mdp_speed[False-False-True-True-True] 98.9310μs 55.2891μs 18.0867 KOps/s 17.6247 KOps/s $\color{#35bf28}+2.62\%$
test_step_mdp_speed[False-False-True-True-False] 69.4310μs 36.1942μs 27.6287 KOps/s 27.2096 KOps/s $\color{#35bf28}+1.54\%$
test_step_mdp_speed[False-False-True-False-True] 82.3910μs 34.2123μs 29.2293 KOps/s 28.7969 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[False-False-True-False-False] 53.1410μs 20.6937μs 48.3238 KOps/s 46.6809 KOps/s $\color{#35bf28}+3.52\%$
test_step_mdp_speed[False-False-False-True-True] 90.0420μs 57.3721μs 17.4301 KOps/s 16.7734 KOps/s $\color{#35bf28}+3.91\%$
test_step_mdp_speed[False-False-False-True-False] 76.5210μs 38.3608μs 26.0683 KOps/s 25.3428 KOps/s $\color{#35bf28}+2.86\%$
test_step_mdp_speed[False-False-False-False-True] 85.6610μs 35.7213μs 27.9945 KOps/s 26.8623 KOps/s $\color{#35bf28}+4.21\%$
test_step_mdp_speed[False-False-False-False-False] 56.0210μs 23.2490μs 43.0125 KOps/s 41.9220 KOps/s $\color{#35bf28}+2.60\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8631s 0.7572s 1.3206 Ops/s 1.3207 Ops/s $-0.01\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7226s 0.6199s 1.6132 Ops/s 1.5940 Ops/s $\color{#35bf28}+1.20\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7354s 1.6533s 0.6048 Ops/s 0.6028 Ops/s $\color{#35bf28}+0.33\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5243s 1.4475s 0.6909 Ops/s 0.6908 Ops/s $\color{#35bf28}+0.01\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9782s 1.8999s 0.5263 Ops/s 0.5229 Ops/s $\color{#35bf28}+0.66\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7746s 1.6881s 0.5924 Ops/s 0.5938 Ops/s $\color{#d91a1a}-0.25\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7829s 4.6876s 0.2133 Ops/s 0.2115 Ops/s $\color{#35bf28}+0.88\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5968s 4.4642s 0.2240 Ops/s 0.2234 Ops/s $\color{#35bf28}+0.27\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.1056s 2.0367s 0.4910 Ops/s 0.5122 Ops/s $\color{#d91a1a}-4.14\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7999s 1.6827s 0.5943 Ops/s 0.6041 Ops/s $\color{#d91a1a}-1.63\%$
test_values[generalized_advantage_estimate-True-True] 10.4172ms 9.9815ms 100.1853 Ops/s 96.9328 Ops/s $\color{#35bf28}+3.36\%$
test_values[vec_generalized_advantage_estimate-True-True] 15.2978ms 11.1162ms 89.9586 Ops/s 89.1720 Ops/s $\color{#35bf28}+0.88\%$
test_values[td0_return_estimate-False-False] 0.2504ms 0.1311ms 7.6251 KOps/s 7.6151 KOps/s $\color{#35bf28}+0.13\%$
test_values[td1_return_estimate-False-False] 27.5384ms 27.0761ms 36.9330 Ops/s 35.3013 Ops/s $\color{#35bf28}+4.62\%$
test_values[vec_td1_return_estimate-False-False] 11.4765ms 11.1382ms 89.7815 Ops/s 89.1144 Ops/s $\color{#35bf28}+0.75\%$
test_values[td_lambda_return_estimate-True-False] 41.0481ms 40.3234ms 24.7995 Ops/s 23.9287 Ops/s $\color{#35bf28}+3.64\%$
test_values[vec_td_lambda_return_estimate-True-False] 12.1896ms 11.1333ms 89.8210 Ops/s 90.0584 Ops/s $\color{#d91a1a}-0.26\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.9015ms 8.7984ms 113.6568 Ops/s 110.3098 Ops/s $\color{#35bf28}+3.03\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7421ms 1.4534ms 688.0590 Ops/s 621.2887 Ops/s $\textbf{\color{#35bf28}+10.75\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4661ms 0.4120ms 2.4272 KOps/s 2.3494 KOps/s $\color{#35bf28}+3.31\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 19.3431ms 18.5190ms 53.9986 Ops/s 33.6367 Ops/s $\textbf{\color{#35bf28}+60.53\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.9166ms 1.7220ms 580.7095 Ops/s 587.2251 Ops/s $\color{#d91a1a}-1.11\%$
test_dqn_speed[False-None] 1.5741ms 1.3873ms 720.8089 Ops/s 712.6010 Ops/s $\color{#35bf28}+1.15\%$
test_dqn_speed[False-backward] 1.9468ms 1.8942ms 527.9251 Ops/s 522.2657 Ops/s $\color{#35bf28}+1.08\%$
test_dqn_speed[True-None] 0.6374ms 0.5334ms 1.8746 KOps/s 1.7906 KOps/s $\color{#35bf28}+4.69\%$
test_dqn_speed[True-backward] 1.3263ms 0.9957ms 1.0043 KOps/s 1.0184 KOps/s $\color{#d91a1a}-1.38\%$
test_dqn_speed[reduce-overhead-None] 0.6654ms 0.5338ms 1.8735 KOps/s 1.8826 KOps/s $\color{#d91a1a}-0.48\%$
test_dqn_speed[reduce-overhead-backward] 1.0115ms 0.9568ms 1.0451 KOps/s 872.9981 Ops/s $\textbf{\color{#35bf28}+19.72\%}$
test_ddpg_speed[False-None] 3.1516ms 2.8185ms 354.7964 Ops/s 344.8763 Ops/s $\color{#35bf28}+2.88\%$
test_ddpg_speed[False-backward] 4.1440ms 4.0268ms 248.3380 Ops/s 241.7169 Ops/s $\color{#35bf28}+2.74\%$
test_ddpg_speed[True-None] 1.4668ms 1.3724ms 728.6349 Ops/s 724.5624 Ops/s $\color{#35bf28}+0.56\%$
test_ddpg_speed[True-backward] 2.4632ms 2.3402ms 427.3061 Ops/s 422.0276 Ops/s $\color{#35bf28}+1.25\%$
test_ddpg_speed[reduce-overhead-None] 1.4793ms 1.3742ms 727.7180 Ops/s 721.6857 Ops/s $\color{#35bf28}+0.84\%$
test_ddpg_speed[reduce-overhead-backward] 2.3649ms 2.3259ms 429.9338 Ops/s 427.8062 Ops/s $\color{#35bf28}+0.50\%$
test_sac_speed[False-None] 8.4662ms 7.9077ms 126.4588 Ops/s 124.1445 Ops/s $\color{#35bf28}+1.86\%$
test_sac_speed[False-backward] 11.6645ms 11.1506ms 89.6809 Ops/s 88.1091 Ops/s $\color{#35bf28}+1.78\%$
test_sac_speed[True-None] 2.3058ms 2.1402ms 467.2355 Ops/s 461.2622 Ops/s $\color{#35bf28}+1.29\%$
test_sac_speed[True-backward] 6.2274ms 4.3512ms 229.8219 Ops/s 230.4812 Ops/s $\color{#d91a1a}-0.29\%$
test_sac_speed[reduce-overhead-None] 2.3171ms 2.1376ms 467.8233 Ops/s 440.4409 Ops/s $\textbf{\color{#35bf28}+6.22\%}$
test_sac_speed[reduce-overhead-backward] 4.1284ms 4.0281ms 248.2590 Ops/s 239.2787 Ops/s $\color{#35bf28}+3.75\%$
test_redq_speed[False-None] 10.8769ms 10.3835ms 96.3065 Ops/s 66.3710 Ops/s $\textbf{\color{#35bf28}+45.10\%}$
test_redq_speed[False-backward] 18.6169ms 18.0109ms 55.5220 Ops/s 55.3167 Ops/s $\color{#35bf28}+0.37\%$
test_redq_speed[True-None] 4.8031ms 4.3604ms 229.3359 Ops/s 224.4598 Ops/s $\color{#35bf28}+2.17\%$
test_redq_speed[True-backward] 9.8685ms 9.5347ms 104.8795 Ops/s 100.7929 Ops/s $\color{#35bf28}+4.05\%$
test_redq_speed[reduce-overhead-None] 4.6997ms 4.3822ms 228.1933 Ops/s 229.9822 Ops/s $\color{#d91a1a}-0.78\%$
test_redq_speed[reduce-overhead-backward] 10.0886ms 9.7251ms 102.8265 Ops/s 102.4035 Ops/s $\color{#35bf28}+0.41\%$
test_redq_deprec_speed[False-None] 11.2974ms 10.7620ms 92.9198 Ops/s 89.4563 Ops/s $\color{#35bf28}+3.87\%$
test_redq_deprec_speed[False-backward] 16.0014ms 15.4660ms 64.6581 Ops/s 62.1568 Ops/s $\color{#35bf28}+4.02\%$
test_redq_deprec_speed[True-None] 3.7330ms 3.4783ms 287.4973 Ops/s 273.4529 Ops/s $\textbf{\color{#35bf28}+5.14\%}$
test_redq_deprec_speed[True-backward] 7.6200ms 7.1606ms 139.6522 Ops/s 129.3399 Ops/s $\textbf{\color{#35bf28}+7.97\%}$
test_redq_deprec_speed[reduce-overhead-None] 3.8534ms 3.4341ms 291.1986 Ops/s 270.0923 Ops/s $\textbf{\color{#35bf28}+7.81\%}$
test_redq_deprec_speed[reduce-overhead-backward] 7.3509ms 7.1176ms 140.4969 Ops/s 130.5623 Ops/s $\textbf{\color{#35bf28}+7.61\%}$
test_td3_speed[False-None] 8.1877ms 7.9907ms 125.1461 Ops/s 123.4213 Ops/s $\color{#35bf28}+1.40\%$
test_td3_speed[False-backward] 11.2914ms 10.8657ms 92.0329 Ops/s 91.7397 Ops/s $\color{#35bf28}+0.32\%$
test_td3_speed[True-None] 1.8715ms 1.8161ms 550.6209 Ops/s 546.6978 Ops/s $\color{#35bf28}+0.72\%$
test_td3_speed[True-backward] 3.7845ms 3.6392ms 274.7870 Ops/s 273.4079 Ops/s $\color{#35bf28}+0.50\%$
test_td3_speed[reduce-overhead-None] 1.8578ms 1.8007ms 555.3401 Ops/s 545.4125 Ops/s $\color{#35bf28}+1.82\%$
test_td3_speed[reduce-overhead-backward] 3.7811ms 3.6647ms 272.8705 Ops/s 226.0663 Ops/s $\textbf{\color{#35bf28}+20.70\%}$
test_cql_speed[False-None] 29.7526ms 26.1193ms 38.2858 Ops/s 37.4432 Ops/s $\color{#35bf28}+2.25\%$
test_cql_speed[False-backward] 39.0140ms 35.4994ms 28.1695 Ops/s 27.8134 Ops/s $\color{#35bf28}+1.28\%$
test_cql_speed[True-None] 12.6725ms 12.2161ms 81.8589 Ops/s 79.3043 Ops/s $\color{#35bf28}+3.22\%$
test_cql_speed[True-backward] 17.8596ms 17.3799ms 57.5378 Ops/s 55.3570 Ops/s $\color{#35bf28}+3.94\%$
test_cql_speed[reduce-overhead-None] 12.5676ms 12.2375ms 81.7159 Ops/s 77.8623 Ops/s $\color{#35bf28}+4.95\%$
test_cql_speed[reduce-overhead-backward] 18.0911ms 17.6337ms 56.7097 Ops/s 54.5869 Ops/s $\color{#35bf28}+3.89\%$
test_a2c_speed[False-None] 5.6370ms 5.3862ms 185.6582 Ops/s 180.3173 Ops/s $\color{#35bf28}+2.96\%$
test_a2c_speed[False-backward] 12.2106ms 11.7687ms 84.9708 Ops/s 85.1824 Ops/s $\color{#d91a1a}-0.25\%$
test_a2c_speed[True-None] 3.8762ms 3.6870ms 271.2217 Ops/s 265.9275 Ops/s $\color{#35bf28}+1.99\%$
test_a2c_speed[True-backward] 9.0710ms 8.5627ms 116.7860 Ops/s 117.0281 Ops/s $\color{#d91a1a}-0.21\%$
test_a2c_speed[reduce-overhead-None] 3.8553ms 3.7016ms 270.1503 Ops/s 267.1898 Ops/s $\color{#35bf28}+1.11\%$
test_a2c_speed[reduce-overhead-backward] 9.5029ms 8.8208ms 113.3678 Ops/s 115.4669 Ops/s $\color{#d91a1a}-1.82\%$
test_ppo_speed[False-None] 6.2761ms 5.8510ms 170.9119 Ops/s 165.7791 Ops/s $\color{#35bf28}+3.10\%$
test_ppo_speed[False-backward] 12.8645ms 12.3480ms 80.9845 Ops/s 78.9605 Ops/s $\color{#35bf28}+2.56\%$
test_ppo_speed[True-None] 3.9824ms 3.6082ms 277.1452 Ops/s 272.7688 Ops/s $\color{#35bf28}+1.60\%$
test_ppo_speed[True-backward] 8.5943ms 8.3070ms 120.3799 Ops/s 115.0123 Ops/s $\color{#35bf28}+4.67\%$
test_ppo_speed[reduce-overhead-None] 3.8954ms 3.6161ms 276.5392 Ops/s 277.3318 Ops/s $\color{#d91a1a}-0.29\%$
test_ppo_speed[reduce-overhead-backward] 9.1808ms 8.7676ms 114.0562 Ops/s 113.3118 Ops/s $\color{#35bf28}+0.66\%$
test_reinforce_speed[False-None] 4.8249ms 4.5988ms 217.4489 Ops/s 215.0287 Ops/s $\color{#35bf28}+1.13\%$
test_reinforce_speed[False-backward] 7.6368ms 7.2941ms 137.0963 Ops/s 134.8059 Ops/s $\color{#35bf28}+1.70\%$
test_reinforce_speed[True-None] 3.2839ms 2.8632ms 349.2620 Ops/s 332.6660 Ops/s $\color{#35bf28}+4.99\%$
test_reinforce_speed[True-backward] 8.0302ms 7.6635ms 130.4884 Ops/s 123.2038 Ops/s $\textbf{\color{#35bf28}+5.91\%}$
test_reinforce_speed[reduce-overhead-None] 3.2771ms 2.8420ms 351.8610 Ops/s 336.2448 Ops/s $\color{#35bf28}+4.64\%$
test_reinforce_speed[reduce-overhead-backward] 8.3520ms 7.9345ms 126.0324 Ops/s 122.8236 Ops/s $\color{#35bf28}+2.61\%$
test_iql_speed[False-None] 25.4356ms 20.1520ms 49.6229 Ops/s 48.4786 Ops/s $\color{#35bf28}+2.36\%$
test_iql_speed[False-backward] 31.2473ms 30.2358ms 33.0734 Ops/s 32.6392 Ops/s $\color{#35bf28}+1.33\%$
test_iql_speed[True-None] 8.9894ms 8.5117ms 117.4847 Ops/s 112.6972 Ops/s $\color{#35bf28}+4.25\%$
test_iql_speed[True-backward] 17.6398ms 16.6679ms 59.9955 Ops/s 58.5755 Ops/s $\color{#35bf28}+2.42\%$
test_iql_speed[reduce-overhead-None] 9.1071ms 8.5403ms 117.0918 Ops/s 113.2489 Ops/s $\color{#35bf28}+3.39\%$
test_iql_speed[reduce-overhead-backward] 17.5032ms 17.0875ms 58.5222 Ops/s 56.9353 Ops/s $\color{#35bf28}+2.79\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.4807ms 6.0612ms 164.9840 Ops/s 163.7036 Ops/s $\color{#35bf28}+0.78\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6546ms 0.3815ms 2.6210 KOps/s 3.1978 KOps/s $\textbf{\color{#d91a1a}-18.04\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6815ms 0.3636ms 2.7500 KOps/s 2.8709 KOps/s $\color{#d91a1a}-4.21\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1558ms 5.7897ms 172.7210 Ops/s 169.6486 Ops/s $\color{#35bf28}+1.81\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.5893ms 0.3715ms 2.6920 KOps/s 3.1576 KOps/s $\textbf{\color{#d91a1a}-14.75\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6249ms 0.3526ms 2.8362 KOps/s 3.2202 KOps/s $\textbf{\color{#d91a1a}-11.92\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7004ms 1.4274ms 700.5718 Ops/s 695.5183 Ops/s $\color{#35bf28}+0.73\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.7005ms 1.3488ms 741.3829 Ops/s 758.1530 Ops/s $\color{#d91a1a}-2.21\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.9354ms 6.0787ms 164.5082 Ops/s 165.9366 Ops/s $\color{#d91a1a}-0.86\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.4443ms 0.5209ms 1.9198 KOps/s 2.1227 KOps/s $\textbf{\color{#d91a1a}-9.56\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6174ms 0.4186ms 2.3891 KOps/s 2.3957 KOps/s $\color{#d91a1a}-0.28\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9913ms 5.8307ms 171.5050 Ops/s 169.3602 Ops/s $\color{#35bf28}+1.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9870ms 0.2995ms 3.3386 KOps/s 3.4709 KOps/s $\color{#d91a1a}-3.81\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4866ms 0.2811ms 3.5573 KOps/s 2.8459 KOps/s $\textbf{\color{#35bf28}+25.00\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9599ms 5.7473ms 173.9958 Ops/s 170.4022 Ops/s $\color{#35bf28}+2.11\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7420ms 0.3316ms 3.0153 KOps/s 2.7741 KOps/s $\textbf{\color{#35bf28}+8.70\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5754ms 0.3191ms 3.1335 KOps/s 3.1920 KOps/s $\color{#d91a1a}-1.83\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.4978ms 5.9538ms 167.9598 Ops/s 166.7015 Ops/s $\color{#35bf28}+0.75\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2044ms 0.5208ms 1.9203 KOps/s 2.0192 KOps/s $\color{#d91a1a}-4.90\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7763ms 0.4728ms 2.1150 KOps/s 2.1321 KOps/s $\color{#d91a1a}-0.80\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.6495s 17.8915ms 55.8924 Ops/s 48.9864 Ops/s $\textbf{\color{#35bf28}+14.10\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.3573ms 1.9272ms 518.8969 Ops/s 565.6595 Ops/s $\textbf{\color{#d91a1a}-8.27\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 10.2532ms 1.3043ms 766.6973 Ops/s 819.4081 Ops/s $\textbf{\color{#d91a1a}-6.43\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.0103ms 5.0483ms 198.0846 Ops/s 197.5164 Ops/s $\color{#35bf28}+0.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9701ms 1.7882ms 559.2259 Ops/s 593.8622 Ops/s $\textbf{\color{#d91a1a}-5.83\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 9.0390ms 1.2486ms 800.8920 Ops/s 930.4443 Ops/s $\textbf{\color{#d91a1a}-13.92\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5989s 17.1510ms 58.3056 Ops/s 188.2613 Ops/s $\textbf{\color{#d91a1a}-69.03\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.1016ms 1.9301ms 518.1077 Ops/s 497.6909 Ops/s $\color{#35bf28}+4.10\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.1877ms 1.0365ms 964.7936 Ops/s 667.6922 Ops/s $\textbf{\color{#35bf28}+44.50\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 40.1569ms 36.6285ms 27.3011 Ops/s 27.7092 Ops/s $\color{#d91a1a}-1.47\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.7161ms 17.5209ms 57.0748 Ops/s 56.4848 Ops/s $\color{#35bf28}+1.04\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.1065ms 36.7364ms 27.2210 Ops/s 26.5363 Ops/s $\color{#35bf28}+2.58\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.4461ms 18.2953ms 54.6588 Ops/s 55.5657 Ops/s $\color{#d91a1a}-1.63\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.6159ms 38.4478ms 26.0093 Ops/s 25.5876 Ops/s $\color{#35bf28}+1.65\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.3736ms 19.1366ms 52.2560 Ops/s 51.7892 Ops/s $\color{#35bf28}+0.90\%$

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 13, 2026
Add `ProfileConfig` and profiling support to collectors. This allows
profiling collector rollouts using PyTorch's profiler across all
collector types (single-process and multi-process).


ghstack-source-id: 80a964d
Pull-Request: #3324
@vmoens vmoens merged commit 9b7db7b into gh/vmoens/192/base Jan 13, 2026
99 of 106 checks passed
@vmoens vmoens deleted the gh/vmoens/192/head branch January 13, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant