chore: robust time travel tests · langchain-ai/langgraph@9ac09ea

Triggered via pull request March 4, 2026 20:31

sydney-runkle

synchronize #7024

sr/new-time-travel-tests

Status Success

Total duration 21m 49s

Artifacts –

bench.yml

on: pull_request

benchmark

21m 47s

Annotations

2 notices

Comparison against main: libs/langgraph/tests/test_replay.py#L0

+-----------------------------------------+---------+-----------------------+ | Benchmark | main | changes | +=========================================+=========+=======================+ | react_agent_100x_sync | 728 ms | 694 ms: 1.05x faster | +-----------------------------------------+---------+-----------------------+ | react_agent_100x | 764 ms | 732 ms: 1.04x faster | +-----------------------------------------+---------+-----------------------+ | pydantic_state_15x600_checkpoint_sync | 78.5 ms | 75.3 ms: 1.04x faster | +-----------------------------------------+---------+-----------------------+ | pydantic_state_9x1200_checkpoint_sync | 68.0 ms | 65.3 ms: 1.04x faster | +-----------------------------------------+---------+-----------------------+ | wide_dict_9x1200_checkpoint_sync | 38.6 ms | 37.1 ms: 1.04x faster | +-----------------------------------------+---------+-----------------------+ | react_agent_100x_checkpoint_sync | 727 ms | 700 ms: 1.04x faster | +-----------------------------------------+---------+-----------------------+ | wide_state_15x600_checkpoint_sync | 49.1 ms | 47.4 ms: 1.04x faster | +-----------------------------------------+---------+-----------------------+ | react_agent_100x_checkpoint | 763 ms | 737 ms: 1.04x faster | +-----------------------------------------+---------+-----------------------+ | wide_state_25x300_checkpoint_sync | 29.6 ms | 28.7 ms: 1.03x faster | +-----------------------------------------+---------+-----------------------+ | wide_dict_15x600_checkpoint_sync | 49.3 ms | 47.9 ms: 1.03x faster | +-----------------------------------------+---------+-----------------------+ | wide_dict_15x600_checkpoint | 54.3 ms | 52.8 ms: 1.03x faster | +-----------------------------------------+---------+-----------------------+ | wide_dict_25x300_checkpoint_sync | 29.4 ms | 28.6 ms: 1.03x faster | +-----------------------------------------+---------+-----------------------+ | react_agent_10x_checkpoint_sync | 22.7 ms | 22.3 ms: 1.02x faster | +-----------------------------------------+---------+-----------------------+ | wide_state_9x1200_checkpoint_sync | 38.1 ms | 37.5 ms: 1.02x faster | +-----------------------------------------+---------+-----------------------+ | wide_dict_9x1200_checkpoint | 43.2 ms | 42.6 ms: 1.02x faster | +-----------------------------------------+---------+-----------------------+ | wide_state_15x600_checkpoint | 54.1 ms | 53.4 ms: 1.01x faster | +-----------------------------------------+---------+-----------------------+ | wide_state_25x300_checkpoint | 33.6 ms | 33.2 ms: 1.01x faster | +-----------------------------------------+---------+-----------------------+ | pydantic_state_9x1200_checkpoint | 71.9 ms | 71.1 ms: 1.01x faster | +-----------------------------------------+---------+-----------------------+ | pydantic_state_25x300_checkpoint_sync | 41.8 ms | 41.4 ms: 1.01x faster | +-----------------------------------------+---------+-----------------------+ | pydantic_state_25x300_checkpoint | 46.0 ms | 45.7 ms: 1.01x faster | +-----------------------------------------+---------+-----------------------+ | wide_dict_25x300_checkpoint | 33.2 ms | 33.1 ms: 1.00x faster | +-----------------------------------------+---------+-----------------------+ | pydantic_state_15x600_checkpoint | 81.3 ms | 81.0 ms: 1.00x faster | +-----------------------------------------+---------+-----------------------+ | pydantic_state_9x1200_sync | 38.4 ms | 38.4 ms: 1.00x faster | +-----------------------------------------+---------+-----------------------+ | wide_state_9x1200_checkpoint | 42.7 ms | 42.8 ms: 1.00x slower | +-----------------------------------------+---------+-----------------------+ | react_agent_10x | 24.5 ms | 24.7 ms: 1.01x slower | +---------------------------------------

Benchmark results: libs/langgraph/tests/test_replay.py#L0

........... WARNING: the benchmark result may be unstable * Not enough samples to get a stable result (95% certainly of less than 1% variation) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. fanout_to_subgraph_10x: Mean +- std dev: 29.4 ms +- 0.7 ms ........... fanout_to_subgraph_10x_sync: Mean +- std dev: 28.5 ms +- 0.2 ms ........... WARNING: the benchmark result may be unstable * Not enough samples to get a stable result (95% certainly of less than 1% variation) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. fanout_to_subgraph_10x_checkpoint: Mean +- std dev: 31.4 ms +- 0.6 ms ........... fanout_to_subgraph_10x_checkpoint_sync: Mean +- std dev: 30.3 ms +- 0.2 ms ........... fanout_to_subgraph_100x: Mean +- std dev: 318 ms +- 24 ms ........... fanout_to_subgraph_100x_sync: Mean +- std dev: 278 ms +- 2 ms ........... WARNING: the benchmark result may be unstable * Not enough samples to get a stable result (95% certainly of less than 1% variation) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. fanout_to_subgraph_100x_checkpoint: Mean +- std dev: 329 ms +- 26 ms ........... fanout_to_subgraph_100x_checkpoint_sync: Mean +- std dev: 297 ms +- 3 ms ........... WARNING: the benchmark result may be unstable * Not enough samples to get a stable result (95% certainly of less than 1% variation) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. react_agent_10x: Mean +- std dev: 24.7 ms +- 0.7 ms ........... WARNING: the benchmark result may be unstable * Not enough samples to get a stable result (95% certainly of less than 1% variation) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. react_agent_10x_sync: Mean +- std dev: 20.6 ms +- 0.9 ms ........... WARNING: the benchmark result may be unstable * Not enough samples to get a stable result (95% certainly of less than 1% variation) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. react_agent_10x_checkpoint: Mean +- std dev: 26.6 ms +- 0.6 ms ........... WARNING: the benchmark result may be unstable * Not enough samples to get a stable result (95% certainly of less than 1% variation) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. react_agent_10x_checkpoint_sync: Mean +- std dev: 22.3 ms +- 1.1 ms ........... WARNING: the benchmark result may be unstable * Not enough samples to get a stable result (95% certainly of less than 1% variation) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. react_agent_100x: Mean +- std dev: 732 ms +- 10 ms ........... react_agent_100x_sync: Mean +- std dev: 694 ms +- 5 ms ...........

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: robust time travel tests #5756

Summary

chore: robust time travel tests #5756

Uh oh!

bench.yml

Annotations