Skip to content

Cannot reproduce Figure 10 pick success rate (28%) with public pi05_droid_jointpos checkpoint #14

@www-Ye

Description

@www-Ye

Summary

I'm trying to reproduce the zero-shot evaluation results in Figure 10 using the public pi05_droid_jointpos checkpoint and the official evaluation code, but consistently get significantly lower success rates on the Pick task (~18% vs. reported 28%).

Setup

  • Checkpoint: pi05_droid_jointpos, downloaded via gsutil cp -r gs://openpi-assets/checkpoints/pi05_droid_jointpos . as instructed in the README
  • Benchmark: FrankaPickHardBench_20260206_json_benchmark (procthor-objaverse, 2000 episodes)
  • Eval config: PiPolicyEvalConfig (policy_dt_ms=66.0)
  • task_horizon_steps: 300
  • Eval command:
    from molmo_spaces.evaluation.eval_main import run_evaluation
    results = run_evaluation(
        eval_config_cls=PiPolicyEvalConfig,
        benchmark_dir="...FrankaPickHardBench_20260206_json_benchmark",
        checkpoint_path="checkpoints/pi05_droid_jointpos",
        task_horizon_steps=300,
        num_workers=16,
    )
  • OpenPI server: serve_policy.py --policy.config=pi05_droid_jointpos

Results

Setting Pick Success Rate
Paper Figure 10 (π0.5) 28%
My result (objaverse 20260131, horizon=500) ~15.9%
My result (objaverse 20251016_from_20250610, horizon=500) ~18.4%
My result (objaverse 20251016_from_20250610, horizon=300) ~14.1%

Questions

  1. Which objaverse version? The code accepts both 20260131 and 20251016_from_20250610. Which was used for the reported results?

  2. Any other configuration details not documented (e.g., specific server-side settings, random seeds) that might affect reproducibility?

Environment

  • Python 3.10, conda environment
  • MuJoCo with EGL rendering
  • OpenPI server on the same machine

Any guidance on reproducing the reported numbers would be appreciated.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions