You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
├── analyze_phase7_observation.py # Offline analyzer for saved Phase 7 trajectories/logs
54
+
├── probe_rollout_speed_candidates.py # Performance probe for rollout-like generate modes without changing library code
55
+
└── ursa_model/ # Self-contained URSA model code used by actor and PRM loading
44
56
```
45
57
58
+
## File Roles
59
+
60
+
### 1. Primary training path
61
+
62
+
-`run_grpo_math_prm_ursa_8b.sh`
63
+
- Main launcher for the current URSA-MATH Stage 3 reproduction path.
64
+
- Wires actor path, reward path, dataset path, FSDP settings, W&B, and rollout options.
65
+
-`train_colocate.py`
66
+
- Real training entry used by `torchrun`.
67
+
- Loads actor / reference / reward model / dataset / trainer and starts the LightRFT PPO-GRPO loop.
68
+
-`ursa_actor.py`
69
+
- URSA-specific actor wrapper.
70
+
- Makes LightRFT load `UrsaForConditionalGeneration` instead of a generic VLM auto-class.
71
+
72
+
### 2. Reward and scoring path
73
+
74
+
-`reward_models.py`
75
+
- Contains all reward model classes used in this example directory.
76
+
- The active Stage 3 path is `MathPRMReward` and the PS-GRPO reward mapping built on top of URSA-RM-8B.
77
+
- Historical Qwen2VL multi-reward classes are still present in this file, but they are not part of the current URSA-MATH Stage 3 training path.
78
+
-`reward_models_utils.py`
79
+
- Handles reward model loading and reward function dispatch.
80
+
- Maps labels such as `math_prm` and `math_psgrpo` onto the current URSA reward path.
81
+
-`prm_infer_score.py`
82
+
- Standalone helper for step-level PRM inference.
83
+
- Useful when comparing LightRFT reward behavior against URSA-MATH reference scoring.
84
+
85
+
### 3. Data preparation and compatibility
86
+
87
+
-`prepare_ursa_stage3_manifest.py`
88
+
- Converts raw `MMathCoT-1M` Stage 3 data into the `prompt / images / reference / label` schema expected by LightRFT.
89
+
- Also performs a lightweight dataset/collate smoke check.
90
+
-`prepare_ursa_engine_checkpoint.py`
91
+
- Optional helper for engine experiments.
92
+
- Builds an engine-friendly wrapper checkpoint with the local URSA model code and `auto_map` metadata so vLLM/SGLang can at least attempt to load URSA.
93
+
-`sitecustomize.py`
94
+
- Local runtime/import hook used to keep this example stack compatible under the frozen Docker baseline.
95
+
96
+
### 4. Validation, smoke, and observation tools
97
+
98
+
-`check_phase2_alignment.py`
99
+
- Verifies that LightRFT `MathPRMReward` remains aligned with the URSA reference scorer on a concrete sample.
100
+
-`check_hf_rollout.py`
101
+
- Minimal end-to-end validation for LightRFT local `hf` rollout.
102
+
- Compares `gather_and_generate()` output against direct `actor.generate()`.
103
+
-`check_phase6_script_alignment.py`
104
+
- Static checker that confirms the Stage 3 launcher still matches the intended defaults.
105
+
-`test_phase2_alignment.py`
106
+
- Current regression test file for the URSA Stage 3 path.
107
+
- Covers alignment, reward mapping, answer extraction, rollout helper behavior, and related utilities.
108
+
-`run_phase3_smoke.sh`
109
+
- Time-boxed Phase 3 smoke launcher for “can it run, does it trend normally, and do we clean up GPUs afterward”.
110
+
-`run_phase7_observation.sh`
111
+
- Bounded full-data observation launcher for later-stage analysis.
112
+
-`analyze_phase7_observation.py`
113
+
- Offline analyzer for saved trajectories and training logs.
114
+
- Computes the Phase 7 health checklist and PRM image-ablation summary.
115
+
-`probe_rollout_speed_candidates.py`
116
+
- Minimal benchmark for rollout-like decode speed.
117
+
- Used to compare `fsdp_train_gc`, `fsdp_train_no_gc`, `fsdp_eval_no_gc`, and `raw_eval_no_gc` without modifying `lightrft/` itself.
118
+
119
+
### 5. Self-contained URSA runtime
120
+
121
+
-`ursa_model/`
122
+
- Local copy of the URSA model stack needed by both the actor and the PRM.
123
+
- Includes config, processor, image processor, projector, vision backbones, and model definitions.
124
+
- This directory is what allows the current Stage 3 path to run without depending on importing code directly from the external URSA-MATH repo.
125
+
126
+
## Current Entry Points
127
+
128
+
If you only care about the active Stage 3 reproduction path, the files you usually need are:
0 commit comments