[Fix] Merge previous contributions from fw/refactor to lite (areal-project#163)

garrett4wade · meizhiyu.mzy · zhaochenyang20 · web-flow · commit 3bf9c85e400a · 2025-07-10T12:56:24.000+08:00
* initial proposal * add arealite * . * change api * . * remove LOG_ROOT * remove MODEL_SAVE_PATH * remove PARAM_REALLOC_PATH, DATASET_CACHE * prepare for testing * prepare for testing * ready for run * local run * tests mainly pass * format * . * amend cluster.py * . * . * client test pass * pass rollout test * remove unused imports * add arealite readme * change api * . * . * . * . * . * . * . * . * format * . * implement iteraptable generation (areal-project#112) Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com> * . * fix * . * . * . * pass controller generate batch test * . * refactor rollout controller into worker and controller * . * . * . * change to async rollout * pass rollout controller test * pass test * . * update readme * . * sft debug * . * add lisence * remove unused files * remove unsed args in ppo * add hf engine wrapper (areal-project#116) * add hf engine * fix issues * fix ppo bugs and add test * add hf client interface and modify cli args * fix bugs * fix issues * Merge fw/refactor * Finish hf wrapper test * add test --------- Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com> * format * format * . * refine hf engine * . * fix * add fsdp engine and sft tests * . * . * . * pass ppo unittest * pass ppo and rollout controller tests * clear unused imports * rename ppo to grpo * change reward function organization * reorganize code * add dataset api * . * . * . * format * chmod fix * . * rename workflow to collector * refactor llm_client location * . * . * fix llm server api * refactor config structure * . * fix tests * . * . * . * Fix unresolved issue in SFTTrainer PR (areal-project#139) * . * . * efficient loading * format * . * . * . * . * . * . * Add CI for testing AReaLite (areal-project#150) * ci: add test-arealite * ci: add checkout before running test-arealite * ci: add USERNAME * ci: add test script * ci: add GitHub mirror * ci: fix typo * ci: clone one commit * ci: fix condition * ci: set command timeout to 60m * ci: enable pip cache * ci: optimize container lifecycle * ci: split into many stages * ci(test-arealite): fix typo * ci: fix wrong env * ci: fix pytest * ci: uninstall transformer-engine * ci: uninstall transformer-engine * ci: fix model paths * ci: show stdout/stderr * ci: fix not clean up * ci: backup sglang * ci: remove tmp repo dir when run * ci: fix docker run exit 1 condition * ci(test-arealite): limit the concurrency and extend command timeout * . * merge fw/refactor * revert some changes * fix --------- Co-authored-by: meizhiyu.mzy <meizhiyu.mzy@antgroup.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com> Co-authored-by: Jayon02 <qiujiangc@outlook.com> Co-authored-by: root <meizhiyu.mzy> Co-authored-by: Zijian Zhang <futrime@outlook.com>
diff --git a/arealite/README.md b/arealite/README.md
@@ -737,4 +737,4 @@ dataloader = StatefulDataLoader(
 )
 for data in dataloader:
     assert isinstance(data, list)
-```
+```
diff --git a/arealite/api/cli_args.py b/arealite/api/cli_args.py
@@ -4,6 +4,9 @@
 from pathlib import Path
 from typing import Dict, List, Optional, Tuple
 
+import uvloop
+
+uvloop.install()
 from hydra import compose as hydra_compose
 from hydra import initialize as hydra_init
 from omegaconf import MISSING, OmegaConf
diff --git a/pyproject.toml b/pyproject.toml
@@ -53,9 +53,9 @@ dependencies = [
     "hydra-core==1.4.0.dev1",
     "packaging",
     "tabulate",
+    "gymnasium>=1.1.1",
     "torchdata",
     "autoflake",
-    "gymnasium",
     "tensordict",
     
     # Monitoring and logging
diff --git a/realhf/api/core/data_api.py b/realhf/api/core/data_api.py
@@ -8,6 +8,7 @@
 import random
 import time
 from contextlib import contextmanager
+from functools import lru_cache
 
 # NOTE: We don't sue wildcard importing here because the type
 # `Sequence` has a very similar name to `SequenceSample`.
@@ -47,6 +48,7 @@
 RL_TASKS = ["math", "code", "rlhf", "stem"]
 
 
+@lru_cache(maxsize=8)
 def load_hf_tokenizer(
     model_name_or_path: str,
     fast_tokenizer=True,
diff --git a/requirements.txt b/requirements.txt
@@ -69,8 +69,8 @@ word2number
 Pebble
 timeout-decorator
 prettytable
+gymnasium>=1.1.1
 swanlab[dashboard]
 torchdata
 autoflake
-gymnasium
-tensordict
+tensordict

Original file line number	Diff line number	Diff line change
`@@ -737,4 +737,4 @@ dataloader = StatefulDataLoader(`
`737`	`737`	`)`
`738`	`738`	`for data in dataloader:`
`739`	`739`	`assert isinstance(data, list)`
`740`		-```
	`740`	+```