-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Description
System Info
- `Accelerate` version: 1.13.0.dev0
- Platform: Linux-6.8.0-100-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /data/hukongtao/miniconda3/envs/transformer_engine/bin/accelerate
- Python version: 3.12.12
- Numpy version: 2.4.2
- PyTorch version: 2.10.0+cu128
- PyTorch accelerator: CUDA
- System RAM: 125.56 GB
- GPU type: NVIDIA GeForce RTX 4090 D
- `Accelerate` default config:
Not foundInformation
- The official example scripts
- My own modified scripts
Tasks
- One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - My own task or dataset (give details below)
Reproduction
cd benchmarks/fp8/transformer_engine/
python non_distributed.py
Expected behavior
Code ran successfully. But I got
Traceback (most recent call last):
File "/mnt/hukongtao/codebase/accelerate/benchmarks/fp8/transformer_engine/non_distributed.py", line 118, in <module>
baseline_not_trained, baseline_trained = train_baseline()
^^^^^^^^^^^^^^^^
File "/mnt/hukongtao/codebase/accelerate/benchmarks/fp8/transformer_engine/non_distributed.py", line 73, in train_baseline
assert trained_model_results["accuracy"] > base_model_results["accuracy"], (
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Accuracy should be higher for the trained model: 0.685 > 0.685
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels