Skip to content

AssertionError for FP8 Benchmarks #3954

@Hukongtao

Description

@Hukongtao

System Info

- `Accelerate` version: 1.13.0.dev0
- Platform: Linux-6.8.0-100-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /data/hukongtao/miniconda3/envs/transformer_engine/bin/accelerate
- Python version: 3.12.12
- Numpy version: 2.4.2
- PyTorch version: 2.10.0+cu128
- PyTorch accelerator: CUDA
- System RAM: 125.56 GB
- GPU type: NVIDIA GeForce RTX 4090 D
- `Accelerate` default config:
        Not found

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

cd benchmarks/fp8/transformer_engine/
python non_distributed.py

Expected behavior

Code ran successfully. But I got

Traceback (most recent call last):
  File "/mnt/hukongtao/codebase/accelerate/benchmarks/fp8/transformer_engine/non_distributed.py", line 118, in <module>
    baseline_not_trained, baseline_trained = train_baseline()
                                             ^^^^^^^^^^^^^^^^
  File "/mnt/hukongtao/codebase/accelerate/benchmarks/fp8/transformer_engine/non_distributed.py", line 73, in train_baseline
    assert trained_model_results["accuracy"] > base_model_results["accuracy"], (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Accuracy should be higher for the trained model: 0.685 > 0.685

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions