Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing tests on MPS #1707

Open
SalmanMohammadi opened this issue Sep 27, 2024 · 0 comments
Open

Failing tests on MPS #1707

SalmanMohammadi opened this issue Sep 27, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@SalmanMohammadi
Copy link
Collaborator

SalmanMohammadi commented Sep 27, 2024

Several of our tests fail due to precision differences between MPS and CUDA/CPU falling outside of numerical tolerances on our tests:

$ pytest tests

FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched[prompt_tokens_batched-generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched[prompt_tokens_batched-generation_model_kv_cache_batched] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched[prompt_tokens_batched_left_padded-generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched[prompt_tokens_batched_left_padded-generation_model_kv_cache_batched] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens[generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens[generation_model_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched_uneven_stopping[generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched_uneven_stopping[generation_model_kv_cache_batched] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched_uneven_stopping_left_padded[generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched_uneven_stopping_left_padded[generation_model_kv_cache_batched] - assert False
FAILED tests/torchtune/models/llama3_1/test_position_embeddings.py::TestLlama3ScaledRoPE::test_forward - AssertionError: actual: -83.30372619628906, expected: -83.15229797363281
FAILED tests/torchtune/models/llama3_1/test_position_embeddings.py::TestLlama3ScaledRoPE::test_forward_with_curr_pos - AssertionError: actual: -83.30372619628906, expected: -83.15229797363281
FAILED tests/torchtune/models/llama3_1/test_position_embeddings.py::TestLlama3ScaledRoPE::test_forward_with_2d_pos_ids - AssertionError: actual: -83.30372619628906, expected: -83.15229797363281
FAILED tests/torchtune/modules/test_position_embeddings.py::TestRotaryPositionEmbedding::test_forward - AssertionError: actual: 2165.59619140625, expected: 2165.705322265625
FAILED tests/torchtune/modules/test_position_embeddings.py::TestRotaryPositionEmbedding::test_forward_with_curr_pos - AssertionError: actual: 2165.59619140625, expected: 2165.705322265625
FAILED tests/torchtune/modules/test_position_embeddings.py::TestRotaryPositionEmbedding::test_forward_with_packed_pos - AssertionError: actual: 2165.59619140625, expected: 2165.705322265625
FAILED tests/torchtune/modules/test_position_embeddings.py::TestPhi3RotaryPositionalEmbeddings::test_forward - AssertionError: actual: -381.06915283203125, expected: -381.06201171875
FAILED tests/torchtune/modules/test_transformer_decoder.py::TestTransformerCrossAttentionLayer::test_forward - AssertionError: actual: 1.7740381956100464, expected: 1.7762000560760498
FAILED tests/torchtune/modules/test_transformer_decoder.py::TestTransformerDecoder::test_forward - AssertionError: actual: 20.48008918762207, expected: 20.479999542236328
$ pytest tests -m integration_test
FAILED tests/recipes/test_ppo_full_finetune_single_device.py::TestPPOFullFinetuneSingleDeviceRecipe::test_loss - AssertionError: Scalars are not close!
FAILED tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_generate_results - AssertionError: assert 'Country maior Connection Kohćutsójcustomulas Sometimes Security' in 'INFO     torchtune.utils._logging:_logging.py:101 Running InferenceRecipe with resolv...

It's likely that the PPO integration test fails since the recipe relies on a generation step.

@SalmanMohammadi SalmanMohammadi added the bug Something isn't working label Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant