Skip to content

ligthning detect wrong backend for HPU #358

@Delaunay

Description

@Delaunay
lightning-gpus.0 [stderr] ============================= HABANA PT BRIDGE CONFIGURATION ===========================
lightning-gpus.0 [stderr]  PT_HPU_LAZY_MODE = 0
lightning-gpus.0 [stderr]  PT_HPU_RECIPE_CACHE_CONFIG = ,false,1024
lightning-gpus.0 [stderr]  PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
lightning-gpus.0 [stderr]  PT_HPU_LAZY_ACC_PAR_MODE = 1
lightning-gpus.0 [stderr]  PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
lightning-gpus.0 [stderr]  PT_HPU_EAGER_PIPELINE_ENABLE = 1
lightning-gpus.0 [stderr]  PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
lightning-gpus.0 [stderr]  PT_HPU_ENABLE_LAZY_COLLECTIVES = 0
lightning-gpus.0 [stderr] ---------------------------: System Configuration :---------------------------
lightning-gpus.0 [stderr] Num CPU Cores : 512
lightning-gpus.0 [stderr] CPU RAM       :
lightning-gpus.0 [stderr] 2377534492 KB
lightning-gpus.0 [stderr] ------------------------------------------------------------------------------
lightning-gpus.0 [stderr] /home/ubuntu/hpu/results/venv/torch/lib/python3.10/site-packages/habana_frameworks/torch/gpu_migration/__init__.py:46: UserWarning: apex not installed, gpu_migration will not swap api for this package.
lightning-gpus.0 [stderr]   warnings.warn(
lightning-gpus.0 [stdout] HPU cannot disable tf32
lightning-gpus.0 [stderr] /home/ubuntu/hpu/results/venv/torch/lib/python3.10/site-packages/lightning/pytorch/utilities/imports.py:40: Import of lightning_habana package failed for some compatibility issues:
lightning-gpus.0 [stderr] `habana_dataloader` package is not installed.
lightning-gpus.0 [stderr] Using bfloat16 Automatic Mixed Precision (AMP)
lightning-gpus.0 [stderr] GPU available: True (cuda), used: True
lightning-gpus.0 [stderr] TPU available: False, using: 0 TPU cores
lightning-gpus.0 [stderr] HPU available: False, using: 0 HPUs
lightning-gpus.0 [stderr] You are using a CUDA device ('GAUDI3') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
lightning-gpus.0 [stderr] Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8
lightning-gpus.0 [stderr] ----------------------------------------------------------------------------------------------------
lightning-gpus.0 [stderr] distributed_backend=nccl
lightning-gpus.0 [stderr] All distributed processes registered. Starting with 8 processes
lightning-gpus.0 [stderr] ----------------------------------------------------------------------------------------------------

Reproduction steps

git clone https://github.com/mila-iqia/milabench.git -b gaudi3
cd milabench/docker
sudo docker build --build-arg CACHEBUST=`git rev-parse gaudi3` -f Dockerfile-hpu -t dockerfile-hpu .

sudo docker run   -it   --runtime=habana   -e HABANA_VISIBLE_DEVICES=all   -e OMPI_MCA_btl_vader_single_copy_mechanism=none   --shm-size 50G   --cap-add=sys_nice   --net=host   dockerfile-hpu:latest   bash

. $MILABENCH_VENV/bin/activate
milabench prepare --use-current-env --select lightning
milabench run --use-current-env --select lightning

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions