Skip to content

bug: the new image doesn't have the accelerate command in the PATH #666

@HarikrishnanBalagopal

Description

@HarikrishnanBalagopal

Describe the bug

The image built from main doesn't have accelerate command available in the PATH

      imageID: 'quay.io/foundation-model-stack/fms-hf-tuning@sha256:4f677383d504502fa73d5fe3b62048189211cf4beaaf8d56ca26045228c33ac8'
      image: 'quay.io/foundation-model-stack/fms-hf-tuning:main-nvcr-latest'
1000800000@example:~$ echo $SHELL
/bin/bash
1000800000@example:~$ accelerate
bash: accelerate: command not found

Platform

Please provide details about the environment you are using, including the following:

  • Interpreter version: Python 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] on linux
  • Library version: latest main
1000800000@example:~/fms-hf-tuning$ pwd
/app/fms-hf-tuning

1000800000@example:~/fms-hf-tuning$ git log
commit 3c27af0e6886485838e914a0813828316c8d9b8c (grafted, HEAD -> main, origin/main)
Author: Dushyant Behl <dushyantbehl@users.noreply.github.com>
Date:   Wed Mar 4 21:05:14 2026 +0530

    Add app folder in nvcr image to mimic dockerfile (#665)
    
    Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com>

Sample Code

You can bring up the container locally with docker/podman.

Here's a pod yaml if you want to run in a K8s/Openshift cluster

apiVersion: v1
kind: Pod
metadata:
  name: example
spec:
  containers:
    - name: mycontainer
      image: 'quay.io/foundation-model-stack/fms-hf-tuning:main-nvcr-latest'
      command:
        - bash
        - '-c'
        - |
          echo 'sleeping...'
          tail -f /dev/null

Go inside the container and try to run accelerate

oc exec -it <podname> -- bash

Expected behavior

The accelerate command should be available in the PATH

Observed behavior

$ accelerate
bash: accelerate: command not found

Additional context

$ docker run --rm -it --entrypoint bash quay.io/foundation-model-stack/fms-hf-tuning:main-nvcr-latest

root@af4efe7bc238:~# accelerate
bash: accelerate: command not found

root@af4efe7bc238:~# pwd
/app

root@af4efe7bc238:~# ls -la
total 20
drwxrwxr-x.  1 root root   42 Mar  4 16:55 .
drwxr-xr-x.  1 root root   10 Mar  5 07:55 ..
-rw-r--r--.  1 root root 3247 Mar  4 15:35 accelerate_fsdp_defaults.yaml
-rwxr-xr-x.  1 root root 8294 Mar  4 15:35 accelerate_launch.py
drwxr-xr-x.  2 root root   22 Mar  4 16:55 build
drwxr-xr-x. 13 root root 4096 Mar  4 15:44 fms-hf-tuning

root@af4efe7bc238:~# echo $PATH
/usr/local/lib/python3.12/dist-packages/torch_tensorrt/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/amazon/efa/bin:/opt/tensorrt/bin

root@af4efe7bc238:~# echo $VIRTUAL_ENV

root@af4efe7bc238:~# echo $CONDA_PREFIX

root@af4efe7bc238:~# python3
Python 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
root@af4efe7bc238:~# python3 --version
Python 3.12.3
root@af4efe7bc238:~# python
Python 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
root@af4efe7bc238:~# python --version
Python 3.12.3
root@af4efe7bc238:~# which python
/usr/bin/python
root@af4efe7bc238:~# which python3
/usr/bin/python3

root@af4efe7bc238:~# python3
Python 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '/usr/local/lib/python3.12/dist-packages', '/usr/local/lib/python3.12/dist-packages/nvfuser-0.2.25a0+6627725-py3.12-linux-x86_64.egg', '/usr/local/lib/python3.12/dist-packages/lightning_thunder-0.2.0.dev0-py3.12.egg', '/usr/local/lib/python3.12/dist-packages/opt_einsum-3.4.0-py3.12.egg', '/usr/local/lib/python3.12/dist-packages/dill-0.3.9-py3.12.egg', '/usr/local/lib/python3.12/dist-packages/lightning_utilities-0.12.0-py3.12.egg', '/usr/local/lib/python3.12/dist-packages/looseversion-1.3.0-py3.12.egg', '/usr/local/lib/python3.12/dist-packages/sympy-1.13.1-py3.12.egg', '/usr/lib/python3/dist-packages']

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions