Skip to content

[BUG: utils.validate_data has inconsistent use of tabs and spaces in indentation which leads to crashing] #95

@Hackerbone

Description

@Hackerbone

Python Version

(venv) (base) admin@testbench:~/mistral-finetune$ python -VV
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]

Pip Freeze

absl-py==2.1.0
annotated-types==0.7.0
attrs==24.2.0
certifi==2024.7.4
charset-normalizer==3.3.2
docstring_parser==0.16
filelock==3.15.4
fire==0.6.0
fsspec==2024.6.1
grpcio==1.66.0
idna==3.8
Jinja2==3.1.4
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
Markdown==3.7
MarkupSafe==2.1.5
mistral_common==1.3.4
mpmath==1.3.0
networkx==3.3
numpy==2.1.0
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
packaging==24.1
protobuf==5.27.3
pydantic==2.8.2
pydantic_core==2.20.1
PyYAML==6.0.2
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
rpds-py==0.20.0
safetensors==0.4.4
sentencepiece==0.2.0
simple_parsing==0.1.5
six==1.16.0
sympy==1.13.2
tensorboard==2.17.1
tensorboard-data-server==0.7.2
termcolor==2.4.0
tiktoken==0.7.0
torch==2.2.0
tqdm==4.66.5
triton==2.2.0
typing_extensions==4.12.2
urllib3==2.2.2
Werkzeug==3.0.4
xformers==0.0.24

Reproduction Steps

  1. Clone the repository
  2. Change directory to mistral-finetune
  3. Try running the validate script - python -m utils.validate_data --train_yaml example/7B.yaml

Output:

(venv) (base) admin@testbench:~/mistral-finetune$ python -m utils.validate_data --train_yaml example/7B.yaml 
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/opt/conda/lib/python3.10/runpy.py", line 157, in _get_module_details
    code = loader.get_code(mod_name)
  File "<frozen importlib._bootstrap_external>", line 1017, in get_code
  File "<frozen importlib._bootstrap_external>", line 947, in source_to_code
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/admin/mistral-finetune/utils/validate_data.py", line 113
    else:
TabError: inconsistent use of tabs and spaces in indentation

There is an error on line number 113 of the file utils/validate_data.py due to improper indentation.

Expected Behavior

Expected Behaviour is that the script should run properly without crashing due to unhandled indentation in the else statement

Additional Context

No response

Suggested Solutions

Simple fix is fixing the indentation properly so that the code does not break. Will open a PR for the same and link it here.

From this:

        if params_config["dim"] == 4096 and params_config.get("moe") is None:
            model_id = "open-mistral-7b"
        elif params_config["dim"] == 4096 and params_config.get("moe") is not None:
            model_id = "open-mixtral-8x7b"
        elif params_config["dim"] == 6144:
            model_id = "open-mixtral-8x22b"
        elif params_config["dim"] == 12288:
            model_id = "mistral-large-latest"
        elif params_config["dim"] == 5120:
            model_id = "open-mistral-nemo"
    else:
            raise ValueError("Provided model folder seems incorrect.")
    else:
        model_id = train_args.model_id_or_path

To this:

        if params_config["dim"] == 4096 and params_config.get("moe") is None:
            model_id = "open-mistral-7b"
        elif params_config["dim"] == 4096 and params_config.get("moe") is not None:
            model_id = "open-mixtral-8x7b"
        elif params_config["dim"] == 6144:
            model_id = "open-mixtral-8x22b"
        elif params_config["dim"] == 12288:
            model_id = "mistral-large-latest"
        elif params_config["dim"] == 5120:
            model_id = "open-mistral-nemo"
        else:
            raise ValueError("Provided model folder seems incorrect.")
    else:
        model_id = train_args.model_id_or_path

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions