Skip to content

Bug: TOKENIZERS_PARALLELISM_ENV constant has wrong/truncated value in oneshot.py #2595

@kuishou68

Description

@kuishou68

Bug Description

The constant TOKENIZERS_PARALLELISM_ENV in src/llmcompressor/entrypoints/oneshot.py has been assigned a corrupted/truncated value:

TOKENIZERS_PARALLELISM_ENV="TOKENI...LISM"

It should be:

TOKENIZERS_PARALLELISM_ENV="TOKENIZERS_PARALLELISM"

Impact

This bug means that:

  1. The Oneshot.__init__ method sets the environment variable TOKENI...LISM=false instead of the correct TOKENIZERS_PARALLELISM=false.
  2. The HuggingFace tokenizer warning about parallelism conflicts is never suppressed, meaning the original issue [Help Wanted] Tokenzier warning messages #2007 that PR fix: suppress tokenizer parallelism warning in oneshot #2183 was meant to fix is still present.
  3. The warning message in the code even references the wrong variable name, printing TOKENI...LISM=false to suppress this warning, which is misleading to users.

Steps to Reproduce

from llmcompressor.entrypoints.oneshot import TOKENIZERS_PARALLELISM_ENV
print(repr(TOKENIZERS_PARALLELISM_ENV))  # prints 'TOKENI...LISM' instead of 'TOKENIZERS_PARALLELISM'

The associated unit tests in tests/llmcompressor/transformers/oneshot/test_tokenizer_parallelism.py check that os.environ[_TOKENIZERS_PARALLELISM_ENV] is set, but since the constant name is wrong, those tests pass while the real env var TOKENIZERS_PARALLELISM remains unset.

Fix

Change line 35 of src/llmcompressor/entrypoints/oneshot.py from:

TOKENIZERS_PARALLELISM_ENV="TOKENI...LISM"

to:

TOKENIZERS_PARALLELISM_ENV="TOKENIZERS_PARALLELISM"

This was likely introduced by a find/replace or editor issue that truncated the string value.

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions