Fail to build the GPT-J docker after successfully installation of tensorrt-llm

Hi @arjunsuresh 

When I was running the below command to build the docker for GPT-J:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
 --model=gptj-99 \
 --implementation=nvidia \
 --framework=tensorrt \
 --category=edge \
 --scenario=Offline \
 --execution_mode=test \
 --device=cuda \
 --docker --quiet \
 --test_query_count=50

I got the failure as below, I'm not sure if it is related the existing docker (built for Resnet50 several days before) or not?


<pre>Successfully installed tensorrt-llm

[notice] A new release of pip is available: 23.3.1 -&gt; 24.3.1
[notice] To update, run: python3 -m pip install --upgrade pip
Initializing model from /mnt/models/GPTJ-6B/checkpoint-final
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16&lt;00:00, 5.48s/it]
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.float32.
Initializing tokenizer from /mnt/models/GPTJ-6B/checkpoint-final
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading calibration dataset
Traceback (most recent call last):
 File &quot;/code/tensorrt_llm/examples/quantization/quantize.py&quot;, line 363, in &lt;module&gt;
 main(args)
 File &quot;/code/tensorrt_llm/examples/quantization/quantize.py&quot;, line 255, in main
 calib_dataloader = get_calib_dataloader(
 File &quot;/code/tensorrt_llm/examples/quantization/quantize.py&quot;, line 187, in get_calib_dataloader
 dataset = load_dataset(&quot;cnn_dailymail&quot;, name=&quot;3.0.0&quot;, split=&quot;train&quot;)
 File &quot;/home/bob1/.local/lib/python3.10/site-packages/datasets/load.py&quot;, line 2129, in load_dataset
 builder_instance = load_dataset_builder(
 File &quot;/home/bob1/.local/lib/python3.10/site-packages/datasets/load.py&quot;, line 1849, in load_dataset_builder
 dataset_module = dataset_module_factory(
 File &quot;/home/bob1/.local/lib/python3.10/site-packages/datasets/load.py&quot;, line 1731, in dataset_module_factory
 raise e1 from None
 File &quot;/home/bob1/.local/lib/python3.10/site-packages/datasets/load.py&quot;, line 1618, in dataset_module_factory
 raise ConnectionError(f&quot;Couldn&apos;t reach &apos;{path}&apos; on the Hub ({e.__class__.__name__})&quot;) from e
ConnectionError: Couldn&apos;t reach &apos;cnn_dailymail&apos; on the Hub (LocalEntryNotFoundError)
make: *** [Makefile:102: devel_run] Error 1
make: Leaving directory &apos;/home/bob1/CM/repos/local/cache/2479e8f0ba164d4c/repo/docker&apos;

CM error: Portable CM script failed (name = get-ml-model-gptj, return code = 256)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!
</pre>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fail to build the GPT-J docker after successfully installation of tensorrt-llm #2022

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fail to build the GPT-J docker after successfully installation of tensorrt-llm #2022

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions