Open
Description
🐛 Describe the bug
$ python torchchat.py export stories110M --dtype float16 --output-dso-path stories.so
Using device=cuda
Setting max_seq_length to 300 for DSO export.
Loading model...
Time to load model: 0.44 seconds
-----------------------------------------------------------
Exporting model using AOT Inductor to /content/torchchat-1/stories.so
W1017 20:10:20.554000 7389 torch/_export/__init__.py:225] +============================+
W1017 20:10:20.554000 7389 torch/_export/__init__.py:226] | !!! WARNING !!! |
W1017 20:10:20.555000 7389 torch/_export/__init__.py:227] +============================+
W1017 20:10:20.555000 7389 torch/_export/__init__.py:228] torch._export.aot_compile() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export()) instead.
The generated DSO model can be found at: /content/torchchat-1/stories.so
2024-10-17 20:12:01.733978: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-17 20:12:01.753928: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-17 20:12:01.759899: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-17 20:12:01.774061: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-17 20:12:02.820101: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Using device=cuda
Loading model...
Time to load model: 0.48 seconds
-----------------------------------------------------------
$ python torchchat.py eval stories110M --dtype float16 --dso-path stories.so --limit 5
2024-10-17:20:12:09,610 INFO [huggingface.py:162] Using device 'cuda'
config.json: 100% 665/665 [00:00<00:00, 3.09MB/s]
model.safetensors: 100% 548M/548M [00:05<00:00, 101MB/s]
generation_config.json: 100% 124/124 [00:00<00:00, 733kB/s]
tokenizer_config.json: 100% 26.0/26.0 [00:00<00:00, 132kB/s]
vocab.json: 100% 1.04M/1.04M [00:00<00:00, 4.67MB/s]
merges.txt: 100% 456k/456k [00:00<00:00, 1.09MB/s]
tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 2.14MB/s]
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
2024-10-17:20:12:27,047 WARNING [task.py:763] [Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
2024-10-17:20:12:27,047 WARNING [task.py:775] [Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
2024-10-17:20:12:27,047 WARNING [task.py:763] [Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
2024-10-17:20:12:27,047 WARNING [task.py:775] [Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
2024-10-17:20:12:27,047 WARNING [task.py:763] [Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte
2024-10-17:20:12:27,047 WARNING [task.py:775] [Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False
wikitext_document_level.py: 100% 10.7k/10.7k [00:00<00:00, 39.4MB/s]
README.md: 100% 7.78k/7.78k [00:00<00:00, 32.7MB/s]
Repo card metadata block was not found. Setting CardData to empty.
2024-10-17:20:12:29,949 WARNING [repocard.py:107] Repo card metadata block was not found. Setting CardData to empty.
Downloading data: 100% 4.72M/4.72M [00:00<00:00, 7.37MB/s]
Generating test split: 62 examples [00:00, 656.90 examples/s]
Generating train split: 629 examples [00:00, 1999.28 examples/s]
Generating validation split: 60 examples [00:00, 2830.26 examples/s]
2024-10-17:20:12:32,165 INFO [task.py:395] Building contexts for wikitext on rank 0...
100% 5/5 [00:00<00:00, 420.70it/s]
2024-10-17:20:12:32,178 INFO [evaluator.py:362] Running loglikelihood_rolling requests
0% 0/5 [00:01<?, ?it/s]
Time to run eval: 23.69s.
Traceback (most recent call last):
File "/content/torchchat-1/torchchat.py", line 92, in <module>
eval_main(args)
File "/content/torchchat-1/torchchat/usages/eval.py", line 271, in main
result = eval(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/content/torchchat-1/torchchat/usages/eval.py", line 217, in eval
eval_results = evaluate(
File "/usr/local/lib/python3.10/dist-packages/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lm_eval/evaluator.py", line 373, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
File "/usr/local/lib/python3.10/dist-packages/lm_eval/models/huggingface.py", line 840, in loglikelihood_rolling
string_nll = self._loglikelihood_tokens(
File "/usr/local/lib/python3.10/dist-packages/lm_eval/models/huggingface.py", line 1074, in _loglikelihood_tokens
logits = torch.gather(logits, 2, cont_toks.unsqueeze(-1)).squeeze(
RuntimeError: Size does not match at dimension 1 expected index [1, 1537, 1] to be smaller than self [1, 1, 32000] apart from dimension 2
Versions
--2024-10-17 20:26:14-- https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23357 (23K) [text/plain]
Saving to: ‘collect_env.py’
collect_env.py 100%[===================>] 22.81K --.-KB/s in 0s
2024-10-17 20:26:14 (156 MB/s) - ‘collect_env.py’ saved [23357/23357]
Collecting environment information...
PyTorch version: 2.6.0.dev20241002+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.30.4
Libc version: glibc-2.35
Python version: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.1.85+-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 535.104.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.6
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU @ 2.00GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
Stepping: 3
BogoMIPS: 4000.28
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32 KiB (1 instance)
L1i cache: 32 KiB (1 instance)
L2 cache: 1 MiB (1 instance)
L3 cache: 38.5 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0,1
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable; SMT Host state unknown
Vulnerability Meltdown: Vulnerable
Vulnerability Mmio stale data: Vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Vulnerable
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2: Vulnerable; IBPB: disabled; STIBP: disabled; PBRSB-eIBRS: Not affected; BHI: Vulnerable (Syscall hardening enabled)
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Vulnerable
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] optree==0.13.0
[pip3] pytorch-triton==3.1.0+cf34004b8a
[pip3] torch==2.6.0.dev20241002+cu121
[pip3] torchao==0.5.0
[pip3] torchaudio==2.4.1+cu121
[pip3] torchsummary==1.5.1
[pip3] torchtune==0.4.0.dev20241010+cu121
[pip3] torchvision==0.20.0.dev20241002+cu121
[conda] Could not collect