Description
Your current environment information
libibverbs not available, ibv_fork_init skipped
- hwloc 2.4.1rc3-git received invalid information from the operating system.
- Failed with: intersection without inclusion
- while inserting Group0 (cpuset 0x000000ff,,0x000000ff) at Package (P#0 cpuset 0x3fffffff,0xffffffff)
- coming from: linux:sysfs:numa
- The following FAQ entry in the hwloc documentation may help:
- What should I do when hwloc reports "operating system" warnings?
- Otherwise please report this error message to the hwloc user's mailing list,
- along with the files generated by the hwloc-gather-topology script.
- hwloc will now ignore this invalid topology information and continue.
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OneFlow version: path: ['/mnt/nfs/prd-llm-15/data1/nidongwang/anaconda3/envs/cogvideox/lib/python3.11/site-packages/oneflow'], version: 0.9.1.dev20241120+cu118, git_commit: cbb0a3e, cmake_build_type: Release, rdma: True, mlir: True, enterprise: False
Nexfort version: 0.1.dev261
OneDiff version: 1.2.1.dev28+g424c81a8
OneDiffX version: none
OS: Ubuntu 20.04 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.18.0
Libc version: glibc-2.31
Python version: 3.11.10 (main, Oct 3 2024, 07:29:13) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-48-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.6.77
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090
GPU 2: NVIDIA GeForce RTX 4090
GPU 3: NVIDIA GeForce RTX 4090
GPU 4: NVIDIA GeForce RTX 4090
GPU 5: NVIDIA GeForce RTX 4090
GPU 6: NVIDIA GeForce RTX 4090
GPU 7: NVIDIA GeForce RTX 4090
Nvidia driver version: 535.171.04
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 124
On-line CPU(s) list: 0-123
Thread(s) per core: 2
Core(s) per socket: 31
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7542 32-Core Processor
Stepping: 0
CPU MHz: 2899.998
BogoMIPS: 5799.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 3.9 MiB
L1i cache: 3.9 MiB
L2 cache: 31 MiB
L3 cache: 240 MiB
NUMA node0 CPU(s): 0-7,64-71
NUMA node1 CPU(s): 8-15,72-79
NUMA node2 CPU(s): 16-23,80-87
NUMA node3 CPU(s): 24-31,88-95
NUMA node4 CPU(s): 32-39,96-103
NUMA node5 CPU(s): 40-47,104-111
NUMA node6 CPU(s): 48-55,112-119
NUMA node7 CPU(s): 56-63,120-123
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid amd_dcm tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd virt_ssbd arat umip rdpid arch_capabilities
Versions of relevant libraries:
[pip3] diffusers==0.31.0
[pip3] numpy==2.1.3
[pip3] torch==2.3.0
[pip3] torchao==0.4.0
[pip3] torchaudio==2.3.0
[pip3] torchvision==0.18.0
[pip3] transformers==4.46.3
[pip3] triton==2.3.0
[conda] numpy 2.1.3 pypi_0 pypi
[conda] torch 2.3.0 pypi_0 pypi
[conda] torchao 0.4.0 pypi_0 pypi
[conda] torchaudio 2.3.0 pypi_0 pypi
[conda] torchvision 0.18.0 pypi_0 pypi
[conda] triton 2.3.0 pypi_0 pypi
🐛 Describe the bug
****************************************************************************
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/tmp/code/xDiT/./examples/cogvideox_example_t2v_by_file.py", line 123, in <module> [rank0]: main()
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/tmp/code/xDiT/./examples/cogvideox_example_t2v_by_file.py", line 40, in main
[rank0]: pipe = xFuserCogVideoXPipeline.from_pretrained( [
rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/tmp/code/xDiT/xfuser/model_executor/pipelines/pipeline_cogvideox.py", line 56, in from_pretrained
[rank0]: return cls(pipeline, engine_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/tmp/code/xDiT/xfuser/model_executor/pipelines/base_pipeline.py", line 81, in __init__
[rank0]: pipeline.transformer = self._convert_transformer_backbone(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/tmp/code/xDiT/xfuser/model_executor/pipelines/base_pipeline.py", line 313, in _convert_transformer_backbone
[rank0]: optimized_transformer_forward = od_compile(
[rank0]: ^^^^^^^^^^^
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/tmp/code/onediff/src/onediff/infer_compiler/backends/compiler.py", line 16, in compile
[rank0]: model = backend(torch_module, options=options)[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/tmp/code/onediff/src/onediff/infer_compiler/backends/oneflow/oneflow.py", line 41, in compile
[rank0]: set_oneflow_env_vars(options)
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/tmp/code/onediff/src/onediff/infer_compiler/backends/env_var.py", line 95, in set_oneflow_env_vars
[rank0]: _set_env_vars(field2env_var, options)
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/tmp/code/onediff/src/onediff/infer_compiler/backends/env_var.py", line 51, in _set_env_vars
[rank0]: for field in dataclasses.fields(options):
[rank0]:^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/nfs/prd-llm-15/data1/root/anaconda3/envs/cogvideox/lib/python3.11/dataclasses.py", line 1246, in fields
[rank0]: raise TypeError('must be called with a dataclass type or instance') from None
[rank0]: TypeError: must be called with a dataclass type or instance