Open
Description
🐛 Describe the bug
The following code:
import torch
from torch.profiler import ProfilerActivity, profile, record_function, tensorboard_trace_handler
DEVICE = "cuda:1"
def main():
t = torch.rand(10, 10).to(DEVICE)
for _ in range(100):
t = t @ t
trace_handler = tensorboard_trace_handler("pytorch_traces", use_gzip=True)
profiler = profile(
activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
profile_memory=True,
with_stack=True,
on_trace_ready=trace_handler,
)
# profile the main function
profiler.start()
main()
profiler.stop()
fails with:
Traceback (most recent call last):
File "/import/bc_workspaces/biocomp/tboyer/sources/GaussianProxy/error_repro.py", line 25, in <module>
profiler.stop()
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 722, in stop
self._transit_action(self.current_action, None)
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 751, in _transit_action
action()
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 745, in _trace_ready
self.on_trace_ready(self)
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 444, in handler_fn
prof.export_chrome_trace(os.path.join(dir_name, file_name))
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 220, in export_chrome_trace
fout.writelines(fin)
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 5237: invalid start byte
with varying bytes and positions ((0xf8, 5248)
, etc), and either start
or continuation
byte.
Versions
Environment information
PyTorch version: 2.4.0
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) [GCC 12.4.0] (64-bit runtime)
Python platform: Linux-5.8.0-63-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA L40S
GPU 1: NVIDIA L40S
GPU 2: NVIDIA L40S
GPU 3: NVIDIA L40S
Nvidia driver version: 550.54.14
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture : x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme : Little Endian
Address sizes: 52 bits physical, 57 bits virtual
Processeur(s) : 64
Liste de processeur(s) en ligne : 0-63
Thread(s) par cœur : 1
Cœur(s) par socket : 1
Socket(s) : 64
Nœud(s) NUMA : 1
Identifiant constructeur : GenuineIntel
Famille de processeur : 6
Modèle : 143
Nom de modèle : Intel(R) Xeon(R) Gold 6426Y
Révision : 8
Vitesse du processeur en MHz : 2500.000
BogoMIPS : 5000.00
Virtualisation : VT-x
Constructeur d'hyperviseur : KVM
Type de virtualisation : complet
Cache L1d : 2 MiB
Cache L1i : 2 MiB
Cache L2 : 256 MiB
Cache L3 : 1 GiB
Nœud NUMA 0 de processeur(s) : 0-63
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; TSX disabled
Drapaux : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid cldemote movdiri movdir64b fsrm md_clear arch_capabilities
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.4.0
[pip3] torchinfo==1.8.0
[pip3] torchvision==0.19.0
[pip3] triton==3.0.0
[conda] Could not collect
cc @robieta @chaekit @aaronenyeshi @guotuofeng @guyang3532 @dzhulgakov @davidberard98 @briancoutinho @sraikund16 @sanrise