-
Notifications
You must be signed in to change notification settings - Fork 659
Description
Hello,
I have been using exrmetrics as a poor man's conversion tool to try different codecs / parameters as supported by OpenEXR.
After converting a few hundred gigs of data, I realized that certain files were actually corrupted but I was unable to repro reliably.
To repro I created a small python script that calls exrmetric twice:
- Once to convert it to htj2k32 or zip
- Once to read to make sure it's correct
the command line is:
exrmetric --convert infile -o output_path -z htj2k32 -t 10
I attached the script if it saves you some time to repro
run.py
This file managed to get to trip fairly easily: https://openexr.com/en/latest/test_images/DisplayWindow/t05.html
I repeat this loop up to 5000 (I manage to get it to corrupt from 1-3000 iterations ). Running with -t 1 seems to complete a few thousand iterations succesfully.
I have recompiled openEXR with thread sanitizer enabled and got the following error when converting to zip:
-------------------- STDERR --------------------
==================
WARNING: ThreadSanitizer: data race (pid=86644)
Write of size 8 at 0x00016dbde5b0 by main thread:
#0 std::__1::vector<char, std::__1::allocator<char>>::__base_destruct_at_end[abi:ne200100](char*) vector.h:750 (exrmetrics:arm64+0x1000557b8)
#1 std::__1::vector<char, std::__1::allocator<char>>::clear[abi:ne200100]() vector.h:531 (exrmetrics:arm64+0x100055488)
#2 std::__1::vector<char, std::__1::allocator<char>>::__destroy_vector::operator()[abi:ne200100]() vector.h:248 (exrmetrics:arm64+0x1000552b0)
#3 std::__1::vector<char, std::__1::allocator<char>>::~vector[abi:ne200100]() vector.h:259 (exrmetrics:arm64+0x100056ea8)
#4 std::__1::vector<char, std::__1::allocator<char>>::~vector[abi:ne200100]() vector.h:259 (exrmetrics:arm64+0x100056ddc)
#5 MemOStream::~MemOStream() exrmetrics.cpp:876 (exrmetrics:arm64+0x100056d4c)
#6 MemOStream::~MemOStream() exrmetrics.cpp:876 (exrmetrics:arm64+0x100050128)
#7 exrmetrics(char const*, char const*, int, Imf_3_4::Compression, float, int, bool, bool, PixelMode, bool) exrmetrics.cpp:1140 (exrmetrics:arm64+0x10004d378)
#8 exrmetrics(char const*, char const*, int, Imf_3_4::Compression, float, int, bool, bool, PixelMode, bool) exrmetrics.cpp:972 (exrmetrics:arm64+0x10004b564)
#9 <null> <null> (0x00018825ab98)
Previous read of size 8 at 0x00016dbde5b0 by thread T6 (mutexes: write M0):
#0 std::__1::vector<char, std::__1::allocator<char>>::size[abi:ne200100]() const vector.h:385 (exrmetrics:arm64+0x100055618)
#1 MemIStream::readMemoryMapped(int) exrmetrics.cpp:912 (exrmetrics:arm64+0x100056320)
#2 Imf_3_4::istream_nonparallel_read(_priv_exr_context_t const*, void*, void*, unsigned long long, unsigned long long, int (*)(_priv_exr_context_t const*, int, char const*, ...)) <null> (libOpenEXR-3_4.33.3.4.3.dylib:arm64+0x14478)
Location is stack of main thread.
Mutex M0 (0x000106d010e0) created at:
#0 pthread_mutex_lock <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x31494)
#1 std::__1::mutex::lock() <null> (libc++.1.dylib:arm64e+0x1f3d8)
#2 exrmetrics(char const*, char const*, int, Imf_3_4::Compression, float, int, bool, bool, PixelMode, bool) exrmetrics.cpp:972 (exrmetrics:arm64+0x10004b564)
#3 <null> <null> (0x00018825ab98)
Thread T6 (tid=147741413, running) created by main thread at:
#0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x2f708)
#1 IlmThread_3_4::(anonymous namespace)::DefaultThreadPoolProvider::setNumThreads(int) <null> (libIlmThread-3_4.33.3.4.3.dylib:arm64+0x27ec)
#2 <null> <null> (0x00018825ab98)
SUMMARY: ThreadSanitizer: data race vector.h:750 in std::__1::vector<char, std::__1::allocator<char>>::__base_destruct_at_end[abi:ne200100](char*)
==================
ThreadSanitizer: reported 1 warnings
Strangely enough, I don't get the same error when converting toi HTJ2K, but got a more generic one:
ojph error 0x000300A1 at ojph_codeblock.cpp:219: Error decoding a codeblock.
ojph error 0x000300A1 at ojph_codeblock.cpp:219: Error decoding a codeblock.
ojph error 0x000300A1 at ojph_codeblock.cpp:219: Error decoding a codeblock.
ojph error 0x000300A1 at ojph_codeblock.cpp:219: Error decoding a codeblock.
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Unable to decompress w 11 image data 37355 -> 76800, got 0
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Decode pipeline unable to decompress data
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Unable to decompress w 11 image data 37361 -> 76800, got 0
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Decode pipeline unable to decompress data
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Unable to decompress w 11 image data 37362 -> 76800, got 0
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Decode pipeline unable to decompress data
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Unable to decompress w 11 image data 37548 -> 76800, got 0
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Decode pipeline unable to decompress data
error from exrmetrics: Unable to run decoder
=============
OS: Alma9.6 and OSX
Experience corruption on both, thread sanitizer output is from OSX
OpenEXR: 741ecb8 (latest RB-3.4) (have also had the corruption with 3.4.1)
OpenJPH: 0.25.3
Compiler: Clang 17.0.0 on OSX GCC 11.5 on Alma
Breaks with both Debug and Release configs
================
Connected with #2157 ?
I got it unreliably to happen with all kinds of builds and on Linux though.
Also the issue did not go away after compiling openJPH with SIMD off: OJPH_DISABLE_SIMD