Data race condition when using exrmetrics

Hello,

I have been using **exrmetrics** as a poor man's conversion tool to try different codecs / parameters as supported by OpenEXR.
After converting a few hundred gigs of data, I realized that certain files were actually corrupted but I was unable to repro reliably.

To repro I created a small python script that calls exrmetric twice: 
* Once to convert it to htj2k32 or zip
* Once to read to make sure it's correct

the command line is:
exrmetric  --convert  infile -o  output_path -z htj2k32 -t 10

I attached the script if it saves you some time to repro
[run.py](https://github.com/user-attachments/files/23730603/run.py)
This file managed to get to trip fairly easily: https://openexr.com/en/latest/test_images/DisplayWindow/t05.html

I repeat this loop up to 5000 (I manage to get it to corrupt from 1-3000 iterations ). Running with -t 1 seems to complete a few thousand iterations succesfully.


I have recompiled openEXR with thread sanitizer enabled and got the following error when converting to **zip**:

```
-------------------- STDERR --------------------
==================
WARNING: ThreadSanitizer: data race (pid=86644)
  Write of size 8 at 0x00016dbde5b0 by main thread:
    #0 std::__1::vector<char, std::__1::allocator<char>>::__base_destruct_at_end[abi:ne200100](char*) vector.h:750 (exrmetrics:arm64+0x1000557b8)
    #1 std::__1::vector<char, std::__1::allocator<char>>::clear[abi:ne200100]() vector.h:531 (exrmetrics:arm64+0x100055488)
    #2 std::__1::vector<char, std::__1::allocator<char>>::__destroy_vector::operator()[abi:ne200100]() vector.h:248 (exrmetrics:arm64+0x1000552b0)
    #3 std::__1::vector<char, std::__1::allocator<char>>::~vector[abi:ne200100]() vector.h:259 (exrmetrics:arm64+0x100056ea8)
    #4 std::__1::vector<char, std::__1::allocator<char>>::~vector[abi:ne200100]() vector.h:259 (exrmetrics:arm64+0x100056ddc)
    #5 MemOStream::~MemOStream() exrmetrics.cpp:876 (exrmetrics:arm64+0x100056d4c)
    #6 MemOStream::~MemOStream() exrmetrics.cpp:876 (exrmetrics:arm64+0x100050128)
    #7 exrmetrics(char const*, char const*, int, Imf_3_4::Compression, float, int, bool, bool, PixelMode, bool) exrmetrics.cpp:1140 (exrmetrics:arm64+0x10004d378)
    #8 exrmetrics(char const*, char const*, int, Imf_3_4::Compression, float, int, bool, bool, PixelMode, bool) exrmetrics.cpp:972 (exrmetrics:arm64+0x10004b564)
    #9 <null> <null> (0x00018825ab98)

  Previous read of size 8 at 0x00016dbde5b0 by thread T6 (mutexes: write M0):
    #0 std::__1::vector<char, std::__1::allocator<char>>::size[abi:ne200100]() const vector.h:385 (exrmetrics:arm64+0x100055618)
    #1 MemIStream::readMemoryMapped(int) exrmetrics.cpp:912 (exrmetrics:arm64+0x100056320)
    #2 Imf_3_4::istream_nonparallel_read(_priv_exr_context_t const*, void*, void*, unsigned long long, unsigned long long, int (*)(_priv_exr_context_t const*, int, char const*, ...)) <null> (libOpenEXR-3_4.33.3.4.3.dylib:arm64+0x14478)

  Location is stack of main thread.

  Mutex M0 (0x000106d010e0) created at:
    #0 pthread_mutex_lock <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x31494)
    #1 std::__1::mutex::lock() <null> (libc++.1.dylib:arm64e+0x1f3d8)
    #2 exrmetrics(char const*, char const*, int, Imf_3_4::Compression, float, int, bool, bool, PixelMode, bool) exrmetrics.cpp:972 (exrmetrics:arm64+0x10004b564)
    #3 <null> <null> (0x00018825ab98)

  Thread T6 (tid=147741413, running) created by main thread at:
    #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x2f708)
    #1 IlmThread_3_4::(anonymous namespace)::DefaultThreadPoolProvider::setNumThreads(int) <null> (libIlmThread-3_4.33.3.4.3.dylib:arm64+0x27ec)
    #2 <null> <null> (0x00018825ab98)

SUMMARY: ThreadSanitizer: data race vector.h:750 in std::__1::vector<char, std::__1::allocator<char>>::__base_destruct_at_end[abi:ne200100](char*)
==================
ThreadSanitizer: reported 1 warnings
```

Strangely enough, I don't get the same error when converting toi HTJ2K, but got a more generic one:

```
ojph error 0x000300A1 at ojph_codeblock.cpp:219: Error decoding a codeblock.
ojph error 0x000300A1 at ojph_codeblock.cpp:219: Error decoding a codeblock.
ojph error 0x000300A1 at ojph_codeblock.cpp:219: Error decoding a codeblock.
ojph error 0x000300A1 at ojph_codeblock.cpp:219: Error decoding a codeblock.
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Unable to decompress w 11 image data 37355 -> 76800, got 0
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Decode pipeline unable to decompress data
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Unable to decompress w 11 image data 37361 -> 76800, got 0
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Decode pipeline unable to decompress data
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Unable to decompress w 11 image data 37362 -> 76800, got 0
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Decode pipeline unable to decompress data
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Unable to decompress w 11 image data 37548 -> 76800, got 0
/tmp/htj2k_t05.exr: (EXR_ERR_CORRUPT_CHUNK) Decode pipeline unable to decompress data
error from exrmetrics: Unable to run decoder
```


=============
OS: Alma9.6 and OSX 
Experience corruption on both, thread sanitizer output is from OSX
OpenEXR: 741ecb82ccdb291ce5b04713fc6c03208753575e (latest RB-3.4) (have also had the corruption with 3.4.1)
OpenJPH: 0.25.3

Compiler: Clang 17.0.0 on OSX GCC 11.5 on Alma
Breaks with both Debug and Release configs

================

Connected with https://github.com/AcademySoftwareFoundation/openexr/issues/2157 ?
I got it unreliably to happen with all kinds of builds and on Linux though.
Also the issue did not go away after compiling openJPH with SIMD off: OJPH_DISABLE_SIMD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data race condition when using exrmetrics #2207

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data race condition when using exrmetrics #2207

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions