Replies: 3 comments
-
Hi!
Yeah, this is the normal behavior, and 7-Zip CLI/GUI behaves the same, as far as I know. This operation can be slow for large archives, as it first copies the data from the old archive and then appends the new data from the file being added, which takes time.
There are some sources of additional overhead in your use case. First, you're using an Another possible source of overhead is probably the 7z format itself, as it uses solid compression by default. This means that the uncompressed data is treated as a single block of data to be compressed, and this will slow down the update process (7-Zip needs the whole uncompressed block of data to be able to compress it). Finally, bit7z v4 internally uses standard C++ file streams to write the updated archive to the filesystem, and such streams are painfully slow and not optimised. |
Beta Was this translation helpful? Give feedback.
-
It looks like there's not much room for optimization then. Would the .zip
file format potentially be faster?
…On Mon, Mar 24, 2025 at 6:03 PM Riccardo ***@***.***> wrote:
Hi!
I'm working with large archive files that are 100-200GB in size. I'm using
BitArchiveEditor to update a single 10MB text file in the archive. While it
is updating, it seems to be writing out a temp file (Original.filename.tmp)
on disk that is the same size as the archive, and it takes about 10 minutes
to update that single file in the original archive.
Yeah, this is the normal behavior, and 7-Zip CLI/GUI behaves the same, as
far as I know.
The reason for this is that when you edit an archive, you need to keep it
open for reading, and hence cannot write to it at the same time. So bit7z
(and 7-Zip) simply write the updated archive to a tmp file, and when it
finishes it remove the original archive and rename the tmp file.
This operation can be slow for large archives, as it first copies the data
from the old archive and then appends the new data from the file being
added, which takes time.
Extracting a file from the archive to memory is very fast, but updating
the archive with a file from memory is very slow. I'm wondering if doing
this correctly, or if there is a faster way to do this operation?
There are some sources of additional overhead in your use case.
First, you're using an std::istringstream for reading the file to be
added to the archive, and as all C++ standard streams, it *usually* is
really slow.
Personally, I would read the file to a std::vector of bytes instead (by
default, bit7z uses std::vector<bit7z::byte_t> for buffers, where
bit7z::byte_t is an alias for std::byte_t).
I'm not sure how much performance improvement can be achieved with
buffers, though, as you're biggest overhead is unlikely to be updating the
10MB text file, but rather the rest of the archive being copied to the
filesystem.
Another possible source of overhead is probably the 7z format itself, as
it uses solid compression by default. This means that the uncompressed data
is treated as a single block of data to be compressed, and this will slow
down the update process (7-Zip needs the whole uncompressed block of data
to be able to compress it).
Finally, bit7z v4 internally uses standard C++ file streams to write the
updated archive to the filesystem, and such streams are painfully slow and
not optimised.
I actually managed to rewrite all bit7z file streams using more low-level
APIs, and got a relevant performance uplift (now closer to 7-Zip itself).
All changes are still on the develop branch, though, and will be
available in the next v4.1.
—
Reply to this email directly, view it on GitHub
<#283 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB7K5B7Y2YQUNOH6NY5GOMT2WB6JHAVCNFSM6AAAAABZVWMGEOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTENRQG44TMMA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I'm not sure it would be faster. I was partly wrong about the importance of solid compression in the program overhead. In my tests, the performance was comparable to that of 7-Zip's GUI. Does your program perform as well as 7-Zip while performing an update? Also, I'm not sure how you test your program, but please note that if you're running it on Windows with an antivirus program active, it can slow your program down significantly, especially while bit7z is overwriting the original archive. In my tests, with the antivirus running, my program took ~30s to update the archive. After whitelisting the program in my antivirus settings, it now takes ~4s. |
Beta Was this translation helpful? Give feedback.
-
I'm working with large archive files that are 100-200GB in size. I'm using BitArchiveEditor to update a single 10MB text file in the archive. While it is updating, it seems to be writing out a temp file (Original.filename.tmp) on disk that is the same size as the archive, and it takes about 10 minutes to update that single file in the original archive.
Extracting a file from the archive to memory is very fast, but updating the archive with a file from memory is very slow. I'm wondering if doing this correctly, or if there is a faster way to do this operation?
This is the code that I'm using:
Beta Was this translation helpful? Give feedback.
All reactions