-
-
Notifications
You must be signed in to change notification settings - Fork 56
User-defined compression level #661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8593122
6ddeef2
089f4bc
a994675
8f1d439
1a52eaf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,6 +60,18 @@ namespace zim | |
Zstd = 5 | ||
}; | ||
|
||
enum class LZMACompressionLevel: int { | ||
MINIMUM = 0, | ||
MAXIMUM = 9, | ||
DEFAULT = MAXIMUM | ||
}; | ||
|
||
enum class ZSTDCompressionLevel: int { | ||
MINIMUM = -21, | ||
MAXIMUM = 19, | ||
DEFAULT = MAXIMUM | ||
}; | ||
Comment on lines
+63
to
+73
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yet again, I dislike those enums. enum class DefaultCompressionLevel: int {
LZMA = 9 | LZMA_PRESET_EXTREME,
ZSTD = 19
}; But we don't want to expose LZMA "symbol" ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@mgautierfr what about something like this (in int getDefaultCompressionLevel(Compression comp) {
switch(comp) {
case Compression::None:
{
return 0;
break;
}
case Compression::Lzma:
{
return 9 | LZMA_PRESET_EXTREME;
break;
}
case Compression::Zstd:
{
return 19;
break;
}
};
} And use it in appropriate places and in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why putting it in zim.h ? We should use it internally. Either you pass a compression level (and you know which one) or you don't and libzim picks one internally.
In fact, I'm not sure we want the same thing. Can you be more explicit on how you want to use the default level in zim-tools ? Maybe you have a branch somewhere you can share. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
zimrecreate [ERROR] Not enough Arguments provided zimrecreate recreates a ZIM file from a existing ZIM. Options: The same for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want # In the header
Creator& configCompression(Compression compression);
Creator& configCompression(Compression compression, int level); # In the implementation
Creator& configCompression(Compression compression) {
return configCompresssion(compression, getDefaultCompressionLevel(compression));
} As There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do would you display There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Isn't LZMA deprecated?
Why not?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
LZMA is obsolete yes. But we still support it, so we must have a default value for it.
Because, user has to refer to the specific algorithm documentation to know what are the correct level. I don't want to change our code if zstd add a new 20 compression level. I would agree with a
I'm still not convinced we need to publicly define |
||
|
||
static const char MimeHtmlTemplate[] = "text/x-zim-htmltemplate"; | ||
|
||
enum class IntegrityCheck | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,22 +26,29 @@ | |
#include <stdexcept> | ||
|
||
const std::string LZMA_INFO::name = "lzma"; | ||
void LZMA_INFO::init_stream_decoder(stream_t* stream, char* raw_data) | ||
|
||
void LZMA_INFO::init_stream_encoder(stream_t* stream, int compression_level, char* raw_data) | ||
{ | ||
*stream = LZMA_STREAM_INIT; | ||
unsigned memsize = zim::envMemSize("ZIM_LZMA_MEMORY_SIZE", LZMA_MEMORY_SIZE * 1024 * 1024); | ||
auto errcode = lzma_stream_decoder(stream, memsize, 0); | ||
int cl = compression_level; | ||
|
||
if (cl == static_cast<int>(zim::LZMACompressionLevel::MAXIMUM)) { | ||
cl |= LZMA_PRESET_EXTREME; | ||
} | ||
Comment on lines
+35
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not change the compression level passed by the user. It should be a simple pass through. |
||
|
||
auto errcode = lzma_easy_encoder(stream, cl, LZMA_CHECK_CRC32); | ||
if (errcode != LZMA_OK) { | ||
throw std::runtime_error("Impossible to allocated needed memory to uncompress lzma stream"); | ||
throw std::runtime_error("Cannot initialize lzma_easy_encoder"); | ||
} | ||
} | ||
|
||
void LZMA_INFO::init_stream_encoder(stream_t* stream, char* raw_data) | ||
void LZMA_INFO::init_stream_decoder(stream_t* stream, char* raw_data) | ||
{ | ||
*stream = LZMA_STREAM_INIT; | ||
auto errcode = lzma_easy_encoder(stream, 9 | LZMA_PRESET_EXTREME, LZMA_CHECK_CRC32); | ||
unsigned memsize = zim::envMemSize("ZIM_LZMA_MEMORY_SIZE", LZMA_MEMORY_SIZE * 1024 * 1024); | ||
auto errcode = lzma_stream_decoder(stream, memsize, 0); | ||
if (errcode != LZMA_OK) { | ||
throw std::runtime_error("Cannot initialize lzma_easy_encoder"); | ||
throw std::runtime_error("Impossible to allocated needed memory to uncompress lzma stream"); | ||
} | ||
} | ||
|
||
|
@@ -103,21 +110,21 @@ ZSTD_INFO::stream_t::~stream_t() | |
::ZSTD_freeDStream(decoder_stream); | ||
} | ||
|
||
void ZSTD_INFO::init_stream_decoder(stream_t* stream, char* raw_data) | ||
void ZSTD_INFO::init_stream_encoder(stream_t* stream, int compression_level, char* raw_data) | ||
{ | ||
stream->decoder_stream = ::ZSTD_createDStream(); | ||
auto ret = ::ZSTD_initDStream(stream->decoder_stream); | ||
stream->encoder_stream = ::ZSTD_createCStream(); | ||
auto ret = ::ZSTD_initCStream(stream->encoder_stream, compression_level); | ||
if (::ZSTD_isError(ret)) { | ||
throw std::runtime_error("Failed to initialize Zstd decompression"); | ||
throw std::runtime_error("Failed to initialize Zstd compression"); | ||
} | ||
} | ||
|
||
void ZSTD_INFO::init_stream_encoder(stream_t* stream, char* raw_data) | ||
void ZSTD_INFO::init_stream_decoder(stream_t* stream, char* raw_data) | ||
{ | ||
stream->encoder_stream = ::ZSTD_createCStream(); | ||
auto ret = ::ZSTD_initCStream(stream->encoder_stream, 19); | ||
stream->decoder_stream = ::ZSTD_createDStream(); | ||
auto ret = ::ZSTD_initDStream(stream->decoder_stream); | ||
if (::ZSTD_isError(ret)) { | ||
throw std::runtime_error("Failed to initialize Zstd compression"); | ||
throw std::runtime_error("Failed to initialize Zstd decompression"); | ||
} | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ | |
#include "../debug.h" | ||
#include "../compression.h" | ||
|
||
#include <zim/zim.h> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why we need this include ? |
||
#include <zim/writer/contentProvider.h> | ||
|
||
#include <sstream> | ||
|
@@ -45,8 +46,9 @@ const zim::size_type MAX_WRITE_SIZE(4UL*1024*1024*1024-1); | |
namespace zim { | ||
namespace writer { | ||
|
||
Cluster::Cluster(Compression compression) | ||
Cluster::Cluster(Compression compression, int compression_level) | ||
: compression(compression), | ||
compressionLevel(compression_level), | ||
isExtended(false), | ||
_size(0) | ||
{ | ||
|
@@ -152,7 +154,7 @@ void Cluster::_compress() | |
bool first = true; | ||
auto writer = [&](const Blob& data) -> void { | ||
if (first) { | ||
runner.init((char*)data.data()); | ||
runner.init(compressionLevel, (char*)data.data()); | ||
first = false; | ||
} | ||
runner.feed(data.data(), data.size()); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,10 +44,11 @@ class Cluster { | |
|
||
|
||
public: | ||
Cluster(Compression compression); | ||
Cluster(Compression compression, int compression_level); | ||
virtual ~Cluster(); | ||
|
||
void setCompression(Compression c) { compression = c; } | ||
void setCompressionLevel(int cl) { compressionLevel = cl; } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We never use |
||
Compression getCompression() const { return compression; } | ||
|
||
void addContent(std::unique_ptr<ContentProvider> provider); | ||
|
@@ -78,6 +79,7 @@ class Cluster { | |
|
||
protected: | ||
Compression compression; | ||
int compressionLevel; | ||
cluster_index_t index; | ||
bool isExtended; | ||
Offsets blobOffsets; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use
ZSTDCompressionLevel::DEFAULT
in case a user never callconfigCompression