- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.3k
          Put windowFrac in the CCtxParams, not the CParams
          #4305
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Even though the `ZSTD_compressionParameters` struct is part of the unstable API, and therefore we are allowed to modify it, it seems likely that to actually do so would cause widespread chaos and bloodshed. So we are strongly incentivized to avoid changing it. We would, however, like to modify the CParams that are actually passed around internally. In particular, we want to be able to configure non-power-of-2 window sizes. So we need something more expressive than an integral window log. And we want this new field to be next to the window log, rather than hanging out in the CCtxParams, which are not passed to the block compressors. So, in order to support that, this commit: 1. Introduces a new struct, the `ZSTD_CParams`, that (for the moment) mirrors the definition of the public `ZSTD_compressionParameters` struct. 2. Codemods all internal use and storage of `cparams` internally to use the new struct definition. (The exception to this is the `ZSTD_parameters` struct, which is a public definition but is also passed around internally sometimes.) 3. Adds translation functions to convert the public and private struct defs to each other. 4. Uses those translation functions at the user API boundary (and when handling `ZSTD_parameters`).
This commit extends the refactor done in the previous and applies it to the `ZSTD_parameters` public struct, introducing an internal variant called `ZSTD_Params` whose only difference is containing a `ZSTD_CParams` instead of a `ZSTD_compressionParameters`. This commit similarly introduces conversion functions and rewrites internal functions to use the new, private definition. This allows us to clean up some internal conversions of the private `CParams` back into the public `CParams` in order to put them in a `ZSTD_parameters` struct to pass into something.
Store it in the CParams. Don't use it yet.
Using a blocksize of 100KB with a 65KB dictionary just *happens* to work with the current window-sizing logic, because it has to round all the way up to the next power of two. But it doesn't *have* to work. So this pre-emptively fixes up this test to use a size that will work after switching to tighter window sizing, but still uses a block size that is bigger than the dict size in case that matters for some reason?
We want to turn this on by default immediately after this release. And we want to thoroughly exercise the underlying code paths handling fractional windows, because they will be available in this upcoming release via new APIs (the explicit CCtx param and the constrain window for protocol path). And it's scary stuff. We can remove all these flags once we turn this on by default.
I found some slight errors via the exhaustive testing added to the fuzzer. This commit fixes them.
For inputs smaller than `1 << ZSTD_WINDOWLOG_MIN` bytes, the pre-existing code in `ZSTD_adjustCParams_internal()` temporarily calculated an illegally small window log for the purposes of adjusting the hash- and chain-table sizes, before fixing that bad window log to comply with the minimums. The new code clamped the window log earlier and therefore didn't shrink the tables as much. This commit fixes that to mirror the pre-existing behavior. This resolves the remaining regression test differences.
This also means the CDict needs to store and init a CCtxParams object.
…rates on CCtxParams Sometimes this is used to validate user-provided CParams, before they're allowed to affect the CCtxParams. Leave those cases as-is. Otherwise, convert uses of this function to pass in the whole CCtxParams so that, in the next commit, when we pull out the `windowFrac` from the CParams, we can still cross-validate it with the `windowLog` CParam.
Unwind the public/private distinction.
…olution Components
| /** | ||
| * Fractional window sizes can always be picked by the user explicitly | ||
| * setting ZSTD_c_windowFrac. This macro controls whether, when Zstd is | ||
| * picking a window size itself, it is allowed to pick a non-power-of-two | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| MEM_STATIC U32 ZSTD_windowSize(const ZSTD_CCtx_params* params) { | ||
| return (U32)(((8ull + params->windowFrac) << params->cParams.windowLog) >> 3); | ||
| } | ||
| MEM_STATIC U32 ZSTD_windowSizeLDM(const ldmParams_t* ldmParams) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: it seems this function is only useful within zstd_ldm.c.
In which case, is there any benefit in sharing it more widely within zstd_compress_internal.h ?
| * Calculates the minimum legal window log and fraction that contain the | ||
| * provided source size. | ||
| */ | ||
| MEM_STATIC void ZSTD_setMinimalWindowLogAndFrac(ZSTD_CCtx_params* params, const U32 srcSize, const U32 minWindowLog) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here,
ZSTD_setMinimalWindowLogAndFrac() seems only useful within zstd_compress.c.
What benefit is there to share it more widely within zstd_compress_internal.h ?
Note that ZSTD_windowLogAndFracAreMinimal() above seems to be a dependency of this function.
| void const* src) | ||
| { | ||
| /* overflow correction only handles power-of-two index moves. */ | ||
| U32 const roundedMaxDist = 1u << (ZSTD_highbit32(maxDist - 1) + 1); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is this change fixing some kind of potential bug ?
|  | ||
| /*! ZSTD_checkCCtxCParams_internal() : | ||
| * Checks the CParams in the CCtxParams (including related parameters not | ||
| * *actually* stored in the CParams struct). */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe name them ?
It seems to be limited to params->windowFrac at this point.
| const ZSTD_CCtx_params* cctxParams) | ||
| { | ||
| params->windowLog = cParams->windowLog; | ||
| params->windowLog = cctxParams->cParams.windowLog; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor:
you could use the same trick you employed in other parsers,
by defining const ZSTD_compressionParameters* const cParams = &cctxParams->cParams; at the top of the function, and then most of the rest of the function would remain identical.
| Note: If `windowLog` is set to larger than 27, `--long=windowLog` or | ||
| `--memory=windowSize` needs to be passed to the decompressor. | ||
|  | ||
| - `windowFrac`=_wfrac_, `wfrac`=_wfrac_: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| { | ||
| ZSTD_compressionParameters cParams = FUZZ_randomCParams(srcSize, producer); | ||
| set(cctx, ZSTD_c_windowLog, cParams.windowLog); | ||
| setRand(cctx, ZSTD_c_windowFrac, 0, 7, producer); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Great way to naturally bring windowFrac within the fuzzer framework.
| ZSTD_CCtx_params params; /* Storage for matchState.cctxParams pointer. | ||
| * Not all member fields are used. Used fields: | ||
| * cParams, compressionLevel, useRowMatchFinder. */ | ||
| /* A compression level of 0 indicates that advanced API was used to select CDict params */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| cctxParams.ldmParams.enableLdm = ZSTD_resolveEnableLdm(cctxParams.ldmParams.enableLdm, &cParams); | ||
| if (cctxParams.ldmParams.enableLdm == ZSTD_ps_enable) { | ||
| ZSTD_ldm_adjustParameters(&cctxParams.ldmParams, &cParams); | ||
| ZSTD_ldm_adjustParameters(&cctxParams.ldmParams, &cctxParams); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that cctxParams.ldmParams is within cctxParams,
one has to wonder why it is initialised separately.
It looks to me that this is just a part of adjusting cctxParams, and should be bundled there,
thus reducing risk of forgetting this action in other places of the code.
| * @param compressionLevel If params are derived from a compression level then that compression level, otherwise ZSTD_NO_CLEVEL. | ||
| */ | ||
| static void | ||
| ZSTD_CCtxParams_init_internal(ZSTD_CCtx_params* cctxParams, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: API consistency:
above function is named ZSTD_CCtxParams_init_fromCCtxParams(),
so maybe the function below would benefit a rename along the same lines.
| static ZSTD_compressionParameters | ||
| ZSTD_adjustCParams_internal(ZSTD_compressionParameters cPar, | ||
| static void | ||
| ZSTD_adjustCParams_internal(ZSTD_CCtx_params* params, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could benefit from a rename: ZSTD_adjustCCtxParams_internal().
In turn, the new name would help in extending the scope of this function beyond the original ZSTD_compressionParameters fields.
| mode); | ||
|  | ||
| /* fills unset fields in `dst` with values from `defaults` */ | ||
| static void ZSTD_fillCParams( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming feels weird.
What is being filled is a ZSTD_CCtx_params* object, not CParams,
albeit admittedly on a reduced internal scope which is somewhat associated with CParams,
but is that actually a good idea to keep the same scope as it was before ?
It looks to me that this operation is about setting parameters that are not yet set,
using a list provided in defaults.
Then a few more parameters must still be set via functions.
But is it meaningful to separate these 2 operations, or should they all be combined in the same one ?
| } | ||
|  | ||
| size_t ZSTD_estimateCCtxSize_usingCParams(ZSTD_compressionParameters cParams) | ||
| static size_t ZSTD_estimateCCtxSize_usingCCtxParamsCParams(const ZSTD_CCtx_params* params) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, ZSTD_estimateCCtxSize_usingCCtxParams() feels enough.
Plus, it opens the opportunity to decide CCtx size based on other parameters present in CCtx_params.
For example, a pretty important one would be hasExternalSequenceProducer.
| While reviewing the  However, this substitution introduces some scope-related issues. Specifically,  Unfortunately, addressing these issues requires more than a simple mechanical update from one type to another. | 
This PR is a set of commits stacked on top of #4275.
It refactors much of that PR, by moving where we store the
windowFracparameter from the CParams to instead storing it in the CCtxParams. This resolve the unpleasant part of that PR, which required splitting the CParams (and Params) structs into public and private variants.This version has its own downsides, which requires that we convert many of the places that we handle and pass CParams around to instead pass around CCtxParams. That includes switching the MatchState to point to a CCtxParams rather than hold a CParams. This also requires that the CDict hold a CCtxParams that its MatchState can point to.
A particularly annoying result of this refactor is that where previously there was often a straightforward path of resolution from compression level -> CParams -> CCtxParams, the CCtxParams now have to be used in two different ways. We have to use the CCtxParams where previously we would transport CParams. But in those places, the CCtxParams are incomplete and are different from the real CCtxParams (because only the CParams part is set), and have to be translated/resolved into the real params before they are actually used. But because they're the same type, this distinction is not easy to discover, and it's not clear which kind of CCtxParams a particular instance is at any given point.
One nice corollary though is that this change does allow us in some internal APIs to remove other parameters that are now redundant now that the function has access to the whole CCtxParams.
This PR is a pure refactor and intends to introduce no functional changes.
I continue to believe we should abandon this PR and merge #4275 as-is.