CKKS: Add CCH+23 Noise Model #1685
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The noise model is mainly taken from On the precision loss in approximate homomorphic encryption. Marking as draft as it need discussion (or research).
I think the most tricky part is the multiplication, unlike in BGV/BFV variance-based noise model where the message itself often does not affect much, in CKKS the message heavily affects how the noise grows.
In short, in CKKS the message is in the form$\Delta m + e$ . Then after multiplication with $\Delta m2 + e2$ , the message becomes $\Delta (\Delta m m2 + m2 e + m e2 + ...)$ . The $m2 e + m e2$ part is the major part of the noise, and we actually have no average way to analyse it.
Past work would argue that we can assume coefficients in message is uniform in$[-1, 1]$ , however, this is hardly true in real world application, so we can not use average-case approach here.
Other works would ask the user to provide the input, which is common in practice (Openfhe has$m$ is still makes it hard to understand the behavior of $m e2$ (past paper gives little detail on this). In the code, I just use $N |\Delta m|\infty |e2|\infty$ the worst case bound on coefficient embedding. The the bound is translated into variance by $N * N * \Delta * \Delta * B * B * variance$ . This approach is still questionable. See the code comment for detail.
EXEC_NOISE_ESTIMATION
and Lattigo has a paper with an estimator asking user input). HEIR also provides similar infrasturcture by the plaintext backend, but knowing exactly what(sorry for not using subscript/superscript as the rendering issue with github)
Note that in https://github.com/bencrts/CKKS_noise/blob/main/heuristics/CLT.py they use$\Delta * \Delta * B * B * variance$ (note the missing $N$ ), which will give underestimation in my running. In the paper, the bound they use is the square of 2-norm on $m$ , so N is still there, but using $N * \Delta * \Delta * B * B * variance$ still gives underestimation.
Missing parts in this PR: implement inverse canonical encoding in HEIR to actually know plaintext$m$ from the cleartext value, and integrate the noise model with the plaintext backend.
I would say the exact noise estimation for CKKS without knowing the input is open, and I would conjecture there might be no good way to do an average one without knowing the input. Even knowing the input or input domain, we might only be able to do worst-case one as the input distribution might be hand crafted to launch attack. After all, we can not ask the user to provide the input distribution. (Also some cite to the IND-CPA definition, IND-CPA-D definition and the recent application-aware security model where only the circuit and its input domain is asked to be provided). (For input distribution, some cite to differential privacy).
An experiment to give some sense on the above comment:
For multiplication of two freshly encrypted ciphertext, we consider the following case
All 0
This time$m=0$ , so the noise is only $e e2$ , and we can use the average-case approach by setting the new variance to $N \rho \rho 2$ .
Note that
7 + 7 + 13/2 = 20.5
wherelogN = 13
, this is somekind of rough estimation (ignoring all the error function stuff).All 1
The interesting part of encoding all 1 is that, the encoded message is actually$\Delta + 0 * X + 0 * X * X + ...$ , namely only a constant (see #1604 (comment)). Then $m e2$ could be easily understood as a constant multiplication.
Note that
7 + 45 = 52
.The dot product input in test example
The inputs are now
[0.1, 0.2, ..., 0.8]
and[0.2, 0.3, ..., 0.9]
.The prediction by the model
The model does not know the exactly value for the input for now.
Note that
7 + 45 + 13 = 65
wherelogN = 13
.I believe with all slots filled with hand-crafted value (may find construction clue in prevoius attacks), such bound could be reached (even exceeded).