PIR-Eng-Notes/research/Group.2b.Interactive.Hint/rms24_2024/RMS24_2024_notes.md at main · 0xalizk/PIR-Eng-Notes

RMS24 — Engineering Notes

_{1. Lineage}
_{2. Core Idea}
_{3. Variants}
_{4. Novel Primitives / Abstractions}
_{5. Cryptographic Foundation}
_{6. Key Data Structures}
_{7. Protocol Phases}
_{8. Two-Server Protocol Details}
_{9. Correctness Analysis (Option B: Probabilistic)}
_{10. Complexity}

_{11. Performance Benchmarks}
_{12. Comparison with Prior Work}
_{13. Portable Optimizations}
_{14. Implementation Notes}
_{15. Application Scenarios}
_{16. Deployment Considerations}
_{17. Key Tradeoffs & Limitations}
_{18. Open Problems}
_{19. Related Papers in Collection}
_{20. Uncertainties}

Field	Value
Paper	Simple and Practical Amortized Sublinear Private Information Retrieval using Dummy Subsets (2024)
Archetype	Construction
PIR Category	Group 2b — Interactive-Hint
Security model	Semi-honest two-server non-colluding (primary) / Semi-honest single-server (secondary variant)
Additional assumptions	OWF (PRF instantiated with AES-128)
Correctness model	Probabilistic with standard (non-degrading) guarantees -- failure probability bounded by a union bound over queries, does not grow per query ¹
Rounds (online)	1 (client sends two subsets, server returns two parities)
Record-size regime	Small (8-byte and 32-byte entries in benchmarks; 256-byte entries tested for scaling)

Lineage ⤴

Field	Value
Builds on	Corrigan-Gibbs-Kogan [Group 2b] (CK20, first amortized sublinear PIR); TreePIR [Group 2b] (partition-based hints); Corrigan-Gibbs et al. backup hints concept ²
What changed	Prior schemes either required parallel repetition (λ factor blowup on all costs) to achieve negligible correctness failure, or weakened the correctness model to non-adaptive queries (Piano). RMS24 introduces dummy subsets -- a second subset constructed from unrepresented partitions -- which eliminates the information leakage from the queried index's absence, removing the need for parallel repetition while maintaining standard PIR correctness. ³
Superseded by	N/A
Concurrent work	Piano PIR [Group 2b] (Zhou et al., concurrent with updated version achieving O(1) response via singleton entries, but with weaker correctness model requiring non-adaptive queries) ⁴

Core Idea ⤴

RMS24 presents a stateful PIR scheme that achieves amortized sublinear communication and computation for both two-server and single-server settings, while maintaining standard PIR correctness (correctness for arbitrary, adaptively chosen query sequences). ⁵ The database of N entries is divided into sqrt(N) partitions. Each client hint selects sqrt(N)/2 + 1 random partitions and picks one random index from each, storing the XOR parity. The key innovation is the dummy subset: when querying, the client constructs a real subset (the hint's subset minus the queried index) and a dummy subset (one random index from each unrepresented partition), then sends both in random order to the server. ⁶ This eliminates the information leakage that prior schemes suffered -- where the server could learn which partition the queried index belongs to -- without requiring parallel repetition. The result is O(1) online response overhead (4x the insecure baseline for two servers; single-server response overhead is O(sqrt(N)/λ)) with O(sqrt(N)) client storage and computation. ⁷

Variants ⤴

Variant	Key Difference	Servers	Online Response	Hint Replenishment	Amortization Window
Two-server	Offline server generates hints; online server answers queries + helps replenish	2 (non-colluding)	O(1) -- constant, 4x element size	On-the-fly via offline server (Alg. 3)	Unlimited (constant amortized) ⁸
Single-server	Client streams entire DB offline; uses backup hints for replenishment	1	O(sqrt(N)/λ) amortized	Via pre-computed backup hint pairs (Alg. 5)	0.4 * λ * sqrt(N) queries per offline phase ⁹

Novel Primitives / Abstractions ⤴

1. Dummy Subsets

Field	Detail
Name	Dummy subset
Type	Privacy technique / Set distribution
Interface	Given a real query subset S covering sqrt(N)/2 + 1 partitions (with queried index removed), construct S' from one random index per uncovered partition. Send (S, S') or (S', S) with equal probability.
Security definition	Indistinguishability of the bit vector b and offset vector r from random (Lemma 2) ¹⁰
Purpose	Eliminate the information leakage from the queried index's absence in the query subset, allowing the scheme to achieve standard PIR privacy without parallel repetition
Built from	PRF evaluations for the offset vectors; random sampling for dummy indices

2. Partition-Based Hints with Median Cutoff Selection

Field	Detail
Name	Partition-based hint with PRF-derived median cutoff
Type	Data structure
Interface	For hint j: compute V_j = [v_{j,0}, ..., v_{j,sqrt(N)-1}] where v_{j,k} = PRF("select"
Purpose	Enable O(1)-time membership testing (check if partition k is selected by hint j) and compact hint storage (only store cutoff, extra index, and parity)
Built from	PRF with domain separation ("select" and "offset" prefixes)
Standalone complexity	O(1) per membership test; O(sqrt(N)) to reconstruct full subset

Cryptographic Foundation ⤴

Layer	Detail
Hardness assumption	OWF (one-way functions) -- minimal assumption; PRF exists if OWF exists ¹¹
PRF instantiation	AES-128 via CryptoPP's AES-NI implementation ¹²
Key structure	Per-client PRF key shared between client and offline server (two-server) or held by client alone (single-server). The PRF key derives all hint structures deterministically.
Correctness condition	Pr[fail] <= e^{-λ/2} per query, by union bound over queries (Lemma 1 + Section 3.5) ¹³

Key Data Structures ⤴

Database layout: N entries divided into sqrt(N) equal-size partitions, each of size sqrt(N). Entry i belongs to partition floor(i / sqrt(N)). ¹⁴
Main hint (j, v-hat_j, e_j, P_j): Hint ID j, median cutoff v-hat_j (32-bit), extra index e_j (32-bit), parity P_j (w-bit XOR of all selected entries). Each hint covers sqrt(N)/2 + 1 partitions with one index per partition. ¹⁵
Backup hint pair (two-server): Not needed -- replenishment is on-the-fly via offline server.
Backup hint pair (single-server): Pair of hints (P_j, P'_j) sharing the same cutoff v-hat_j, covering complementary halves of the partition space. Stored alongside main hints during streaming offline phase. ¹⁶
Client storage: λsqrt(N) main hints (two-server) or λsqrt(N) main + λ*sqrt(N)/2 backup pairs (single-server). Per hint: w + 64 bits (32-bit cutoff + 32-bit extra index + w-bit parity + 1 comparison bit). ¹⁷

Protocol Phases ⤴

Two-Server Scheme

Phase	Actor	Operation	Communication	When / Frequency
Offline (Alg. 1)	Offline server	Generate M = λ*sqrt(N) hints using shared PRF key: compute cutoffs, select partitions, compute parities	O(λ*sqrt(N)) words down to client ¹⁸	Once at setup
Online Query (Alg. 2)	Client	Find hint j containing index i; construct real subset S (remove i) and dummy subset S'; permute and send to online server	(sqrt(N)/2 + 1) * log(N) bits up (compact encoding via bit vector b and offset vector r) ¹⁹	Per query
Online Answer	Online + Offline servers	Each server computes XOR parity of its received subset	2 * w bits down (two parities P, P')	Per query
Decode	Client	Discard dummy parity; recover DB[i] = P XOR P_j (stored parity of hint j)	--	Per query
Replenish (Alg. 3)	Offline server + Client	Offline server constructs new hint with next ID J, sends (J, v-hat_J, P_J, P'_J); client picks correct half, adds i as extra index	O(1) words	Per query (piggybacked)

Single-Server Scheme

Phase	Actor	Operation	Communication	When / Frequency
Streaming Offline (Alg. 4)	Client	Stream DB partition by partition; build 1.5M hints (M main + 0.5M backup pairs) using PRF key	N words down (full DB stream) ²⁰	Every 0.4λsqrt(N) queries
Online Query (Alg. 2)	Client	Same as two-server: find hint, construct real + dummy subsets, permute and send	(sqrt(N)/2 + 1) * log(N) bits up	Per query
Online Answer	Server	Compute XOR parities of both received subsets	2 * w bits down	Per query
Decode	Client	Discard dummy parity; recover DB[i]	--	Per query
Replenish (Alg. 5)	Client	Find unused backup hint pair that skips the queried partition; promote to main hint with i as extra index	-- (local)	Per query

Two-Server Protocol Details ⤴

Aspect	Offline Server	Online Server
Data held	Full DB copy + shared PRF key	Full DB copy
Role in offline	Generates all hints; sends (j, v-hat_j, e_j, P_j) per hint to client	None
Query received	Replenishment request (after each online query)	Two subsets (S, S') or (S', S) in random order
Computation (online)	O(sqrt(N)) per replenishment -- construct one new hint	O(sqrt(N)) per query -- compute XOR parity of each subset
Security guarantee	Computational (OWF/PRF) -- learns nothing about query sequence	Computational (OWF/PRF) -- cannot distinguish real from dummy subset ²¹
Non-collusion assumption	Required -- if servers collude, the offline server's PRF key reveals all hint structures, and the online server's subsets reveal the query

Correctness Analysis (Option B: Probabilistic) ⤴

Field	Detail
Failure mode	No main hint contains the queried index i
Failure probability	Pr[fail] <= (1 - 1/(2sqrt(N)))^{λsqrt(N)} < e^{-λ/2} ²²
Probability grows over queries?	No -- per-query independent. Hint replenishment preserves the distributional invariant (H' is identically distributed to H). ²³
Probability grows over DB mutations?	N/A -- DB updates not addressed
Key parameters affecting correctness	Number of hints M = λsqrt(N); coverage probability q = (sqrt(N)/2 + 1)/sqrt(N) 1/sqrt(N) per hint per index
Proof technique	Lemma 1 gives per-hint coverage probability >= 1/(2sqrt(N)); then apply (1-p)^M bound with M = λsqrt(N). For subsequent queries, a distributional invariance argument shows replenished hints follow distribution R (identical to original). ²⁴
Amplification	Not needed -- failure is already negligible for λ = 80 (e^{-40})
Adaptive vs non-adaptive	Correctness holds for fully adaptive queries -- this is the key distinction from Piano PIR ²⁵
Query model restrictions	Two-server: unlimited queries. Single-server: bounded by backup hint supply (~0.4λsqrt(N) queries per offline phase).

Complexity ⤴

Core metrics

Metric	Asymptotic	Concrete (2^28 entries x 32 bytes = 8 GB)	Phase
Query size (request)	O(sqrt(N) * log N) bits	34 KB (two-server); 34 KB (single-server)	Online
Response size	O(w) -- constant (two-server); O(w) per query (single-server)	64 bytes (two parities, w=32)	Online
Server computation	O(sqrt(N))	2.7 ms (two-server); 2.7 ms online (single-server) ²⁶	Online
Client computation	O(sqrt(N))	< 1 ms (finding hint + subset construction)	Online
Response overhead	4x insecure baseline (two-server); O(sqrt(N)/λ) (single-server) ²⁷	4x (two-server); O(sqrt(N)/λ) (single-server)	--

Preprocessing metrics

Metric	Two-Server Asymptotic	Two-Server Concrete (8 GB)	Single-Server Asymptotic	Single-Server Concrete (8 GB)
Offline communication	O(λ*sqrt(N)) words	60.16 MB	O(N) words (full DB stream)	8192 MB ²⁸
Offline computation	O(λ*N)	842 s	O(λ*N)	1146 s
Client storage	O(λ*sqrt(N)) words	60.16 MB	O(λ*sqrt(N)) words	100 MB ²⁹
Amortized response overhead	O(1)	2x	O(sqrt(N)/λ)	varies
Amortization window	Unlimited (constant)	--	~0.4λsqrt(N) queries	~0.4 * 80 * 2^14 ~ 524K queries

Preprocessing Characterization

Aspect	Two-Server	Single-Server
Preprocessing model	Server-side computation (offline server generates hints)	Streaming (single-pass over DB) ³⁰
Client peak memory	O(λ*sqrt(N))	O(λ*sqrt(N))
Number of DB passes	1 (offline server)	1 (streaming)
Hint refresh mechanism	Pipelining -- offline server replenishes on-the-fly after each query	Full re-download after ~0.4λsqrt(N) queries

Performance Benchmarks ⤴

Hardware

AWS m5.8xlarge: 3.1 GHz Intel Xeon, 128 GB RAM, Ubuntu 22.04, GCC 11.3, Go 1.18. All experiments single-threaded. ³¹

Two-Server Benchmarks (Table 2)

Database	Client Storage (MB)	Offline Comm. (MB)	Offline Compute (s)	Online Comm. (KB)	Online Compute (ms)
2^20 x 32B (32 MB)	3.76	3.76	2.3	2.26	0.12
2^24 x 32B (512 MB)	15.04	15.04	41	8.64	0.54
2^28 x 8B (2 GB)	30.16	30.16	636	34.0	2.19
2^28 x 32B (8 GB)	60.16	60.16	842	34.1	2.7
2^28 x 256B (64 GB)	340.16	340.16	2242	35.0	5.23

Single-Server Benchmarks (Table 3)

Database	Client Storage (MB)	Offline Comm. (MB)	Offline Compute (s)	Online Comm. (KB)	Online Compute (ms)	Amortized Comm. (KB)	Amortized Compute (ms)
2^20 x 32B (32 MB)	6.25	32	4	2.18	0.14	2.99	0.25
2^24 x 32B (512 MB)	25	512	65	8.56	0.62	11.76	1.0
2^28 x 8B (2 GB)	40	2048	989	34.02	2.4	37.22	3.9
2^28 x 32B (8 GB)	100	8192	1146	34.06	2.7	46.86	4.5
2^28 x 256B (64 GB)	660	65536	2327	34.5	4.2	136.9	7.8

Comparison with Prior Work ⤴

Two-Server (at 2^28 x 32B = 8 GB)

Metric	RMS24	DPF-PIR	Checklist	TreePIR
Client storage (MB)	60.16	--	1119.74	46.14
Offline comm. (MB)	60.16	--	46.14	46.14
Offline compute (s)	842	--	1141	430
Online comm. (KB)	34.1	1.31	0.64	1049.6
Online compute (ms)	2.7	745	1.8	14
DB params	2^28 x 32B	2^28 x 32B	2^28 x 32B	2^28 x 32B

Single-Server (at 2^28 x 32B = 8 GB)

Metric	RMS24	Spiral	SimplePIR	Piano
Client storage (MB)	100	--	352.98	144.75
Amortized comm. (KB)	46.86	35.0	688	90.41
Amortized compute (ms)	4.5	30273	1123	9.6
Correctness model	Standard	Deterministic	Deterministic	Non-adaptive only
DB params	2^28 x 32B	2^28 x 32B	2^28 x 32B	2^28 x 32B

Key takeaway: RMS24 should be preferred when standard PIR correctness is required (adaptive query sequences) and sublinear server computation is needed. In the two-server setting, it achieves the best balance of low communication, low computation, and manageable client storage -- avoiding the 1 GB+ client storage of Checklist and the 1 MB+ online communication of TreePIR. ³² In the single-server setting, it is 9-14x better in communication and hundreds of times faster in computation than SimplePIR, while providing stronger correctness guarantees than Piano. ³³

Portable Optimizations ⤴

Dummy subset technique: Applicable to any partition-based hint PIR scheme to eliminate leakage from the queried index's partition membership. Removes the need for parallel repetition (λ-factor blowup on all costs). Could be applied to TreePIR's hint system. ³⁴
PRF-based median cutoff for partition selection: Using the median of PRF outputs to select exactly half the partitions allows O(1) membership testing per hint without storing the full partition set. Generalizable to any scheme needing pseudorandom subset selection with compact representation.
Compact two-subset encoding: Encoding two subsets as a bit vector b (partition assignment) plus offset vector r (index within partition) halves the request size compared to sending explicit index lists. ³⁵
Introselect-based fast median: Filtering 7/8 of PRF outputs via heuristic bounds before running introselect for the median. Reduces the median-finding bottleneck in the offline phase. ³⁶

Implementation Notes ⤴

Language: C++ (RMS24 scheme); baselines in C++ (DPF-PIR, Spiral) and Go (Checklist, TreePIR, SimplePIR, Piano) ³⁷
PRF: AES-128 via CryptoPP with AES-NI. A single 128-bit AES output is broken into four to eight 32-bit pseudorandom numbers, shared across different hint/partition combinations to save computation. ³⁸
Polynomial arithmetic: N/A (PRF-based, no polynomial operations)
SIMD / vectorization: AES-NI (via CryptoPP)
Parallelism: Single-threaded in all benchmarks
Lines of Code: ~600 lines (two-server); ~500 lines (single-server) ³⁹
Open source: https://github.com/renling/S3PIR/
Security parameter: λ = 80 ⁴⁰
V_j element size: 32-bit fixed-point numbers for PRF-derived partition selection values ⁴¹

Application Scenarios ⤴

Private DNS lookup: Mentioned as a motivating application where adversarial query influence is realistic (Kaminsky attack), making Piano's non-adaptive correctness model insufficient. ⁴²
Private password checking: Mentioned as a general PIR application. ⁴³

Deployment Considerations ⤴

Database updates: Not addressed. Hints become stale if the database changes; full re-preprocessing is required. ⁴⁴
Two-server trust model: Requires two non-colluding servers. The offline server holds the PRF key and must not collude with the online server.
Session model: Persistent client -- client maintains hint state across queries.
Cold start suitability: No -- requires offline preprocessing (two-server: server-side hint generation; single-server: full DB streaming).
Amortization crossover (single-server): Offline cost dominates for the first query. At 8 GB (2^28 x 32B), the offline phase costs 8192 MB communication and 1146 s computation. Amortized per-query cost (46.86 KB, 4.5 ms) is reached after sufficient queries.
Scalability concern: Client storage is O(λ*sqrt(N)), which reaches 660 MB at 64 GB database size (single-server). The paper acknowledges this as a limitation. ⁴⁵

Key Tradeoffs & Limitations ⤴

Client storage: O(λ*sqrt(N)) is sublinear but still large in practice -- 60 MB at 8 GB, 660 MB at 64 GB. Shared limitation with all amortized sublinear PIR schemes.
Request size: O(sqrt(N) * log N) bits is the dominant cost for small databases. At 2^28 entries, this is ~34 KB. The paper notes this as an open problem -- techniques exist to reduce it but sacrifice other aspects. ⁴⁶
Single-server offline cost: Streaming the entire database is required for the single-server variant, which is prohibitive for very large databases and delays the first query.
No database update support: Hints become invalid when the database changes, requiring full re-preprocessing.
Two-server non-collusion requirement: The two-server variant requires a strong trust model (non-colluding servers), which may not be available in all deployment scenarios.

Open Problems ⤴

Reducing request size: Can the O(sqrt(N)) request size be reduced without sacrificing other efficiency metrics? ⁴⁷
Reducing client storage: The O(λ*sqrt(N)) client storage is a limitation shared by all amortized sublinear PIR schemes.
Database updates: How to handle updates to the database without full re-preprocessing.
Keyword PIR: Supporting queries by keywords rather than indices in the stateful PIR setting. ⁴⁸

Related Papers in Collection ⤴

Paper	Group	Relationship
CK20	Group 2b	First amortized sublinear PIR; RMS24 builds on the CK20 framework but eliminates the λ parallel repetition factor and the O(N) client storage / O(N) per-query computation bottleneck
TreePIR	Group 2b	Introduced partition-based hints adopted by RMS24; RMS24 eliminates TreePIR's O(sqrt(N)) response overhead
Piano	Group 2b	Concurrent single-server scheme with similar asymptotic efficiency but weaker correctness model (non-adaptive queries only); RMS24 achieves 2x better communication and 1.7-3.7x better computation
Checklist	Group 2b	Two-server scheme with O(N) client storage or O(N) per-query computation; RMS24 achieves sublinear on both
SimplePIR	Group 2a	Single-server baseline with linear server computation; RMS24 is hundreds of times faster in computation at 8 GB
Spiral	Group 1a	Single-server single-query PIR baseline; RMS24 has higher communication but orders of magnitude lower computation

Uncertainties ⤴

Median corner case: When two or more elements of V_j equal the median, the hint is discarded. The paper states this happens with probability ~6 x 10^{-5} for N = 2^20 and decreases with database size. The pseudocode omits this handling for readability.
Exact amortization window (single-server): The paper states "close to, but fewer than, 0.5λsqrt(N)" queries. The "0.4λsqrt(N)" figure used in Table 1 caption is described as conservative ("say 0.4λsqrt(N)").
Go vs C++ comparison fairness: RMS24 is implemented in C++ while several baselines (Checklist, TreePIR, SimplePIR, Piano) are in Go. The paper does not normalize for language overhead, though the performance gaps are large enough (orders of magnitude) that this is unlikely to change conclusions.

Section 3.5 (p.7): Correctness failure happens when none of the M = λ * sqrt(N) main hints contains the queried index. By Lemma 1, each hint contains a given index with probability at least 1/(2sqrt(N)), so the failure probability is at most (1 - 1/(2sqrt(N)))^{λ*sqrt(N)} < e^{-λ/2}, which is astronomically small for large λ. This is a standard (non-degrading) guarantee: the same bound holds for every subsequent query because hint replenishment preserves the distribution. ↩
Section 3.1 (p.4): "We will describe our techniques on top of the partition-based hints because they offer advantages in compact hint storage and fast membership testing." The partition-based hint structure comes from TreePIR [22]. Backup hints for single-server replenishment originate from Corrigan-Gibbs et al. [9]. ↩
Section 3.1 (p.4): "Our main idea to address this leakage is for the client to additionally send a dummy subset of indices. The dummy subset contains one random index from each of the sqrt(N)/2 partitions that do not appear in the query subset." ↩
Section 5 (p.13): Piano PIR weakens the correctness guarantee and requires non-adversarial query sequences; publishing the permutation key enables an adversary to force queries into the same partition and cause correctness failure (paraphrase of discussion on p.13). ↩
Section 2 (p.3): The standard stateful PIR definition requires correctness for any adaptively chosen query sequence and privacy against a server that chooses both candidate query sequences. ↩
Section 3.1 (p.4): "The dummy subset contains one random index from each of the sqrt(N)/2 partitions that do not appear in the query subset. The client also randomly swaps the two subsets." ↩
Table 1 (p.2): Two-server: O(sqrt(N)) request, O(1) response, O(λsqrt(N)) client storage, O(sqrt(N)) client computation, O(sqrt(N)) server computation. Single-server: O(sqrt(N)) request, O(sqrt(N)/λ) response amortized, O(λsqrt(N)) client storage. ↩
Section 3.6 (p.8): "The amortized cost of our two-server scheme only depends on the online phase and the hint replenishment step." The offline phase runs once. Each query costs O(1) response and O(sqrt(N)) server computation. ↩
Section 3.4 (p.6): The client retrieves λsqrt(N) main hints and λsqrt(N) backup hints. Backup hints come in pairs; with pairs, the client can make up to λsqrt(N)/2 queries before needing to re-run the offline phase. The 0.4λsqrt(N) figure used in Table 1 is a conservative estimate (the paired strategy achieves close to but fewer than 0.5λ*sqrt(N)). ↩
Section 3.5 (p.7): Lemma 2 proves Pr(b | i) = Pr(b | i') for any two query indices i and i', where b is the bit vector encoding which partitions go to which subset. The proof shows each event has probability tau/2 where tau = C(sqrt(N)-1, sqrt(N)/2)^{-1}, independent of the queried index. ↩
Section 2 (p.3): "PRF is one of the most common cryptographic primitives and can be instantiated from any one-way function" (quote continues with mentions of AES and SHA as practical instantiations). ↩
Section 4.1 (p.8): "We use AES as the pseudorandom function. We use CryptoPP's implementation of AES, which leverages Intel's AES-NI instructions." ↩
Section 3.5 (p.7): Lemma 1 proves each hint contains any particular index with probability at least 1/(2sqrt(N)). With M = λsqrt(N) hints, the probability that none contains the queried index is at most (1 - 1/(2sqrt(N)))^{λsqrt(N)} < e^{-λ/2}. ↩
Section 3.1 (p.4): "A database of size N is divided into sqrt(N) partitions each of size sqrt(N)." ↩
Section 3.2 (p.5): "Hint storage. Each hint is stored as a tuple (j, v-hat_j, e_j, P_j) where j is a unique hint ID, v-hat_j is the cutoff median value, e_j is the extra index, and P_j is the parity." ↩
Section 3.4 (p.6): "A more clever strategy is to have backup hints in pairs, similar in spirit to the two-server hint replenishment algorithm." Each backup hint pair stores parities for both halves. ↩
Section 4.1 (p.8): "We use 32-bit numbers for elements in V_j to save client storage and computation." Hint IDs use 32-bit integers in single-server (reset periodically) and 64-bit in two-server. ↩
Section 3.6 (p.8): "The offline phase costs O(λsqrt(N)) communication and O(λN) computation at the offline server." ↩
Section 3.2 (p.5-6): The compact encoding using b and r costs (sqrt(N)/2 + 1) * log(N) bits, reducing the request size compared to sending explicit subsets at sqrt(N) * log(N) bits. ↩
Section 3.6 (p.8): "The streaming offline phase costs N communication and O(λN) computation, and needs to be run every 0.5λsqrt(N) online queries." The single-server response overhead is O(sqrt(N)/λ) amortized because the O(N) offline communication is amortized over 0.5λ*sqrt(N) queries. ↩
Section 3.5 (p.7): Privacy is proven by showing that the bit vector b (encoding the partition assignment) has probability Pr(b | i) = tau/2 for any query index i (Lemma 2), so observing b gives the server no advantage in distinguishing between any two candidate queries. ↩
Section 3.5 (p.7): Lemma 1 and the paragraph following it. ↩
Section 3.5 (p.8): "Thus, every hint j in H' follows the same distribution as the hint j in H. This shows that the main hints after a query are identically distributed as they were before the query." The proof uses a matrix representation H where each row is a hint and shows the distribution R_i (conditioned on containing i) and R_{-i} (conditioned on not containing i) are preserved. ↩
Section 3.5 (p.7-8): The full distributional argument constructs random variable J (the consumed hint index) with geometric-like distribution Pr(J = j) = (1-q)^j * q, then shows that for any hint j in H', its distribution is a mixture of R (sampled from R with probability 1-(1-q)^j) and R_{-i} (with the remaining probability), which equals R. ↩
Section 2 (p.3): "Piano PIR [34], however, does not satisfy the above definition (even for statically constructed queries) because it requires the client query sequence to have no adversarial influence." ↩
Table 2 (p.10): At 2^28 x 32-byte entries (8 GB), the two-server scheme achieves 60.16 MB offline communication and 842 s offline computation, with 34.1 KB online communication and 2.7 ms online computation. ↩
Section 3.6 (p.8): "The online response overhead is O(1), or 4x to be precise, since the online server and the offline server both send back two parities." This 4x figure is for the two-server scheme. For single-server: "The only difference from the two-server case is that the response overhead is O(sqrt(N)/λ) because the O(N) offline communication is amortized over 0.5λsqrt(N) online queries." The abstract's "twice that of simply fetching" is an approximation. ↩
Table 3 (p.10): At 2^28 x 32-byte entries, the single-server scheme requires 8192 MB offline communication (full DB stream) and 1146 s offline computation. Client storage is 100 MB. ↩
Table 3 (p.10): Client storage for single-server at 8 GB is 100 MB (1.5x the two-server's 60 MB because of backup hint pairs). At 2^28 x 256-byte entries (64 GB), client storage is 660 MB. ↩
Section 3.4 (p.6-7): Algorithm 4 streams the database one partition at a time. For each partition k, it downloads DB[k*sqrt(N) : (k+1)sqrt(N) - 1] and processes all 1.5M hints to update their parities. This requires only O(λsqrt(N)) client memory. ↩
Section 4.2 (p.9): "We run all experiments on an AWS m5.8xlarge instance equipped with a 3.1 GHz Intel Xeon processor and 128 GB RAM." ↩
Section 4.3 (p.9): "Our scheme achieves a balance of low client storage, low communication, and low computation for all database parameters, by avoiding major bottlenecks in previous schemes such as linear client storage, linear server computation, or high communication." ↩
Section 4.3 (p.10): "Compared with the latest version of Piano PIR, which is concurrent with our work, our communication is about 2x better, and our amortized computation is 1.7 -- 3.7x better. Moreover, we achieve these improvements while providing a stronger correctness guarantee." ↩
Section 3.1 (p.4): "Our techniques can be applied to the original sublinear scheme of Corrigan-Gibbs and Kogan [10] or the partition-based hints of TreePIR [22]." ↩
Section 3.2 (p.6): The compact encoding costs (sqrt(N)/2 + 1) * log(N) bits vs. sqrt(N) * log(N) bits for sending explicit subsets directly. ↩
Section 4.1 (p.9): "We can filter out elements that are too large or too small... In expectation, this filters 7/8 of the elements." The probability of filtering a median element is 6 x 10^{-5} for N = 2^20. ↩
Section 4.1 (p.8): "We implemented our scheme in C++. The implementation is available at https://github.com/renling/S3PIR/." ↩
Section 4.1 (p.8-9): "We break up a single 128-bit AES output into four to eight pseudorandom numbers (i.e., v_{j,k} and r_{j,k} in the algorithms) across different hints or partitions to save computation." ↩
Section 4.1 (p.8): "the two-server version of our implementation comprises about 600 lines of code and the single-server version comprises about 500 lines of code." ↩
Section 4.1 (p.8): "We set the parameter λ to 80." ↩
Section 4.1 (p.8-9): "We use 32-bit numbers for elements in V_j to save client storage and computation." The paper notes a corner case where two or more elements equal the median; such hints are discarded with very small probability. ↩
Section 2 (p.3): "Consider DNS lookup, which is a primary application that Piano PIR targets. The threat model of DNS typically assumes that the client may visit a malicious webpage that can trigger DNS queries of the adversary's choosing, e.g., as in the Kaminsky attack." ↩
Section 1 (p.1): "An efficient PIR scheme enables many privacy-preserving applications, such as password check [1]." ↩
Section 6 (p.14): "Other general challenges involving stateful PIR include how to handle updates to the database." ↩
Section 6 (p.14): "A limitation shared by all existing amortized sublinear schemes is that the O(λ*sqrt(N)) client storage, while sublinear, is still quite large in practice. An indirect consequence is that the single-server offline phase cannot do much better than streaming the whole database when the client needs so many hints." ↩
Section 6 (p.14): "An obvious one is the Omega(sqrt(N)) request size. There exist techniques to reduce the request size, but the challenge is to do so without sacrificing other aspects of the algorithm." ↩
Section 6 (p.14). ↩
Section 6 (p.14): "Other general challenges involving stateful PIR include how to handle updates to the database and how to support queries by keywords, and recent works have made some progress in these directions [19, 24]." ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMS24 — Engineering Notes

Lineage ⤴

Core Idea ⤴

Variants ⤴

Novel Primitives / Abstractions ⤴

1. Dummy Subsets

2. Partition-Based Hints with Median Cutoff Selection

Cryptographic Foundation ⤴

Key Data Structures ⤴

Protocol Phases ⤴

Two-Server Scheme

Single-Server Scheme

Two-Server Protocol Details ⤴

Correctness Analysis (Option B: Probabilistic) ⤴

Complexity ⤴

Core metrics

Preprocessing metrics

Preprocessing Characterization

Performance Benchmarks ⤴

Hardware

Two-Server Benchmarks (Table 2)

Single-Server Benchmarks (Table 3)

Comparison with Prior Work ⤴

Two-Server (at 2^28 x 32B = 8 GB)

Single-Server (at 2^28 x 32B = 8 GB)

Portable Optimizations ⤴

Implementation Notes ⤴

Application Scenarios ⤴

Deployment Considerations ⤴

Key Tradeoffs & Limitations ⤴

Open Problems ⤴

Related Papers in Collection ⤴

Uncertainties ⤴

FilesExpand file tree

RMS24_2024_notes.md

Latest commit

History

RMS24_2024_notes.md

File metadata and controls

RMS24 — Engineering Notes

Lineage ⤴

Core Idea ⤴

Variants ⤴

Novel Primitives / Abstractions ⤴

1. Dummy Subsets

2. Partition-Based Hints with Median Cutoff Selection

Cryptographic Foundation ⤴

Key Data Structures ⤴

Protocol Phases ⤴

Two-Server Scheme

Single-Server Scheme

Two-Server Protocol Details ⤴

Correctness Analysis (Option B: Probabilistic) ⤴

Complexity ⤴

Core metrics

Preprocessing metrics

Preprocessing Characterization

Performance Benchmarks ⤴

Hardware

Two-Server Benchmarks (Table 2)

Single-Server Benchmarks (Table 3)

Comparison with Prior Work ⤴

Two-Server (at 2^28 x 32B = 8 GB)

Single-Server (at 2^28 x 32B = 8 GB)

Portable Optimizations ⤴

Implementation Notes ⤴

Application Scenarios ⤴

Deployment Considerations ⤴

Key Tradeoffs & Limitations ⤴

Open Problems ⤴

Related Papers in Collection ⤴

Uncertainties ⤴

Footnotes