Skip to content

Commit 2c9c693

Browse files
therealalephclaude
andcommitted
fix: v1.9.16 — Full mode 50 MiB batch-response truncation (#863)
Apps Script's response body cap is ~50 MiB. tunnel-node had a TCP_DRAIN_MAX_BYTES = 16 MiB per-session cap to stay under it, but multiple sessions in the same batch each contributed up to 16 MiB raw, summing past 50 MiB on busy VPS — N≥4 concurrent sessions × 16 MiB → ≥64 MiB raw → ≥85 MiB after base64. Steam updates and other CDN-served large downloads hit this exactly: `EOF while parsing a string at line 1 column 52428630` from the client and the session aborts mid-stream. Fix: new BATCH_RESPONSE_BUDGET = 32 MiB total-batch cap. Drain loop tracks remaining budget across sessions and stops one short of the cliff. drain_now() now takes max_bytes; effective cap = min(budget, TCP_DRAIN_MAX_BYTES). Sessions deferred this batch keep their buffered data — no data loss, they drain on the next poll. Single-op-path callers and existing tests pass usize::MAX (no extra constraint, original TCP_DRAIN_MAX_BYTES still enforced). New regression test `drain_now_respects_caller_budget_below_per_session_cap` covers the new behavior. Tests: 197 lib + 36 tunnel-node (was 35) all green. UI release build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 82a8cbf commit 2c9c693

4 files changed

Lines changed: 85 additions & 21 deletions

File tree

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "mhrv-rs"
3-
version = "1.9.15"
3+
version = "1.9.16"
44
edition = "2021"
55
description = "Rust port of MasterHttpRelayVPN -- DPI bypass via Google Apps Script relay with domain fronting"
66
license = "MIT"

docs/changelog/v1.9.16.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<!-- see docs/changelog/v1.1.0.md for the file format: Persian, then `---`, then English. -->
2+
• Fix Full mode large-download truncation at exactly 50 MiB ([#863](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/863)). Apps Script's response body cap is ~50 MiB; tunnel-node had a `TCP_DRAIN_MAX_BYTES = 16 MiB` per-session cap to stay under it, but **multiple sessions in the same batch** each contributed up to 16 MiB raw, summing past 50 MiB on busy VPS (Steam/CDN downloads with N≥4 concurrent sessions). Symptom: `batch JSON parse error: EOF while parsing a string at line 1 column 52428630 (body_len=52428630)` followed by session abort + download restart from 0. Fix: new `BATCH_RESPONSE_BUDGET = 32 MiB` total-batch cap; the drain loop tracks remaining budget across sessions and stops one short of the cliff. Sessions deferred this batch keep their buffered data and drain on the next poll (no data loss). New regression test `drain_now_respects_caller_budget_below_per_session_cap`. ۳۶ tunnel-node test (was 35) همه pass + ۱۹۷ lib test همه pass.
3+
---
4+
• Fix Full mode large-download truncation at exactly 50 MiB ([#863](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/863)). Apps Script's response body cap is ~50 MiB; tunnel-node had a `TCP_DRAIN_MAX_BYTES = 16 MiB` per-session cap to stay under it, but **multiple sessions in the same batch** each contributed up to 16 MiB raw, summing past 50 MiB on busy VPS (Steam / CDN downloads with N≥4 concurrent sessions). Symptom: `batch JSON parse error: EOF while parsing a string at line 1 column 52428630 (body_len=52428630)` followed by session abort + download restart from 0. Fix: new `BATCH_RESPONSE_BUDGET = 32 MiB` total-batch cap; the drain loop tracks remaining budget across sessions and stops one short of the cliff. Sessions deferred this batch keep their buffered data and drain on the next poll (no data loss). New regression test `drain_now_respects_caller_budget_below_per_session_cap`. **36 tunnel-node tests** (was 35) + **197 lib tests** all green.

tunnel-node/src/main.rs

Lines changed: 79 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,19 @@ const UDP_RECV_BUF_BYTES: usize = 65536;
9999
/// under the cap and let throughput recover across batches.
100100
const TCP_DRAIN_MAX_BYTES: usize = 16 * 1024 * 1024;
101101

102+
/// Hard cap on the total raw bytes drained across **all sessions** in a
103+
/// single batch response. The per-session cap (`TCP_DRAIN_MAX_BYTES`)
104+
/// alone isn't enough — N concurrent sessions can each contribute up to
105+
/// 16 MiB raw; with N≥4, the summed batch body exceeds Apps Script's
106+
/// 50 MiB ceiling and the client fails JSON parse mid-stream (#863).
107+
///
108+
/// 32 MiB raw → ~43 MiB base64 + per-session JSON envelope overhead
109+
/// (~80 bytes × ≤50 ops cap) → comfortably under 50 MiB total. Any
110+
/// further sessions in the same batch are deferred to the next poll
111+
/// (their data stays in their per-session `read_buf`, so no data loss
112+
/// — they just settle one batch later).
113+
const BATCH_RESPONSE_BUDGET: usize = 32 * 1024 * 1024;
114+
102115
/// First queue-drop on a session always logs at warn level; subsequent
103116
/// drops log at debug only every Nth occurrence so a single congested
104117
/// session can't flood the operator's log.
@@ -340,27 +353,32 @@ async fn udp_reader_task(socket: Arc<UdpSocket>, session: Arc<UdpSessionInner>)
340353
}
341354
}
342355

343-
/// Drain up to `TCP_DRAIN_MAX_BYTES` from the per-session read buffer —
344-
/// no waiting. Used by batch mode where we poll frequently.
356+
/// Drain up to `min(TCP_DRAIN_MAX_BYTES, max_bytes)` from the per-session
357+
/// read buffer — no waiting. Used by batch mode where we poll frequently.
358+
///
359+
/// `max_bytes` is the caller-supplied budget for this drain (typically the
360+
/// remaining batch-response budget after summing previous drains in the
361+
/// same batch). It allows the batch loop to stop one session short of
362+
/// blowing past Apps Script's 50 MiB ceiling on the wire (#863). Pass
363+
/// `usize::MAX` if there's no extra budget constraint (e.g. single-op
364+
/// path outside the batch loop).
345365
///
346-
/// If the buffer is larger than the cap, we return a prefix of the
347-
/// data and leave the remainder in the buffer for the next poll. The
348-
/// cap exists to keep batch responses under Apps Script's ~50 MiB body
349-
/// ceiling on high-bandwidth VPS — see `TCP_DRAIN_MAX_BYTES` for the
350-
/// underlying issue (#460).
366+
/// If the buffer is larger than the effective cap, we return a prefix of
367+
/// the data and leave the remainder in the buffer for the next poll.
351368
///
352-
/// `eof` is reported as true only when the buffer has been fully
353-
/// drained AND upstream has signaled EOF — otherwise a partial drain
354-
/// would prematurely tear the session down on the client side.
355-
async fn drain_now(session: &SessionInner) -> (Vec<u8>, bool) {
369+
/// `eof` is reported as true only when the buffer has been fully drained
370+
/// AND upstream has signaled EOF — otherwise a partial drain would
371+
/// prematurely tear the session down on the client side.
372+
async fn drain_now(session: &SessionInner, max_bytes: usize) -> (Vec<u8>, bool) {
356373
let mut buf = session.read_buf.lock().await;
357374
let raw_eof = session.eof.load(Ordering::Acquire);
358-
if buf.len() <= TCP_DRAIN_MAX_BYTES {
375+
let cap = max_bytes.min(TCP_DRAIN_MAX_BYTES);
376+
if buf.len() <= cap {
359377
let data = std::mem::take(&mut *buf);
360378
(data, raw_eof)
361379
} else {
362380
// Take the prefix; leave the tail in the buffer.
363-
let tail = buf.split_off(TCP_DRAIN_MAX_BYTES);
381+
let tail = buf.split_off(cap);
364382
let head = std::mem::replace(&mut *buf, tail);
365383
// Don't propagate eof yet — buffer still has data even if upstream
366384
// has closed. The client will get eof on the drain that returns
@@ -1062,12 +1080,25 @@ async fn handle_batch(
10621080
// session and abort the reader_task with the tail still
10631081
// buffered, dropping those bytes.
10641082
let mut tcp_eof_sids: Vec<String> = Vec::new();
1083+
// Track remaining batch-response budget across all session drains
1084+
// (#863). Per-session `TCP_DRAIN_MAX_BYTES` alone wasn't enough —
1085+
// several concurrent sessions each contributing 16 MiB summed past
1086+
// Apps Script's 50 MiB response ceiling. This cap stops one session
1087+
// short of the cliff; deferred sessions drain on the next poll.
1088+
let mut remaining_budget: usize = BATCH_RESPONSE_BUDGET;
10651089
for (i, sid, inner) in &tcp_drains {
1066-
let (data, eof) = drain_now(inner).await;
1090+
let (data, eof) = drain_now(inner, remaining_budget).await;
1091+
let drained = data.len();
10671092
if eof {
10681093
tcp_eof_sids.push(sid.clone());
10691094
}
10701095
results.push((*i, tcp_drain_response(sid.clone(), data, eof)));
1096+
remaining_budget = remaining_budget.saturating_sub(drained);
1097+
if remaining_budget == 0 {
1098+
// Budget exhausted; remaining sessions in `tcp_drains` keep
1099+
// their buffered data and pick up next batch.
1100+
break;
1101+
}
10711102
}
10721103
if !tcp_eof_sids.is_empty() {
10731104
let mut sessions = state.sessions.lock().await;
@@ -1718,24 +1749,53 @@ mod tests {
17181749
let oversized = TCP_DRAIN_MAX_BYTES + 4096;
17191750
inner.read_buf.lock().await.resize(oversized, 0xab);
17201751

1721-
let (first, eof) = drain_now(&inner).await;
1752+
let (first, eof) = drain_now(&inner, usize::MAX).await;
17221753
assert_eq!(first.len(), TCP_DRAIN_MAX_BYTES);
17231754
assert!(!eof, "shouldn't propagate eof while buffer still has data");
17241755

17251756
// Tail remains for the next poll.
17261757
assert_eq!(inner.read_buf.lock().await.len(), 4096);
17271758

1728-
let (second, _) = drain_now(&inner).await;
1759+
let (second, _) = drain_now(&inner, usize::MAX).await;
17291760
assert_eq!(second.len(), 4096);
17301761
assert!(inner.read_buf.lock().await.is_empty());
17311762
}
17321763

1764+
#[tokio::test]
1765+
async fn drain_now_respects_caller_budget_below_per_session_cap() {
1766+
// Issue #863: per-session TCP_DRAIN_MAX_BYTES alone wasn't enough
1767+
// because N sessions × 16 MiB summed past Apps Script's 50 MiB
1768+
// response ceiling. The batch loop now passes a remaining-budget
1769+
// cap; drain_now must honor `min(budget, TCP_DRAIN_MAX_BYTES)`,
1770+
// leaving the tail for the next poll exactly like the per-session
1771+
// cap path does.
1772+
let inner = fake_inner().await;
1773+
// 1 MiB buffered, but caller only has 256 KiB budget left.
1774+
inner
1775+
.read_buf
1776+
.lock()
1777+
.await
1778+
.resize(1024 * 1024, 0xcd);
1779+
1780+
let (drained, eof) = drain_now(&inner, 256 * 1024).await;
1781+
assert_eq!(drained.len(), 256 * 1024);
1782+
assert!(!eof, "tail still buffered, eof must wait");
1783+
1784+
// The remaining 768 KiB stays put for the next poll.
1785+
assert_eq!(inner.read_buf.lock().await.len(), 768 * 1024);
1786+
1787+
// Next call with full budget drains the rest.
1788+
let (rest, _) = drain_now(&inner, usize::MAX).await;
1789+
assert_eq!(rest.len(), 768 * 1024);
1790+
assert!(inner.read_buf.lock().await.is_empty());
1791+
}
1792+
17331793
#[tokio::test]
17341794
async fn drain_now_passes_through_when_under_cap() {
17351795
let inner = fake_inner().await;
17361796
inner.read_buf.lock().await.extend_from_slice(b"hello world");
17371797

1738-
let (data, eof) = drain_now(&inner).await;
1798+
let (data, eof) = drain_now(&inner, usize::MAX).await;
17391799
assert_eq!(data, b"hello world");
17401800
assert!(!eof);
17411801
assert!(inner.read_buf.lock().await.is_empty());
@@ -1754,11 +1814,11 @@ mod tests {
17541814
.await
17551815
.resize(TCP_DRAIN_MAX_BYTES + 100, 0);
17561816

1757-
let (head, head_eof) = drain_now(&inner).await;
1817+
let (head, head_eof) = drain_now(&inner, usize::MAX).await;
17581818
assert_eq!(head.len(), TCP_DRAIN_MAX_BYTES);
17591819
assert!(!head_eof, "premature eof would tear the session");
17601820

1761-
let (tail, tail_eof) = drain_now(&inner).await;
1821+
let (tail, tail_eof) = drain_now(&inner, usize::MAX).await;
17621822
assert_eq!(tail.len(), 100);
17631823
assert!(tail_eof, "eof finally flips when buffer is drained");
17641824
}

0 commit comments

Comments
 (0)