Skip to content

Commit e7ebff0

Browse files
authored
Add Base64DecodingStream (#8226)
## Summary of changes Adds `Base64DecodingStream` helper class ## Reason for change In a few places (particularly Remote Config and Newtonsoft.JSON) we have code like this: ```csharp string someValue = // var contentDecode = Convert.FromBase64String(someValue); using var stream = new MemoryStream(contentDecode); ``` This takes an existing `string`, decodes it from base64 to a `byte[]`, then feeds that `byte[]` into a `MemoryStream`. For big strings, that's a potentially large extra array allocation we can avoid. Instead, `Base64DecodingStream` acts effectively as a `MemoryStream` over the `string`, doing the decode either on the fly (.NET Core), or by using a pooled buffer to decode the input string in chunks. ## Implementation details Basically asked 🤖 Claude to do this, then reviewed and tidied up and iterated. The general idea is: - Keep track of where in the string we are - **.NET Framework/.NET Standard only** - Decode the next section of the string into an array-pool rented buffer, using the vendored `Base64.DecodeFromUtf8InPlace` - Copy the buffer into the destination `byte[]` - **.NET Core only** - Decode the next section of the string directly into the destination `Span<T>` It didn't seem worth trying to unify these two paths, given the lack of `Convert.TryFromBase64Chars` in .NET Framework. We _could_ use `Base64.DecodeFromUtf8` to write directly to the destination span instead, but we would still need to do the narrowing, and doing that in the destination span is a little risky, as it would mean we write a bunch of bytes which are then leftover junk. It's _probably_ still fine, but I don't know that it's worth the complexity/risk. > [!WARNING] > Rather than handle the case where you're passed a destination buffer that's <3 bytes, this implementation currently just throws. Otherwise we have to decode into a stackallocated buffer, and hang onto the "overflow" bytes to avoid returning 0 when we're not EOF, which is a bit of a pain. For the places we currently use it, I don't think this will be a problem, but if others disagree, we can handle the edge case too. ## Test coverage Added a variety of unit tests for the implementation ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-940 This will be part of a Remote config stack, but for now kept it agnostic
1 parent 76f3785 commit e7ebff0

File tree

2 files changed

+579
-0
lines changed

2 files changed

+579
-0
lines changed
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
// <copyright file="Base64DecodingStream.cs" company="Datadog">
2+
// Unless explicitly stated otherwise all files in this repository are licensed under the Apache 2 License.
3+
// This product includes software developed at Datadog (https://www.datadoghq.com/). Copyright 2017 Datadog, Inc.
4+
// </copyright>
5+
6+
#nullable enable
7+
8+
using System;
9+
using System.Diagnostics.CodeAnalysis;
10+
using System.IO;
11+
using System.Threading;
12+
using System.Threading.Tasks;
13+
14+
namespace Datadog.Trace.Util.Streams;
15+
16+
/// <summary>
17+
/// A read-only, forward-only <see cref="Stream"/> that decodes a base64-encoded <see cref="string"/>
18+
/// into its raw bytes on-the-fly, without allocating the full decoded byte array.
19+
/// </summary>
20+
/// <remarks>
21+
/// This stream processes base64 input in chunks, decoding only as much as
22+
/// the caller requests via <see cref="Read(byte[], int, int)"/>.
23+
/// The input must be valid base64 with standard padding (no embedded whitespace).
24+
/// On .NET Core, decodes directly into the caller's buffer with no intermediate allocation.
25+
/// On .NET Framework, uses <see cref="ArrayPool{T}"/> internally to avoid GC pressure.
26+
/// </remarks>
27+
internal sealed class Base64DecodingStream : Stream
28+
{
29+
#if !NETCOREAPP
30+
/// <summary>
31+
/// Maximum number of base64 characters to process per decode operation.
32+
/// Must be a multiple of 4 (the base64 quantum size).
33+
/// The internal buffer also holds the narrowed ASCII bytes before
34+
/// in-place decode, so the buffer size equals this value.
35+
/// </summary>
36+
private const int CharsPerChunk = 4096;
37+
#endif
38+
39+
private readonly string _base64;
40+
private int _charPosition;
41+
42+
#if !NETCOREAPP
43+
private byte[] _buffer;
44+
private int _bufferOffset;
45+
private int _bufferCount;
46+
#endif
47+
48+
public Base64DecodingStream(string base64)
49+
{
50+
_base64 = base64 ?? throw new ArgumentNullException(nameof(base64));
51+
#if !NETCOREAPP
52+
_buffer = ArrayPool<byte>.Shared.Rent(CharsPerChunk);
53+
#endif
54+
}
55+
56+
public override bool CanRead => true;
57+
58+
public override bool CanSeek => false;
59+
60+
public override bool CanWrite => false;
61+
62+
public override long Length => throw new NotSupportedException();
63+
64+
public override long Position
65+
{
66+
get => throw new NotSupportedException();
67+
set => throw new NotSupportedException();
68+
}
69+
70+
#if NETCOREAPP
71+
public override int Read(byte[] buffer, int offset, int count)
72+
=> Read(buffer.AsSpan(offset, count));
73+
74+
public override int Read(Span<byte> destination)
75+
{
76+
if (destination.Length == 0 || _charPosition >= _base64.Length)
77+
{
78+
return 0;
79+
}
80+
81+
var remainingChars = _base64.Length - _charPosition;
82+
var maxQuanta = destination.Length / 3;
83+
84+
if (maxQuanta == 0)
85+
{
86+
ThrowDestinationTooSmall();
87+
}
88+
89+
var charsToProcess = maxQuanta * 4;
90+
if (charsToProcess >= remainingChars)
91+
{
92+
charsToProcess = remainingChars;
93+
}
94+
95+
if (!Convert.TryFromBase64Chars(
96+
_base64.AsSpan(_charPosition, charsToProcess),
97+
destination,
98+
out var bytesWritten))
99+
{
100+
ThrowFormatException();
101+
}
102+
103+
_charPosition += charsToProcess;
104+
return bytesWritten;
105+
}
106+
107+
public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default)
108+
{
109+
return cancellationToken.IsCancellationRequested
110+
? new ValueTask<int>(Task.FromCanceled<int>(cancellationToken))
111+
: new ValueTask<int>(Read(buffer.Span));
112+
}
113+
#else
114+
public override int Read(byte[] buffer, int offset, int count)
115+
{
116+
if (count == 0)
117+
{
118+
return 0;
119+
}
120+
121+
// Added for consistency with .NET Core case
122+
if (count < 3)
123+
{
124+
ThrowDestinationTooSmall();
125+
}
126+
127+
var totalRead = 0;
128+
129+
while (count > 0)
130+
{
131+
// Drain any previously decoded bytes still in the internal buffer
132+
var available = _bufferCount - _bufferOffset;
133+
if (available > 0)
134+
{
135+
var toCopy = Math.Min(count, available);
136+
Buffer.BlockCopy(_buffer, _bufferOffset, buffer, offset, toCopy);
137+
_bufferOffset += toCopy;
138+
offset += toCopy;
139+
count -= toCopy;
140+
totalRead += toCopy;
141+
continue;
142+
}
143+
144+
// No buffered bytes remain — decode another chunk from the input string
145+
if (_charPosition >= _base64.Length)
146+
{
147+
break;
148+
}
149+
150+
DecodeNextChunk();
151+
}
152+
153+
return totalRead;
154+
}
155+
#endif
156+
157+
public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
158+
{
159+
return cancellationToken.IsCancellationRequested
160+
? Task.FromCanceled<int>(cancellationToken)
161+
: Task.FromResult(Read(buffer, offset, count));
162+
}
163+
164+
public override void Flush()
165+
{
166+
}
167+
168+
public override long Seek(long offset, SeekOrigin origin) => throw new NotSupportedException();
169+
170+
public override void SetLength(long value) => throw new NotSupportedException();
171+
172+
public override void Write(byte[] buffer, int offset, int count) => throw new NotSupportedException();
173+
174+
protected override void Dispose(bool disposing)
175+
{
176+
#if !NETCOREAPP
177+
if (_buffer is { } buffer)
178+
{
179+
_buffer = null!;
180+
ArrayPool<byte>.Shared.Return(buffer);
181+
}
182+
#endif
183+
184+
base.Dispose(disposing);
185+
}
186+
187+
[DoesNotReturn]
188+
private static void ThrowFormatException() => throw new FormatException("The input is not a valid base64 string.");
189+
190+
[DoesNotReturn]
191+
private static void ThrowDestinationTooSmall() => throw new ArgumentException("Destination buffer must be at least 3 bytes to hold a decoded base64 quantum.");
192+
193+
#if !NETCOREAPP
194+
private void DecodeNextChunk()
195+
{
196+
var remainingChars = _base64.Length - _charPosition;
197+
var charsToProcess = Math.Min(remainingChars, CharsPerChunk);
198+
199+
// Round down to a multiple of 4 for non-final chunks.
200+
// Base64 decoding operates on 4-character quanta; the final chunk
201+
// may include padding characters and is always a valid quantum boundary.
202+
if (charsToProcess < remainingChars)
203+
{
204+
charsToProcess &= ~3;
205+
}
206+
207+
if (charsToProcess == 0)
208+
{
209+
// Remaining chars < 4 and this is not the final chunk — shouldn't happen
210+
// for valid base64, but guard against infinite loops.
211+
_charPosition = _base64.Length;
212+
return;
213+
}
214+
215+
// All valid base64 characters are in the ASCII range (0–127),
216+
// so we can safely narrow each char to a byte for the UTF-8 decoder.
217+
// OR-accumulate all chars to detect non-ASCII input without branching per character.
218+
var buf = _buffer;
219+
var str = _base64;
220+
var offset = _charPosition;
221+
var nonAscii = 0;
222+
223+
for (var i = 0; i < charsToProcess; i++)
224+
{
225+
var c = str[offset + i];
226+
nonAscii |= c;
227+
buf[i] = (byte)c;
228+
}
229+
230+
if (nonAscii > 127)
231+
{
232+
ThrowFormatException();
233+
}
234+
235+
// Decode the UTF-8 base64 bytes in-place. The decoded output is always
236+
// shorter than the input (3 bytes per 4 input bytes), so in-place is safe.
237+
var status = Base64.DecodeFromUtf8InPlace(buf.AsSpan(0, charsToProcess), out var bytesWritten);
238+
if (status != OperationStatus.Done)
239+
{
240+
ThrowFormatException();
241+
}
242+
243+
_bufferOffset = 0;
244+
_bufferCount = bytesWritten;
245+
_charPosition += charsToProcess;
246+
}
247+
#endif
248+
}

0 commit comments

Comments
 (0)