[PERFORMANCE]: Optimize stream parser buffer management (O(n²) → O(n))

## Summary

The stream parsers in `mcpgateway/wrapper.py` use string concatenation (`+=`) and slicing in tight loops, creating O(n²) time complexity. For large streaming responses with many chunks, this causes significant performance degradation.

## Impact

- **CPU Overhead**: Each `buffer += text` creates a new string object, copying all existing content
- **Memory Churn**: Frequent allocations and deallocations increase GC pressure
- **Latency**: Large SSE/NDJSON streams become progressively slower as buffer grows

| Operation | Current | Optimized |
|-----------|---------|-----------|
| Append chunk | O(n) | O(1) amortized |
| Slice buffer | O(n) | O(1) |
| **Total per chunk** | **O(n)** | **O(1) amortized** |

## Affected Code

**File**: `mcpgateway/wrapper.py`

### `ndjson_lines()` (Lines 269-303)
```python
buffer = ""
async for chunk in resp.aiter_bytes():
    buffer += text  # Line 291: O(n) - creates new string!
    while True:
        nl_idx = buffer.find("\n")  # O(n) search
        buffer = buffer[nl_idx + 1:]  # Line 297: O(n) - creates new string!
```

### `sse_events()` (Lines 306-364)
Same pattern at lines 324, 330, 348.

## Proposed Fix

Use `io.StringIO` or `bytearray` for O(1) amortized append:

```python
import io

async def ndjson_lines(resp: httpx.Response) -> AsyncIterator[str]:
    buffer = io.StringIO()
    async for chunk in resp.aiter_bytes():
        buffer.write(decoder.decode(chunk))  # O(1) amortized
        # Process complete lines...
```

Or use split-based approach:
```python
async def ndjson_lines(resp: httpx.Response) -> AsyncIterator[str]:
    partial_line = ""
    async for chunk in resp.aiter_bytes():
        lines = (partial_line + decoder.decode(chunk)).split('\n')
        partial_line = lines.pop()  # Keep incomplete line
        for line in lines:
            if line.strip():
                yield line.strip()
```

## Acceptance Criteria

- [ ] No string concatenation (`+=`) in buffer loops
- [ ] No string slicing for buffer truncation
- [ ] Memory usage bounded regardless of stream size
- [ ] Existing SSE/NDJSON tests pass
- [ ] Performance improvement measurable for large streams

## References

- Detailed analysis: `todo/newperf/optimize-stream-parser-buffers.md`
- [Python io.StringIO](https://docs.python.org/3/library/io.html#io.StringIO)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PERFORMANCE]: Optimize stream parser buffer management (O(n²) → O(n)) #1613

Summary

Impact

Affected Code

`ndjson_lines()` (Lines 269-303)

`sse_events()` (Lines 306-364)

Proposed Fix

Acceptance Criteria

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Operation	Current	Optimized
Append chunk	O(n)	O(1) amortized
Slice buffer	O(n)	O(1)
Total per chunk	O(n)	O(1) amortized

[PERFORMANCE]: Optimize stream parser buffer management (O(n²) → O(n)) #1613

Description

Summary

Impact

Affected Code

ndjson_lines() (Lines 269-303)

sse_events() (Lines 306-364)

Proposed Fix

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`ndjson_lines()` (Lines 269-303)

`sse_events()` (Lines 306-364)