Skip to content

✨ feat(load-test): add load test CLI with Typer, handler registry, and Rich output#70

Merged
itisnotyourenv merged 9 commits into
mainfrom
test-bot-layer
Feb 22, 2026
Merged

✨ feat(load-test): add load test CLI with Typer, handler registry, and Rich output#70
itisnotyourenv merged 9 commits into
mainfrom
test-bot-layer

Conversation

@itisnotyourenv

@itisnotyourenv itisnotyourenv commented Feb 22, 2026

Copy link
Copy Markdown
Owner

Summary

Create a CLI app at src/presentation/load_test/ which allows load testing for a bot

  • Typer CLI with interactive prompts and flag overrides (python -m src.presentation.load_test)
  • Handler registry with auto-discovery — add a new handler by creating a file in handlers/ with @register_handler, nothing else to touch
  • Rich progress bar during test execution (aiogram INFO logs suppressed for clean display)
  • Rich report panels — color-coded configuration, results, latency percentiles, and error details
  • Plain text reports saved to reports/ for archival

Also includes minor bot layer cleanup:

  • Simplified Dispatcher and DI initialization in bot/main.py
  • Fixed duplicate word typo in AdminFilter docstring

Changes

Area What changed
src/presentation/load_test/ New package: CLI, registry, handlers, session, metrics, report, runner
src/presentation/load_test/README.md Comprehensive docs with usage examples and handler guide
src/presentation/bot/main.py Pass config to Dispatcher, simplify DBProvider and start_polling
src/presentation/bot/filters/admin.py Fix "the the" → "the" typo
pyproject.toml Add typer dep, ruff ignores for load_test, coverage omit
.gitignore Ignore src/presentation/load_test/reports/

Usage

# Interactive (prompts for all params)
python -m src.presentation.load_test

# Non-interactive
python -m src.presentation.load_test --total 10000 --concurrency 100 --handler start -n "my_test"

Checklist

  • Ruff lint + format pass
  • Smoke tested with live database
  • Progress bar and Rich report verified
  • README with usage examples and handler registration guide
  • Plain text report file saving works
  • .gitignore updated for reports directory
  • Unit tests (load test CLI excluded from coverage — tool runs against real DB)

🤖 Generated with Claude Code

itisnotyourenv and others added 8 commits February 21, 2026 06:33
Replace per-update patch.object mocking with a NoOpSession (BaseSession
subclass) that intercepts all Telegram API calls at the session level,
returning correct types (Message for send/edit, True for callbacks).

- Add NoOpSession with typed responses for SendMessage, EditMessageText,
  AnswerCallbackQuery, DeleteMessage, EditMessageReplyMarkup
- Uncomment make_callback_update factory for callback_language handler
- Add first error traceback capture to LoadTestMetrics
- Add RPS metric to the load test report
- Default total-updates bumped to 1000

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gistry

Refactor monolithic bot-load-test.py into src/presentation/load_test/
with proper separation of concerns:

- Typer CLI with interactive prompts and flag overrides
- Registry pattern for extensible handler discovery
- Extracted NoOpSession, metrics, report, and runner modules
- Comprehensive README with usage examples and handler guide

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pass config directly to Dispatcher constructor, remove explicit
postgres config from DBProvider, and clean up start_polling call.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pass config to Dispatcher constructor and remove explicit postgres
config from DBProvider, matching the bot main.py pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace plain text console output with Rich panels and tables:
- Configuration panel (blue) with handler, concurrency, throughput
- Results panel (green) with color-coded success/error counts
- Latency panel (cyan) with highlighted p95/p99 percentiles
- Error panel (red) with traceback when errors occur
- Suppress aiogram.event INFO logs during test for clean progress bar

Plain text format preserved for file reports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude

claude Bot commented Feb 22, 2026

Copy link
Copy Markdown

PR Review: ✨ feat(load-test): add load test CLI with Typer, handler registry, and Rich output

Overall, this is a well-structured addition. The separation of concerns across cli.py, runner.py, metrics.py, report.py, session.py, and the handler registry is clean, the NoOpSession is a solid implementation, and the README is comprehensive. Below are the findings.


🔴 Critical Issues

1. DI container is never closed (runner.py)

setup_dispatcher creates an async Dishka container but there is no await container.close() call after the test finishes. This leaves database connections open and may exhaust the pool on repeat runs or large tests.

# runner.py – run_load_test
config = load_config()
dp, bot = await setup_dispatcher(config)   # container created inside
...
await asyncio.gather(*tasks)
# ← container.close() never called

setup_dispatcher should return the container alongside (dp, bot) so run_load_test can close it in a finally block:

dp, bot, container = await setup_dispatcher(config)
try:
    ...
    await asyncio.gather(*tasks)
finally:
    await container.close()

2. import statement inside function body (runner.py, line ~1092)

# At the very end of run_load_test:
from rich.console import Console
Console().print(f"\nReport saved: [bold green]{filepath}[/bold green]")

This import belongs at the top of the file. A module-level console object is already established in report.py – either reuse the one from there or move the import to the top of runner.py. Late imports hide dependencies and make the code harder to audit.


🟡 Suggestions

3. Memory: all updates pre-allocated as a list

updates = [factory(update_id=i+1, ...) for i in range(total_updates)]

For large runs (100K+ updates) every Update object (with its nested Message/CallbackQuery) is materialised in RAM before a single one is processed. A generator piped through a bounded queue or via asyncio.Semaphore directly would cap peak memory to concurrency objects:

async def _worker(sem, i):
    update = factory(update_id=i+1, user_id=base_user_id + (i % user_pool_size))
    async with sem:
        ...

tasks = [_worker(semaphore, i) for i in range(total_updates)]

This matches the existing semaphore pattern without the up-front allocation.

4. message_id=update_id - 1 produces message_id=0 for the first update (callback_language.py)

message = Message(
    message_id=update_id - 1,   # update_id=1 → message_id=0
    ...
)

Telegram message IDs start at 1; ID 0 is not a valid value. With NoOpSession this won't surface an error today, but if aiogram or a middleware ever validates the ID this factory will break unexpectedly. Using update_id directly or abs(update_id - 1) or 1 avoids the edge case.

5. No error handling around auto-discovery imports (handlers/__init__.py)

for _f in Path(__file__).parent.glob("*.py"):
    if _f.name not in ("__init__.py", "registry.py"):
        importlib.import_module(f".{_f.stem}", __package__)

If any handler file has a syntax error or a failed import, the entire handlers module raises at import time with a traceback that points to __init__.py rather than the offending file. Wrapping each import in try/except ImportError and logging the failure would give a much clearer diagnostic.

6. Latency list grows without bound

metrics.latencies stores one float per update. At 1M updates × 8 bytes ≈ 8 MB — manageable but worth being aware of. For tests that will regularly run at this scale, a reservoir-sampling or t-digest approach would bound memory while preserving percentile accuracy.


🔵 Minor / Polish

7. pyproject.toml: typer in production dependencies

typer (+ rich, shellingham, annotated-doc) is added to the main [project.dependencies] section. Since the load test CLI is a development/testing tool, these should live in the [project.optional-dependencies] dev group alongside pytest and friends, preventing typer from being installed in production containers.

8. Commit message inconsistency

One commit (e770652) is titled simply add progress bar without an emoji, conventional-commit type, or scope — inconsistent with every other commit in the PR which follow ♻️ type(scope): description format. Minor, but worth squashing or amending for a clean history.

9. Duplicate config=config refactoring spread across two commits

Commits ca5dbc3 (refactor bot main.py) and 37c7e9f (align load-test runner) make the same conceptual change to two files. These would read more clearly as a single commit, or alternatively the load-test runner change should have been part of the initial load-test commit.


✅ What's done well

  • NoOpSession correctly extends BaseSession and returns typed objects (not bare None) so aiogram deserialization doesn't fail — this is the right approach.
  • The @register_handler decorator with duplicate-name detection is clean and the Protocol-based UpdateFactory type hint is idiomatic.
  • Auto-discovery via importlib in __init__.py means adding a handler requires zero registry edits.
  • Rich panels / progress bar integration is well-composed and the plain-text fallback for saved reports is a nice touch.
  • Excluding the load-test package from both coverage and strict lint rules is appropriate and correctly scoped.
  • The README is detailed enough to onboard a new contributor without reading source code.

Summary

Category Finding
🔴 Critical DI container never closed → DB connection leak
🔴 Critical import inside function at end of runner.py
🟡 Suggestion Pre-allocating all updates in memory
🟡 Suggestion message_id=0 edge case in callback_language factory
🟡 Suggestion No error handling in handler auto-discovery
🔵 Minor typer in prod deps instead of dev
🔵 Minor One non-conventional commit message

Overall assessment: Request Changes — the container leak in particular should be addressed before merge, as repeated or large load test runs will leave stale connections. The rest are suggestions and can be addressed at discretion.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude

claude Bot commented Feb 22, 2026

Copy link
Copy Markdown

PR Review: feat(load-test): add load test CLI with Typer, handler registry, and Rich output

Overall this is a solid, well-thought-out addition. The architecture is clean, the README is comprehensive, and the NoOpSession approach for intercepting Telegram API calls is the right design. Below is my structured feedback.

Critical Issues

1. DI container never closed - resource leak (runner.py)

The async container holds open database connection pool resources. After the test completes the container is never closed, so connections are leaked until the process exits. In run_load_test, make_async_container(...) is called but await container.close() is never invoked. Add cleanup:

try:
    await asyncio.gather(*tasks)
finally:
    await container.close()

2. typer added to main dependencies, not dev (pyproject.toml)

typer is a developer/load-testing tool that has no business being in production. It should go under [project.optional-dependencies.dev] (or a new load-test extras group), similar to pytest, ruff, etc. As-is, every production deployment pulls in typer, shellingham, annotated-doc, and click unnecessarily.

Important Suggestions

3. Module-level logging side effect in runner.py

This mutates a global logger at import time. Any code that imports runner (directly or transitively) silently suppresses aiogram event logs. Move this inside run_load_test(), restoring the original level afterward:

original_level = aiogram_logger.level
aiogram_logger.setLevel(logging.WARNING)
try:
    ...
finally:
    aiogram_logger.setLevel(original_level)

4. No input validation for total_updates and concurrency

Passing --concurrency 0 creates asyncio.Semaphore(0) which blocks forever. Passing --total 0 silently produces a "No data." report. Add min=1 constraints to the Typer options.

5. All updates materialised upfront - potential OOM for large tests

For 100,000+ updates, updates = [factory(...) for i in range(total_updates)] allocates every Update object in memory before processing begins. The README already flags this. A simple improvement: generate each update lazily inside _process_single_update by passing an index instead.

Minor Issues

6. Console() instantiated twice

report.py defines a module-level console = Console(), but runner.py creates a new one at the end just for a single print. Import and reuse the instance from report.py.

7. _compute_stats called twice with the same data

Both print_report and format_report independently call _compute_stats, each sorting the full latency list. Compute stats once in run_load_test and pass the result to both functions.

8. message_id=update_id - 1 can be 0 in callback_language.py

When update_id=1, message_id=0. Telegram message IDs start at 1. Use max(1, update_id - 1) or a fixed constant like message_id=1.

9. wall_elapsed includes progress bar teardown time

wall_elapsed = time.perf_counter() - wall_start is computed after the with progress: block exits. Move the measurement inside the block, immediately after await asyncio.gather(*tasks), for accuracy.

10. Missing newline at end of README.md

The diff shows \ No newline at end of file.

What's Done Well

  • NoOpSession design is exactly right - intercepting at the session level avoids deserialization errors and is far cleaner than per-method patching.
  • Handler auto-discovery via importlib glob in __init__.py is elegant; no registry file modifications needed to add a new handler.
  • Concurrency control via asyncio.Semaphore is the correct primitive.
  • README is thorough: usage examples, architecture, troubleshooting, and a step-by-step handler registration guide.
  • bot/main.py refactor - passing config to the Dispatcher constructor instead of start_polling is a cleaner pattern and the runner correctly mirrors it.
  • Rich report output is well structured with appropriate colour coding (p95/p99 highlighted).

Overall Assessment

Request changes. Two issues should be fixed before merge:

  1. Close the DI container after the test run (connection pool leak).
  2. Move typer to dev/optional dependencies (avoid polluting production installs).

Issues 3 (logging side effect) and 4 (input validation) are also worth addressing before merge. The remaining items are polish.

@itisnotyourenv itisnotyourenv merged commit f9566fe into main Feb 22, 2026
8 of 9 checks passed
@itisnotyourenv itisnotyourenv deleted the test-bot-layer branch February 22, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant