Skip to content

feat: implement container pooling and fix transport error handling#551

Merged
jleni merged 7 commits intodevfrom
feat/pooling
Jul 20, 2025
Merged

feat: implement container pooling and fix transport error handling#551
jleni merged 7 commits intodevfrom
feat/pooling

Conversation

@jleni
Copy link
Copy Markdown
Member

@jleni jleni commented Jul 20, 2025

Summary

This PR implements container pooling for Zemu to improve performance and fixes transport error handling to fail fast instead of timing out.

Key Features

  1. Container Pooling (feat: Add container pooling for 60-80% faster test execution #549)

    • Pre-initialize Docker containers for each device model to eliminate 3-5 second startup delays
    • Containers are reused across test runs with proper cleanup between uses
    • Configurable pool sizes via environment variables
    • Backward compatible - pooling can be disabled via ZEMU_DISABLE_POOL=true
  2. Transport Error Handling

    • Fix issue where errors like 0x6984 (INVALID_DATA) or 0x6E00 (CLA_NOT_SUPPORTED) would cause timeouts
    • Implement fast-fail mechanism that detects critical transport errors immediately
    • Add comprehensive error tracking and propagation throughout the codebase

Changes

Container Pooling Implementation

  • src/containerPool.ts: New singleton class managing pooled containers
  • Pre-allocates containers for each device model (nanos, nanox, nanosp, stax, flex)
  • Implements container lifecycle management with acquire/release pattern
  • Handles container reset and ELF reloading between uses

Transport Error Handling

  • src/errors.ts: Define APDU status codes and error utilities
  • src/Zemu.ts:
    • Add transport proxy to intercept and track errors
    • Update waitUntilScreenIs(), waitUntilScreenIsNot(), and waitForText() to check for critical errors
    • Modify getEvents() to propagate transport errors instead of silently returning empty array

Configuration

  • Environment variables for pool configuration:
    • ZEMU_POOL_NANOS: Number of Nano S containers (default: 2)
    • ZEMU_POOL_NANOX: Number of Nano X containers (default: 1)
    • ZEMU_POOL_NANOSP: Number of Nano S+ containers (default: 1)
    • ZEMU_POOL_STAX: Number of Stax containers (default: 1)
    • ZEMU_POOL_FLEX: Number of Flex containers (default: 1)
    • ZEMU_DISABLE_POOL: Disable pooling entirely (default: false)

Tests

  • Comprehensive test coverage for both features
  • Error handling tests demonstrate fast-fail behavior (3-5ms vs multi-second timeouts)
  • All existing tests continue to pass

Performance Impact

  • Before: Each test container startup takes 3-5 seconds
  • After: Near-instant container acquisition from pool
  • Significant improvement for test suites with many individual test cases

Migration Guide

No breaking changes - the pooling feature is enabled by default but can be disabled:

// Disable pooling globally
process.env.ZEMU_DISABLE_POOL = 'true'

// Or per instance
const sim = new Zemu(elfPath, {
  containerPooling: false
})

Testing

All tests pass including new test suites:

  • tests/error-handling.test.ts: Validates fast-fail error handling
  • tests/error-handling-fixed.test.ts: Comprehensive error propagation tests
  • tests/basic.*.test.ts: All existing tests continue to work

Related Issues

  • Addresses performance concerns raised in various issues
  • Fixes timeout issues when transport errors occur

Checklist

  • Code follows project style guidelines
  • Tests added/updated and passing
  • Backward compatibility maintained
  • Documentation updated in code comments
  • No breaking changes

jleni added 6 commits July 20, 2025 11:21
- Fix YAML linting issues in GitHub workflow files
  - Remove trailing spaces in ci-ts.yaml
  - Add missing newlines at end of YAML files
- Fix typo: settingsNavnavLeftButton → settingsNavLeftButton
- Organize imports in containerPool.ts (external packages first)
- Remove unused model variable in cleanup() method
- Apply automated code formatting improvements
- Fix missing parentheses in tests/minapp/index.ts toString() call
- Add error handling to gRPC exchange promise chain
- Remove async modifier from globalsetup.ts (no await needed)
- Add race condition protection for pool initialization
- Document DEV_CERT_PRIVATE_KEY as development-only test key
- Fix trailing spaces in .github/dependabot.yml
- Add missing await to async operations in tests/pullImageKillOld.ts
- Wrap async calls in IIFE to prevent CI hanging on cleanup

This should resolve the hanging CI build and address the remaining
CodeRabbit linting issues.
- Add error tracking in transport layer via proxy interceptor
- Create errors.ts with APDU status codes and error handling utilities
- Propagate critical transport errors in waitUntilScreenIs, waitUntilScreenIsNot, and waitForText
- Update getEvents to check for critical transport errors before making requests
- Add comprehensive tests to verify fast-fail behavior

Previously, when errors like 0x6984 (INVALID_DATA) or 0x6E00 (CLA_NOT_SUPPORTED) occurred,
Zemu would continue waiting for timeouts instead of failing immediately. Now these errors
are detected and cause immediate failure with clear error messages.

Fixes issue where transport errors were absorbed causing unnecessary timeout delays.
The test was expecting 0x6984 (INVALID_DATA) but invalid CLA (0xFF) actually
triggers 0x6E00 (CLA_NOT_SUPPORTED) which is the correct behavior.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 20, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/pooling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

- Remove unused imports in test files
- Fix import organization in Zemu.ts
- Fix formatting (newlines at end of files)
- Apply Biome linter rules

All tests passing after linter fixes.
@jleni jleni merged commit 006249c into dev Jul 20, 2025
3 checks passed
@jleni jleni deleted the feat/pooling branch July 20, 2025 14:41
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jleni maybe this file is not needed anymore? depends on the failure being copied to the original file or not

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this one? Seems to be repeated too

Comment thread tests/pullImageKillOld.ts
Comment on lines +3 to +6
;(async () => {
await Zemu.checkAndPullImage()
await Zemu.stopAllEmuContainers()
})()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we add then/catch here, and log if errors?

jleni added a commit that referenced this pull request Jul 28, 2025
)

* feat: modernize Zemu codebase to Zondax standards (#547)

* modernize

* refactor: migrate to biome and standardize code style across project

* chore: update biome lint command and add fix script alias

* style: standardize quotes to single quotes and fix formatting

* refactor: replace forEach loops with for...of and remove unnecessary async keywords

* refactor: replace forEach loops with for...of and remove unnecessary async declarations

* ci: update workflow commands and disable coverage in ci-ts pipeline

* temporarily disable docker

* Modernize codebase (#548)

* activate testing in ci

* ci: remove frozen lockfile in CI workflow

* feat: implement container pooling and fix transport error handling (#551)

* feat: implement container pooling system for device emulation

* fix: address CodeRabbit review comments

- Fix YAML linting issues in GitHub workflow files
  - Remove trailing spaces in ci-ts.yaml
  - Add missing newlines at end of YAML files
- Fix typo: settingsNavnavLeftButton → settingsNavLeftButton
- Organize imports in containerPool.ts (external packages first)
- Remove unused model variable in cleanup() method
- Apply automated code formatting improvements

* fix: address additional CodeRabbit review comments

- Fix missing parentheses in tests/minapp/index.ts toString() call
- Add error handling to gRPC exchange promise chain
- Remove async modifier from globalsetup.ts (no await needed)
- Add race condition protection for pool initialization
- Document DEV_CERT_PRIVATE_KEY as development-only test key

* fix: resolve remaining CI issues and CodeRabbit comments

- Fix trailing spaces in .github/dependabot.yml
- Add missing await to async operations in tests/pullImageKillOld.ts
- Wrap async calls in IIFE to prevent CI hanging on cleanup

This should resolve the hanging CI build and address the remaining
CodeRabbit linting issues.

* fix: handle transport errors to fail fast instead of timing out

- Add error tracking in transport layer via proxy interceptor
- Create errors.ts with APDU status codes and error handling utilities
- Propagate critical transport errors in waitUntilScreenIs, waitUntilScreenIsNot, and waitForText
- Update getEvents to check for critical transport errors before making requests
- Add comprehensive tests to verify fast-fail behavior

Previously, when errors like 0x6984 (INVALID_DATA) or 0x6E00 (CLA_NOT_SUPPORTED) occurred,
Zemu would continue waiting for timeouts instead of failing immediately. Now these errors
are detected and cause immediate failure with clear error messages.

Fixes issue where transport errors were absorbed causing unnecessary timeout delays.

* fix: update error handling test to expect correct error code

The test was expecting 0x6984 (INVALID_DATA) but invalid CLA (0xFF) actually
triggers 0x6E00 (CLA_NOT_SUPPORTED) which is the correct behavior.

* fix: resolve linter issues

- Remove unused imports in test files
- Fix import organization in Zemu.ts
- Fix formatting (newlines at end of files)
- Apply Biome linter rules

All tests passing after linter fixes.

* feat: increase default container pool size to 2 for all devices

Changed default pool configuration from 1 to 2 containers per device type
(except nanos which was already 2). This provides better parallelism for
test suites that run multiple tests concurrently.

* fix: add explicit permissions to GitHub workflows

Address security warnings by adding explicit GITHUB_TOKEN permissions:
- ci-ts.yaml: contents: read (minimal permissions for CI)
- publish.yml: contents: read, packages: write (for npm publishing)

This follows GitHub security best practices to limit token permissions.

* fix: address CodeRabbit review comments

- Add missing newline at EOF in .github/workflows/publish.yml
- Replace console.log with proper error handling in buttons.ts
- Improve error handling in Zemu.ts close method with try-catch-finally
- Add setScrambleKey support to transport proxy for comprehensive error tracking
- Remove debugging console.log statements from tests
- Remove skipped test that was only for exploration
- Replace console.warn with process.stderr.write for cleaner logging

* test: align metadata and comments to CLA_NOT_SUPPORTED (0x6E00)

- Update test title to reference 0x6E00 (CLA_NOT_SUPPORTED) instead of 0x6984
- Update comment before transport.send to reference CLA_NOT_SUPPORTED
- Update comment on thrown error to reference CLA_NOT_SUPPORTED

This aligns test documentation with the actual error code (0x6E00) that
the test assertion validates, removing confusion about which error is
being tested.

* fix: correct property name from containerPooling to disablePool in tests

- Update tests/error-handling-simple.test.ts to use disablePool: true
- Update tests/error-handling.test.ts to use disablePool: true
- Update tests/error-handling-fixed.test.ts to use disablePool: true

The property name now matches the IStartOptions interface definition
which defines disablePool?: boolean, not containerPooling.

* fix: resolve ButtonKind errors for non-touch devices

- Create dummyButton constant to avoid calling getTouchElement for nanos/nanox/nanosp
- Add device type checks before calling getTouchElement in Zemu.ts navigation
- Fix TouchNavigation usage to check isTouchDevice before instantiation
- Non-touch devices now use ActionKind.RightClick with dummy button for navigation
- Remove hardcoded getTouchElement call in scheduleToNavElement function

This resolves 'Unsupported ButtonKind: nanos, 1' and 'Unsupported ButtonKind: nanos, 2' errors
when using ButtonKind.QuitAppButton and ButtonKind.SwipeContinueButton with Nano devices.

* feat: implement automatic container cleanup to prevent port conflicts

- Add cleanupStaleContainers() method to ContainerPool that runs before initialization
- Use Docker API to find and remove stale zemu containers automatically
- Enhanced global test setup with comprehensive process exit handlers
- Added handlers for SIGINT, SIGTERM, beforeExit, uncaughtException, and unhandledRejection
- Prevents port allocation conflicts by cleaning up leftover containers from previous runs
- No more manual container cleanup required - the library manages this automatically

This resolves port conflicts like 'Bind for 0.0.0.0:15000 failed: port is already allocated'
that occurred when containers weren't properly cleaned up from interrupted test runs.

* fix: add explicit support for prerelease tags while maintaining security

- Keep secure numeric-only patterns (rejecting unsafe CodeRabbit suggestion v*.*.*)
- Add explicit patterns for common prerelease formats:
  - v1.2.3-rc1, v1.2.3-rc2 (release candidates)
  - v1.2.3-beta.1, v1.2.3-beta.2 (beta releases)
  - v1.2.3-alpha.1, v1.2.3-alpha.2 (alpha releases)
- Maintain backwards compatibility with existing patterns
- General v[0-9]+.[0-9]+.[0-9]+-* pattern still covers other formats

This ensures -rc1 support while preventing dangerous patterns like v*.*.*
that could match malicious strings like vMALICIOUS.CODE.INJECTION-hack.

* refactor: simplify tag patterns with clear comments

- Use concise patterns with descriptive comments
- Main releases: v1, v1.2, v1.2.3
- Prereleases: v1.2.3-rc1, v1.2.3-beta.1, etc.
- Remove redundant explicit patterns - general v[0-9]+.[0-9]+.[0-9]+-* covers all prerelease formats
- Maintain security by requiring numeric versions only

* fix: improve error handling and code formatting in container pool initialization

* adjust github token permission settings

* feat: add comprehensive APDU error codes to errors module

- Add BUSY (0x9001) for device busy state
- Add EXECUTION_ERROR (0x6400) for execution failures
- Add EMPTY_BUFFER (0x6982) for empty buffer conditions
- Add OUTPUT_BUFFER_TOO_SMALL (0x6983) for buffer size issues
- Add TX_NOT_INITIALIZED (0x6987) for uninitialized transactions
- Add BAD_KEY_HANDLE (0x6A80) for invalid key handles
- Add SIGN_VERIFY_ERROR (0x6F01) for signature verification failures
- Update critical transport errors list with new error codes
- Add human-readable messages for all new error codes

* refactor: consolidate duplicate dummyButton definitions into single export

- Export dummyButton from buttons.ts module
- Remove duplicate definitions from Zemu.ts and actions.ts
- Update imports to use the shared dummyButton export
- Reduces code duplication and improves maintainability

* fix: update GitHub workflow permissions for npm publishing

- Change permissions from packages:write to id-token:write
- Required for npm provenance and OIDC authentication
- Addresses security recommendation from PR review

* docs: update README with latest features and improvements

- Update CI badge to point to new workflow (ci-ts.yaml)
- Add support for Stax and Flex devices in overview
- Update features list with container pooling and TypeScript support
- Add Vitest to supported test frameworks
- Include installation instructions with pnpm support
- Add quick start code example
- Highlight enhanced error handling capabilities
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants