Skip to content

Feat: implement dynamic batch sizing that scales with VRAM per worker#1034

Open
bazzi24 wants to merge 1 commit into
datalab-to:masterfrom
bazzi24:feat-balance-memory
Open

Feat: implement dynamic batch sizing that scales with VRAM per worker#1034
bazzi24 wants to merge 1 commit into
datalab-to:masterfrom
bazzi24:feat-balance-memory

Conversation

@bazzi24
Copy link
Copy Markdown

@bazzi24 bazzi24 commented May 5, 2026

Dynamic Batch Sizing Based on VRAM

This PR implements dynamic batch size scaling in marker/utils/batch.py to optimize performance across different GPU configurations. Previously, batch sizes were hardcoded for multi-worker scenarios and not optimized for single-worker high-VRAM setups.

Problem

The original get_batch_sizes_worker_counts() function returned fixed batch sizes when workers > 1, but:

  • High-VRAM single-GPU setups (e.g., 24GB, 48GB, 80GB) got no batch size optimization
  • Multi-worker configurations with different VRAM capacities all used the same batch sizes regardless of actual VRAM per worker
  • No visibility into what configuration was chosen at runtime

Solution

The function now calculates batch sizes dynamically by:

1/ Scaling factor calculation: scale = vram_per_worker / 7.0

  • 7GB is the reference VRAM per worker for which batch sizes were originally calibrated

2/ Proportional scaling: All batch sizes multiplied by scale factor

  • layout_batch_size: 12 → scaled
  • detection_batch_size: 8 → scaled
  • recognition_batch_size: 64 → scaled
  • etc.

3/ Safety caps:

  • Minimum scale: 1.0x (never go below baseline)
  • Maximum scale: 4.0x (prevent OOM from aggressive batching)

4/ Single-worker high-VRAM support:

  • If VRAM > 10GB for single worker, apply scaling
  • Otherwise return empty dict (use existing defaults for backward compatibility)

5/ Minimum batch sizes:

  • Ensures each model has sensible lower bounds (2-4) to avoid degenerate cases

6/ Logging:

  • INFO: Shows VRAM, workers, VRAM/worker, and scale factor
  • DEBUG: Shows full batch size dictionary

Example Configurations

GPU Setup Workers VRAM/Worker Scale layout_batch_size
14GB (2x7) 2 7.0GB 1.0x 12
16GB (2x8) 2 8.0GB 1.14x 14
24GB (1x24) 1 24.0GB 3.43x 41
48GB (1x48) 1 48.0GB 4.0x (capped) 48
80GB (1x80) 1 80.0GB 4.0x (capped) 48
80GB (11x7.3) 11 7.3GB 1.04x 12-13

Backward Compatibility

  • Single worker with ≤10GB VRAM returns {} as before (uses existing defaults in each builder)
  • Multi-worker configurations maintain the same baseline (1.0x scale) when VRAM/worker ≈ 7GB
  • All existing configuration options (--config_json overrides) still take precedence

Testing

Added comprehensive test suite in tests/utils/test_batch.py:

  • 18 test cases covering single-worker, multi-worker, edge cases
  • Tests for scaling calculations, minimum batch sizes, CPU worker counts
  • Edge cases: zero VRAM, negative VRAM, fractional workers

Performance Impact

  • High-VRAM single GPU: Significant throughput improvement (3-4x batch size → ~3-4x throughput)
  • Standard multi-GPU: No change from baseline
  • Mixed VRAM setups: Slight automatic optimization based on actual VRAM per worker

Files Changed

  • marker/utils/batch.py - Core implementation
  • tests/utils/test_batch.py - Test suite (new)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@bazzi24
Copy link
Copy Markdown
Author

bazzi24 commented May 5, 2026

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant