Skip to content

Conversation

@misrasaurabh1
Copy link
Contributor

📄 338% (3.38x) speedup for _negotiate_grid_size in inference/core/utils/drawing.py

Saurabh's comments: Speeds up creation of tiles, which seems to be a core drawing operation
⏱️ Runtime : 251 microseconds 57.3 microseconds (best of 338 runs)

📝 Explanation and details

The optimization achieves a 338% speedup by replacing expensive operations with more efficient alternatives and eliminating loops:

Key optimizations:

  1. Replace math.ceil(np.sqrt()) with math.isqrt(): The original code used NumPy's square root followed by math ceiling, which is computationally expensive. The optimized version uses Python's math.isqrt() for efficient integer square root calculation, then adds 1 only when needed for non-perfect squares.

  2. Eliminate the while loop: Instead of iteratively decrementing proposed_rows until the grid fits, the optimized code directly calculates the required rows using ceiling division: (images_len + proposed_columns - 1) // proposed_columns. This removes the need for multiple iterations and condition checks.

  3. Cache len(images): Store the length in images_len variable to avoid repeated function calls.

Performance impact by test case:

  • Small inputs (≤3 images): Slight overhead (~1-13% slower) due to additional variable assignment, but these cases use the fast single-row path anyway
  • Medium to large inputs (≥4 images): Dramatic speedups of 300-500% because they avoid the expensive np.sqrt() + math.ceil() combination and the iterative loop
  • Perfect squares: Particularly benefit from math.isqrt() efficiency
  • Large datasets (900+ images): Consistent 300-400% improvements due to eliminating loop iterations

The optimization is most effective for the common case where images need to be arranged in a multi-row grid, transforming an O(sqrt(n)) iterative process into O(1) direct calculation.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 71 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import math
from typing import List, Tuple

import numpy as np
# imports
import pytest  # used for our unit tests
from inference.core.utils.drawing import _negotiate_grid_size

MAX_COLUMNS_FOR_SINGLE_ROW_GRID = 3
from inference.core.utils.drawing import _negotiate_grid_size

# unit tests

# Helper to create dummy images
def make_imgs(n):
    # Each image is a 1x1 numpy array
    return [np.zeros((1,1)) for _ in range(n)]

# 1. Basic Test Cases

def test_zero_images():
    # No images: should return (1,0)
    imgs = make_imgs(0)
    rows, cols = _negotiate_grid_size(imgs) # 464ns -> 510ns (9.02% slower)

def test_one_image():
    # One image: should return (1,1)
    imgs = make_imgs(1)
    rows, cols = _negotiate_grid_size(imgs) # 458ns -> 501ns (8.58% slower)

def test_two_images():
    # Two images: should return (1,2)
    imgs = make_imgs(2)
    rows, cols = _negotiate_grid_size(imgs) # 430ns -> 495ns (13.1% slower)

def test_three_images():
    # Three images: should return (1,3)
    imgs = make_imgs(3)
    rows, cols = _negotiate_grid_size(imgs) # 454ns -> 476ns (4.62% slower)

def test_four_images():
    # Four images: sqrt(4)=2, so 2x2 grid
    imgs = make_imgs(4)
    rows, cols = _negotiate_grid_size(imgs) # 8.65μs -> 1.25μs (591% faster)

def test_five_images():
    # Five images: ceil(sqrt(5))=3, try 3x3, but 3x2=6>=5, so rows=2, cols=3
    imgs = make_imgs(5)
    rows, cols = _negotiate_grid_size(imgs) # 5.47μs -> 994ns (451% faster)

def test_six_images():
    # Six images: ceil(sqrt(6))=3, try 3x3, 3x2=6>=6, so rows=2, cols=3
    imgs = make_imgs(6)
    rows, cols = _negotiate_grid_size(imgs) # 5.09μs -> 949ns (437% faster)

def test_seven_images():
    # Seven images: ceil(sqrt(7))=3, 3x3=9, 3x2=6<7, so rows=3, cols=3
    imgs = make_imgs(7)
    rows, cols = _negotiate_grid_size(imgs) # 4.73μs -> 959ns (393% faster)

def test_nine_images():
    # Nine images: sqrt(9)=3, so 3x3 grid
    imgs = make_imgs(9)
    rows, cols = _negotiate_grid_size(imgs) # 4.59μs -> 880ns (421% faster)

def test_ten_images():
    # Ten images: ceil(sqrt(10))=4, try 4x4=16, 4x3=12>=10, so rows=3, cols=4
    imgs = make_imgs(10)
    rows, cols = _negotiate_grid_size(imgs) # 4.71μs -> 926ns (408% faster)

# 2. Edge Test Cases

def test_max_columns_for_single_row_grid():
    # Exactly at MAX_COLUMNS_FOR_SINGLE_ROW_GRID
    imgs = make_imgs(MAX_COLUMNS_FOR_SINGLE_ROW_GRID)
    rows, cols = _negotiate_grid_size(imgs) # 470ns -> 477ns (1.47% slower)

def test_one_more_than_max_columns():
    # One more than MAX_COLUMNS_FOR_SINGLE_ROW_GRID
    imgs = make_imgs(MAX_COLUMNS_FOR_SINGLE_ROW_GRID + 1)
    nearest_sqrt = math.ceil(np.sqrt(MAX_COLUMNS_FOR_SINGLE_ROW_GRID + 1))
    # Should not be single row
    rows, cols = _negotiate_grid_size(imgs) # 1.98μs -> 1.02μs (93.9% faster)

def test_perfect_square_images():
    # 16 images: perfect square, should be 4x4
    imgs = make_imgs(16)
    rows, cols = _negotiate_grid_size(imgs) # 4.22μs -> 945ns (346% faster)

def test_just_over_perfect_square():
    # 17 images: ceil(sqrt(17))=5, 5x5=25, 5x4=20>=17, so rows=4, cols=5
    imgs = make_imgs(17)
    rows, cols = _negotiate_grid_size(imgs) # 4.75μs -> 942ns (404% faster)

def test_one_less_than_perfect_square():
    # 15 images: ceil(sqrt(15))=4, 4x4=16, 4x3=12<15, so rows=4, cols=4
    imgs = make_imgs(15)
    rows, cols = _negotiate_grid_size(imgs) # 4.49μs -> 955ns (370% faster)

def test_large_prime_number_images():
    # 997 images: prime, ceil(sqrt(997))=32, 32x32=1024, 32x31=992<997, so rows=32, cols=32
    imgs = make_imgs(997)
    rows, cols = _negotiate_grid_size(imgs) # 5.98μs -> 1.34μs (347% faster)

def test_large_even_number_images():
    # 1000 images: ceil(sqrt(1000))=32, 32x32=1024, 32x31=992<1000, so rows=32, cols=32
    imgs = make_imgs(1000)
    rows, cols = _negotiate_grid_size(imgs) # 6.04μs -> 1.30μs (365% faster)

def test_large_just_over_square():
    # 961 images: sqrt(961)=31, so 31x31 grid
    imgs = make_imgs(961)
    rows, cols = _negotiate_grid_size(imgs) # 6.49μs -> 1.37μs (375% faster)

def test_large_just_under_square():
    # 960 images: ceil(sqrt(960))=31, 31x31=961, 31x30=930<960, so rows=31, cols=31
    imgs = make_imgs(960)
    rows, cols = _negotiate_grid_size(imgs) # 6.61μs -> 1.25μs (431% faster)

# 3. Large Scale Test Cases

def test_large_scale_minimal_grid():
    # 999 images: ceil(sqrt(999))=32, 32x32=1024, 32x31=992<999, so rows=32, cols=32
    imgs = make_imgs(999)
    rows, cols = _negotiate_grid_size(imgs) # 6.21μs -> 1.26μs (391% faster)

def test_large_scale_maximal_grid():
    # 1000 images: ceil(sqrt(1000))=32, 32x32=1024, 32x31=992<1000, so rows=32, cols=32
    imgs = make_imgs(1000)
    rows, cols = _negotiate_grid_size(imgs) # 5.82μs -> 1.32μs (341% faster)

def test_large_scale_lower_bound():
    # 901 images: ceil(sqrt(901))=30, 30x30=900<901, so rows=31, cols=31
    imgs = make_imgs(901)
    rows, cols = _negotiate_grid_size(imgs) # 6.57μs -> 1.28μs (412% faster)

def test_large_scale_upper_bound():
    # 999 images: as above, rows=32, cols=32
    imgs = make_imgs(999)
    rows, cols = _negotiate_grid_size(imgs) # 6.13μs -> 1.21μs (407% faster)

def test_large_scale_random_sizes():
    # Try a range of sizes from 950 to 999
    for n in [950, 960, 970, 980, 990, 999]:
        imgs = make_imgs(n)
        rows, cols = _negotiate_grid_size(imgs) # 12.9μs -> 3.13μs (312% faster)

# Additional edge case: test with empty arrays (should behave as with zero images)
def test_empty_arrays():
    imgs = []
    rows, cols = _negotiate_grid_size(imgs) # 504ns -> 562ns (10.3% slower)

# Additional edge case: test with non-square images (should not affect grid size)
def test_non_square_images():
    imgs = [np.zeros((2,3)) for _ in range(7)]
    rows, cols = _negotiate_grid_size(imgs) # 6.95μs -> 1.20μs (477% faster)

# Additional edge case: test with images of different shapes
def test_different_shapes():
    imgs = [np.zeros((i+1, i+2)) for i in range(10)]
    rows, cols = _negotiate_grid_size(imgs) # 5.00μs -> 937ns (434% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import math
from typing import List, Tuple

import numpy as np
# imports
import pytest  # used for our unit tests
from inference.core.utils.drawing import _negotiate_grid_size

MAX_COLUMNS_FOR_SINGLE_ROW_GRID = 3
from inference.core.utils.drawing import _negotiate_grid_size


# Helper function to create dummy images
def make_images(n):
    # Each image is a 1x1 array; content is irrelevant for grid sizing
    return [np.zeros((1, 1)) for _ in range(n)]

# ---------------- BASIC TEST CASES ----------------

def test_empty_list_returns_1_0():
    # Test with zero images
    images = make_images(0)
    rows, cols = _negotiate_grid_size(images) # 472ns -> 490ns (3.67% slower)

def test_one_image_returns_1_1():
    # Test with one image
    images = make_images(1)
    rows, cols = _negotiate_grid_size(images) # 449ns -> 458ns (1.97% slower)

def test_two_images_returns_1_2():
    # Test with two images
    images = make_images(2)
    rows, cols = _negotiate_grid_size(images) # 469ns -> 478ns (1.88% slower)

def test_three_images_returns_1_3():
    # Test with three images
    images = make_images(3)
    rows, cols = _negotiate_grid_size(images) # 454ns -> 458ns (0.873% slower)

def test_four_images_returns_2_2():
    # Test with four images (should be a 2x2 grid)
    images = make_images(4)
    rows, cols = _negotiate_grid_size(images) # 7.11μs -> 1.18μs (504% faster)

def test_five_images_returns_2_3():
    # Test with five images (should be 2 rows, 3 columns)
    images = make_images(5)
    rows, cols = _negotiate_grid_size(images) # 5.27μs -> 944ns (458% faster)

def test_six_images_returns_2_3():
    # Test with six images (should be 2 rows, 3 columns)
    images = make_images(6)
    rows, cols = _negotiate_grid_size(images) # 4.74μs -> 954ns (397% faster)

def test_seven_images_returns_3_3():
    # Test with seven images (should be 3 rows, 3 columns)
    images = make_images(7)
    rows, cols = _negotiate_grid_size(images) # 4.64μs -> 926ns (401% faster)

def test_eight_images_returns_3_3():
    # Test with eight images (should be 3 rows, 3 columns)
    images = make_images(8)
    rows, cols = _negotiate_grid_size(images) # 4.59μs -> 871ns (427% faster)

def test_nine_images_returns_3_3():
    # Test with nine images (perfect square)
    images = make_images(9)
    rows, cols = _negotiate_grid_size(images) # 4.64μs -> 923ns (403% faster)

# ---------------- EDGE TEST CASES ----------------

def test_max_columns_for_single_row_grid_boundary():
    # Test at the boundary of MAX_COLUMNS_FOR_SINGLE_ROW_GRID
    images = make_images(MAX_COLUMNS_FOR_SINGLE_ROW_GRID)
    rows, cols = _negotiate_grid_size(images) # 458ns -> 470ns (2.55% slower)

def test_just_above_max_columns_for_single_row_grid():
    # Test just above the boundary
    images = make_images(MAX_COLUMNS_FOR_SINGLE_ROW_GRID + 1)
    rows, cols = _negotiate_grid_size(images) # 5.15μs -> 984ns (423% faster)

def test_perfect_square_images():
    # Test with a perfect square number of images
    for n in [4, 9, 16, 25]:
        images = make_images(n)
        rows, cols = _negotiate_grid_size(images) # 8.16μs -> 1.97μs (314% faster)
        sqrt_n = int(math.sqrt(n))

def test_one_less_than_perfect_square():
    # Test with one less than a perfect square
    for n in [3, 8, 15, 24]:
        images = make_images(n)
        sqrt_n = math.ceil(math.sqrt(n))
        rows, cols = _negotiate_grid_size(images) # 6.55μs -> 1.85μs (254% faster)

def test_one_more_than_perfect_square():
    # Test with one more than a perfect square
    for n in [5, 10, 17, 26]:
        images = make_images(n)
        sqrt_n = math.ceil(math.sqrt(n))
        rows, cols = _negotiate_grid_size(images) # 7.84μs -> 1.82μs (331% faster)

def test_large_prime_number_images():
    # Test with a large prime number of images (to check non-square, non-divisible)
    n = 97
    images = make_images(n)
    rows, cols = _negotiate_grid_size(images) # 4.01μs -> 893ns (349% faster)

def test_images_length_is_zero():
    # Edge case: zero images
    images = []
    rows, cols = _negotiate_grid_size(images) # 453ns -> 470ns (3.62% slower)

def test_negative_images_should_raise():
    # Negative number of images is not possible; simulate by passing negative-length list
    # This is not possible in Python, but test that negative is not accepted
    # Instead, we can check that the function does not crash for empty list
    images = []
    rows, cols = _negotiate_grid_size(images) # 439ns -> 465ns (5.59% slower)

def test_non_square_but_even_distribution():
    # Test for 12 images, should be 3x4 grid
    images = make_images(12)
    rows, cols = _negotiate_grid_size(images) # 6.59μs -> 1.07μs (517% faster)

def test_large_gap_between_rows_and_columns():
    # Test for 20 images, should be 4x5 grid
    images = make_images(20)
    rows, cols = _negotiate_grid_size(images) # 4.95μs -> 969ns (410% faster)

# ---------------- LARGE SCALE TEST CASES ----------------

def test_large_number_of_images_perfect_square():
    # Test with a large perfect square number of images
    n = 961  # 31x31
    images = make_images(n)
    rows, cols = _negotiate_grid_size(images) # 5.95μs -> 1.29μs (360% faster)

def test_large_number_of_images_non_square():
    # Test with a large non-square number of images
    n = 999
    images = make_images(n)
    rows, cols = _negotiate_grid_size(images) # 6.27μs -> 1.33μs (373% faster)

def test_large_number_of_images_one_less_than_square():
    # Test with one less than a large perfect square
    n = 960  # 961 - 1
    images = make_images(n)
    rows, cols = _negotiate_grid_size(images) # 6.38μs -> 1.25μs (410% faster)

def test_large_number_of_images_one_more_than_square():
    # Test with one more than a large perfect square
    n = 962  # 961 + 1
    images = make_images(n)
    rows, cols = _negotiate_grid_size(images) # 6.39μs -> 1.29μs (395% faster)

def test_maximum_allowed_images():
    # Test with the maximum allowed images (1000)
    n = 1000
    images = make_images(n)
    rows, cols = _negotiate_grid_size(images) # 6.47μs -> 1.27μs (409% faster)

# ---------------- DETERMINISM AND CONSISTENCY ----------------

def test_determinism_multiple_calls():
    # Ensure multiple calls with same input return same output
    images = make_images(23)
    codeflash_output = _negotiate_grid_size(images); result1 = codeflash_output # 5.81μs -> 1.23μs (374% faster)
    codeflash_output = _negotiate_grid_size(images); result2 = codeflash_output # 1.38μs -> 415ns (232% faster)

def test_different_inputs_give_different_outputs():
    # Ensure different lengths give different grid sizes
    images1 = make_images(10)
    images2 = make_images(11)
    codeflash_output = _negotiate_grid_size(images1) # 4.41μs -> 905ns (388% faster)

# ---------------- SANITY CHECKS ----------------

def test_output_types():
    # Ensure output types are always int
    images = make_images(7)
    rows, cols = _negotiate_grid_size(images) # 4.19μs -> 899ns (366% faster)
    images = make_images(0)
    rows, cols = _negotiate_grid_size(images) # 403ns -> 360ns (11.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_negotiate_grid_size-mh2mp7zg and push.

Codeflash

The optimization achieves a **338% speedup** by replacing expensive operations with more efficient alternatives and eliminating loops:

**Key optimizations:**

1. **Replace `math.ceil(np.sqrt())` with `math.isqrt()`**: The original code used NumPy's square root followed by math ceiling, which is computationally expensive. The optimized version uses Python's `math.isqrt()` for efficient integer square root calculation, then adds 1 only when needed for non-perfect squares.

2. **Eliminate the while loop**: Instead of iteratively decrementing `proposed_rows` until the grid fits, the optimized code directly calculates the required rows using ceiling division: `(images_len + proposed_columns - 1) // proposed_columns`. This removes the need for multiple iterations and condition checks.

3. **Cache `len(images)`**: Store the length in `images_len` variable to avoid repeated function calls.

**Performance impact by test case:**
- **Small inputs (≤3 images)**: Slight overhead (~1-13% slower) due to additional variable assignment, but these cases use the fast single-row path anyway
- **Medium to large inputs (≥4 images)**: Dramatic speedups of 300-500% because they avoid the expensive `np.sqrt()` + `math.ceil()` combination and the iterative loop
- **Perfect squares**: Particularly benefit from `math.isqrt()` efficiency
- **Large datasets (900+ images)**: Consistent 300-400% improvements due to eliminating loop iterations

The optimization is most effective for the common case where images need to be arranged in a multi-row grid, transforming an O(sqrt(n)) iterative process into O(1) direct calculation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant