-
Notifications
You must be signed in to change notification settings - Fork 1.1k
test codeflash #4098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test codeflash #4098
Conversation
| for shape in shapes: | ||
| b = [x * x for x in range(200)] | ||
| # Inefficient: Calculating pi in every iteration | ||
| pi = 3.14159265359 | ||
| if hasattr(shape, "type") and shape.type == "circle": | ||
| total_area += ( | ||
| pi * (shape.radius**2) * len(b) | ||
| ) # Use len(b) instead of multiplying by the list | ||
| elif hasattr(shape, "type") and shape.type == "square": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 4,897% (48.97x) speedup for calculate_total_area in unstructured/documents/mappings.py
⏱️ Runtime : 65.5 milliseconds → 1.31 milliseconds (best of 238 runs)
📝 Explanation and details
The optimized code achieves a 4897% speedup by eliminating the most expensive operations from the loop:
Key optimizations:
-
Removed list comprehension from loop: The original code created
b = [x * x for x in range(200)]in every iteration, which was the primary bottleneck (78.6% of runtime). The optimized version replaceslen(b)with the constant200, eliminating this O(n) operation entirely. -
Moved pi calculation outside loop: Instead of recalculating
pi = 3.14159265359in every iteration (3.6% of runtime), it's now computed once before the loop starts. -
Optimized attribute access: Replaced two
hasattr(shape, "type")calls andshape.typeaccesses per iteration with a singlegetattr(shape, "type", None)call, reducing attribute lookup overhead.
Performance impact by test case type:
- Single shape tests: 500-625% faster - benefit mainly from eliminating the list comprehension
- Multiple shape tests: 735-1305% faster - compound benefits as the loop runs more iterations
- Large scale tests (1000+ shapes): 4600-5400% faster - massive gains due to eliminating the O(n) list creation that was executed thousands of times
The optimization is most effective for workloads with many shapes, where the eliminated list comprehension was being executed repeatedly. The changes maintain identical functionality while dramatically reducing computational complexity from O(n*m) to O(n), where n is the number of shapes and m is the list size (200).
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⏪ Replay Tests | 🔘 None Found |
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | ✅ 2 Passed |
| 🌀 Generated Regression Tests | ✅ 43 Passed |
| 📊 Tests Coverage | 100.0% |
🔎 Concolic Coverage Tests and Runtime
🌀 Generated Regression Tests and Runtime
from typing import Any, List
# imports
import pytest # used for our unit tests
from unstructured.documents.mappings import calculate_total_area
# Helper classes for shapes
class Circle:
def __init__(self, radius: float):
self.type = "circle"
self.radius = radius
class Square:
def __init__(self, side: float):
self.type = "square"
self.side = side
class Rectangle:
def __init__(self, width: float, height: float):
self.type = "rectangle"
self.width = width
self.height = height
class NoTypeShape:
def __init__(self, radius: float):
self.radius = radius
# ===========================
# Basic Test Cases
# ===========================
def test_empty_list_returns_zero():
# Test with empty list
codeflash_output = calculate_total_area([]) # 611ns -> 622ns (1.77% slower)
def test_none_returns_zero():
# Test with None input
codeflash_output = calculate_total_area(None) # 741ns -> 721ns (2.77% faster)
def test_single_circle_area():
# Test with one circle of radius 1
expected = 3.14159265359 * 1**2 * 200
codeflash_output = pytest.approx(calculate_total_area([Circle(1)]), rel=1e-12) # 10.0μs -> 1.53μs (554% faster)
def test_single_square_area():
# Test with one square of side 2
expected = 2**2
codeflash_output = calculate_total_area([Square(2)]) # 10.0μs -> 1.38μs (625% faster)
def test_multiple_shapes():
# Test with multiple shapes: one circle (r=1), one square (s=2)
expected = 3.14159265359 * 1**2 * 200 + 2**2
codeflash_output = pytest.approx(calculate_total_area([Circle(1), Square(2)]), rel=1e-12) # 18.2μs -> 1.83μs (894% faster)
def test_multiple_circles_and_squares():
# Test with two circles (r=2, r=3) and two squares (s=4, s=5)
expected = (
3.14159265359 * 2**2 * 200 +
3.14159265359 * 3**2 * 200 +
4**2 +
5**2
)
shapes = [Circle(2), Circle(3), Square(4), Square(5)]
codeflash_output = pytest.approx(calculate_total_area(shapes), rel=1e-12) # 32.1μs -> 2.29μs (1305% faster)
# ===========================
# Edge Test Cases
# ===========================
def test_circle_with_zero_radius():
# Circle with radius 0 should contribute 0 area
codeflash_output = calculate_total_area([Circle(0)]) # 9.54μs -> 1.25μs (662% faster)
def test_square_with_zero_side():
# Square with side 0 should contribute 0 area
codeflash_output = calculate_total_area([Square(0)]) # 9.75μs -> 1.31μs (642% faster)
def test_negative_radius_circle():
# Circle with negative radius should still calculate area (radius squared)
expected = 3.14159265359 * (-2)**2 * 200
codeflash_output = pytest.approx(calculate_total_area([Circle(-2)]), rel=1e-12) # 9.42μs -> 1.30μs (623% faster)
def test_negative_side_square():
# Square with negative side should still calculate area (side squared)
expected = (-3)**2
codeflash_output = calculate_total_area([Square(-3)]) # 9.63μs -> 1.33μs (623% faster)
def test_shape_without_type_attribute():
# Shape without 'type' attribute should be ignored
shapes = [NoTypeShape(5)]
codeflash_output = calculate_total_area(shapes) # 8.81μs -> 1.00μs (779% faster)
def test_shape_with_unrecognized_type():
# Shape with type not 'circle' or 'square' should be ignored
shapes = [Rectangle(2, 3)]
codeflash_output = calculate_total_area(shapes) # 8.93μs -> 781ns (1043% faster)
def test_shape_with_none_attributes():
# Shape with type, but missing required attributes
class BadCircle:
def __init__(self):
self.type = "circle"
shapes = [BadCircle()]
with pytest.raises(AttributeError):
calculate_total_area(shapes) # 9.23μs -> 2.58μs (257% faster)
def test_shape_is_integer():
# List containing an integer as a shape
shapes = [42]
codeflash_output = calculate_total_area(shapes) # 9.33μs -> 1.17μs (696% faster)
def test_shape_is_dict():
# List containing a dict as a shape
shapes = [{"type": "circle", "radius": 2}]
codeflash_output = calculate_total_area(shapes) # 9.21μs -> 1.04μs (784% faster)
def test_shape_with_extra_attributes():
# Circle with extra attributes should still work
class FancyCircle:
def __init__(self, radius):
self.type = "circle"
self.radius = radius
self.color = "red"
shapes = [FancyCircle(1)]
expected = 3.14159265359 * 1**2 * 200
codeflash_output = pytest.approx(calculate_total_area(shapes), rel=1e-12) # 10.1μs -> 1.52μs (562% faster)
# ===========================
# Large Scale Test Cases
# ===========================
def test_many_circles():
# Test with 1000 circles of radius 1
shapes = [Circle(1) for _ in range(1000)]
expected = 1000 * 3.14159265359 * 1**2 * 200
codeflash_output = pytest.approx(calculate_total_area(shapes), rel=1e-12) # 6.88ms -> 138μs (4869% faster)
def test_many_squares():
# Test with 1000 squares of side 2
shapes = [Square(2) for _ in range(1000)]
expected = 1000 * 2**2
codeflash_output = calculate_total_area(shapes) # 6.94ms -> 125μs (5452% faster)
def test_mixed_large_shapes():
# 500 circles (r=1) + 500 squares (s=2)
shapes = [Circle(1)] * 500 + [Square(2)] * 500
expected = 500 * 3.14159265359 * 1**2 * 200 + 500 * 2**2
codeflash_output = pytest.approx(calculate_total_area(shapes), rel=1e-12) # 6.80ms -> 135μs (4930% faster)
def test_large_varied_shapes():
# 333 circles (r=2), 333 squares (s=3), 334 circles (r=1)
shapes = [Circle(2)] * 333 + [Square(3)] * 333 + [Circle(1)] * 334
expected = (
333 * 3.14159265359 * 2**2 * 200 +
333 * 3**2 +
334 * 3.14159265359 * 1**2 * 200
)
codeflash_output = pytest.approx(calculate_total_area(shapes), rel=1e-12) # 6.81ms -> 137μs (4852% faster)
def test_large_list_with_invalid_shapes():
# 995 valid circles, 5 invalid shapes
shapes = [Circle(1)] * 995 + [Rectangle(2, 3)] * 5
expected = 995 * 3.14159265359 * 1**2 * 200
codeflash_output = pytest.approx(calculate_total_area(shapes), rel=1e-12) # 6.73ms -> 140μs (4680% faster)
def test_large_list_with_mixed_types():
# 500 circles, 250 squares, 100 rectangles, 100 dicts, 50 NoTypeShape
shapes = (
[Circle(1)] * 500 +
[Square(2)] * 250 +
[Rectangle(2, 3)] * 100 +
[{"type": "circle", "radius": 2}] * 100 +
[NoTypeShape(5)] * 50
)
expected = 500 * 3.14159265359 * 1**2 * 200 + 250 * 2**2
codeflash_output = pytest.approx(calculate_total_area(shapes), rel=1e-12) # 6.70ms -> 122μs (5379% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Any, List
# imports
import pytest # used for our unit tests
from unstructured.documents.mappings import calculate_total_area
# Helper classes for test shapes
class Circle:
def __init__(self, radius):
self.type = "circle"
self.radius = radius
class Square:
def __init__(self, side):
self.type = "square"
self.side = side
class WeirdShape:
def __init__(self):
self.type = "triangle"
self.base = 3
self.height = 4
class NoTypeShape:
def __init__(self):
self.radius = 5
# ---------------------------
# Basic Test Cases
# ---------------------------
def test_empty_list_returns_zero():
# Test with empty list
codeflash_output = calculate_total_area([]) # 581ns -> 601ns (3.33% slower)
def test_none_returns_zero():
# Test with None input
codeflash_output = calculate_total_area(None) # 721ns -> 722ns (0.139% slower)
def test_single_square_area():
# Test with a single square
sq = Square(5)
codeflash_output = calculate_total_area([sq]) # 9.92μs -> 1.63μs (507% faster)
def test_single_circle_area():
# Test with a single circle
c = Circle(2)
# Area = pi * r^2 * len(b) where len(b) == 200
expected = 3.14159265359 * (2**2) * 200
codeflash_output = pytest.approx(calculate_total_area([c]), rel=1e-12) # 9.94μs -> 1.66μs (498% faster)
def test_multiple_squares():
# Test with multiple squares
squares = [Square(1), Square(2), Square(3)]
expected = 1*1 + 2*2 + 3*3
codeflash_output = calculate_total_area(squares) # 24.3μs -> 2.09μs (1061% faster)
def test_multiple_circles():
# Test with multiple circles
circles = [Circle(1), Circle(2)]
expected = 3.14159265359 * (1**2) * 200 + 3.14159265359 * (2**2) * 200
codeflash_output = pytest.approx(calculate_total_area(circles), rel=1e-12) # 17.4μs -> 1.94μs (793% faster)
def test_mixed_shapes():
# Test with a mix of circles and squares
shapes = [Square(2), Circle(1)]
expected = 4 + 3.14159265359 * (1**2) * 200
codeflash_output = pytest.approx(calculate_total_area(shapes), rel=1e-12) # 17.4μs -> 2.08μs (735% faster)
# ---------------------------
# Edge Test Cases
# ---------------------------
def test_square_with_zero_side():
# Test with a square of side 0
sq = Square(0)
codeflash_output = calculate_total_area([sq]) # 9.37μs -> 1.43μs (554% faster)
def test_circle_with_zero_radius():
# Test with a circle of radius 0
c = Circle(0)
codeflash_output = calculate_total_area([c]) # 9.61μs -> 1.50μs (539% faster)
def test_negative_square_side():
# Test with a square of negative side
sq = Square(-3)
# Area = (-3)^2 = 9
codeflash_output = calculate_total_area([sq]) # 9.39μs -> 1.35μs (594% faster)
def test_negative_circle_radius():
# Test with a circle of negative radius
c = Circle(-4)
# Area = pi * (-4)^2 * 200
expected = 3.14159265359 * (16) * 200
codeflash_output = pytest.approx(calculate_total_area([c]), rel=1e-12) # 9.59μs -> 1.45μs (560% faster)
def test_shape_with_missing_type():
# Shape without 'type' attribute should be ignored
s = NoTypeShape()
codeflash_output = calculate_total_area([s]) # 8.94μs -> 1.10μs (711% faster)
def test_shape_with_unknown_type():
# Shape with unknown 'type' should be ignored
s = WeirdShape()
codeflash_output = calculate_total_area([s]) # 8.97μs -> 832ns (978% faster)
def test_shape_with_extra_attributes():
# Shape with extra attributes but correct type
class FancySquare(Square):
def __init__(self, side):
super().__init__(side)
self.color = "red"
sq = FancySquare(4)
codeflash_output = calculate_total_area([sq]) # 9.69μs -> 1.41μs (586% faster)
def test_shape_with_missing_attributes():
# Circle missing radius attribute should raise AttributeError
class BadCircle:
def __init__(self):
self.type = "circle"
shapes = [BadCircle()]
with pytest.raises(AttributeError):
calculate_total_area(shapes) # 9.14μs -> 2.52μs (262% faster)
def test_shape_with_non_numeric_attributes():
# Circle with non-numeric radius should raise TypeError
class BadCircle:
def __init__(self):
self.type = "circle"
self.radius = "not_a_number"
shapes = [BadCircle()]
with pytest.raises(TypeError):
calculate_total_area(shapes) # 9.23μs -> 2.44μs (278% faster)
def test_shape_is_not_an_object():
# Passing a dict instead of an object should be ignored
shapes = [{"type": "circle", "radius": 5}]
codeflash_output = calculate_total_area(shapes) # 8.93μs -> 1.06μs (741% faster)
# ---------------------------
# Large Scale Test Cases
# ---------------------------
def test_large_number_of_squares():
# Test with 1000 squares of side 1
squares = [Square(1) for _ in range(1000)]
expected = 1000 * 1
codeflash_output = calculate_total_area(squares) # 6.89ms -> 124μs (5423% faster)
def test_large_number_of_circles():
# Test with 500 circles of radius 2
circles = [Circle(2) for _ in range(500)]
expected = 500 * 3.14159265359 * (2**2) * 200
codeflash_output = pytest.approx(calculate_total_area(circles), rel=1e-12) # 3.49ms -> 71.4μs (4788% faster)
def test_large_mixed_shapes():
# 500 squares (side 2), 500 circles (radius 1)
squares = [Square(2) for _ in range(500)]
circles = [Circle(1) for _ in range(500)]
expected = 500 * 4 + 500 * 3.14159265359 * (1**2) * 200
codeflash_output = pytest.approx(calculate_total_area(squares + circles), rel=1e-12) # 6.92ms -> 133μs (5076% faster)
def test_performance_large_input():
# Test that function runs efficiently for 1000 shapes
import time
shapes = [Square(1) for _ in range(500)] + [Circle(2) for _ in range(500)]
start = time.time()
codeflash_output = calculate_total_area(shapes); result = codeflash_output # 7.03ms -> 132μs (5215% faster)
end = time.time()
# Check correctness
expected = 500 * 1 + 500 * 3.14159265359 * (2**2) * 200
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from unstructured.documents.mappings import calculate_total_area
def test_calculate_total_area():
calculate_total_area(shapes=[0])
def test_calculate_total_area_2():
calculate_total_area(shapes=None)To test or edit this optimization locally git merge codeflash/optimize-pr4098-2025-09-23T20.06.24
| for shape in shapes: | |
| b = [x * x for x in range(200)] | |
| # Inefficient: Calculating pi in every iteration | |
| pi = 3.14159265359 | |
| if hasattr(shape, "type") and shape.type == "circle": | |
| total_area += ( | |
| pi * (shape.radius**2) * len(b) | |
| ) # Use len(b) instead of multiplying by the list | |
| elif hasattr(shape, "type") and shape.type == "square": | |
| # Inefficient: Calculating pi in every iteration | |
| pi = 3.14159265359 | |
| for shape in shapes: | |
| shape_type = getattr(shape, "type", None) | |
| if shape_type == "circle": | |
| total_area += ( | |
| pi * (shape.radius**2) * 200 | |
| ) # Use len(b) instead of multiplying by the list | |
| elif shape_type == "square": |
No description provided.