-
Notifications
You must be signed in to change notification settings - Fork 2k
feat(annotators): enhance label annotators with frame boundary adjust… #1820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…ments and new base class - ensures labels are within frame - May have a few overlaps at edges,in very busy frames, when smart_pos is enabled. but running smart_pos first yields better results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @hidara2000 thank you for this awesome PR
I made my first initials quick comments about certain change, Let me also test as well.
Co-authored-by: Onuralp SEZER <[email protected]>
Co-authored-by: Onuralp SEZER <[email protected]>
Co-authored-by: Onuralp SEZER <[email protected]>
Makes sense. Changes ticked off. Cheers for a great tool! |
Hi @hidara2000 👋🏻 Huge thanks for deciding to submit a PR to introduce this change! I have a couple of points I'd like to discuss before I dive deeper into the PR review: Wouldn't it be a better approach to keep the
For some time now, I've wanted to add support for multiline labels / label wrapping. Considering you're completely rewriting both label annotators, would you be willing to add support for multiline labels / label wrapping as part of this PR? |
📝 Add Multiline Text Support to Label Annotators🔄 Updates to Previous PRThis extends my previous PR that added frame boundary adjustments by incorporating support for multiline text in label annotators. The implementation now properly handles both newlines in text and automatic text wrapping. ✨ New Features
🛠️ Implementation Details
📊 Before/After Comparison📚 Usage Example# Create a label annotator with multiline text support
label_annotator = sv.LabelAnnotator(
text_padding=10,
smart_position=True, # Works with existing smart positioning
max_line_length=20 # Enable text wrapping at 20 characters
)
# Labels can have manual newlines or will auto-wrap
labels = [
"Car\nLicense: ABC-123", # Manual newlines
"This is a very long label that will be wrapped automatically" # Auto-wrapped
]
# Use as normal
annotated_image = label_annotator.annotate(
scene=image,
detections=detections,
labels=labels
) 🧪 Test CodeHere's the code I used to test the multiline text support: def process_image_with_supervision(
image: np.ndarray,
display_image: bool = True,
text_position: sv.Position = sv.Position.TOP_LEFT,
smart_position: bool = False,
detections: Optional[sv.Detections] = None,
) -> None:
# 1. Simulate YOLO model output or use provided
if detections is None:
bounding_boxes, confidence_scores, class_labels = generate_mock_yolo_output(
image.shape
)
detections = sv.Detections(
xyxy=bounding_boxes,
confidence=confidence_scores,
class_id=class_labels,
)
# 2. Create annotators
box_annotator = BoxAnnotator(thickness=2)
class_names = ["This is\na\ncar", "This is a really really really long label"]
label_annotator_smart = LabelAnnotator(
text_scale=0.5,
text_thickness=1,
text_padding=5,
text_position=text_position,
smart_position=True,
max_line_length=12, # Enable text wrapping at 12 characters
)
label_annotator_not_smart = LabelAnnotator(
text_scale=0.5,
text_thickness=1,
text_padding=5,
text_position=text_position,
smart_position=False,
)
# 3. Annotate the image with both configurations
annotated_image_smart = box_annotator.annotate(image.copy(), detections=detections)
labels_smart = [
f"{class_names[int(class_id)]} {confidence:.2f}"
for _, _, confidence, class_id, *_ in detections
]
annotated_image_smart = label_annotator_smart.annotate(
annotated_image_smart, detections=detections, labels=labels_smart
)
annotated_image_not_smart = box_annotator.annotate(
image.copy(), detections=detections
)
labels_not_smart = [
f"{class_names[int(class_id)]} {confidence:.2f}"
for _, _, confidence, class_id, *_ in detections
]
annotated_image_not_smart = label_annotator_not_smart.annotate(
annotated_image_not_smart, detections=detections, labels=labels_not_smart
)
# 4. Create comparison image and save
# ... (display and saving code omitted for brevity) I tested with various text positions: positions = [
sv.Position.TOP_LEFT,
sv.Position.CENTER_LEFT,
sv.Position.BOTTOM_RIGHT,
sv.Position.CENTER_RIGHT,
]
for position in positions:
process_image_with_supervision(
image, display_image=False, text_position=position, smart_position=True
) 🔍 Performance NoteThe enhanced smart positioning uses a two-phase approach that maintains good performance in most real-world scenarios. For scenes with many labels, the visual improvement in label placement is well worth the minimal additional processing time. 🔄 CompatibilityThis change is backward compatible. The |
supervision/annotators/core.py
Outdated
Returns: | ||
List[str]: A list of text lines after wrapping. | ||
""" | ||
import textwrap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s move this import to the top of the file instead of placing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 oops
supervision/annotators/core.py
Outdated
else: # CENTER, CENTER_LEFT, CENTER_RIGHT | ||
return (y1 + y2) / 2 | ||
|
||
def _wrap_text(self, text: str) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer this not to be a private class method—let’s move it to supervision/annotators/utils.py
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
supervision/annotators/core.py
Outdated
import textwrap | ||
|
||
if not text: | ||
return [""] | ||
|
||
if self.max_line_length is None: | ||
return text.splitlines() or [""] | ||
|
||
# Split the text by existing newlines first | ||
paragraphs = text.split("\n") | ||
all_lines = [] | ||
|
||
for paragraph in paragraphs: | ||
if not paragraph: | ||
# Keep empty lines | ||
all_lines.append("") | ||
continue | ||
|
||
# Wrap each paragraph separately | ||
wrapped = textwrap.wrap( | ||
paragraph, | ||
width=self.max_line_length, | ||
break_long_words=True, | ||
replace_whitespace=False, | ||
drop_whitespace=True, | ||
) | ||
|
||
# Add the wrapped lines for this paragraph | ||
if wrapped: | ||
all_lines.extend(wrapped) | ||
else: | ||
# If wrap returns an empty list (e.g., for whitespace-only input) | ||
all_lines.append("") | ||
|
||
return all_lines if all_lines else [""] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here seems pretty easy to follow. Let's remove python comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
supervision/annotators/core.py
Outdated
frame_width: int, | ||
frame_height: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the supervision codebase, we usually pass a resolution_wh
tuple instead of separate frame width and height values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have two other functions clip_boxes
and pad_boxes
. I recommend:
- renaming this function to
snap_boxes
- drop part of the logic that flips (we can add it in the future, but I want to keep it out of this PR)
- make it vectorized to process all boxes at once without looping
- wrap
frame_width
andframe_height
into singleresolution_wh
argument.
here's clip_boxes
for reference
def clip_boxes(xyxy: np.ndarray, resolution_wh: Tuple[int, int]) -> np.ndarray:
"""
Clips bounding boxes coordinates to fit within the frame resolution.
Args:
xyxy (np.ndarray): A numpy array of shape `(N, 4)` where each
row corresponds to a bounding box in
the format `(x_min, y_min, x_max, y_max)`.
resolution_wh (Tuple[int, int]): A tuple of the form `(width, height)`
representing the resolution of the frame.
Returns:
np.ndarray: A numpy array of shape `(N, 4)` where each row
corresponds to a bounding box with coordinates clipped to fit
within the frame resolution.
Examples:
```python
import numpy as np
import supervision as sv
xyxy = np.array([
[10, 20, 300, 200],
[15, 25, 350, 450],
[-10, -20, 30, 40]
])
sv.clip_boxes(xyxy=xyxy, resolution_wh=(320, 240))
# array([
# [ 10, 20, 300, 200],
# [ 15, 25, 320, 240],
# [ 0, 0, 30, 40]
# ])
```
"""
result = np.copy(xyxy)
width, height = resolution_wh
result[:, [0, 2]] = result[:, [0, 2]].clip(0, width)
result[:, [1, 3]] = result[:, [1, 3]].clip(0, height)
return result
I generated this. We would need to make sure it works:
def snap_boxes(xyxy: np.ndarray, resolution_wh: Tuple[int, int]) -> np.ndarray:
"""
Shifts bounding boxes into the frame so that they are fully contained
within the given resolution. Unlike `clip_boxes`, this function does not crop boxes.
It moves them entirely if they exceed the frame boundaries.
Args:
xyxy (np.ndarray): A numpy array of shape `(N, 4)` where each
row corresponds to a bounding box in the format
`(x_min, y_min, x_max, y_max)`.
resolution_wh (Tuple[int, int]): A tuple `(width, height)`
representing the resolution of the frame.
Returns:
np.ndarray: A numpy array of shape `(N, 4)` with boxes shifted into frame.
Examples:
```python
import numpy as np
import supervision as sv
xyxy = np.array([
[-10, 10, 30, 50],
[310, 200, 350, 250],
[100, -20, 150, 30],
[200, 220, 250, 270]
])
sv.snap_boxes(xyxy=xyxy, resolution_wh=(320, 240))
# array([
# [ 0, 10, 40, 50],
# [280, 200, 320, 250],
# [100, 0, 150, 50],
# [200, 190, 250, 240]
# ])
```
"""
result = np.copy(xyxy)
width, height = resolution_wh
box_w = result[:, 2] - result[:, 0]
box_h = result[:, 3] - result[:, 1]
shift_x1 = np.where(result[:, 0] < 0, -result[:, 0], 0)
shift_x2 = np.where(result[:, 2] > width, width - result[:, 2], 0)
shift_x = shift_x1 + shift_x2
result[:, 0] += shift_x
result[:, 2] += shift_x
shift_y1 = np.where(result[:, 1] < 0, -result[:, 1], 0)
shift_y2 = np.where(result[:, 3] > height, height - result[:, 3], 0)
shift_y = shift_y1 + shift_y2
result[:, 1] += shift_y
result[:, 3] += shift_y
return result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Might be worth double checking that I understood you properly here
supervision/annotators/core.py
Outdated
if x1 < 0: | ||
shift = -x1 | ||
x1 += shift | ||
x2 += shift | ||
elif x2 > frame_width: | ||
shift = frame_width - x2 | ||
x1 += shift | ||
x2 += shift | ||
|
||
# Adjust y-coordinate to stay within frame | ||
if y1 < 0: | ||
shift = -y1 | ||
y1 += shift | ||
y2 += shift | ||
elif y2 > frame_height: | ||
shift = frame_height - y2 | ||
y1 += shift | ||
y2 += shift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be possible to vectorize this and run on all boxes at once. Without looping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
supervision/annotators/core.py
Outdated
# Check if label should be flipped to above the box | ||
if check_flip_label and text_anchor is not None: | ||
box_height = y2 - y1 | ||
|
||
# Check anchor position to see if we can flip it | ||
anchor_y = self._get_anchor_y_for_adjustment( | ||
np.array([y1, y2]), text_anchor | ||
) | ||
|
||
# If we're at the bottom, try moving to the top | ||
if anchor_y >= y2 - 5: # Near bottom edge | ||
# Check if there's room at the top | ||
if y1 - box_height >= 0: | ||
y2 = y1 | ||
y1 = y2 - box_height |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove that logic from scope of this PR? I'm not sure I want to add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
supervision/annotators/core.py
Outdated
@staticmethod | ||
def _get_anchor_y_for_adjustment(bbox_y: np.ndarray, anchor: Position) -> float: | ||
""" | ||
Calculates the anchor y-coordinate for label adjustment based on the text anchor | ||
position. | ||
|
||
Args: | ||
bbox_y (np.ndarray): An array containing the y1 and y2 coordinates of the | ||
bounding box. | ||
anchor (Position): The desired text anchor position. | ||
|
||
Returns: | ||
float: The anchor y-coordinate. | ||
""" | ||
y1, y2 = bbox_y | ||
if anchor in [Position.TOP_LEFT, Position.TOP_CENTER, Position.TOP_RIGHT]: | ||
return y1 | ||
elif anchor in [ | ||
Position.BOTTOM_LEFT, | ||
Position.BOTTOM_CENTER, | ||
Position.BOTTOM_RIGHT, | ||
]: | ||
return y2 | ||
else: # CENTER, CENTER_LEFT, CENTER_RIGHT | ||
return (y1 + y2) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, I'd like to keep this part of the logic out of scope for this PR. We can go ahead and remove this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
supervision/annotators/core.py
Outdated
self.smart_position = smart_position | ||
self.max_line_length: Optional[int] = max_line_length | ||
|
||
def _validate_labels(self, labels: Optional[List[str]], detections: Detections): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this method to supervision/annotators/utils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
supervision/annotators/core.py
Outdated
) | ||
|
||
@staticmethod | ||
def _get_labels_text( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this method to supervision/annotators/utils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
supervision/annotators/core.py
Outdated
# First, make sure the boxes don't go outside the frame | ||
for i in range(len(labels)): | ||
# Adjust box to stay within frame | ||
adjusted_properties[i, :4] = self._ensure_box_in_frame( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we are getting rid of flipping for now, do we need to call _ensure_box_in_frame
(snap_boxes
) twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
supervision/detection/utils.py
Outdated
force_scale: float = 10.0, | ||
consider_size: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hidara2000 I'm curious—what was the reason for introducing those two arguments? I'm a bit concerned they might lead to unstable label positions during video processing, where small changes in initial position could cause disproportionately large shifts in the final output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SkalskiP
The reason for introducing force_scale and consider_size was primarily to offer more granular control over how the spreading algorithm resolves overlaps in static images (during testing). force_scale allows tuning the overall repulsion strength, and consider_size was an attempt to see if factoring in the label box dimensions could lead to a more visually pleasing distribution, especially in complex overlap scenarios. force_vectors *= 10
was already in the original code an I was toying with the idea of letting a user set these values to suit their scenario. ie less force for videos and more for static busy scenes.
You're absolutely right to be concerned about video stability. Iterative algorithms like spread_out_boxes can be sensitive to small frame-to-frame variations in detection positions. Parameters like force_scale (especially if set high) and consider_size can amplify these small variations into larger, potentially noticeable jumps or jitter in the label positions across consecutive video frames.
Given this valid concern and the potential for these parameters to introduce instability, I've reverted the spread_out_boxes function in the PR back to the original version that doesn't have these parameters.
I'm still interested to know your thoughts though – do you think there's a viable way to use the version of the function (below) with force_scale and consider_size without causing instability (perhaps with very conservative default values)? Or do the added parameters introduce unnecessary complexity that would require users to tune them during class instantiation, which might not be ideal for a general-purpose annotator?
Looking forward to your feedback!
def alternative_spread_out_boxes(
xyxy: np.ndarray,
max_iterations: int = 50, # Moderate default iterations
force_scale: float = 5.0, # Moderate default force scale
consider_size: bool = False, # Default to False for better video stability
min_force_magnitude: float = 2.0 # Make minimum force tunable
) -> np.ndarray:
"""
Spread out boxes that overlap with each other, optimized for a balance
between overlap resolution and video stability.
Args:
xyxy: Numpy array of shape (N, 4) where N is the number of boxes.
max_iterations: Maximum number of iterations to run the algorithm for.
Lower values may improve performance and stability
but could leave some overlaps unresolved.
force_scale: Scale factor for the repulsion forces. Lower values result
in less aggressive spreading, which can improve video stability.
consider_size: Whether to consider box size when calculating forces.
Setting to True might yield better static layouts but can
increase jitter in video due to fluctuating box sizes.
Defaults to False for better video stability.
min_force_magnitude: Minimum magnitude for calculated force vectors.
Ensures slight overlaps still result in movement.
Returns:
np.ndarray: A numpy array of shape (N, 4) with adjusted box positions.
"""
if len(xyxy) == 0:
return xyxy
# Add a small padding to ensure boxes that are just touching are considered for overlap
xyxy_padded = pad_boxes(xyxy, px=1)
# Calculate box areas if we're considering size (only done once)
size_factors = np.ones(len(xyxy_padded))
if consider_size:
box_areas = (xyxy_padded[:, 2] - xyxy_padded[:, 0]) * (
xyxy_padded[:, 3] - xyxy_padded[:, 1]
)
# Calculate the size factors (normalize by mean size), handle empty box_areas
if len(box_areas) > 0 and np.mean(np.sqrt(box_areas)) != 0:
size_factors = np.sqrt(box_areas) / np.mean(np.sqrt(box_areas))
# Clip to avoid extreme values influencing forces too much
size_factors = np.clip(size_factors, 0.5, 2.0)
for _ in range(max_iterations):
# Calculate IoU between all pairs of boxes (NxN matrix)
iou = box_iou_batch(xyxy_padded, xyxy_padded)
np.fill_diagonal(iou, 0) # Eliminate self-interactions (a box doesn't overlap with itself)
# If there are no overlaps, we are done
if np.all(iou == 0):
break
overlap_mask = iou > 0
# Calculate centers of the boxes (Nx2)
centers = (xyxy_padded[:, :2] + xyxy_padded[:, 2:]) / 2
# Calculate vectors pointing from each box center to every other box center (NxNx2)
delta_centers = centers[:, np.newaxis, :] - centers[np.newaxis, :, :]
# Only consider deltas for overlapping boxes
delta_centers *= overlap_mask[:, :, np.newaxis]
# Sum the delta vectors for each box to get the total push direction (Nx2)
delta_sum = np.sum(delta_centers, axis=1)
# Normalize the sum of deltas to get direction vectors (unit vectors)
delta_magnitude = np.linalg.norm(delta_sum, axis=1, keepdims=True)
direction_vectors = np.divide(
delta_sum,
delta_magnitude,
out=np.zeros_like(delta_sum), # Use zeros where magnitude is zero to avoid NaNs
where=delta_magnitude != 0,
)
# Calculate the base force magnitude based on total overlap (sum of IoUs)
base_force_magnitude = np.sum(iou, axis=1)
force_vectors = base_force_magnitude[:, np.newaxis] * direction_vectors
# Apply size-based scaling if enabled
if consider_size:
force_vectors *= size_factors[:, np.newaxis]
# Apply the general force scale
force_vectors *= force_scale
# Ensure minimum force for small overlaps to guarantee separation
current_force_magnitudes = np.linalg.norm(force_vectors, axis=1, keepdims=True)
small_force_mask = (current_force_magnitudes > 0) & (current_force_magnitudes < min_force_magnitude)
if np.any(small_force_mask):
# Rescale small force vectors to have the minimum magnitude
force_directions_for_small = force_vectors / np.where(
current_force_magnitudes > 0, current_force_magnitudes, 1
)
force_vectors = np.where(
small_force_mask, force_directions_for_small * min_force_magnitude, force_vectors
)
# Convert displacement vectors to integers for pixel-based movement
force_vectors = force_vectors.astype(int)
# Apply forces to update box positions (shift both corners by the same vector)
xyxy_padded[:, [0, 1]] += force_vectors
xyxy_padded[:, [2, 3]] += force_vectors
# Remove the padding before returning
return pad_boxes(xyxy_padded, px=-1)
Hi @hidara2000, sorry it took me a while to get back to you. I'm currently juggling work across 3–4 repositories, so my time is a bit stretched. I’ve now gone through your PR carefully and you’ve done an excellent job—really impressive work! Don’t be discouraged by the number of comments I left—they’re all meant to help polish things up. Once we merge this PR, it’ll take Supervision’s text annotators to the next level! |
I appreciate you going through it, and I agree with all the comments. Changes made as per advice and results from test below. |
🚀 Enhance label annotators with frame boundary adjustments and new base class
Description
This PR adds the ability to ensure labels stay within frame boundaries through a new
ensure_in_frame
parameter. When enabled, this functionality guarantees that text labels for bounding boxes near image edges remain visible by adjusting their position to fit within the frame.The key improvements include:
False
to maintain backward compatibility)smart_position
functionality with complementary behaviorWhile there may be occasional label overlaps in very busy frames when both
smart_position
andensure_in_frame
are enabled, running the smart positioning algorithm first typically yields better results overall.Type of change
How has this change been tested?
I tested this change with various image scenarios that have bounding boxes positioned near frame edges. The implementation was verified by:
ensure_in_frame
parameter enabledsmart_position
Example test code:
Any specific deployment considerations
No special deployment considerations are needed. This feature is implemented as an optional parameter that defaults to
False
, ensuring backward compatibility with existing code.Docs
No changes to docs as functionality is similar to
smart_position
and the only entry for this in the docs was in the changelog. I can update the documentation to include this new parameter in the appropriate class references if desired, just let me know where and the format.