feat(annotators): enhance label annotators with frame boundary adjust… #1820

hidara2000 · 2025-04-15T08:30:07Z

🚀 Enhance label annotators with frame boundary adjustments and new base class

Description

This PR adds the ability to ensure labels stay within frame boundaries through a new ensure_in_frame parameter. When enabled, this functionality guarantees that text labels for bounding boxes near image edges remain visible by adjusting their position to fit within the frame.

The key improvements include:

✅ Text labels near edges now properly positioned within frame boundaries
✅ Implemented as an optional parameter (default: False to maintain backward compatibility)
✅ Works alongside existing smart_position functionality with complementary behavior

While there may be occasional label overlaps in very busy frames when both smart_position and ensure_in_frame are enabled, running the smart positioning algorithm first typically yields better results overall.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

How has this change been tested?

I tested this change with various image scenarios that have bounding boxes positioned near frame edges. The implementation was verified by:

Comparing output images with and without the ensure_in_frame parameter enabled
Testing cases with multiple objects near edges to ensure proper positioning
Validating behavior when used in combination with smart_position

Example test code:

import cv2
import numpy as np
import supervision as sv
from supervision.annotators.core import LabelAnnotator, BoxAnnotator
from PIL import Image, ImageDraw
import os
from typing import Optional, Tuple, List


def generate_mock_yolo_output(image_shape: Tuple[int, int, int]) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Generates mock bounding box detections, confidence scores, and class predictions
    for a given image shape.  The function creates a set of detections, including
    one that covers the whole image.

    Args:
        image_shape (Tuple[int, int, int]): The shape of the image (height, width, channels).
            This is used to determine the boundaries for the generated bounding boxes.

    Returns:
        Tuple[np.ndarray, np.ndarray, np.ndarray]: A tuple containing:
            - A NumPy array of bounding boxes (N, 4), where N is the number of detections.
              Each box is defined as [xmin, ymin, xmax, ymax].
            - A NumPy array of confidence scores (N,).
            - A NumPy array of class labels (N,).
    """
    image_height, image_width, _ = image_shape
    num_detections = 100
    
    # Generate random bounding boxes
    xmin = np.random.randint(0, image_width, num_detections)
    ymin = np.random.randint(0, image_height, num_detections)
    xmax = np.random.randint(xmin + 20, image_width + 50, num_detections)
    ymax = np.random.randint(ymin + 20, image_height + 50, num_detections)
    bounding_boxes = np.stack([xmin, ymin, xmax, ymax], axis=1).astype(np.float32)

    # Add a box that covers the whole image
    full_image_box = np.array([0, 0, image_width, image_height], dtype=np.float32).reshape(1, 4)
    bounding_boxes = np.concatenate([bounding_boxes, full_image_box], axis=0)
    num_detections += 1

    # Generate random confidence scores
    confidence_scores = np.random.uniform(0.5, 0.95, num_detections).astype(np.float32)
    confidence_scores[-1] = 0.99  # High confidence for the full image box

    # Generate random class labels
    class_labels = np.random.randint(0, 2, num_detections).astype(np.int32)
    class_labels[-1] = 0  # Assign a class to the full image box

    return bounding_boxes, confidence_scores, class_labels



def process_image_with_supervision(
    image: np.ndarray,
    display_image: bool = True,
    text_position: sv.Position = sv.Position.TOP_LEFT,
    smart_position: bool = False,
    detections: Optional[sv.Detections] = None,
) -> None:
    """
    Processes an image by simulating YOLO detection and using Supervision to annotate it.
    The function generates two annotated images (with and without `ensure_in_frame`)
    and stacks them vertically, adding headers and white boundaries for clarity.

    Args:
        image (np.ndarray): The input image as a NumPy array in BGR format.
        display_image (bool, optional): Flag to control whether to display the image.
            If True, it attempts to display the image. If False, it saves the
            image to a file. Defaults to True.
        text_position (sv.Position, optional): The position of the text label
            relative to the bounding box.  Defaults to sv.Position.TOP_LEFT.
        smart_position (bool, optional): Flag to enable smart position adjustment of labels
            to keep them within the image frame. Defaults to False.
        detections (sv.Detections, optional): Pre-calculated detections.
            If provided, the function uses these detections instead of generating new ones.
            Defaults to None.

    Returns:
        None (displays or saves the stacked annotated image).
    """
    # 1. Simulate YOLO model output or use provided
    if detections is None:
        bounding_boxes, confidence_scores, class_labels = generate_mock_yolo_output(image.shape)
        detections = sv.Detections(
            xyxy=bounding_boxes,
            confidence=confidence_scores,
            class_id=class_labels,
        )

    # 2. Create annotators
    box_annotator = BoxAnnotator(thickness=2)
    class_names = ["car", "person"]

    label_annotator_in_frame = LabelAnnotator(
        text_scale=0.5,
        text_thickness=1,
        text_padding=5,
        ensure_in_frame=True,
        text_position=text_position,
        smart_position=smart_position,
    )
    label_annotator_out_of_frame = LabelAnnotator(
        text_scale=0.5,
        text_thickness=1,
        text_padding=5,
        ensure_in_frame=False,
        text_position=text_position,
        smart_position=smart_position,
    )

    # 4. Annotate the image with the detections using both annotators.
    annotated_image_in_frame = box_annotator.annotate(image.copy(), detections=detections)
    labels_in_frame = [
        f"{class_names[int(class_id)]} {confidence:.2f}"  # Corrected f-string
        for _, _, confidence, class_id, *_ in detections
    ]
    annotated_image_in_frame = label_annotator_in_frame.annotate(
        annotated_image_in_frame, detections=detections, labels=labels_in_frame
    )

    annotated_image_out_of_frame = box_annotator.annotate(image.copy(), detections=detections)
    labels_out_of_frame = [
        f"{class_names[int(class_id)]} {confidence:.2f}"  # Corrected f-string
        for _, _, confidence, class_id, *_ in detections
    ]
    annotated_image_out_of_frame = label_annotator_out_of_frame.annotate(
        annotated_image_out_of_frame, detections=detections, labels=labels_out_of_frame
    )

    # 5. Add white boundaries around the images
    border_width = 3
    annotated_image_in_frame = cv2.copyMakeBorder(
        annotated_image_in_frame,
        border_width,
        border_width,
        border_width,
        border_width,
        cv2.BORDER_CONSTANT,
        value=(255, 255, 255),
    )
    annotated_image_out_of_frame = cv2.copyMakeBorder(
        annotated_image_out_of_frame,
        border_width,
        border_width,
        border_width,
        border_width,
        cv2.BORDER_CONSTANT,
        value=(255, 255, 255),
    )

    # 6. Add headers to each image
    header_height = 30
    header_color = (255, 255, 255)
    text_color = (0, 0, 0)
    font = cv2.FONT_HERSHEY_SIMPLEX
    font_scale = 0.7
    font_thickness = 2

    # Create header images for each annotated image
    header_image_in_frame = np.zeros(
        (header_height, annotated_image_in_frame.shape[1], 3), dtype=np.uint8
    )
    header_image_in_frame[:] = header_color
    text_size_in_frame = cv2.getTextSize("Enabled", font, font_scale, font_thickness)[0]
    text_x_in_frame = annotated_image_in_frame.shape[1] - text_size_in_frame[0] - 10
    text_y_in_frame = (header_height + text_size_in_frame[1]) // 2
    cv2.putText(
        header_image_in_frame,
        "Enabled",
        (text_x_in_frame, text_y_in_frame),
        font,
        font_scale,
        text_color,
        font_thickness,
        cv2.LINE_AA,
    )

    header_image_out_of_frame = np.zeros(
        (header_height, annotated_image_out_of_frame.shape[1], 3), dtype=np.uint8
    )
    header_image_out_of_frame[:] = header_color
    text_size_out_of_frame = cv2.getTextSize("Not Enabled", font, font_scale, font_thickness)[0]
    text_x_out_of_frame = (header_image_out_of_frame.shape[1] - text_size_out_of_frame[0]) // 2
    text_y_out_of_frame = (header_height + text_size_out_of_frame[1]) // 2
    cv2.putText(
        header_image_out_of_frame,
        "Not Enabled",
        (text_x_out_of_frame, text_y_out_of_frame),
        font,
        font_scale,
        text_color,
        font_thickness,
        cv2.LINE_AA,
    )

    # Stack the headers and the images
    annotated_image_in_frame_with_header = np.vstack(
        (header_image_in_frame, annotated_image_in_frame)
    )
    annotated_image_out_of_frame_with_header = np.vstack(
        (header_image_out_of_frame, annotated_image_out_of_frame)
    )

    # 7. Stack the two images vertically
    stacked_image = np.vstack(
        (annotated_image_in_frame_with_header, annotated_image_out_of_frame_with_header)
    )

    # Add position text to the top-left corner
    cv2.putText(
        stacked_image,
        str(text_position) + f", smart_pos={smart_position}",
        (10, 20),
        cv2.FONT_HERSHEY_SIMPLEX,
        0.7,
        (0, 0, 0),
        2,
        cv2.LINE_AA,
    )

    # 8. Display the annotated image.
    if display_image:
        try:
            pil_image = Image.fromarray(cv2.cvtColor(stacked_image, cv2.COLOR_BGR2RGB))
            pil_image.show()
            pil_image.close()
        except OSError as e:
            print(f"Error displaying image: {e}. Saving image instead.")
            cv2.imwrite(f"annotated_image_{text_position}_smart_{smart_position}.jpg", stacked_image)
    else:
        cv2.imwrite(f"annotated_image_{text_position}_smart_{smart_position}.jpg", stacked_image)
        print(f"Annotated image saved to annotated_image_{text_position}_smart_{smart_position}.jpg")



def main(image_path: str = "example.jpg") -> None:
    """
    Main function to run the image processing and annotation with different label positions
    and smart position settings.

    Args:
        image_path (str, optional): Path to the image file. Defaults to "example.jpg".
    """
    # Create a dummy image
    image = np.zeros((600, 800, 3), dtype=np.uint8)
    cv2.imwrite(image_path, image)

    # 1. Generate Detections once - Moved inside process_image_with_supervision
    # mock_bounding_boxes, mock_confidence_scores, mock_class_labels = generate_mock_yolo_output(image.shape)
    # detections = sv.Detections(
    #      xyxy=mock_bounding_boxes,
    #      confidence=confidence_scores,
    #      class_id=mock_class_labels,
    # )

    # 2. Loop through positions with smart_position=False
    positions = [
        sv.Position.TOP_LEFT,
        sv.Position.CENTER_LEFT,
        sv.Position.BOTTOM_RIGHT,
        sv.Position.CENTER_RIGHT,
    ]
    for position in positions:
        print(f"Processing image with text position: {position}, smart_position=False")
        process_image_with_supervision(image, display_image=False, text_position=position, smart_position=False)  # Removed detections

    # 3. Loop through positions with smart_position=True, using the same detections
    for position in positions:
        print(f"Processing image with text position: {position}, smart_position=True")
        process_image_with_supervision(image, display_image=False, text_position=position, smart_position=True)  # Removed detections

    os.remove(image_path)



if __name__ == "__main__":
    main()

Any specific deployment considerations

No special deployment considerations are needed. This feature is implemented as an optional parameter that defaults to False, ensuring backward compatibility with existing code.

Docs

Docs updated? What were the changes:
No changes to docs as functionality is similar to smart_position and the only entry for this in the docs was in the changelog. I can update the documentation to include this new parameter in the appropriate class references if desired, just let me know where and the format.

…ments and new base class - ensures labels are within frame - May have a few overlaps at edges,in very busy frames, when smart_pos is enabled. but running smart_pos first yields better results

CLAassistant · 2025-04-15T08:30:14Z

All committers have signed the CLA.

onuralpszr

Hello @hidara2000 thank you for this awesome PR

I made my first initials quick comments about certain change, Let me also test as well.

supervision/annotators/core.py

Co-authored-by: Onuralp SEZER <[email protected]>

hidara2000 · 2025-04-15T12:00:10Z

Hello @hidara2000 thank you for this awesome PR

I made my first initials quick comments about certain change, Let me also test as well.

Makes sense. Changes ticked off. Cheers for a great tool!

SkalskiP · 2025-04-16T08:23:00Z

Hi @hidara2000 👋🏻 Huge thanks for deciding to submit a PR to introduce this change! I have a couple of points I'd like to discuss before I dive deeper into the PR review:

Wouldn't it be a better approach to keep the smart_position flag and simply add this extra behavior when smart_position=True? I understand that these two features could be seen as separate operations, but I'm still leaning towards maintaining a simple API:

smart_position=False - raw, unprocessed label positions
smart_position=True - we do everything we can to make them as visible as possible

For some time now, I've wanted to add support for multiline labels / label wrapping. Considering you're completely rewriting both label annotators, would you be willing to add support for multiline labels / label wrapping as part of this PR?

…ctionality

hidara2000 · 2025-04-17T00:53:03Z

📝 Add Multiline Text Support to Label Annotators

🔄 Updates to Previous PR

This extends my previous PR that added frame boundary adjustments by incorporating support for multiline text in label annotators. The implementation now properly handles both newlines in text and automatic text wrapping.

✨ New Features

🔤 Multiline Text Support: Labels now properly render text with newlines (\n)
📏 Auto Text Wrapping: New max_line_length parameter controls automatic text wrapping
🧠 Enhanced Smart Positioning: Improved algorithm to prevent overlapping multiline labels
🔄 Two-Phase Spreading: More effective label distribution with size-aware positioning

🛠️ Implementation Details

Added max_line_length parameter to existing annotator classes
Used Python's textwrap library for robust text wrapping functionality
Enhanced smart positioning to better handle varying text box sizes
Properly calculated dimensions for multiline text boxes
Implemented size-aware box spreading to reduce overlaps

📊 Before/After Comparison

📚 Usage Example

# Create a label annotator with multiline text support
label_annotator = sv.LabelAnnotator(
    text_padding=10,
    smart_position=True,  # Works with existing smart positioning
    max_line_length=20  # Enable text wrapping at 20 characters
)

# Labels can have manual newlines or will auto-wrap
labels = [
    "Car\nLicense: ABC-123",  # Manual newlines
    "This is a very long label that will be wrapped automatically"  # Auto-wrapped
]

# Use as normal
annotated_image = label_annotator.annotate(
    scene=image,
    detections=detections,
    labels=labels
)

🧪 Test Code

Here's the code I used to test the multiline text support:

def process_image_with_supervision(
    image: np.ndarray,
    display_image: bool = True,
    text_position: sv.Position = sv.Position.TOP_LEFT,
    smart_position: bool = False,
    detections: Optional[sv.Detections] = None,
) -> None:
    # 1. Simulate YOLO model output or use provided
    if detections is None:
        bounding_boxes, confidence_scores, class_labels = generate_mock_yolo_output(
            image.shape
        )
        detections = sv.Detections(
            xyxy=bounding_boxes,
            confidence=confidence_scores,
            class_id=class_labels,
        )

    # 2. Create annotators
    box_annotator = BoxAnnotator(thickness=2)
    class_names = ["This is\na\ncar", "This is a really really really long label"]

    label_annotator_smart = LabelAnnotator(
        text_scale=0.5,
        text_thickness=1,
        text_padding=5,
        text_position=text_position,
        smart_position=True,
        max_line_length=12,  # Enable text wrapping at 12 characters
    )
    label_annotator_not_smart = LabelAnnotator(
        text_scale=0.5,
        text_thickness=1,
        text_padding=5,
        text_position=text_position,
        smart_position=False,
    )

    # 3. Annotate the image with both configurations
    annotated_image_smart = box_annotator.annotate(image.copy(), detections=detections)
    labels_smart = [
        f"{class_names[int(class_id)]} {confidence:.2f}"
        for _, _, confidence, class_id, *_ in detections
    ]
    annotated_image_smart = label_annotator_smart.annotate(
        annotated_image_smart, detections=detections, labels=labels_smart
    )

    annotated_image_not_smart = box_annotator.annotate(
        image.copy(), detections=detections
    )
    labels_not_smart = [
        f"{class_names[int(class_id)]} {confidence:.2f}"
        for _, _, confidence, class_id, *_ in detections
    ]
    annotated_image_not_smart = label_annotator_not_smart.annotate(
        annotated_image_not_smart, detections=detections, labels=labels_not_smart
    )

    # 4. Create comparison image and save
    # ... (display and saving code omitted for brevity)

I tested with various text positions:

positions = [
    sv.Position.TOP_LEFT,
    sv.Position.CENTER_LEFT,
    sv.Position.BOTTOM_RIGHT,
    sv.Position.CENTER_RIGHT,
]
for position in positions:
    process_image_with_supervision(
        image, display_image=False, text_position=position, smart_position=True
    )

🔍 Performance Note

The enhanced smart positioning uses a two-phase approach that maintains good performance in most real-world scenarios. For scenes with many labels, the visual improvement in label placement is well worth the minimal additional processing time.

🔄 Compatibility

This change is backward compatible. The max_line_length parameter is optional (default: None), so existing code will continue to work without modification.

SkalskiP · 2025-04-23T07:29:45Z

supervision/annotators/core.py

+        Returns:
+            List[str]: A list of text lines after wrapping.
+        """
+        import textwrap


Let’s move this import to the top of the file instead of placing it here.

SkalskiP · 2025-04-23T07:32:50Z

supervision/annotators/core.py

+        else:  # CENTER, CENTER_LEFT, CENTER_RIGHT
+            return (y1 + y2) / 2
+
+    def _wrap_text(self, text: str) -> List[str]:


I’d prefer this not to be a private class method—let’s move it to supervision/annotators/utils.py instead.

SkalskiP · 2025-04-23T09:09:57Z

supervision/annotators/core.py

+        import textwrap
+
+        if not text:
+            return [""]
+
+        if self.max_line_length is None:
+            return text.splitlines() or [""]
+
+        # Split the text by existing newlines first
+        paragraphs = text.split("\n")
+        all_lines = []
+
+        for paragraph in paragraphs:
+            if not paragraph:
+                # Keep empty lines
+                all_lines.append("")
+                continue
+
+            # Wrap each paragraph separately
+            wrapped = textwrap.wrap(
+                paragraph,
+                width=self.max_line_length,
+                break_long_words=True,
+                replace_whitespace=False,
+                drop_whitespace=True,
+            )
+
+            # Add the wrapped lines for this paragraph
+            if wrapped:
+                all_lines.extend(wrapped)
+            else:
+                # If wrap returns an empty list (e.g., for whitespace-only input)
+                all_lines.append("")
+
+        return all_lines if all_lines else [""]


The logic here seems pretty easy to follow. Let's remove python comments here.

supervision/annotators/core.py

SkalskiP · 2025-04-23T09:34:35Z

supervision/annotators/core.py

+        frame_width: int,
+        frame_height: int,


in the supervision codebase, we usually pass a resolution_wh tuple instead of separate frame width and height values.

we have two other functions clip_boxes and pad_boxes. I recommend:

renaming this function to snap_boxes

drop part of the logic that flips (we can add it in the future, but I want to keep it out of this PR)

make it vectorized to process all boxes at once without looping

wrap frame_width and frame_height into single resolution_wh argument.

here's clip_boxes for reference

def clip_boxes(xyxy: np.ndarray, resolution_wh: Tuple[int, int]) -> np.ndarray: """ Clips bounding boxes coordinates to fit within the frame resolution. Args: xyxy (np.ndarray): A numpy array of shape `(N, 4)` where each row corresponds to a bounding box in the format `(x_min, y_min, x_max, y_max)`. resolution_wh (Tuple[int, int]): A tuple of the form `(width, height)` representing the resolution of the frame. Returns: np.ndarray: A numpy array of shape `(N, 4)` where each row corresponds to a bounding box with coordinates clipped to fit within the frame resolution. Examples: ```python import numpy as np import supervision as sv xyxy = np.array([ [10, 20, 300, 200], [15, 25, 350, 450], [-10, -20, 30, 40] ]) sv.clip_boxes(xyxy=xyxy, resolution_wh=(320, 240)) # array([ # [ 10, 20, 300, 200], # [ 15, 25, 320, 240], # [ 0, 0, 30, 40] # ]) ``` """ result = np.copy(xyxy) width, height = resolution_wh result[:, [0, 2]] = result[:, [0, 2]].clip(0, width) result[:, [1, 3]] = result[:, [1, 3]].clip(0, height) return result

I generated this. We would need to make sure it works:

def snap_boxes(xyxy: np.ndarray, resolution_wh: Tuple[int, int]) -> np.ndarray: """ Shifts bounding boxes into the frame so that they are fully contained within the given resolution. Unlike `clip_boxes`, this function does not crop boxes. It moves them entirely if they exceed the frame boundaries. Args: xyxy (np.ndarray): A numpy array of shape `(N, 4)` where each row corresponds to a bounding box in the format `(x_min, y_min, x_max, y_max)`. resolution_wh (Tuple[int, int]): A tuple `(width, height)` representing the resolution of the frame. Returns: np.ndarray: A numpy array of shape `(N, 4)` with boxes shifted into frame. Examples: ```python import numpy as np import supervision as sv xyxy = np.array([ [-10, 10, 30, 50], [310, 200, 350, 250], [100, -20, 150, 30], [200, 220, 250, 270] ]) sv.snap_boxes(xyxy=xyxy, resolution_wh=(320, 240)) # array([ # [ 0, 10, 40, 50], # [280, 200, 320, 250], # [100, 0, 150, 50], # [200, 190, 250, 240] # ]) ``` """ result = np.copy(xyxy) width, height = resolution_wh box_w = result[:, 2] - result[:, 0] box_h = result[:, 3] - result[:, 1] shift_x1 = np.where(result[:, 0] < 0, -result[:, 0], 0) shift_x2 = np.where(result[:, 2] > width, width - result[:, 2], 0) shift_x = shift_x1 + shift_x2 result[:, 0] += shift_x result[:, 2] += shift_x shift_y1 = np.where(result[:, 1] < 0, -result[:, 1], 0) shift_y2 = np.where(result[:, 3] > height, height - result[:, 3], 0) shift_y = shift_y1 + shift_y2 result[:, 1] += shift_y result[:, 3] += shift_y return result

Done. Might be worth double checking that I understood you properly here

SkalskiP · 2025-04-23T09:37:43Z

supervision/annotators/core.py

+        if x1 < 0:
+            shift = -x1
+            x1 += shift
+            x2 += shift
+        elif x2 > frame_width:
+            shift = frame_width - x2
+            x1 += shift
+            x2 += shift
+
+        # Adjust y-coordinate to stay within frame
+        if y1 < 0:
+            shift = -y1
+            y1 += shift
+            y2 += shift
+        elif y2 > frame_height:
+            shift = frame_height - y2
+            y1 += shift
+            y2 += shift


it should be possible to vectorize this and run on all boxes at once. Without looping.

SkalskiP · 2025-04-23T09:41:48Z

supervision/annotators/core.py

+            # Check if label should be flipped to above the box
+            if check_flip_label and text_anchor is not None:
+                box_height = y2 - y1
+
+                # Check anchor position to see if we can flip it
+                anchor_y = self._get_anchor_y_for_adjustment(
+                    np.array([y1, y2]), text_anchor
+                )
+
+                # If we're at the bottom, try moving to the top
+                if anchor_y >= y2 - 5:  # Near bottom edge
+                    # Check if there's room at the top
+                    if y1 - box_height >= 0:
+                        y2 = y1
+                        y1 = y2 - box_height


can we remove that logic from scope of this PR? I'm not sure I want to add it

SkalskiP · 2025-04-23T09:46:06Z

supervision/annotators/core.py

+    @staticmethod
+    def _get_anchor_y_for_adjustment(bbox_y: np.ndarray, anchor: Position) -> float:
+        """
+        Calculates the anchor y-coordinate for label adjustment based on the text anchor
+        position.
+
+        Args:
+            bbox_y (np.ndarray): An array containing the y1 and y2 coordinates of the
+                                bounding box.
+            anchor (Position): The desired text anchor position.
+
+        Returns:
+            float: The anchor y-coordinate.
+        """
+        y1, y2 = bbox_y
+        if anchor in [Position.TOP_LEFT, Position.TOP_CENTER, Position.TOP_RIGHT]:
+            return y1
+        elif anchor in [
+            Position.BOTTOM_LEFT,
+            Position.BOTTOM_CENTER,
+            Position.BOTTOM_RIGHT,
+        ]:
+            return y2
+        else:  # CENTER, CENTER_LEFT, CENTER_RIGHT
+            return (y1 + y2) / 2


As mentioned, I'd like to keep this part of the logic out of scope for this PR. We can go ahead and remove this method.

SkalskiP · 2025-04-23T09:47:40Z

supervision/annotators/core.py

+        self.smart_position = smart_position
+        self.max_line_length: Optional[int] = max_line_length
+
+    def _validate_labels(self, labels: Optional[List[str]], detections: Detections):


I'd move this method to supervision/annotators/utils.

SkalskiP · 2025-04-23T09:47:50Z

supervision/annotators/core.py

+            )
+
+    @staticmethod
+    def _get_labels_text(


I'd move this method to supervision/annotators/utils.

SkalskiP · 2025-04-23T10:01:05Z

supervision/annotators/core.py

+        # First, make sure the boxes don't go outside the frame
+        for i in range(len(labels)):
+            # Adjust box to stay within frame
+            adjusted_properties[i, :4] = self._ensure_box_in_frame(


Given that we are getting rid of flipping for now, do we need to call _ensure_box_in_frame (snap_boxes) twice?

SkalskiP · 2025-04-23T10:05:42Z

supervision/detection/utils.py

+    force_scale: float = 10.0,
+    consider_size: bool = True,


@hidara2000 I'm curious—what was the reason for introducing those two arguments? I'm a bit concerned they might lead to unstable label positions during video processing, where small changes in initial position could cause disproportionately large shifts in the final output.

@SkalskiP
The reason for introducing force_scale and consider_size was primarily to offer more granular control over how the spreading algorithm resolves overlaps in static images (during testing). force_scale allows tuning the overall repulsion strength, and consider_size was an attempt to see if factoring in the label box dimensions could lead to a more visually pleasing distribution, especially in complex overlap scenarios. force_vectors *= 10 was already in the original code an I was toying with the idea of letting a user set these values to suit their scenario. ie less force for videos and more for static busy scenes.

You're absolutely right to be concerned about video stability. Iterative algorithms like spread_out_boxes can be sensitive to small frame-to-frame variations in detection positions. Parameters like force_scale (especially if set high) and consider_size can amplify these small variations into larger, potentially noticeable jumps or jitter in the label positions across consecutive video frames.

Given this valid concern and the potential for these parameters to introduce instability, I've reverted the spread_out_boxes function in the PR back to the original version that doesn't have these parameters.

I'm still interested to know your thoughts though – do you think there's a viable way to use the version of the function (below) with force_scale and consider_size without causing instability (perhaps with very conservative default values)? Or do the added parameters introduce unnecessary complexity that would require users to tune them during class instantiation, which might not be ideal for a general-purpose annotator?

Looking forward to your feedback!

def alternative_spread_out_boxes( xyxy: np.ndarray, max_iterations: int = 50, # Moderate default iterations force_scale: float = 5.0, # Moderate default force scale consider_size: bool = False, # Default to False for better video stability min_force_magnitude: float = 2.0 # Make minimum force tunable ) -> np.ndarray: """ Spread out boxes that overlap with each other, optimized for a balance between overlap resolution and video stability. Args: xyxy: Numpy array of shape (N, 4) where N is the number of boxes. max_iterations: Maximum number of iterations to run the algorithm for. Lower values may improve performance and stability but could leave some overlaps unresolved. force_scale: Scale factor for the repulsion forces. Lower values result in less aggressive spreading, which can improve video stability. consider_size: Whether to consider box size when calculating forces. Setting to True might yield better static layouts but can increase jitter in video due to fluctuating box sizes. Defaults to False for better video stability. min_force_magnitude: Minimum magnitude for calculated force vectors. Ensures slight overlaps still result in movement. Returns: np.ndarray: A numpy array of shape (N, 4) with adjusted box positions. """ if len(xyxy) == 0: return xyxy # Add a small padding to ensure boxes that are just touching are considered for overlap xyxy_padded = pad_boxes(xyxy, px=1) # Calculate box areas if we're considering size (only done once) size_factors = np.ones(len(xyxy_padded)) if consider_size: box_areas = (xyxy_padded[:, 2] - xyxy_padded[:, 0]) * ( xyxy_padded[:, 3] - xyxy_padded[:, 1] ) # Calculate the size factors (normalize by mean size), handle empty box_areas if len(box_areas) > 0 and np.mean(np.sqrt(box_areas)) != 0: size_factors = np.sqrt(box_areas) / np.mean(np.sqrt(box_areas)) # Clip to avoid extreme values influencing forces too much size_factors = np.clip(size_factors, 0.5, 2.0) for _ in range(max_iterations): # Calculate IoU between all pairs of boxes (NxN matrix) iou = box_iou_batch(xyxy_padded, xyxy_padded) np.fill_diagonal(iou, 0) # Eliminate self-interactions (a box doesn't overlap with itself) # If there are no overlaps, we are done if np.all(iou == 0): break overlap_mask = iou > 0 # Calculate centers of the boxes (Nx2) centers = (xyxy_padded[:, :2] + xyxy_padded[:, 2:]) / 2 # Calculate vectors pointing from each box center to every other box center (NxNx2) delta_centers = centers[:, np.newaxis, :] - centers[np.newaxis, :, :] # Only consider deltas for overlapping boxes delta_centers *= overlap_mask[:, :, np.newaxis] # Sum the delta vectors for each box to get the total push direction (Nx2) delta_sum = np.sum(delta_centers, axis=1) # Normalize the sum of deltas to get direction vectors (unit vectors) delta_magnitude = np.linalg.norm(delta_sum, axis=1, keepdims=True) direction_vectors = np.divide( delta_sum, delta_magnitude, out=np.zeros_like(delta_sum), # Use zeros where magnitude is zero to avoid NaNs where=delta_magnitude != 0, ) # Calculate the base force magnitude based on total overlap (sum of IoUs) base_force_magnitude = np.sum(iou, axis=1) force_vectors = base_force_magnitude[:, np.newaxis] * direction_vectors # Apply size-based scaling if enabled if consider_size: force_vectors *= size_factors[:, np.newaxis] # Apply the general force scale force_vectors *= force_scale # Ensure minimum force for small overlaps to guarantee separation current_force_magnitudes = np.linalg.norm(force_vectors, axis=1, keepdims=True) small_force_mask = (current_force_magnitudes > 0) & (current_force_magnitudes < min_force_magnitude) if np.any(small_force_mask): # Rescale small force vectors to have the minimum magnitude force_directions_for_small = force_vectors / np.where( current_force_magnitudes > 0, current_force_magnitudes, 1 ) force_vectors = np.where( small_force_mask, force_directions_for_small * min_force_magnitude, force_vectors ) # Convert displacement vectors to integers for pixel-based movement force_vectors = force_vectors.astype(int) # Apply forces to update box positions (shift both corners by the same vector) xyxy_padded[:, [0, 1]] += force_vectors xyxy_padded[:, [2, 3]] += force_vectors # Remove the padding before returning return pad_boxes(xyxy_padded, px=-1)

SkalskiP · 2025-04-23T10:08:31Z

Hi @hidara2000, sorry it took me a while to get back to you. I'm currently juggling work across 3–4 repositories, so my time is a bit stretched. I’ve now gone through your PR carefully and you’ve done an excellent job—really impressive work! Don’t be discouraged by the number of comments I left—they’re all meant to help polish things up. Once we merge this PR, it’ll take Supervision’s text annotators to the next level!

hidara2000 · 2025-04-24T00:55:58Z

Hi @hidara2000, sorry it took me a while to get back to you. I'm currently juggling work across 3–4 repositories, so my time is a bit stretched. I’ve now gone through your PR carefully and you’ve done an excellent job—really impressive work! Don’t be discouraged by the number of comments I left—they’re all meant to help polish things up. Once we merge this PR, it’ll take Supervision’s text annotators to the next level!

I appreciate you going through it, and I agree with all the comments. Changes made as per advice and results from test below.

…ing logic

…ting logic

feat(annotators): enhance label annotators with frame boundary adjust…

f2e9001

…ments and new base class - ensures labels are within frame - May have a few overlaps at edges,in very busy frames, when smart_pos is enabled. but running smart_pos first yields better results

hidara2000 requested review from SkalskiP and onuralpszr as code owners April 15, 2025 08:30

fix(pre_commit): 🎨 auto format pre-commit hooks

ca5596c

onuralpszr reviewed Apr 15, 2025

View reviewed changes

supervision/annotators/core.py Outdated Show resolved Hide resolved

supervision/annotators/core.py Outdated Show resolved Hide resolved

supervision/annotators/core.py Outdated Show resolved Hide resolved

hidara2000 and others added 3 commits April 15, 2025 21:58

Update supervision/annotators/core.py

a1e43f8

Co-authored-by: Onuralp SEZER <[email protected]>

Update supervision/annotators/core.py

a5c833b

Co-authored-by: Onuralp SEZER <[email protected]>

Update supervision/annotators/core.py

474c8bc

Co-authored-by: Onuralp SEZER <[email protected]>

fix(annotators): add missing comma in annotate method signature

c4dce27

hidara2000 mentioned this pull request Apr 15, 2025

Labels new edge can disappear #1818

Open

2 tasks

hidara2000 and others added 2 commits April 17, 2025 10:48

feat(annotators): add max_line_length parameter and text wrapping fun…

0539ec9

…ctionality

fix(pre_commit): 🎨 auto format pre-commit hooks

c68218a

hidara2000 and others added 3 commits April 17, 2025 11:02

fix long comment lines

5c35af9

fix(pre_commit): 🎨 auto format pre-commit hooks

95a2ecc

Merge branch 'develop' into develop

1876019

SkalskiP reviewed Apr 23, 2025

View reviewed changes

SkalskiP requested changes Apr 23, 2025

View reviewed changes

hidara2000 and others added 3 commits April 25, 2025 09:32

refactor(annotators): streamline label handling and improve box snapp…

2969505

…ing logic

fix(pre_commit): 🎨 auto format pre-commit hooks

a6aa9b7

refactor(snap_boxes): enhance docstring and improve bounding box shif…

0c2aae6

…ting logic

hidara2000 requested a review from SkalskiP April 25, 2025 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(annotators): enhance label annotators with frame boundary adjust… #1820

feat(annotators): enhance label annotators with frame boundary adjust… #1820

hidara2000 commented Apr 15, 2025 •

edited

Loading

CLAassistant commented Apr 15, 2025 •

edited

Loading

onuralpszr left a comment

hidara2000 commented Apr 15, 2025

SkalskiP commented Apr 16, 2025

hidara2000 commented Apr 17, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025 •

edited

Loading

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025

SkalskiP Apr 23, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 25, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025

SkalskiP Apr 23, 2025

hidara2000 Apr 24, 2025 •

edited

Loading

SkalskiP commented Apr 23, 2025

hidara2000 commented Apr 24, 2025

feat(annotators): enhance label annotators with frame boundary adjust… #1820

Are you sure you want to change the base?

feat(annotators): enhance label annotators with frame boundary adjust… #1820

Conversation

hidara2000 commented Apr 15, 2025 • edited Loading

🚀 Enhance label annotators with frame boundary adjustments and new base class

Description

Type of change

How has this change been tested?

Any specific deployment considerations

Docs

CLAassistant commented Apr 15, 2025 • edited Loading

onuralpszr left a comment

Choose a reason for hiding this comment

hidara2000 commented Apr 15, 2025

SkalskiP commented Apr 16, 2025

hidara2000 commented Apr 17, 2025

📝 Add Multiline Text Support to Label Annotators

🔄 Updates to Previous PR

✨ New Features

🛠️ Implementation Details

📊 Before/After Comparison

📚 Usage Example

🧪 Test Code

🔍 Performance Note

🔄 Compatibility

Choose a reason for hiding this comment

hidara2000 Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hidara2000 Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

SkalskiP commented Apr 23, 2025

hidara2000 commented Apr 24, 2025

hidara2000 commented Apr 15, 2025 •

edited

Loading

CLAassistant commented Apr 15, 2025 •

edited

Loading

hidara2000 Apr 24, 2025 •

edited

Loading

hidara2000 Apr 24, 2025 •

edited

Loading