Skip to content

refactor: reduce cyclomatic complexity in OMR processing#258

Open
IlyNosov wants to merge 1 commit into
Udayraj123:masterfrom
IlyNosov:refactor/reduce-omr-complexity
Open

refactor: reduce cyclomatic complexity in OMR processing#258
IlyNosov wants to merge 1 commit into
Udayraj123:masterfrom
IlyNosov:refactor/reduce-omr-complexity

Conversation

@IlyNosov
Copy link
Copy Markdown

@IlyNosov IlyNosov commented Dec 15, 2025

User description

Summary

Refactoring of OMR response processing to reduce cyclomatic complexity
and improve code maintainability without changing behavior.

Motivation

This pull request was created as part of an educational assignment
focused on code quality analysis and working with pull requests.
The repository was analyzed using Radon, which revealed a very high
cyclomatic complexity in the read_omr_response method.

Cyclomatic Complexity (Radon)

Before

  • ImageInstanceOps.read_omr_response: E (39)
  • ImageInstanceOps (class): B (9)
  • Average complexity: B (8.7)
  • Blocks analyzed: 10

After

  • ImageInstanceOps.read_omr_response: B (7)
  • ImageInstanceOps (class): A (4)
  • Average complexity: A (3.92)
  • Blocks analyzed: 25

Improvement

  • read_omr_response: −32 (−82%)
  • Average complexity: −4.78 (−55%)

What was done

  • Split a large, monolithic method into focused private helper methods
  • Reduced nesting and localized decision logic
  • Preserved all original comments for readability and future work
  • Fixed a potential UnboundLocalError caused by variable scope
  • Kept algorithmic behavior and output format unchanged

Notes

  • Refactoring only; no functional or behavioral changes intended
  • No performance claims
  • Changes were validated using Radon before and after refactoring

PR Type

Enhancement


Description

  • Split monolithic read_omr_response method into 11 focused helper methods

  • Reduced cyclomatic complexity from E(39) to B(7) in main method

  • Improved code maintainability and readability through method extraction

  • Fixed potential UnboundLocalError in _init_debug_boxplots and _plot_box_types_if_needed


Diagram Walkthrough

flowchart LR
  A["read_omr_response<br/>monolithic method"] -->|extract| B["_prepare_response_images"]
  A -->|extract| C["_prepare_morph_for_alignment"]
  A -->|extract| D["_auto_align_field_blocks"]
  A -->|extract| E["_collect_q_stats"]
  A -->|extract| F["_compute_thresholds"]
  A -->|extract| G["_detect_marked_bubbles"]
  G -->|extract| H["_compute_qstrip_threshold"]
  G -->|extract| I["_scan_and_draw_bubbles"]
  G -->|extract| J["_apply_detected_bubbles_to_response"]
  A -->|extract| K["_plot_box_types_if_needed"]
  A -->|extract| L["_show_final_align_if_needed"]
  A -->|result| M["Reduced complexity<br/>E39 → B7"]
Loading

File Walkthrough

Relevant files
Refactoring
core.py
Extract 11 helper methods from monolithic OMR processing 

src/core.py

  • Refactored read_omr_response method by extracting 11 private helper
    methods
  • Created _prepare_response_images to handle image copying, resizing,
    and normalization
  • Created _prepare_morph_for_alignment to prepare morphological image
    for alignment
  • Created _init_debug_boxplots to initialize debug visualization
    structures with null-safety
  • Created _auto_align_field_blocks to handle field block alignment logic
  • Created _render_alignment_debug to render alignment visualization
  • Created _collect_q_stats to collect bubble value statistics
  • Created _compute_thresholds to calculate global thresholds
  • Created _detect_marked_bubbles as main bubble detection orchestrator
  • Created _compute_qstrip_threshold to compute per-strip thresholds
  • Created _scan_and_draw_bubbles to detect and draw marked bubbles
  • Created _apply_detected_bubbles_to_response to update response with
    detected values
  • Created _apply_empty_if_none_detected to handle empty field cases
  • Created _collect_c_box_debug to collect debug box plot data
  • Created _plot_box_types_if_needed to render box plots with null-safety
    check
  • Created _show_final_align_if_needed to display alignment visualization
  • Fixed potential UnboundLocalError by initializing all_c_box_vals and
    q_nums to None
  • Preserved all original comments and algorithmic behavior
+511/-346

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Generic variable names: Variables s, d, x, y, f, and k use single-letter or generic names that don't clearly
express their purpose

Referred Code
s, d = field_block.origin, field_block.dimensions

match_col, max_steps, align_stride, thk = map(
    config.alignment_params.get,
    [
        "match_col",
        "max_steps",
        "stride",
        "thickness",
    ],
)
shift, steps = 0, 0
while steps < max_steps:
    left_mean = np.mean(
        morph_v[
            s[1] : s[1] + d[1],
            s[0] + shift - thk : -thk + s[0] + shift + match_col,
        ]
    )
    right_mean = np.mean(
        morph_v[


 ... (clipped 352 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing null validation: Method _scan_and_draw_bubbles doesn't validate if field_block_bubbles is empty before
accessing bubble properties, risking IndexError

Referred Code
def _scan_and_draw_bubbles(
        self,
        *,
        field_block,
        field_block_bubbles,
        per_q_strip_threshold,
        all_q_vals,
        total_q_box_no,
        final_marked,
        box_w,
        box_h,
):
    # TODO: get rid of total_q_box_no
    detected_bubbles = []
    for bubble in field_block_bubbles:
        bubble_is_marked = per_q_strip_threshold > all_q_vals[total_q_box_no]
        total_q_box_no += 1

        # Use these in both branches
        x, y, field_value = (
            bubble.x + field_block.shift,


 ... (clipped 39 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Missing input validation: Methods like _prepare_response_images and _collect_q_stats don't validate input
parameters (template, image) for null or invalid values before processing

Referred Code
def _prepare_response_images(self, template, image):
    config = self.tuning_config
    img = image.copy()
    # origDim = img.shape[:2]
    img = ImageUtils.resize_util(img, template.page_dimensions[0], template.page_dimensions[1])
    if img.max() > img.min():
        img = ImageUtils.normalize_util(img)
    # Processing copies
    transp_layer = img.copy()
    final_marked = img.copy()
    return img, transp_layer, final_marked

def _prepare_morph_for_alignment(self, img, auto_align):
    config = self.tuning_config
    morph = img.copy()
    self.append_save_img(3, morph)

    if auto_align:
        # Note: clahe is good for morphology, bad for thresholding
        morph = CLAHE_HELPER.apply(morph)
        self.append_save_img(3, morph)


 ... (clipped 165 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Introduce a dedicated class for processing

Create a dedicated OMRResponseProcessor class to encapsulate the state and logic
of OMR processing. This will replace long parameter lists in helper methods with
instance attributes, improving state management and reducing coupling.

Examples:

src/core.py [341-354]
    def _detect_marked_bubbles(
            self,
            template,
            img,
            final_marked,
            all_q_vals,
            all_q_strip_arrs,
            all_q_std_vals,
            global_thr,
            global_std_thresh,

 ... (clipped 4 lines)
src/core.py [80-96]
            (
                omr_response,
                multi_marked,
                per_omr_threshold_avg,
            ) = self._detect_marked_bubbles(
                template,
                img,
                final_marked,
                all_q_vals,
                all_q_strip_arrs,

 ... (clipped 7 lines)

Solution Walkthrough:

Before:

class ImageInstanceOps:
    def read_omr_response(self, template, image, name, ...):
        # ... initialization of many local variables ...
        img, transp_layer, final_marked = self._prepare_response_images(...)
        all_q_vals, all_q_strip_arrs, ... = self._collect_q_stats(...)
        global_thr, ... = self._compute_thresholds(all_q_vals, ...)
        
        omr_response, ... = self._detect_marked_bubbles(
            template, img, final_marked, all_q_vals, 
            all_q_strip_arrs, all_q_std_vals, global_thr, 
            global_std_thresh, omr_response, ...
        )
        # ...
    
    def _detect_marked_bubbles(self, template, img, final_marked, all_q_vals, ...):
        # ... uses many parameters
        pass

After:

class OMRResponseProcessor:
    def __init__(self, template, image, name, config, ...):
        self.template = template
        self.image = image
        # ... other attributes for state ...
        self.img = None
        self.final_marked = None
        self.all_q_vals = None
        self.global_thr = None
        # ...

    def process(self):
        self._prepare_response_images()
        self._collect_q_stats()
        self._compute_thresholds()
        self._detect_marked_bubbles()
        return self.omr_response, ...

    def _detect_marked_bubbles(self):
        # Accesses state via self.img, self.all_q_vals, etc.
        pass

class ImageInstanceOps:
    def read_omr_response(self, template, image, name, ...):
        processor = OMRResponseProcessor(template, image, name, self.tuning_config, self)
        return processor.process()
Suggestion importance[1-10]: 9

__

Why: This is an excellent architectural suggestion that addresses a significant design flaw (tight coupling via long parameter lists) introduced by the refactoring, proposing a solution that would greatly improve state management and code structure.

High
Possible issue
Fix uninitialized variable bug

Fix a bug in _detect_marked_bubbles where multi_roll is never updated. Update
the function to correctly calculate and return the multi_roll status, restoring
logic lost during refactoring.

src/core.py [341-422]

 def _detect_marked_bubbles(
         self,
         template,
         img,
         final_marked,
         all_q_vals,
         all_q_strip_arrs,
         all_q_std_vals,
         global_thr,
         global_std_thresh,
         omr_response,
         all_c_box_vals,
         q_nums,
 ):
     config = self.tuning_config
     per_omr_threshold_avg, total_q_strip_no, total_q_box_no = 0, 0, 0
     multi_marked, multi_roll = 0, 0
 
     for field_block in template.field_blocks:
         block_q_strip_no = 1
         box_w, box_h = field_block.bubble_dimensions
         shift = field_block.shift
         s, d = field_block.origin, field_block.dimensions
         key = field_block.name[:3]
         # cv2.rectangle(final_marked,(s[0]+shift,s[1]),(s[0]+shift+d[0],
         #   s[1]+d[1]),CLR_BLACK,3)
 
         for field_block_bubbles in field_block.traverse_bubbles:
             per_q_strip_threshold = self._compute_qstrip_threshold(
                 key=key,
                 field_block_bubbles=field_block_bubbles,
                 block_q_strip_no=block_q_strip_no,
                 all_q_std_vals=all_q_std_vals,
                 total_q_strip_no=total_q_strip_no,
                 global_std_thresh=global_std_thresh,
                 all_q_strip_arrs=all_q_strip_arrs,
                 global_thr=global_thr,
                 config=config,
             )
             per_omr_threshold_avg += per_q_strip_threshold
 
             detected_bubbles, total_q_box_no = self._scan_and_draw_bubbles(
                 field_block=field_block,
                 field_block_bubbles=field_block_bubbles,
                 per_q_strip_threshold=per_q_strip_threshold,
                 all_q_vals=all_q_vals,
                 total_q_box_no=total_q_box_no,
                 final_marked=final_marked,
                 box_w=box_w,
                 box_h=box_h,
             )
 
-            multi_marked = self._apply_detected_bubbles_to_response(
+            multi_marked, multi_roll = self._apply_detected_bubbles_to_response(
                 detected_bubbles=detected_bubbles,
                 omr_response=omr_response,
                 multi_marked=multi_marked,
+                multi_roll=multi_roll,
             )
 
             self._apply_empty_if_none_detected(
                 detected_bubbles=detected_bubbles,
                 field_block_bubbles=field_block_bubbles,
                 field_block=field_block,
                 omr_response=omr_response,
             )
 
             self._collect_c_box_debug(
                 config=config,
                 key=key,
                 all_c_box_vals=all_c_box_vals,
                 q_nums=q_nums,
                 block_q_strip_no=block_q_strip_no,
                 all_q_strip_arrs=all_q_strip_arrs,
                 total_q_strip_no=total_q_strip_no,
             )
 
             block_q_strip_no += 1
             total_q_strip_no += 1
         # /for field_block
 
     per_omr_threshold_avg /= total_q_strip_no
     per_omr_threshold_avg = round(per_omr_threshold_avg, 2)
-    return omr_response, multi_marked, per_omr_threshold_avg
+    return omr_response, multi_marked, multi_roll, per_omr_threshold_avg
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical bug introduced by the refactoring where the multi_roll logic was lost, causing it to always be 0. Restoring this functionality is essential for correctness.

High
Restore missing multi-roll detection logic

Restore the missing multi_roll detection logic in
_apply_detected_bubbles_to_response. The function should accept multi_roll,
update it, and return it to fix a bug introduced during refactoring.

src/core.py [524-541]

-def _apply_detected_bubbles_to_response(self, *, detected_bubbles, omr_response, multi_marked):
+def _apply_detected_bubbles_to_response(self, *, detected_bubbles, omr_response, multi_marked, multi_roll):
     for bubble in detected_bubbles:
         field_label, field_value = (
             bubble.field_label,
             bubble.field_value,
         )
         # Only send rolls multi-marked in the directory
         multi_marked_local = field_label in omr_response
         omr_response[field_label] = (
             (omr_response[field_label] + field_value)
             if multi_marked_local
             else field_value
         )
         # TODO: generalize this into identifier
-        # multi_roll = multi_marked_local and "Roll" in str(q)
+        multi_roll = multi_roll or (multi_marked_local and "Roll" in str(field_label))
         multi_marked = multi_marked or multi_marked_local
 
-    return multi_marked
+    return multi_marked, multi_roll
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies that the multi_roll calculation logic was lost during refactoring. It provides an accurate fix to restore this critical functionality, which is necessary for the overall correctness of the read_omr_response method.

High
  • More

@Udayraj123
Copy link
Copy Markdown
Owner

Hey @IlyNosov , thanks for the refactor PR and welcome to OMRChecker!
If possible can you please check the dev branch and see if this logic could be refactored (I have moved that logic into a two parts - detection/interpretation segregation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants