MG-391: Add y_axis_max to immunohisto data #222

beatrizsaldana · 2025-09-08T20:54:25Z

Problem

The immunohisto transform was missing y-axis maximum values needed for proper visualization of data. Without these values, frontend applications couldn't properly scale figures, leading to poor user experience when viewing biomarker and pathology data.

Solution

This PR implements a comprehensive solution for calculating and rounding y-axis maximum values in the immunohisto transform:

Key Features Added:

Y-axis Maximum Calculation: Added logic to calculate the maximum value across all ages for each combination of (name, evidence_type, tissue)
Smart Rounding Algorithm: Implemented round_y_axis_max() function with sophisticated rounding logic that:
- Returns 10 for zero values
- Rounds UP to the next "nice" number where the second digit is 0 or 5
- Handles different magnitudes correctly (e.g., 0.0021 → 0.0025, 1094 → 1500)
- Maintains monotonicity and handles floating-point precision issues
Comprehensive Error Handling: Handles edge cases like negative values, conversion errors, and missing data

Rounding Logic Details:
The rounding algorithm follows a specific pattern with key design principles:

Always Round UP: Ensures the y-axis scale accommodates all data points
"Nice Numbers": Creates visually appealing scales with 0 or 5 as second digit
Magnitude Preservation: Maintains the order of magnitude while rounding
Floating-Point Safety: Uses string manipulation to avoid precision issues
Monotonicity: Larger inputs always produce larger or equal outputs

Tests

Added comprehensive test coverage including:

Unit tests for round_y_axis_max() function with 15+ test cases covering:
- All Jira ticket examples
- Edge cases (zero, negative values, very small/large numbers)
- Second digit rounding logic
- Magnitude handling
- Floating-point precision issues
- Monotonicity verification
- Parametrized tests for main examples
Integration tests updated to include expected y_axis_max field in all test output files
Test assets updated with new expected output containing y-axis max values

Files Modified:

src/agoradatatools/etl/transform/immunohisto_transform.py - Core implementation
tests/transform/test_immunohisto_transform.py - Comprehensive test suite
All test output JSON files updated with expected y_axis_max values

…ssing age

… from 19 to the 15 allowed.

…r yields, which is more than the 3 allowed.

…y file

…floating point values

…run (#225)

beatrizsaldana · 2025-09-23T17:36:42Z

setup.cfg

    PyYAML~=6.0
    pyarrow~=14.0.1
    typer~=0.7.0
+    click<8.3.0


To prevent CI failure

Copilot

Pull Request Overview

This PR adds comprehensive y-axis maximum calculation functionality to the immunohisto transform to support proper data visualization scaling in frontend applications. The key changes include implementing a sophisticated rounding algorithm, calculating y-axis maximums across age groups, and maintaining data completeness.

Added round_y_axis_max() function with smart rounding logic that always rounds UP to "nice numbers"
Implemented y-axis maximum calculation across all ages for each (name, evidence_type, tissue) combination
Refactored transform logic into modular helper functions and updated all test output files with expected y_axis_max values

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

File	Description
src/agoradatatools/etl/transform/immunohisto_transform.py	Core implementation of y-axis max calculation and rounding logic
tests/transform/test_immunohisto_transform.py	Comprehensive test suite with 150+ test cases for new functionality
tests/test_assets//output/.json	Updated expected test outputs to include y_axis_max field
setup.cfg	Added click version constraint for dependency management

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tests/transform/test_immunohisto_transform.py

src/agoradatatools/etl/transform/immunohisto_transform.py

…view

jaclynbeck-sage

This is a clever solution to making sure things get rounded correctly. I do think this code can be simplified a lot and I made some suggestions on how to do that with pandas instead of lists of dictionaries.

src/agoradatatools/etl/transform/immunohisto_transform.py

jaclynbeck-sage · 2025-09-30T21:08:34Z

src/agoradatatools/etl/transform/immunohisto_transform.py

+        entry[extra_column_name] = group[extra_columns].to_dict("records")
+        data_rows.append(entry)
+
+    return data_rows


A suggestion for simplifying this function. You might even be able to delete this a function and move this to the main code since it's short:

# Add nest_fields from agoradatatools.etl.utils to the imports at the top of the file grouped = nest_fields(dataset.copy(), grouping=group_columns, new_column="data", drop_columns=group_columns) grouped["y_axis_max"] = grouped.apply( lambda entry: round_y_axis_max(y_axis_max_map.get((entry["name"], entry["evidence_type"], entry["tissue"]), 0)), axis=1) # Though as above, I really think the rounding should already be in the lookup table )) return grouped.to_dict("records")

jaclynbeck-sage · 2025-09-30T21:56:28Z

src/agoradatatools/etl/transform/immunohisto_transform.py

+                    }
+                )
+
+    return data_rows


A suggestion for simplifying this function. If you use my suggestion for _create_data_rows but leave data_rows as a data frame, you can play some games with pandas to fill in missing ages. This removes the need to pass in the original data set or create a y_max lookup table and you can just return the data frame and I don't think you need to call convert_numpy_types either.

# add import numpy as np to imports at top of function available_ages = list(data_rows["age"].drop_duplicates()) # All unique combinations of groups fill_df = data_rows[["name", "evidence_type", "tissue", "y_axis_max"]].copy().drop_duplicates() fill_df["age"] = [available_ages] * fill_df.shape[0] # Make an "age" column where each entry is a list of all possible ages # "explode" makes one row per age + group. Then merge back into the data to create new rows for missing ages fill_df = fill_df.explode("age").merge(data_rows, how="outer", validate="one_to_one") # Fill NA values for units. Can't use fillna to make an empty list so we add an extra line fill_df = fill_df.fillna({"units": ""}) fill_df["data"] = fill_df["data"].apply(lambda x: [] if x is np.NaN else x) # sort by age # Adjust _extract_age_num to work with a single string from a pd.Series, no need to return a tuple def _extract_age_num(entry: str) -> float: try/catch but return float fill_df["age_numeric"] = fill_df["age"].apply(_extract_age_num) # Sort fill_df = fill_df.sort_values(["age_numeric", "age", "evidence_type"]).drop(columns="age_numeric")

jaclynbeck-sage · 2025-09-30T22:02:53Z

tests/transform/test_immunohisto_transform.py

 import os

 import pandas as pd
 import pytest


I'll review the tests after you simplify. You might be able to remove some of the helper functions and delete/simplify those tests.

Co-authored-by: jaclynbeck-sage <[email protected]>

…mended by Jaclyn

… lookup table

sonarqubecloud · 2025-10-27T21:01:55Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
2.3% Duplication on New Code

See analysis details on SonarQube Cloud

Beatriz Saldana added 5 commits September 8, 2025 13:14

Added y-axis-max to immunohisto transforms

80f0384

Rounding y-axis-max

ace69c5

Preventing unnecessary re-calculation of y_axis_max for entry with mi…

634cefe

…ssing age

Added tests for the round_y_axis_max() function

a521450

Updated immunohisto tests to include expected y_axis_max field

61e11d2

beatrizsaldana requested a review from a team as a code owner September 8, 2025 20:54

Beatriz Saldana and others added 12 commits September 8, 2025 14:01

Updated model_details tests to include y_axis_max

d04128f

Addressing small PR comment

b55298c

Merged dev, still have conflicts

4efe2d6

Tests are passing post merge

e092272

Refactored immunohisto_transform() to reduce its Cognitive Complexity…

a9e73ab

… from 19 to the 15 allowed.

pre-commit

4866502

Tests for all new helper functions in immunohisto_transform()

a96483f

Fixing sonarcloud's complaint that round_y_axis_max() has 4 returns o…

31c2f8d

…r yields, which is more than the 3 allowed.

Added more specific type hints as requested by sonarcloud

16ca312

Added type hints to all functions in the test_immunohisto_trasnfrom.p…

44a047d

…y file

First attempt at fixing sonarcloud's dislike of equality checks with …

bd03d4d

…floating point values

Trying to pin click to less than 8.3.0 as it worked in a previous CI …

ec57e1f

…run (#225)

beatrizsaldana commented Sep 23, 2025

View reviewed changes

setup.cfg

PyYAML~=6.0

pyarrow~=14.0.1

typer~=0.7.0

click<8.3.0

Copy link

Member Author

beatrizsaldana Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent CI failure

beatrizsaldana requested a review from Copilot September 24, 2025 19:51

Copilot AI reviewed Sep 24, 2025

View reviewed changes

tests/transform/test_immunohisto_transform.py Show resolved Hide resolved

src/agoradatatools/etl/transform/immunohisto_transform.py Outdated Show resolved Hide resolved

src/agoradatatools/etl/transform/immunohisto_transform.py Outdated Show resolved Hide resolved

Beatriz Saldana added 2 commits September 25, 2025 12:22

Moved import statement as suggested by copilot PR review

40aae55

Removed use of hardcoded precision as recommended by copilot in PR re…

e12f1b5

…view

jaclynbeck-sage requested changes Sep 30, 2025

View reviewed changes

beatrizsaldana and others added 6 commits October 15, 2025 12:18

Update src/agoradatatools/etl/transform/immunohisto_transform.py

f6b3ab1

Co-authored-by: jaclynbeck-sage <[email protected]>

Simplified round_y_axis_max function as recommended by Jaclyn

f0705ad

Updated function name and docstring to clarify functionality as recom…

9dde583

…mended by Jaclyn

Removed redundant check to see if len(group) is greater than zero

3cf4d36

Merge remote-tracking branch 'origin/dev' into beatrizsaldana/MG-391

742ee80

Tests are now passing

5b5cd88

Beatriz Saldana added 4 commits October 27, 2025 13:49

Implemented Jaclyn's recomemndation to run rounding when creating the…

2b41ead

… lookup table

Updated docstring as suggested by Jaclyn

0572253

Updated docstring for _add_missing_age_entries as recommended by Jaclyn

e3f9cc3

Removed redundant lookup map creation

2dac9ee

Uh oh!

MG-391: Add y_axis_max to immunohisto data #222

Are you sure you want to change the base?

MG-391: Add y_axis_max to immunohisto data #222

Uh oh!

Conversation

beatrizsaldana commented Sep 8, 2025

Problem

Solution

Tests

Uh oh!

beatrizsaldana Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jaclynbeck-sage left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jaclynbeck-sage Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

jaclynbeck-sage Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

jaclynbeck-sage Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Oct 27, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants