-
Couldn't load subscription status.
- Fork 3
MG-391: Add y_axis_max to immunohisto data #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
… from 19 to the 15 allowed.
…r yields, which is more than the 3 allowed.
…floating point values
| PyYAML~=6.0 | ||
| pyarrow~=14.0.1 | ||
| typer~=0.7.0 | ||
| click<8.3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To prevent CI failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive y-axis maximum calculation functionality to the immunohisto transform to support proper data visualization scaling in frontend applications. The key changes include implementing a sophisticated rounding algorithm, calculating y-axis maximums across age groups, and maintaining data completeness.
- Added
round_y_axis_max()function with smart rounding logic that always rounds UP to "nice numbers" - Implemented y-axis maximum calculation across all ages for each (name, evidence_type, tissue) combination
- Refactored transform logic into modular helper functions and updated all test output files with expected y_axis_max values
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/agoradatatools/etl/transform/immunohisto_transform.py | Core implementation of y-axis max calculation and rounding logic |
| tests/transform/test_immunohisto_transform.py | Comprehensive test suite with 150+ test cases for new functionality |
| tests/test_assets//output/.json | Updated expected test outputs to include y_axis_max field |
| setup.cfg | Added click version constraint for dependency management |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clever solution to making sure things get rounded correctly. I do think this code can be simplified a lot and I made some suggestions on how to do that with pandas instead of lists of dictionaries.
| entry[extra_column_name] = group[extra_columns].to_dict("records") | ||
| data_rows.append(entry) | ||
|
|
||
| return data_rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A suggestion for simplifying this function. You might even be able to delete this a function and move this to the main code since it's short:
# Add nest_fields from agoradatatools.etl.utils to the imports at the top of the file
grouped = nest_fields(dataset.copy(), grouping=group_columns, new_column="data", drop_columns=group_columns)
grouped["y_axis_max"] = grouped.apply(
lambda entry:
round_y_axis_max(y_axis_max_map.get((entry["name"], entry["evidence_type"], entry["tissue"]), 0)), axis=1)
# Though as above, I really think the rounding should already be in the lookup table
))
return grouped.to_dict("records")
| } | ||
| ) | ||
|
|
||
| return data_rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A suggestion for simplifying this function. If you use my suggestion for _create_data_rows but leave data_rows as a data frame, you can play some games with pandas to fill in missing ages. This removes the need to pass in the original data set or create a y_max lookup table and you can just return the data frame and I don't think you need to call convert_numpy_types either.
# add import numpy as np to imports at top of function
available_ages = list(data_rows["age"].drop_duplicates())
# All unique combinations of groups
fill_df = data_rows[["name", "evidence_type", "tissue", "y_axis_max"]].copy().drop_duplicates()
fill_df["age"] = [available_ages] * fill_df.shape[0] # Make an "age" column where each entry is a list of all possible ages
# "explode" makes one row per age + group. Then merge back into the data to create new rows for missing ages
fill_df = fill_df.explode("age").merge(data_rows, how="outer", validate="one_to_one")
# Fill NA values for units. Can't use fillna to make an empty list so we add an extra line
fill_df = fill_df.fillna({"units": ""})
fill_df["data"] = fill_df["data"].apply(lambda x: [] if x is np.NaN else x)
# sort by age
# Adjust _extract_age_num to work with a single string from a pd.Series, no need to return a tuple
def _extract_age_num(entry: str) -> float:
try/catch but return float
fill_df["age_numeric"] = fill_df["age"].apply(_extract_age_num)
# Sort
fill_df = fill_df.sort_values(["age_numeric", "age", "evidence_type"]).drop(columns="age_numeric")
| import os | ||
|
|
||
| import pandas as pd | ||
| import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review the tests after you simplify. You might be able to remove some of the helper functions and delete/simplify those tests.
Co-authored-by: jaclynbeck-sage <[email protected]>
|



Problem
The immunohisto transform was missing y-axis maximum values needed for proper visualization of data. Without these values, frontend applications couldn't properly scale figures, leading to poor user experience when viewing biomarker and pathology data.
Solution
This PR implements a comprehensive solution for calculating and rounding y-axis maximum values in the immunohisto transform:
Key Features Added:
(name, evidence_type, tissue)round_y_axis_max()function with sophisticated rounding logic that:Rounding Logic Details:
The rounding algorithm follows a specific pattern with key design principles:
Tests
Added comprehensive test coverage including:
round_y_axis_max()function with 15+ test cases covering:y_axis_maxfield in all test output filesFiles Modified:
src/agoradatatools/etl/transform/immunohisto_transform.py- Core implementationtests/transform/test_immunohisto_transform.py- Comprehensive test suite