Skip to content

Conversation

@mohamed-laarej
Copy link
Collaborator

@mohamed-laarej mohamed-laarej commented Mar 27, 2025

Fixes #738

Problem

The color parameter in plot_haplotype_network function is defined in plotly_params.py as Annotated[Optional[Union[str, Mapping]], "Name of variable to use to color the markers."], but the function was incorrectly using it as a direct column identifier on the haplotypes dataframe without validation.

Solution

This PR adds proper validation and handling for all possible color parameter formats:

  1. When color is a string:

    • Checks if it directly matches a column name in the dataframe
    • If not, checks if "cohorts_" + string matches a column name
    • Raises a helpful error message if neither exists
  2. When color is a mapping (dictionary) :

    • Properly processes the mapping to create appropriate color assignments
  3. When color is None:

    • Skips color processing (already handled in original code)

The implementation maintains backward compatibility with existing code while adding proper validation and error handling.

Changes Made

  • Added validation to check if the specified color column exists in the dataframe
  • Added support for "cohorts_" prefix when the direct column doesn't exist
  • Added support for mapping/dictionary input
  • Added descriptive error messages to help users understand the issue when an invalid column is specified
  • Maintained backward compatibility with existing code
  • Added tests with assertions for validation.
  • Updated notebook with examples demonstrating all color options (string, cohorts_, mapping, None).

Testing

Tested the fix by verifying that the function correctly identifies available columns and handles the color parameter appropriately.

@jonbrenas
Copy link
Collaborator

Thank you @mohamed-laarej . This doesn't actually solve the problem, however. The issue is slightly more complex than it appears. First, the color can be (if you look at its type definition) either a string or a mapping. If it is a string, it can be either a column name that exists or the postfix of a column name starting with cohorts_ that exists. It also needs to be able to accommodate the mapping option which is going to be a dictionary of label: query. This is to make sure that all our functions treat shared parameters in a homogeneous fashion.

@leehart leehart requested a review from jonbrenas March 28, 2025 12:01
@mohamed-laarej
Copy link
Collaborator Author

Thank you for your feedback, @jonbrenas. I understand now that the color parameter is more complex than I initially thought.

You're right that the parameter can be either a string or a mapping, and when it's a string, it can be either a direct column name or a postfix of a column name starting with "cohorts_". I'll update my PR to handle all these cases properly while maintaining consistency with how other functions treat similar parameters.

I'll submit an updated implementation soon that addresses these requirements.

@jonbrenas
Copy link
Collaborator

This looks great @mohamed-laarej. Have you run some tests to check that it works? Could you modify the notebook in notebook/plot_haplotype_networks.ipynb to give an example and update the tests in tests/integration/test_ag3.py and tests/integration/test_af1.py?

@mohamed-laarej
Copy link
Collaborator Author

Thanks for the feedback, @jonbrenas! Yes, I’ve run a local test script to verify that the new code handles all the cases for the color parameter (direct column name, cohorts_ prefix, mapping, None, and invalid input) before committing the changes. I’m currently working on updating the notebook in notebook/plot_haplotype_networks.ipynb to include examples and modifying the tests in tests/integration/test_ag3.py and tests/integration/test_af1.py as requested. These updates should be completed soon, and I’ll push them to the PR once they’re ready. Let me know if there’s anything specific you’d like me to focus on in the meantime!

…apping case); update tests with assertions; update notebook with examples
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mohamed-laarej
Copy link
Collaborator Author

@jonbrenas Hi there! I wanted to follow up on this PR where I've implemented the requested changes to the tests and notebook examples. I'm looking forward to any additional feedback you might have when you get a chance. Thank you for your time and guidance so far!

@jonbrenas
Copy link
Collaborator

Thanks @mohamed-laarej. I'll have to create a new PR shadowing this one for the CI to work, I am pretty sure. Your code looks good but I am not sure why you needed to use a new definition of Ag3/Af1 instead of setup_ag3/setup_af1.

@mohamed-laarej
Copy link
Collaborator Author

Thank you for the feedback!
While developing locally, I created the tests in isolation and used separate definitions for Ag3 and Af1 to simplify testing during the early stages. When integrating the tests into the main codebase, I included those definitions. This was an oversight on my part during integration. I’ll update the code to use the appropriate setup functions to ensure consistency.
Thanks again for pointing that out!

@leehart leehart merged commit 683aa1b into malariagen:master Aug 7, 2025
17 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect parameter color in plot_haplotype_network

3 participants