-
Notifications
You must be signed in to change notification settings - Fork 2
[Feature #289] Integrate PostgreSQL-based marker gene functionality #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add `MarkerGeneDao` to fetch marker gene profiles and counts from PostgreSQL when `specific` flag in preferences is true. Updated `BaselineExperimentProfilesService` to conditionally use the new DAO for marker genes or the existing Solr-based implementations. Included unit tests for `MarkerGeneDao` and updated service tests for comprehensive coverage.
Introduced new database fixture and cleanup scripts for the `gxa_marker_gene` table to support integration testing. Updated relevant test classes to populate and clean this table during test setup and teardown phases. These changes enable testing scenarios that involve marker gene data.
Update `ExpressionUnit` enums to include a `getDatabaseValue()` method for consistent database representation. Adjust related service and DAO logic to use this method, ensuring accurate parameter handling. Enhance test coverage to validate new functionality.
Extracted reusable SQL fragments and modularized query-building logic for better readability and maintainability. Added helper methods to streamline SQL execution, parameter building, and results processing. Introduced stricter null constraints for method parameters to improve type safety.
…tion Detailed javadoc comments were added to DAO and service classes to explain their responsibilities, methods, and parameters. This improves code readability and helps developers understand the purpose and usage of each component more effectively.
Streamlined the test setup by introducing helper methods for initialization. Renamed mock variable names for clarity and updated to reflect switch from MarkerGeneDao to PostgresDao usage. Improved readability and maintained consistency across test cases.
Replaced all instances of 'TPM' with 'tpms' in the GXA marker gene fixture to ensure consistency with expected data formats. This change aligns the test data with the application's case sensitivity requirements.
Added logic to match assay names with assay groups using column headers, along with supporting utility methods for safer JSON handling and improved logging for unmatched cases.
…selineExperimentProfilesServiceTest for the updated code This update introduces mockColumnHeaders to ensure proper handling of assay group metadata in test cases.
… correct value from DB
Replaced hardcoded strings ("TPM", "FPKM") with `ExpressionUnit.Absolute.Rna.getDatabaseValue()` in test cases. This ensures consistency with database values and reduces potential errors from string mismatches.
Bump the version from 37.1.1 to 37.2.0 in the Gradle build script. This update likely includes changes or improvements aligning with the new version.
Added filtering by assay names and optional gene ids passed in by params. Introduced a constant for maximum marker genes and adjusted logic accordingly.
Refactored SQL queries and parameters to handle gene filtering by both gene IDs and gene names. Introduced additional checks for empty gene queries and adjusted query construction accordingly. Updated expression unit enums to include lowercase variants for consistent handling.
Consolidated logic for fetching marker gene profiles by replacing `fetchSpecificGeneProfiles` with `fetchMarkerGeneProfiles`. Introduced helper methods for cleaner query parameter handling and improved readability. Updated affected tests to align with the refactored method signatures.
Simplified test methods by removing redundant comments, consolidating common logic, and improving naming for better clarity. Introduced helper methods and reused constants to reduce duplication and ensure consistency across tests.
Added a default SemanticQuery for gene queries in `RnaSeqBaselineRequestPreferences` initialization. This ensures consistent handling of gene-related queries during baseline test setups.
Simplified logic in `BaselineExperimentProfilesService` by removing specific gene search handling.
Add a safeguard in MarkerGeneDao to set a minimum marker gene rank limit of 1 when the calculated value is less than 1.
Replaced HashMap with LinkedHashMap to preserve insertion order. Updated SQL query to include ordering by assay names and descending expression levels. Adjusted query parameters to reflect the new ordering logic.
Enhance SQL query in `MarkerGeneDao` to exclude entries with null `marker_gene_rank` for improved data accuracy.
Replaced assay names with assay group IDs for improved consistency and clarity in query logic and data mapping. Updated SQL query, helper methods, and associated logic to reflect this change. Removed unused code and redundant logging for better maintainability.
Eliminated the redundant "factorValue"/"name" parameter in fetchMarkerGeneProfiles method calls across the codebase. Updated corresponding test cases to align with the method signature changes for consistency and accuracy.
Updated SQL queries to utilize `StringSubstitutor` for dynamic parameter substitution, improving readability and maintainability. Replaced positional parameters with named placeholders and refactored query parameter handling to generate a map of values for substitution. Removed redundant parameters in `executeQuery` as a substitution simplifies query preparation.
Changed RNA expression unit enums from "fpkm" to "fpkms" and "tpm" to "tpms" to reflect pluralization. This ensures consistency and clarity in naming conventions.
Replaced references to `assay names` with `assay_group_id` for consistency with the updated data structure. Adjusted related test logic and method calls accordingly to reflect this change.
Replaced PostgreSQL's `ARRAY_POSITION` function with Java-based sorting to improve flexibility and maintainability. Updated the SQL query to remove `ARRAY_POSITION` and added a utility method to sort results by assay order in Java. This ensures consistent ordering of marker genes based on assay group IDs.
Updated the SQL schema to include a new `assay_id` column in the marker gene table and adjusted the test data accordingly. Refactored test constants and methods in `BaselineExperimentProfilesServiceTest` to align with the updated schema.
Closed
6 tasks
sandsebi
reviewed
Sep 9, 2025
app/src/main/java/uk/ac/ebi/atlas/experimentpage/baseline/profiles/MarkerGeneDao.java
Show resolved
Hide resolved
sandsebi
reviewed
Sep 9, 2025
app/src/main/java/uk/ac/ebi/atlas/experimentpage/baseline/profiles/MarkerGeneDao.java
Show resolved
Hide resolved
sandsebi
reviewed
Sep 9, 2025
Contributor
sandsebi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added my comments.
Replaced JdbcTemplate with NamedParameterJdbcTemplate for better SQL parameter handling and readability. Updated related methods, queries, and tests to accommodate the change. This ensures improved maintainability and reduces the risk of SQL injection.
sandsebi
approved these changes
Sep 11, 2025
Contributor
sandsebi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Create a new DAO to fetch gene expressions from the new gxa_marker_gene table in our PostgreSQL.
Every specific genes in each assay group are clustered. The new table will contains the priority for each gene in the marker_gene_rank column. In each assay group the data should be ordered by that column.
The assay group IDs (assay field in the DB table or with other name samples in the given experiment) are passed (already calculated beforehand) into this service and we use them as a query filter. They are representing the column name in the heatmap table.
We should also select the top X gene for every assay groups based on the number of assay groups divided by 50 and round down to its integer value.
For example:
number of assay groups = 18 -> 50 % 18 = 2
In this case we are showing 36 rows (2 * 18)
number of assay groups: 9 -> 50 % 9 = 5
In this case we are showing 45 rows (5 * 9)
Note: Later on this value can come from a slider from the UI.
The json output should be exactly same format as the previous one, to prevent further changes in the frontend side.
Changes:
MarkerGeneDaoto fetch marker gene profiles and counts from PostgreSQL when looking forMost Specificgenes. - UpdatedBaselineExperimentProfilesServiceto conditionally use the new DAO for marker genes or the existing Solr-based implementations.MarkerGeneDaoand updated service tests for comprehensive coverage.