Skip to content

Conversation

@ke4
Copy link
Contributor

@ke4 ke4 commented Aug 11, 2025

Create a new DAO to fetch gene expressions from the new gxa_marker_gene table in our PostgreSQL.

Every specific genes in each assay group are clustered. The new table will contains the priority for each gene in the marker_gene_rank column. In each assay group the data should be ordered by that column.

The assay group IDs (assay field in the DB table or with other name samples in the given experiment) are passed (already calculated beforehand) into this service and we use them as a query filter. They are representing the column name in the heatmap table.

We should also select the top X gene for every assay groups based on the number of assay groups divided by 50 and round down to its integer value.

For example:

number of assay groups = 18 -> 50 % 18 = 2
In this case we are showing 36 rows (2 * 18)
number of assay groups: 9 -> 50 % 9 = 5
In this case we are showing 45 rows (5 * 9)
Note: Later on this value can come from a slider from the UI.
The json output should be exactly same format as the previous one, to prevent further changes in the frontend side.

Changes:

  • Add MarkerGeneDao to fetch marker gene profiles and counts from PostgreSQL when looking for Most Specific genes. - Updated BaselineExperimentProfilesService to conditionally use the new DAO for marker genes or the existing Solr-based implementations.
  • Included unit tests for MarkerGeneDao and updated service tests for comprehensive coverage.
  • Modified some related services / tests

Add `MarkerGeneDao` to fetch marker gene profiles and counts from PostgreSQL when `specific` flag in preferences is true. Updated `BaselineExperimentProfilesService` to conditionally use the new DAO for marker genes or the existing Solr-based implementations. Included unit tests for `MarkerGeneDao` and updated service tests for comprehensive coverage.
@ke4 ke4 self-assigned this Aug 11, 2025
@ke4 ke4 added enhancement New feature or request high priority labels Aug 11, 2025
ke4 added 26 commits August 12, 2025 10:13
Introduced new database fixture and cleanup scripts for the `gxa_marker_gene` table to support integration testing. Updated relevant test classes to populate and clean this table during test setup and teardown phases. These changes enable testing scenarios that involve marker gene data.
Update `ExpressionUnit` enums to include a `getDatabaseValue()` method for consistent database representation. Adjust related service and DAO logic to use this method, ensuring accurate parameter handling. Enhance test coverage to validate new functionality.
Extracted reusable SQL fragments and modularized query-building logic for better readability and maintainability. Added helper methods to streamline SQL execution, parameter building, and results processing. Introduced stricter null constraints for method parameters to improve type safety.
…tion

Detailed javadoc comments were added to DAO and service classes to explain their responsibilities, methods, and parameters. This improves code readability and helps developers understand the purpose and usage of each component more effectively.
Streamlined the test setup by introducing helper methods for initialization. Renamed mock variable names for clarity and updated to reflect switch from MarkerGeneDao to PostgresDao usage. Improved readability and maintained consistency across test cases.
Replaced all instances of 'TPM' with 'tpms' in the GXA marker gene fixture to ensure consistency with expected data formats. This change aligns the test data with the application's case sensitivity requirements.
Added logic to match assay names with assay groups using column headers, along with supporting utility methods for safer JSON handling and improved logging for unmatched cases.
…selineExperimentProfilesServiceTest for the updated code

This update introduces mockColumnHeaders to ensure proper handling of assay group metadata in test cases.
… correct value from DB

Replaced hardcoded strings ("TPM", "FPKM") with `ExpressionUnit.Absolute.Rna.getDatabaseValue()` in test cases. This ensures consistency with database values and reduces potential errors from string mismatches.
Bump the version from 37.1.1 to 37.2.0 in the Gradle build script. This update likely includes changes or improvements aligning with the new version.
Added filtering by assay names and optional gene ids passed in by params.
Introduced a constant for maximum marker genes and adjusted logic accordingly.
Refactored SQL queries and parameters to handle gene filtering by both gene IDs and gene names. Introduced additional checks for empty gene queries and adjusted query construction accordingly. Updated expression unit enums to include lowercase variants for consistent handling.
Consolidated logic for fetching marker gene profiles by replacing `fetchSpecificGeneProfiles` with `fetchMarkerGeneProfiles`. Introduced helper methods for cleaner query parameter handling and improved readability. Updated affected tests to align with the refactored method signatures.
Simplified test methods by removing redundant comments, consolidating common logic, and improving naming for better clarity. Introduced helper methods and reused constants to reduce duplication and ensure consistency across tests.
Added a default SemanticQuery for gene queries in `RnaSeqBaselineRequestPreferences` initialization. This ensures consistent handling of gene-related queries during baseline test setups.
Simplified logic in `BaselineExperimentProfilesService` by removing specific gene search handling.
Add a safeguard in MarkerGeneDao to set a minimum marker gene rank limit of 1 when the calculated value is less than 1.
Replaced HashMap with LinkedHashMap to preserve insertion order. Updated SQL query to include ordering by assay names and descending expression levels. Adjusted query parameters to reflect the new ordering logic.
Enhance SQL query in `MarkerGeneDao` to exclude entries with null `marker_gene_rank` for improved data accuracy.
Replaced assay names with assay group IDs for improved consistency and clarity in query logic and data mapping. Updated SQL query, helper methods, and associated logic to reflect this change. Removed unused code and redundant logging for better maintainability.
Eliminated the redundant "factorValue"/"name" parameter in fetchMarkerGeneProfiles method calls across the codebase. Updated corresponding test cases to align with the method signature changes for consistency and accuracy.
Updated SQL queries to utilize `StringSubstitutor` for dynamic parameter substitution, improving readability and maintainability. Replaced positional parameters with named placeholders and refactored query parameter handling to generate a map of values for substitution. Removed redundant parameters in `executeQuery` as a substitution simplifies query preparation.
Changed RNA expression unit enums from "fpkm" to "fpkms" and "tpm" to "tpms" to reflect pluralization. This ensures consistency and clarity in naming conventions.
Replaced references to `assay names` with `assay_group_id` for consistency with the updated data structure. Adjusted related test logic and method calls accordingly to reflect this change.
Replaced PostgreSQL's `ARRAY_POSITION` function with Java-based sorting to improve flexibility and maintainability. Updated the SQL query to remove `ARRAY_POSITION` and added a utility method to sort results by assay order in Java. This ensures consistent ordering of marker genes based on assay group IDs.
Updated the SQL schema to include a new `assay_id` column in the marker gene table and adjusted the test data accordingly. Refactored test constants and methods in `BaselineExperimentProfilesServiceTest` to align with the updated schema.
@ke4 ke4 marked this pull request as ready for review September 8, 2025 16:01
@ke4 ke4 requested review from amnonkhen and sandsebi September 8, 2025 16:01
Copy link
Contributor

@sandsebi sandsebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added my comments.

Replaced JdbcTemplate with NamedParameterJdbcTemplate for better SQL parameter handling and readability. Updated related methods, queries, and tests to accommodate the change. This ensures improved maintainability and reduces the risk of SQL injection.
Copy link
Contributor

@sandsebi sandsebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@ke4 ke4 merged commit bae6fef into develop Sep 11, 2025
2 checks passed
@ke4 ke4 deleted the feature/integrate_marker_gene_table branch September 11, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request high priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Load most-specific gene expressions for bulk baseline experiments from new Postgres table

3 participants