-
-
Notifications
You must be signed in to change notification settings - Fork 727
Clickhouse only db #11704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alisman
wants to merge
25
commits into
master
Choose a base branch
from
demo-clickhouse-only-db
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Clickhouse only db #11704
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…1673) Replace foreach loops generating multiple prepared statement parameters with single array parameter using ArrayTypeHandler. This significantly improves performance with ClickHouse JDBC connections by reducing parameter overhead. - Use CONCAT(study_id, ':', patient_id) with ArrayTypeHandler - Add SqlUtils.combineStudyAndPatientIds() utility method - Apply optimization to both patient and sample lookup queries - Maintain security through proper parameter binding 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <[email protected]>
Co-authored-by: Bryan Lai <[email protected]>
…C performance (#11703) Apply ArrayTypeHandler optimization strategy to whereSample include, following the same approach used in PatientMapper (commit 2e2ec22). This significantly improves performance with ClickHouse JDBC connections by reducing parameter overhead. Changes: - Replace foreach loops in whereSample with ArrayTypeHandler for both single-study and multi-study queries - Use SqlUtils.combineStudyAndPatientIds() for multi-study scenarios with CONCAT-based unique key matching - Optimize getClinicalAttributeCountsBySampleIds query performance through updated whereSample include This reduces prepared statement parameters from potentially thousands to single array parameters, maintaining security through proper parameter binding. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <[email protected]>
…rmance Apply ArrayTypeHandler optimization strategy across all high-priority MyBatis mappers to dramatically improve ClickHouse JDBC performance by reducing prepared statement parameter overhead. SqlUtils Enhancements: - Add listToArray() utility method to convert List<String> to String[] for ArrayTypeHandler - Extend combineStudyAndPatientIds() usage for multi-study query optimization Optimized MyBatis Mappers (11 files): - ClinicalAttributeMapper.xml - clinical attribute count queries with sample IDs - ClinicalDataMapper.xml - sample and patient clinical data queries - ClinicalEventMapper.xml - clinical events by sample and patient IDs - CopyNumberSegmentMapper.xml - copy number segment queries - DiscreteCopyNumberMapper.xml - discrete copy number queries - MutationMapper.xml - mutation queries with sample/profile pairs - NamespaceMapper.xml - sample ID namespace queries - SampleMapper.xml - sample queries with study/sample and study/patient pairs - StructuralVariantMapper.xml - structural variant queries - TreatmentMapper.xml - treatment sample ID queries Optimization Strategy Applied: - Single-study queries: Use <bind> + ArrayTypeHandler for direct List→Array conversion - Multi-study queries: Use SqlUtils.combineStudyAndPatientIds() with CONCAT matching - Replace foreach loops generating multiple prepared statement parameters - Use proper <bind> elements to avoid MyBatis parameter binding errors Performance Impact: - Reduces prepared statement parameters from potentially thousands to single arrays - Follows same proven optimization pattern from PatientMapper (commit 2e2ec22) - Maintains security through proper parameter binding - Significant improvement for ClickHouse JDBC connections 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
2e9c7ae
to
688a552
Compare
Replace individual fetchGenePanelDataByMolecularProfileId calls with a single fetchGenePanelDataByMolecularProfileIds batch call to reduce database round trips and improve performance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Made modifications to the usecases and added methods to the repository Working on the infrastructure layer Moved the mutation infrastructure to the right folder finished up the mapper class for the infrastructure Added rest endpoint and a record class for all use cases Refractored the controller layer adhering to clean arch principles Started with creating the sql for clickhouse repo mutation Added test for ferchAllMutationsProfileUseCase logic Finshed testing the utility class and usecase logic cleaned up some classes Wrote the sql for getMetaMutation use case. Starting with getMutation method Changed the naming for mutation controller Fixed some parameters for the datamapper to control information from the database Refractored the mapper class method to handle each projection with its seperate method Rough SQL for SUMMARY projection note, need to crosscheck Refactored repository layer to use molecularProfileCaseIdentifierUtil for grouping profiles and removing duplicates Updates fixed up summary query Almost done with queries need to confirm some results Finished up summary and detailed projection wroks now Created dto's and maooers for data received from the clincal data mapper corrected field variantAllele Refactored code to make use of projectionType Trying to adjust the queries to make use of clickhouse strengths put more comments for description Trying to optimize the query for clickhouse by batch sending the whole information to clickhouse for faster processing and reducing the risk of doing redundant joins Fixed the query to work for just molecularProfileId is provided
Simplified the method to call mutationMapper.getMutationsInMultipleMolecularProfiles directly instead of iterating through grouped cases and making multiple calls. This reduces database round trips and improves performance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Simplified the method to call structuralVariantMapper.fetchStructuralVariants directly with all molecular profile IDs instead of iterating through grouped cases and making multiple calls. This reduces database round trips and improves performance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
This commit optimizes the discrete copy number queries in DiscreteCopyNumberMapper by reordering the FROM clause to start with sample_cna_event instead of cna_event, and using subqueries to filter by genetic_profile_id instead of joining the full genetic_profile table. Changes: - Reordered FROM clause to start with sample_cna_event table - Replaced genetic_profile.stable_id joins with subquery lookups - Filters on genetic_profile_id directly in sample_cna_event table - Improves query performance by reducing join complexity This optimization should improve query performance for discrete copy number alteration lookups, especially for large datasets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>
This commit introduces a more efficient studyExists() method for validating study existence without fetching full study objects, and fixes a ClickHouse SQL compatibility issue in the StudyMapper. Changes: 1. Added StudyService.studyExists() method - Fetches only study IDs instead of full objects - Throws StudyNotFoundException if study doesn't exist - More efficient than getStudy() when only validation is needed 2. Replaced getStudy() with studyExists() across codebase - Updated 29 call sites where return value wasn't used - Affected services: Clinical, Sample, Patient, MolecularProfile, etc. 3. Fixed ClickHouse SQL error in StudyMapper.xml - Added cancer_study_identifier to GROUP BY clause - Resolves: "Column is not under aggregate function and not in GROUP BY keys" - Required for ORDER BY cancer_study_identifier to work with ClickHouse 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>
* Fix zero value handling in molecular data truncation Add special case handling for zero values in the molecular data value truncation logic to prevent them from being converted incorrectly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Update src/main/resources/org/cbioportal/legacy/persistence/mybatis/MolecularDataMapper.xml Co-authored-by: Onur Sumer <[email protected]> --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Onur Sumer <[email protected]>
* ♻️ Refactor Cancer Study PermissionEvaluator * 🔒 Add Support for CancerStudyMetadata obj * Update field typeOfCancer to cancerType
* Fix sample list query for clickhouse * Change cancer study details query in sample list query * Clean up SampleListMapper.xml by removing unused prefix properties
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.