Skip to content

Variant clickhouse tables#5260

Draft
hanars wants to merge 80 commits intosearch-test-setupfrom
variant-clickhouse-tables
Draft

Variant clickhouse tables#5260
hanars wants to merge 80 commits intosearch-test-setupfrom
variant-clickhouse-tables

Conversation

@hanars
Copy link
Collaborator

@hanars hanars commented Jan 30, 2026

No description provided.

def _annotate_filtered_transcripts(self, results, consequence_field, transcript_filters, *args, require_mane_canonical=False, **kwargs):
if require_mane_canonical:
filtered_expr = ArrayFilter(consequence_field, conditions=[{'canonical': (0, '{field} > {value}')}])
if 'isManeSelect' in self.sorted_transcript_consequence_fields:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only new behavior added in this PR - adds the require_mane_canonical flag that uses the new isManeSelect field so restricting to primary consequence is doen at query time instead of in post-processing

return lookup


def get_variant_main_transcripts_by_key(genome_version, dataset_type, selected_transcripts_by_key, include_clinvar=False, additional_values=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly different versions of this logic were implemented in 3 different places in the code, this moves the logic into a single shared helper

('_overwrite_base_manager', django.db.models.manager.Manager()),
],
),
migrations.RunSQL(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a no-op as the foreign keys are not actually enforced in clickhouse at the db level, but this just tells django that they have been updated

@@ -5,12 +5,8 @@
from django.db.models import Q, F
from django.db.models.functions import JSONObject
Copy link
Collaborator Author

@hanars hanars Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for the command changed quite a bit, as previously we had to get the full results back from search and then post-process them to filter based on whether or not they pass their transcript filters with a valid mane transcript. Since mane select in now available on the main variants table this filtering is done at search time and no post-processing is needed. Therefore, rather than getting back the formatted results from search we get back just the minimal genotypes and keys. We then use the shared utility function for bulk saving variants to handle getting and formatting the fields we need for actually saving a variant model, as this allows us to encapsulate that functionality in one place instead of having multiple different implementations. A restriction of this is that we need to call the bulk update method on a per-dataset type bases instead of once for everything, but that should not have a real performance impact

super().setUpClass()

@classmethod
def _clean_up_clickhouse_db(cls):
Copy link
Collaborator Author

@hanars hanars Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this exact TRUNCATE sql is run as part of every test class tear down, it juts doe snot always propogate, and this was the only way I could reliably get tests to pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant