-
Notifications
You must be signed in to change notification settings - Fork 38
Add unrestricted_use_only
and surveillance_use_only
constructor params
#724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
unrestricted_use_only
and surveillance_use_only
constructor params
…_releases property.
Opening this for a WIP review, to potentially avoid going too far down the wrong path. @alimanfoo @cclarkson @ahernank @jonbrenas I'm trying to identify other functions that we need to filter results for, according to:
Those seem like the main ones, which some other functions use, but I want to make sure we plug all potential leaks. |
|
|
It looks like Ag3 currently has about 137 public methods... 🤔 |
It also looks like about 119 of Ag3's public methods cannot be called without specifying params (cannot rely on defaults), which makes the testing of these constructor params somewhat difficult to automate. This also smells a lot like a "god object" anti-pattern https://en.wikipedia.org/wiki/God_object We should probably consider re-organising all of those methods, despite the inconvenience, but that will probably have to wait. In the meantime, I hope to be able to figure out which functions are vulnerable to leaking unfiltered data, relating to either the surveillance-only or unrestricted-use-only flags. |
@ahernank @jonbrenas Checkmarks indicate whether the function gets its data from an upstream public function, or file, or param, or otherwise looks covered. Unchecked functions indicate some doubt and require further investigation, discussion or coding.
|
unrestricted_use_only
and surveillance_use_only
constructor paramsunrestricted_use_only
and surveillance_use_only
constructor params
Now investigating unexpected test failures after merging with master branch. Local pytest:
CI pytest:
|
Thanks @leehart. For IGV to work, it needed to have access to a 'public' version of the reference genomes, e.g., at |
Following discussion, the plan is now to reduce the scope of honouring these constructor params to only currently "documented" public functions (as per |
List of "documented" public functions/properties to check: Note: I have crossed off functions/properties that should be OK once their sub-functions are OK, to help us focus on the root issues, but this doesn't imply that they are themselves "safe", only that any potential issues appear to be confined to their sub-functions. Basic data access
Reference genome data access
Sample metadata access
SNP data access
Haplotype data access
AIM data access
CNV data access
Similar to the issue with Again, as with Again, this might be resolved by adding filtering for
Since the function takes a
This function has the same kind of issues as
Integrative genomics viewer (IGV)
The end user (or other code) could try to pass a sample id to this function that was incompatible with the In this case, the incompatible sample id would be provided to SNP and CNV frequency analysis
It looks like this function should be resolved when
Principal components analysis (PCA)
This function should be partly resolved when Since the
Genetic distance and neighbour-joining trees (NJT)
This function should be partly resolved when
This function should be partly resolved when I propose that we look at changing the
By the way, I suspect that we will probably need a smarter mechanism for
Heterozygosity analysis
It looks like this function should be resolved when This function can take
It looks like this function should be resolved when
It looks like this function should be resolved when Diversity analysis
It looks like this function should be resolved when
This function should be partly resolved when
Genome-wide selection scans
It looks like this function should be resolved when
This function should be partly resolved when
Haplotype clustering and network analysis
Diplotype clustering
It looks like this function should be resolved when
It looks like this function should be resolved when Fst analysis
Inversion karyotypes
|
In summary, the following publicly-documented functions (about 20, out of about 100) need further investigation, discussion, decision or resolution, with regards to compliance with the two new constructor params:
When Since the
Many of the other functions, which leaves about 80, will require the above functions to be updated in order to behave according to plan. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #724 +/- ##
==========================================
- Coverage 96.13% 96.06% -0.08%
==========================================
Files 47 47
Lines 4683 4749 +66
==========================================
+ Hits 4502 4562 +60
- Misses 181 187 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Currently trying to resolve a recursion error relating to the Basically, for example, |
@ahernank @cclarkson I suspect there might be something wrong with the surveillance data, or something I don't understand:
Shouldn't there be a one-to-one correspondence between these two files?
I suspect the surveillance flags data is including all samples instead of just the samples we released after QC. |
Thanks @leehart. Yup, absolutely, these were staged directly without the QC filtering. This wraps up with other bits that it would be good to address to move forward, I've opened https://github.com/malariagen/vector-ops/issues/2485 -- should we tackle this over there? |
Re: issue #716