a smaller test dataset

Our current test dataset comprises all of chr1 in two different samples: the Jurkat sample and the MOLT4 cell line. It takes about an hour to run the entire pipeline with this dataset.

Ideally, we would have a dataset that runs in under 10 mins or so. This could then be incorporated into a Github CI pipeline that runs automatically upon release of each major and minor version increment, so that we can know when a change that we've made to the code leads to a change in the results.

- [x] find SNVs and indels supported by all callers
- [x] choose just one or two peaks that overlap those variants from each of the two samples
- [x] subset the example dataset to reads that only overlap those peaks
- [x] also try to subset the reference genome that is packaged with the example data, since the ref genome appears to be the largest file, right now
- [x] rerun the pipeline with the smaller dataset and tweak the dataset as necessary to make it run quickly
- [ ] use `snakemake --generate-unit-tests` to create a bunch of tests that can be executed using `pytest`
    - I'm running into issues with this. It doesn't work for outputs marked as `pipe` and there are some problems with other directories (see snakemake/snakemake#1104)
    - [ ] fix issues and ensure test coverage is appropriate
    - [ ] remove any unnecessary tests to ensure the test directory is small and can be properly included in version history (_edit_: this won't be possible, after all - b/c the test directory has to include the outputs of each rule ugh)
- [ ] (optionally) create a Github action like [this one](https://github.com/snakemake/snakemake-github-action) to execute `pytest` upon each major or minor version increment and confirm the tests pass successfully

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

a smaller test dataset #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

a smaller test dataset #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions