Using TrajGWAS for large-scale datasets: how to improve performance?

I would like to run TrajGWAS on some large-scale longitudinal phenotypes. Specifically, I have ``100,000`` observations, ``48`` covariates (+ intercept), and ``100`` phenotypes. I would like to get effect size estimates as well (so running a Wald test)

As an example, I ran ``TrajGWAS`` for one phenotype. I start ``Julia`` with: ``julia --threads 64`` and then do:

```
trajGWAS(@formula(y ~ 1 + X1 + X2 + ... + X48),
@formula(y ~ 1),
@formula(y ~ 1 + X1 + X2 + ... + X48),
:id,
path_to_csv_file,
path_to_plink_file,
pvalfile = p_output_name,
nullfile = null_output_name,
covrowinds = covrowmask,
genetic rowinds = geneticrowmask,
parallel = :true,
test = :wald)
```

I am doing this as a slurm job with ``--cpus-per-task=64`` and ``mem-per-cpu=7G`` specifications. Julia version: ``1.10.0``

However, after about 22 hours, only about 700 SNPs have been written to the output file. This is quite a bit slow and I wonder if there are any suggestions on how to make this efficient? Perhaps I am not specifying parallelisation correctly?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using TrajGWAS for large-scale datasets: how to improve performance? #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using TrajGWAS for large-scale datasets: how to improve performance? #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions