-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hi, I'm trying to run a toy example with GEPSi, but am running into an error at the first step, to convert my genotypes to h5 format the way GEPSi expects. Here is my toy vcf generated from an msprime simulation:
##fileformat=VCFv4.2
##source=tskit 0.4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=10000>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tsk_0indv tsk_1indv tsk_2indv tsk_3indv tsk_4indv tsk_5indv tsk_6indv tsk_7indv tsk_8indv tsk_9indv
1 409 . T A . PASS . GT 0|1 0|0 0|1 0|0 0|0 0|0 0|0 0|0 1|0 0|0
1 752 . C T . PASS . GT 0|0 1|1 0|0 1|1 1|1 1|0 1|1 1|0 0|0 0|1
1 1043 . C A . PASS . GT 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
1 1207 . A C . PASS . GT 0|0 1|0 0|0 0|1 0|0 1|0 0|0 0|0 0|0 0|0
1 1952 . C A . PASS . GT 1|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0|0
1 2521 . T C . PASS . GT 0|0 0|0 0|0 0|1 0|0 1|0 0|0 0|0 0|0 0|0
1 2542 . C T . PASS . GT 1|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0|0
1 2707 . C T . PASS . GT 0|0 0|1 0|0 1|0 1|1 0|0 1|1 1|0 0|0 0|1
1 4352 . C T . PASS . GT 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|1 0|1 1|0
1 5416 . T A . PASS . GT 0|0 1|1 0|0 1|1 1|1 1|0 1|1 1|0 0|0 0|1
1 6656 . A G . PASS . GT 1|1 0|0 1|1 0|0 0|0 0|1 0|0 0|1 1|1 1|0
1 7850 . T A . PASS . GT 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|1 0|1 1|0
1 9263 . A T . PASS . GT 0|0 1|1 0|0 1|1 1|1 1|0 1|1 1|0 0|0 0|1
I converted this vcf to .raw and .bim files using both, plink1.9 and plink2 with these commands:
plink --vcf output.vcf --recode A --out output
plink --vcf output.vcf --make-just-bim --out output
I try to convert this to the h5/matrix_header format using the following command:
gepsi genotype --data_path ./ --matrix_name output.raw --snplist_name output.bim --ignore_gene_map --features "NONE" --annotation_name NONE
However, I'm getting errors at this step. They are different errors for plink1.9 and plink2.
The error while trying to convert raw/bim files generated from plink1.9 is
OverflowError: can't convert negative value to hsize_t
Exception ignored in: 'tables.utilsextension.malloc_dims'
Traceback (most recent call last):
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 250, in _g_create_common
self._v_objectid = self._create_carray(self._v_new_title)
OverflowError: can't convert negative value to hsize_t
OverflowError: can't convert negative value to hsize_t
Exception ignored in: 'tables.utilsextension.malloc_dims'
Traceback (most recent call last):
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 250, in _g_create_common
self._v_objectid = self._create_carray(self._v_new_title)
OverflowError: can't convert negative value to hsize_t
Traceback (most recent call last):
File "/home/sbelsare/software/anaconda3/envs/gepsi/bin/gepsi", line 8, in <module>
sys.exit(main())
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/main.py", line 22, in main
data = genotype_sim.simulate_genotype()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 36, in simulate_genotype
return self.generate_genotype_file()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 87, in generate_genotype_file
sf, risk_alleles = self.script_result_clean()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 136, in script_result_clean
array_c = f.create_earray(f.root, 'data', tables.IntCol(), (0,num_feat), expectedrows=self.num_patients,filters=tables.Filters(complib='zlib', complevel=1))
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/file.py", line 1380, in create_earray
ptobj = EArray(parentnode, name,
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 158, in __init__
super(EArray, self).__init__(parentnode, name, atom, shape, title,
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 219, in __init__
super(Array, self).__init__(parentnode, name, new, filters,
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/leaf.py", line 286, in __init__
super(Leaf, self).__init__(parentnode, name, _log)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/node.py", line 264, in __init__
self._v_objectid = self._g_create()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 182, in _g_create
return self._g_create_common(self._v_expectedrows)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 250, in _g_create_common
self._v_objectid = self._create_carray(self._v_new_title)
File "tables/hdf5extension.pyx", line 1366, in tables.hdf5extension.Array._create_carray
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "H5S.c", line 1507, in H5Screate_simple
invalid dataspace information
End of HDF5 error back trace
Problems creating the EArray.
Closing remaining open files:./genotype_100k_NONE.h5...done
and the error while trying to convert raw/bim files generated from plink2 is:
Traceback (most recent call last):
File "/home/sbelsare/software/anaconda3/envs/gepsi/bin/gepsi", line 8, in <module>
sys.exit(main())
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/main.py", line 22, in main
data = genotype_sim.simulate_genotype()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 36, in simulate_genotype
return self.generate_genotype_file()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 87, in generate_genotype_file
sf, risk_alleles = self.script_result_clean()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 139, in script_result_clean
self.h5_append(df,f)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 183, in h5_append
f.root.data.append(a)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 219, in append
self._check_shape_append(nparr)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 196, in _check_shape_append
raise ValueError(("the shapes of the appended object and the "
ValueError: the shapes of the appended object and the ``/data`` EArray differ in non-enlargeable dimension 1
Closing remaining open files:./genotype_100k_NONE.h5...done
Which version of plink should I use, and how should I fix the corresponding error to get the h5/matrix_header files?
Thank you!