- Until
v1.0.3, pairtools sort allows the header line to list column names chr1 and chr2 (as indicated in official 4DN specs).
- Starting with
v1.1.0, pairtools sort now expects the header line indicating column names to list chrom1 and chrom2, and breaks if the header line is #columns: readID chr1 pos1 chr2 pos2 strand1 strand2.
- It also seem to require
pair_type to be present in the #columns in the header, as well as in a column.
I understand that the chr1/chr2 can be circumvented by specifying -c1 and -c2 fields in CLI, but now if a pair_type column is not included, pairtools sort cannot work. Is this an intended behavior? Sorry if I missed something or if this issue has already been raised.
Reproducible example
- Here is an unsorted pairs file I created by hand, with
chr1/chr2 in header:
echo -e "## pairs format v1.0
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp.pairs
This works
pip install pairtools==1.0.3
pairtools sort tmp.pairs
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
NS500150:497:HWH2WBGXC:4:23605:21900:3336 NODE_1404 461 NODE_1404 246 --
NS500150:497:HWH2WBGXC:4:23606:10802:17906 NODE_1404 1441 NODE_1814 4433 --
NS500150:497:HWH2WBGXC:4:23603:4102:4882 NODE_522 6855 NODE_1404 1035 --
This fails:
pip install pairtools==1.1.1 ## pairtools 1.1.0 errors with `circular import`
pairtools sort tmp.pairs
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
sys.exit(cli())
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
return func(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
sort_py(
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'chrom1' is not in list
- Now, changing the
chr1/chr2 to chrom1/chrom2 in the header:
echo -e "## pairs format v1.0
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp2.pairs
This works:
pip install pairtools==1.0.3
pairtools sort tmp2.pairs
This fails:
pip install pairtools==1.1.1
pairtools sort tmp2.pairs
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
sys.exit(cli())
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
return func(*args, **kwargs)
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
sort_py(
File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'pair_type' is not in list
- Now, adding
pair_type:
echo -e "## pairs format v1.0
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp3.pairs
This works:
pip install pairtools==1.0.3
pairtools sort tmp3.pairs
This works:
pip install pairtools==1.1.1
pairtools sort tmp3.pairs
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type
NS500150:497:HWH2WBGXC:4:23605:21900:3336 NODE_1404 461 NODE_1404 246 --
NS500150:497:HWH2WBGXC:4:23606:10802:17906 NODE_1404 1441 NODE_1814 4433 --
NS500150:497:HWH2WBGXC:4:23603:4102:4882 NODE_522 6855 NODE_1404 1035 --
v1.0.3,pairtools sortallows the header line to list column nameschr1andchr2(as indicated in official 4DN specs).v1.1.0,pairtools sortnow expects the header line indicating column names to listchrom1andchrom2, and breaks if the header line is#columns: readID chr1 pos1 chr2 pos2 strand1 strand2.pair_typeto be present in the#columnsin the header, as well as in a column.I understand that the
chr1/chr2can be circumvented by specifying-c1and-c2fields in CLI, but now if apair_typecolumn is not included,pairtools sortcannot work. Is this an intended behavior? Sorry if I missed something or if this issue has already been raised.Reproducible example
chr1/chr2in header:This works
This fails:
pip install pairtools==1.1.1 ## pairtools 1.1.0 errors with `circular import` pairtools sort tmp.pairschr1/chr2tochrom1/chrom2in the header:This works:
This fails:
pair_type:This works:
This works: