I have a bunch of shell scripts that operate on tsv files using cut, paste, comm, some custom python filters, awk etc and I'm looking to add a new feature today and I first saw tsv-tools and an issue there mentioned qsv. qsv will solve what I'm trying to do pretty easily so I'm going to try adding it to my tool belt.
But I had some challenges when trying to use it in pipes. This is challenging because qsv seems to always output csv and also doesn't seem to have uniform format specification across the commands. My data has commas in fields but never tabs so using tab separators has worked well for my bash-scriptery.
The qsv fmt command provides -t to set the output format but -t doesn't seem to be available when using other tools for example qsv select. qsv select can toggle output formats when using -o file.ext based on extensions but that doesn't help in a pipeline (you can't add extensions to fds etc) or if you want to override the default behavior. Frankly, I expected the input/output format selection to be universal across the commands and it's a little surprising that it is not.
Another surprise was that qsv didn't just apply the input separator as output separator by default. i.e. setting -d (or detecting it from the extension of the input filename) should also set the output format. This is the way the -d parameter in cut, etc behave.
Or loading a tsv file should result in tsv output. If there are multiple input files in different formats I'm not sure the best behavior but it seems like I would expect to have to specify -t in that case.
Describe the solutions you'd like
qsv -d [sep] should also set the output separator unless output format has been specified separately (i.e. by -o filename.ext detection or expicitly by -t [sep]
qsv -d [sep] -t [sep] should be generally available in all sub commands with identical semantics (new to qsv so unsure whether this makes sense for all of the commands)
qsv -d [sep] should accept the known file extensions as a short hand for readability (i.e. something like
... | qsv -d csv -t tsv select Column1,Column7,Column3 | ...
might be more legible than
... | qsv -d , -t $'\t' select Column1,Column7,Column3 | ...
Proposed priority for selecting output format (first wins):
- Explicitly set
-t
- Successful
-o extension detection
- Explicitly set
-d
- Successful detection of first input file format
- csv default
Describe alternatives you've considered
I can do these sorts of things everywhere:
qsv select -d .... | qsv fmt -d $'\t' -t $'\t' | ...
Personal opinion: having to do that feels disappointing and unsatisfying.
I have a bunch of shell scripts that operate on tsv files using
cut,paste,comm, some custom python filters,awketc and I'm looking to add a new feature today and I first sawtsv-toolsand an issue there mentionedqsv.qsvwill solve what I'm trying to do pretty easily so I'm going to try adding it to my tool belt.But I had some challenges when trying to use it in pipes. This is challenging because
qsvseems to always output csv and also doesn't seem to have uniform format specification across the commands. My data has commas in fields but never tabs so using tab separators has worked well for my bash-scriptery.The
qsv fmtcommand provides-tto set the output format but-tdoesn't seem to be available when using other tools for exampleqsv select.qsv selectcan toggle output formats when using-o file.extbased on extensions but that doesn't help in a pipeline (you can't add extensions to fds etc) or if you want to override the default behavior. Frankly, I expected the input/output format selection to be universal across the commands and it's a little surprising that it is not.Another surprise was that
qsvdidn't just apply the input separator as output separator by default. i.e. setting-d(or detecting it from the extension of the input filename) should also set the output format. This is the way the-dparameter incut, etc behave.Or loading a tsv file should result in tsv output. If there are multiple input files in different formats I'm not sure the best behavior but it seems like I would expect to have to specify
-tin that case.Describe the solutions you'd like
qsv -d [sep]should also set the output separator unless output format has been specified separately (i.e. by-o filename.extdetection or expicitly by-t [sep]qsv -d [sep] -t [sep]should be generally available in all sub commands with identical semantics (new to qsv so unsure whether this makes sense for all of the commands)qsv -d [sep]should accept the known file extensions as a short hand for readability (i.e. something like... | qsv -d csv -t tsv select Column1,Column7,Column3 | ...might be more legible than
... | qsv -d , -t $'\t' select Column1,Column7,Column3 | ...Proposed priority for selecting output format (first wins):
-t-oextension detection-dDescribe alternatives you've considered
I can do these sorts of things everywhere:
Personal opinion: having to do that feels disappointing and unsatisfying.