Skip to content

Specifing output formats on stdout #2676

@jstorrs

Description

@jstorrs

I have a bunch of shell scripts that operate on tsv files using cut, paste, comm, some custom python filters, awk etc and I'm looking to add a new feature today and I first saw tsv-tools and an issue there mentioned qsv. qsv will solve what I'm trying to do pretty easily so I'm going to try adding it to my tool belt.

But I had some challenges when trying to use it in pipes. This is challenging because qsv seems to always output csv and also doesn't seem to have uniform format specification across the commands. My data has commas in fields but never tabs so using tab separators has worked well for my bash-scriptery.

The qsv fmt command provides -t to set the output format but -t doesn't seem to be available when using other tools for example qsv select. qsv select can toggle output formats when using -o file.ext based on extensions but that doesn't help in a pipeline (you can't add extensions to fds etc) or if you want to override the default behavior. Frankly, I expected the input/output format selection to be universal across the commands and it's a little surprising that it is not.

Another surprise was that qsv didn't just apply the input separator as output separator by default. i.e. setting -d (or detecting it from the extension of the input filename) should also set the output format. This is the way the -d parameter in cut, etc behave.

Or loading a tsv file should result in tsv output. If there are multiple input files in different formats I'm not sure the best behavior but it seems like I would expect to have to specify -t in that case.

Describe the solutions you'd like

  1. qsv -d [sep] should also set the output separator unless output format has been specified separately (i.e. by -o filename.ext detection or expicitly by -t [sep]
  2. qsv -d [sep] -t [sep] should be generally available in all sub commands with identical semantics (new to qsv so unsure whether this makes sense for all of the commands)
  3. qsv -d [sep] should accept the known file extensions as a short hand for readability (i.e. something like
    ... | qsv -d csv -t tsv select Column1,Column7,Column3 | ...
    might be more legible than
    ... | qsv -d , -t $'\t' select Column1,Column7,Column3 | ...

Proposed priority for selecting output format (first wins):

  1. Explicitly set -t
  2. Successful -o extension detection
  3. Explicitly set -d
  4. Successful detection of first input file format
  5. csv default

Describe alternatives you've considered

I can do these sorts of things everywhere:

qsv select -d .... | qsv fmt -d $'\t'  -t $'\t' | ...

Personal opinion: having to do that feels disappointing and unsatisfying.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions