Specifing output formats on stdout #2682

jstorrs · 2025-04-10T18:13:35Z

jstorrs
Apr 10, 2025

I have a bunch of shell scripts that operate on tsv files using cut, paste, comm, some custom python filters, awk etc and I'm looking to add a new feature today and I first saw tsv-tools and an issue there mentioned qsv. qsv will solve what I'm trying to do pretty easily so I'm going to try adding it to my tool belt.

But I had some challenges when trying to use it in pipes. This is challenging because qsv seems to always output csv and also doesn't seem to have uniform format specification across the commands. My data has commas in fields but never tabs so using tab separators has worked well for my bash-scriptery.

The qsv fmt command provides -t to set the output format but -t doesn't seem to be available when using other tools for example qsv select. qsv select can toggle output formats when using -o file.ext based on extensions but that doesn't help in a pipeline (you can't add extensions to fds etc) or if you want to override the default behavior. Frankly, I expected the input/output format selection to be universal across the commands and it's a little surprising that it is not.

Another surprise was that qsv didn't just apply the input separator as output separator by default. i.e. setting -d (or detecting it from the extension of the input filename) should also set the output format. This is the way the -d parameter in cut, etc behave.

Or loading a tsv file should result in tsv output. If there are multiple input files in different formats I'm not sure the best behavior but it seems like I would expect to have to specify -t in that case.

Describe the solutions you'd like

qsv -d [sep] should also set the output separator unless output format has been specified separately (i.e. by -o filename.ext detection or expicitly by -t [sep]
qsv -d [sep] -t [sep] should be generally available in all sub commands with identical semantics (new to qsv so unsure whether this makes sense for all of the commands)
qsv -d [sep] should accept the known file extensions as a short hand for readability (i.e. something like
... | qsv -d csv -t tsv select Column1,Column7,Column3 | ...
might be more legible than
... | qsv -d , -t $'\t' select Column1,Column7,Column3 | ...

Proposed priority for selecting output format (first wins):

Explicitly set -t
Successful -o extension detection
Explicitly set -d
Successful detection of first input file format
csv default

Describe alternatives you've considered

I can do these sorts of things everywhere:

qsv select -d .... | qsv fmt -d $'\t'  -t $'\t' | ...

Personal opinion: having to do that feels disappointing and unsatisfying.

Answered by jqnatividad

Apr 10, 2025

@jstorrs , have you tried setting QSV_DEFAULT_DELIMITER?

https://github.com/dathere/qsv/blob/master/docs/ENVIRONMENT_VARIABLES.md

If that works, you can make it the default setting in your qsv installation by setting the dotenv file.

https://github.com/dathere/qsv/blob/master/docs/ENVIRONMENT_VARIABLES.md#env-file-support

As with most things open-source, the defaults of qsv follow the reqts/scratches we're itching. We get data in all shapes, sizes and quality - and qsv's output defaults to the CSV dialect we use in our pipelines.

I'm actually weighing changing the default delimiter to \t (tab) for our pipelines, so do let me know if QSV_DEFAULT_DELIMITER does the trick for you.

View full answer

jqnatividad · 2025-04-10T19:14:58Z

jqnatividad
Apr 10, 2025
Maintainer

@jstorrs , have you tried setting QSV_DEFAULT_DELIMITER?

https://github.com/dathere/qsv/blob/master/docs/ENVIRONMENT_VARIABLES.md

If that works, you can make it the default setting in your qsv installation by setting the dotenv file.

https://github.com/dathere/qsv/blob/master/docs/ENVIRONMENT_VARIABLES.md#env-file-support

As with most things open-source, the defaults of qsv follow the reqts/scratches we're itching. We get data in all shapes, sizes and quality - and qsv's output defaults to the CSV dialect we use in our pipelines.

I'm actually weighing changing the default delimiter to \t (tab) for our pipelines, so do let me know if QSV_DEFAULT_DELIMITER does the trick for you.

0 replies

jstorrs · 2025-04-11T20:55:21Z

jstorrs
Apr 11, 2025
Author

Thank you! Setting that environment variable seems to work for me and it's easy enough to add it to a script.

I don't really like the global envrc ideas because it makes the command behavior depend on system configuration but that's admittedly nothing more than a preference. It seems likely to lead to more complexity when trying to debug problems. For me it's easy enough to rely on export QSV_DEFAULT_DELIMITER='\t' in scripts that use qsv.

0 replies

ondohotola · 2025-04-11T21:00:05Z

ondohotola
Apr 11, 2025

But, you can easily write the env file in the working directory from your script at runtime and remove it afterwards.

0 replies

jstorrs · 2025-04-11T21:15:53Z

jstorrs
Apr 11, 2025
Author

I know you can do that. To me .env files are about storing paths and parameters or secrets. And it's probably fine for a script that's being built ad-hoc for use in a single project. But when you get to the point of having to generalize that script for use in multiple projects, then relying on having to setup .env files and audit them is just going to be a headache. The suggestions I made in the initial comment are just to document things that I find strange about the cli interface. I'm perfectly fine throwing environment variables into scripts. The whole .env file thing is just one way to manage environment variables.

The reason I provided suggestions in the original comment is because the goals in README include

qsv is designed to be composable, with a focus on interoperability with other common CLI tools like 'awk', 'xargs', 'ripgrep', 'sed', etc.

and I find it to be lacking in that regard. But that's just my opinion and I'm just providing feedback.

0 replies

ondohotola · 2025-04-11T21:28:40Z

ondohotola
Apr 11, 2025

I like the env file written by my scripts (and not removed) so I can manually test steps in the script, without having to remember to set environment variables. But then there are different ways of achieving same goal.

0 replies

jstorrs · 2025-04-11T21:41:08Z

jstorrs
Apr 11, 2025
Author

I not interested in being lectured to about using .env files and frankly I'm now re-evaluating adopting qsv vs installing a D compiler and using tsv-utils. Thank you.

0 replies

ondohotola · 2025-04-12T05:39:53Z

ondohotola
Apr 12, 2025

Wow!!

0 replies

jstorrs · 2025-04-12T11:37:40Z

jstorrs
Apr 12, 2025
Author

Advocating for one ugly hack vs another for working around limitations of the qsv cli doesn't make the failings of the qsv cli disappear.

And frankly, I find it odd that qsv even reads .env files directly.

There seems to be some confusion about the purpose of .env files vs configuration files and it's beyond the scope of the issue I opened. For configuration files it would be better for qsv to use XDG_CONFIG_HOME and XDG_CONFIG_DIRS etc (from the XDG Base Directory Specification).

0 replies

jqnatividad · 2025-04-12T12:58:01Z

jqnatividad
Apr 12, 2025
Maintainer

Thanks @jstorrs for the feedback.

I'm glad to know the QSV_DEFAULT_DELIMITER env var is working out of you.

The choice of .env files for persisting configuration was really not deeply considered when I introduced it. We were using .env files for another project where we embed qsv and it was convenient for me to use it at the time.

Will revisit using XDG configuration files in the future.

Converting this issue to a discussion so other folks with a similar issue can easily find it.

0 replies

jstorrs · 2025-04-12T13:21:12Z

jstorrs
Apr 12, 2025
Author

I don't mind that .env file functionality exists, but it would be better if searching of .env files were optional and needed to be explicitly enabled when desired.

At the very least there should be some way to turn .env file use off completely if we want to ensure behavior. Looking at utils.rs it looks like an option is export QSV_DOTENV_PATH=/dev/null (at least on POSIX systems).

5 replies

jqnatividad Apr 12, 2025
Maintainer

@jstorrs I'll adjust QSV_DOTENV_PATH so if you specify a sentinel value "<NONE>" it will disable .env support altogether.

WDYT?

jstorrs Apr 12, 2025
Author

"<NONE>" would be great!

jqnatividad Apr 12, 2025
Maintainer

Implemented with #2684

jstorrs Apr 14, 2025
Author

Another idea is that this project seems to be very comfortable with different flavors of executable with different names qsm vs qsmlite vs qsmdv etc. Maybe worth considering about a very stripped down version that has no network use, more rigid behavior out of the box and is easier to evaluate for use in a high-security environment. For example suppose you're working at a hospital or bank and want to make sure nothing is sent externally nor sucked in from outside. Things like cut and paste don't reach out and are easy to think about. That version could have different defaults about .env files and be far more rigid in its behavior so that people more accustomed to the more friendly adhoc behavior of qsv aren't affected.

jqnatividad Apr 14, 2025
Maintainer

That's a good idea @jstorrs !

Can you create a new issue with the enhancement request?

Specifing output formats on stdout #2682

Uh oh!

jstorrs Apr 10, 2025

Replies: 10 comments · 5 replies

Uh oh!

Uh oh!

jqnatividad Apr 10, 2025 Maintainer

Uh oh!

jstorrs Apr 11, 2025 Author

Uh oh!

ondohotola Apr 11, 2025

Uh oh!

Uh oh!

jstorrs Apr 11, 2025 Author

Uh oh!

ondohotola Apr 11, 2025

Uh oh!

jstorrs Apr 11, 2025 Author

Uh oh!

Uh oh!

ondohotola Apr 12, 2025

Uh oh!

Uh oh!

jstorrs Apr 12, 2025 Author

Uh oh!

jqnatividad Apr 12, 2025 Maintainer

Uh oh!

jstorrs Apr 12, 2025 Author

Uh oh!

Uh oh!

jqnatividad Apr 12, 2025 Maintainer

Uh oh!

Uh oh!

jstorrs Apr 12, 2025 Author

Uh oh!

jqnatividad Apr 12, 2025 Maintainer

Uh oh!

Uh oh!

jstorrs Apr 14, 2025 Author

Uh oh!

jqnatividad Apr 14, 2025 Maintainer

jstorrs
Apr 10, 2025

Replies: 10 comments 5 replies

jqnatividad
Apr 10, 2025
Maintainer

jstorrs
Apr 11, 2025
Author

ondohotola
Apr 11, 2025

jstorrs
Apr 11, 2025
Author

ondohotola
Apr 11, 2025

jstorrs
Apr 11, 2025
Author

ondohotola
Apr 12, 2025

jstorrs
Apr 12, 2025
Author

jqnatividad
Apr 12, 2025
Maintainer

jstorrs
Apr 12, 2025
Author

jqnatividad Apr 12, 2025
Maintainer

jstorrs Apr 12, 2025
Author

jqnatividad Apr 12, 2025
Maintainer

jstorrs Apr 14, 2025
Author

jqnatividad Apr 14, 2025
Maintainer