Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

sfischer13/datautils

Repository files navigation

datautils logo

datautils

The best toolbox for processing textual data.

Release License Go Report Card


Contents

Introduction

The Data Utilities are a collection of handy text manipulation tools. These tools are supposed to make a data wrangler’s life on the command-line easier.

Much of the functionality can be solved with standard command-line tools (awk, sed, cut, sort, uniq, …), but that would often become tedious. Zealots of the Unix philosophy will probably not use these tools and create a set of sophisticated aliases instead.

On the other hand, some of the tools fix actual problems. The tools use UTF-8 by default. As a consequence, one does not have to deal with the quirks of sort and uniq w.r.t. non-ASCII input.

Installation

go get -v github.com/sfischer13/datautils/...

Tools

These tools are part of the collection:

  • count
  • norm
  • rows
  • text
  • trim

Usage

count

$ echo "a\na\na\nb\nb\nc"
a
a
a
b
b
c
$ echo "a\na\na\nb\nb\nc" | count --keys
3	a
2	b
1	c
$ echo "a\na\na\nb\nb\nc" | count --counts
1	c
2	b
3	a
$ echo "a\na\na\nb\nb\nc" | count --flip
a	3
b	2
c	1
$ echo "a\na\na\nb\nb\nc" | count --threshold 2
3	a
2	b

norm

$ echo "¹²³" | norm --nfc
¹²³
$ echo "¹²³" | norm --nfkc
123

rows

echo "a\nb\nc\nd\ne" | rows --rows 2:4
b
c
d
echo "a\nb\nc\nd\ne" | rows --rows 1,5
a
e

text

$ echo abca | text chars
a
b
c
a
$ echo "This is a test." | text words
This
is
a
test.

trim

$ echo "   abc" | trim --left
abc

Credits

This project is authored and maintained by Stefan Fischer.
The source code is available under the MIT License.
See LICENSE for further details.

Releases

No releases published

Packages

No packages published