A ChRIS ds plugin for find-and-replace operations on text files using regular expressions.
singularity exec docker://docker.io/fnndsc/pl-re-sub:latest resub \
--expression ... --replacement ... \
--inputPathFilter ... incoming/ outgoing/Files inside incoming/ matching the glob given by --inputPathFilder
are processed and written to outgoing/.
A string representing a regular expression with matching groups to search for.
Uses Python re syntax.
A string which may include matching groups which should be used to replace
occurrences of what is matched by the value given to --expression.
Multiple operations can be chained one after the other using the --ifs option.
The value passed to --ifs (default ||) is a delimiter between multiple regexes.
For example, if you wanted to first replace all B with A and then do a second
pass through the line replacing all :( with :):
singularity exec docker://docker.io/fnndsc/pl-re-sub:latest resub \
--expression 'B :\(' --replacement 'A :\)' --ifs ' ' \
--inputPathFilter 'report_card.txt' incoming/ outgoing/Chained operation support can be disabled by passing --ifs ''.
For every *.csv files in the directory incoming/,
change dates from MM/DD/YYYY format to YYYY.MM.DD format,
and saving the results into a new directory outgoing/:
# set up date to be found in the incoming/ directory
mkdir incoming/ outgoing/
mv ./data.csv incoming/data.csv
# convert date formats in all files and write results into outgoing/
singularity exec docker://docker.io/fnndsc/pl-re-sub:latest resub \
--expression '(\d\d)/(\d\d)/(\d\d\d\d)' \
--replacement '\3.\1.\2' \
--inputPathFilter '*.csv' incoming/ outgoing/Processing is serial using a single-thread.
When you have a large number of files, you can do parallel processing
using external tools such as ChRIS, sbatch, or GNU parallel.
Examples: TODO
-
evalsupport for dynamic replacement text generation