- Replace Windows line termination from Raw Redis output with Unix EOL
$ awk -f scripts/win2unixeol.awk /path/to/raw/redis.dump > /path/to/unix'd/eol/output.dump- Strip out all non-JSON valid lines (we don't try to fix partial transciption errors)
$ ruby scripts/strip_blanks.rb /path/to/unix'd/eol.output /path/to/valid/tweet/per/line.txt- Create valid Tre-View capable JSON
$ ruby scripts/tre_view_raw_tweets.rb /path/to/valid/tweet/per/line.output /path/to/tre_viewed/tweets.json- (Optional, tar it up)
$ tar -jvcf [search_term].tar.bz2 /path/to/valid/tweet/per/line.txt /path/to/tre_viewed/tweets.jsonOnce step #2 is completed, the data is technically ready for a full batch load - each line is a valid Tweet encoded via JSON. Step 3 is simply for tre-viewed purposes, and is completely optional. Step 4 is also optional, but recommended do to overall size.