Skip to content

Latest commit

 

History

History
73 lines (49 loc) · 1.92 KB

0192._Word_Frequency.md

File metadata and controls

73 lines (49 loc) · 1.92 KB

192. Word Frequency

难度: Medium

刷题内容

原题连接

内容描述

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

words.txt contains only lowercase characters and space ' ' characters.
Each word must consist of lowercase characters only.
Words are separated by one or more whitespace characters.
Example:

Assume that words.txt has the following content:

the day is sunny the the
the sunny is is
Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1
Note:

Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
Could you write it in one-line using Unix pipes?

解题方案

思路 1

参考fabrizio3

cat word.txt :
output the text in the file

tr '\n' ' ' :
substitute endlines with single space

sed "s/\s\s/ /g"* :
substitute multiple spaces with single space

awk -v RS=' ' '{print $0}' :
output one word per line by changing Record Separator in AWK to single space. $0 is the entire record (1 word).

sort :
sort alphabetically the list of words (with repetitions) to prepare it for uniq command.

uniq -c :
print the list of unique words with their count. Before uniq you need to sort the list of words.

sort -nr -k1 :
sort the list of unique words by their count (-nr numerical reverse sorting) (-k1 sort by the first field that is the count of repetitions for the current word)

awk '{print $2" "$1}' :
for each line print before the second field $2, that is the word, and then the first field that is the count of repetitions for the word.
cat words.txt | tr '\n' ' ' | sed "s/\s\s*/ /g" | awk -v RS=' ' '{print $0}' | sort | uniq -c | sort -nr -k1 | awk '{print $2" "$1}'