├── README.md
├── run.sh
├── src
│ ├── average_degree.py
│ └── tweets_cleaned.py
├── tweet_input
│ └── tweets.txt
└── tweet_output
├── ft1.txt
└── ft2.txt
import jsonfor parsing jsonimport timefor parsing the timestampfrom datetime import datetimefor calculating the time difference between 2 timestamps
- Run the
run.shscript with./run.sh(may need to runchmod +x run.sh) - Results will appear in
tweet_outputdirectory ft1.txtis part 1, cleaned tweetsft2.txtis part 2, average degree calculations for the tweets
| Arguments | Description |
|---|---|
input file |
The location of the input file. |
output file |
The location of the output file. |
- Part 1:
python tweets_cleaned.py <input file> <output file> - Part 2:
python average_degree.py <input file> <output file>
python src/tweets_cleaned.py tweet_input/tweets.txt tweet_output/ft1.txt
- Go through all the tweets in
tweet_input/tweets.txt - parse the json for
textandcreated_at - clean the unicode and remove
\nand\r - for each tweet with a unicode, increment the counter
- write the cleaned tweet with timestamp in format
tweet (timestamp)\ninto filetweet_output/ft1.txt - at the end write
X tweets contained unicode., whereXis the number of tweets that contains unicode
- Use the file
tweet_output/ft1.txtcreated from part 1 to calculate the average degree - Go through each tweet, and parse the hashtags and the timestamp
- insert into the
DegreeGraphClass, which handles the insertion of hashtags, evicting hashtags that are not in the 60 second window, and also does the average degree calculations for each tweet added - write the average degree for each tweet to the file
tweet_output/ft2.txt