update readme

matth2k · matth2k · commit e1d15ead894d · 2023-05-17T21:02:12.000-04:00
diff --git a/README.md b/README.md
@@ -1,7 +1,67 @@
 # CS Word Cloud
 
-TODO: Explain methods
+# Requirements
 
-TLDR: All this word to make this word cloud:
+* Python 3.x
+* numpy, wordcloud, and any other misc pip/conda packages
+* Golang
+* GNU make
+* GNU coreutils
+* Bash or Zsh
+
+# How To Build
+
+0. Install the prerequesites
+1. Setup your environment by setting the variables in `Makefile`. The main variables to set are `APIKEY`, `INITMATCH`, and `MATCH_COUNT`
+
+For example, before:
+```
+APIKEY?=TODO # Add server api key here from https://developers.faceit.com/
+INITMATCH?=TODO # Add any recent faceit match ID here 
+MATCH_COUNT=1000 # Number of demos to download
+SHELL=/bin/bash # Need this just so I can use pipefail :/
+```
+
+After:
+```
+APIKEY?=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx # Add server api key here from https://developers.faceit.com/
+INITMATCH?=1-a993a412-8987-4d11-a682-dbe2fae3a761 # Add any recent faceit match ID here 
+MATCH_COUNT=5 # Let's only do 5 demos for a short test
+SHELL=/bin/zsh # Say I have a macbook lets use zsh
+```
+
+2. Run `make all`
+3. Get a cool word cloud like this:
 
 ![Drag Racing](rev0.png)
+
+# How It Works
+
+## Step 1: Traverse through FACEIT API for some matches
+
+The basic traversal goes like this: given some initial match id, choose a random player in that match. Then choose a random match in their recent match history. And so on. This gave me a decently "random" sample of demos from a variety of regions and skill levels.
+
+## Step 2: Download the demos
+
+I'm sure their are other ways to do this. However, I was able to download all 1000 demos in my dataset through these 3 URLS in `cdns.txt`:
+
+```
+https://storage.googleapis.com/demos-us-central1.faceit-cdn.net
+https://storage.googleapis.com/demos-europe-west1.faceit-cdn.net
+https://storage.googleapis.com/demos-europe-west2.faceit-cdn.net
+https://storage.googleapis.com/demos-asia-southeast1.faceit-cdn.net
+```
+
+The `download.sh` script already handles the demo request automatically given `cdns.txt` is there.
+
+## Step 3: Parse the words
+
+All I had to do was write a small method in Go using the API provided from https://github.com/markus-wa/demoinfocs-golang. It dumps all the chat text to `stdout`. Then I just `cat` them together for the word cloud generator.
+
+## Step 4: Generate the word cloud
+
+I mainly followed this example here https://github.com/amueller/word_cloud/blob/main/examples/masked.py. I made my own stencil with GIMP and played around with the parameters.
+
+# Using this work to download large collections of demos
+
+The scripts in this repo may be of interest for those doing data science / statistics on CSGO games on the general population. Just use `scrapeGames.py` and `download.sh` scripts, and you should be able to get pretty large datasets in no time. I was able to get 1000 demos using 150GB and only a handful of hours.