|
1 | 1 | # CS Word Cloud |
2 | 2 |
|
3 | | -TODO: Explain methods |
| 3 | +# Requirements |
4 | 4 |
|
5 | | -TLDR: All this word to make this word cloud: |
| 5 | +* Python 3.x |
| 6 | +* numpy, wordcloud, and any other misc pip/conda packages |
| 7 | +* Golang |
| 8 | +* GNU make |
| 9 | +* GNU coreutils |
| 10 | +* Bash or Zsh |
| 11 | + |
| 12 | +# How To Build |
| 13 | + |
| 14 | +0. Install the prerequesites |
| 15 | +1. Setup your environment by setting the variables in `Makefile`. The main variables to set are `APIKEY`, `INITMATCH`, and `MATCH_COUNT` |
| 16 | + |
| 17 | +For example, before: |
| 18 | +``` |
| 19 | +APIKEY?=TODO # Add server api key here from https://developers.faceit.com/ |
| 20 | +INITMATCH?=TODO # Add any recent faceit match ID here |
| 21 | +MATCH_COUNT=1000 # Number of demos to download |
| 22 | +SHELL=/bin/bash # Need this just so I can use pipefail :/ |
| 23 | +``` |
| 24 | + |
| 25 | +After: |
| 26 | +``` |
| 27 | +APIKEY?=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx # Add server api key here from https://developers.faceit.com/ |
| 28 | +INITMATCH?=1-a993a412-8987-4d11-a682-dbe2fae3a761 # Add any recent faceit match ID here |
| 29 | +MATCH_COUNT=5 # Let's only do 5 demos for a short test |
| 30 | +SHELL=/bin/zsh # Say I have a macbook lets use zsh |
| 31 | +``` |
| 32 | + |
| 33 | +2. Run `make all` |
| 34 | +3. Get a cool word cloud like this: |
6 | 35 |
|
7 | 36 |  |
| 37 | + |
| 38 | +# How It Works |
| 39 | + |
| 40 | +## Step 1: Traverse through FACEIT API for some matches |
| 41 | + |
| 42 | +The basic traversal goes like this: given some initial match id, choose a random player in that match. Then choose a random match in their recent match history. And so on. This gave me a decently "random" sample of demos from a variety of regions and skill levels. |
| 43 | + |
| 44 | +## Step 2: Download the demos |
| 45 | + |
| 46 | +I'm sure their are other ways to do this. However, I was able to download all 1000 demos in my dataset through these 3 URLS in `cdns.txt`: |
| 47 | + |
| 48 | +``` |
| 49 | +https://storage.googleapis.com/demos-us-central1.faceit-cdn.net |
| 50 | +https://storage.googleapis.com/demos-europe-west1.faceit-cdn.net |
| 51 | +https://storage.googleapis.com/demos-europe-west2.faceit-cdn.net |
| 52 | +https://storage.googleapis.com/demos-asia-southeast1.faceit-cdn.net |
| 53 | +``` |
| 54 | + |
| 55 | +The `download.sh` script already handles the demo request automatically given `cdns.txt` is there. |
| 56 | + |
| 57 | +## Step 3: Parse the words |
| 58 | + |
| 59 | +All I had to do was write a small method in Go using the API provided from https://github.com/markus-wa/demoinfocs-golang. It dumps all the chat text to `stdout`. Then I just `cat` them together for the word cloud generator. |
| 60 | + |
| 61 | +## Step 4: Generate the word cloud |
| 62 | + |
| 63 | +I mainly followed this example here https://github.com/amueller/word_cloud/blob/main/examples/masked.py. I made my own stencil with GIMP and played around with the parameters. |
| 64 | + |
| 65 | +# Using this work to download large collections of demos |
| 66 | + |
| 67 | +The scripts in this repo may be of interest for those doing data science / statistics on CSGO games on the general population. Just use `scrapeGames.py` and `download.sh` scripts, and you should be able to get pretty large datasets in no time. I was able to get 1000 demos using 150GB and only a handful of hours. |
0 commit comments