-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
92 lines (58 loc) · 4.63 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Classifying turns by Boydstun-labeled topics - topicinfo.py, mallet.py
----------------------------------------------------------------------
Dependencies: Python 2.7, uses scipy
Mallet installed (http://mallet.cs.umass.edu/)
topicinfo.py builds feature and label files for mallet.py to use for classification, based on the Boydstun labeled topics. mallet.py is run after topicinfo.py, and prints out the results of classifying the output of topicinfo.py using Mallet. A sample run would look like this:
$ python topicinfo.py
$ python mallet.py
...
(mallet.py output)
...
mallet.py requires that Mallet be installed. At the top of mallet.py, there is a line:
self.mallet_directory = "../mallet-2.0.7/"
This should be changed to the directory of wherever Mallet is installed.
topicinfo.py requires the debates reactions and transcript corpora. It assumes that the path to the debate reactions corpus is "resources/data/reactions_oct3_4project.csv" and that the path to the transcript is "resources/corpora/oct3_coded_transcript_sync.csv".
In the main() method of topicinfo.py, there is a line:
labels = LABEL_by_spin_dodge(turn_info, reactions_Romney)
The method LABEL_by_spin_dodge creates labels based on the number of spin and dodge reactions. There are two other methods for labeling: LABEL_by_agreement(), which creates labels based on the number of reactions agreeing and disagreeing with the current speaker, and LABEL_by_count(), which creates labels based on the total number of reactions.
The second argument to the method, "reactions_Romney," is the set of reactions by users who support Romney. To change it to the set of users who support Obama, replace it with "reactions_Obama."
Classifying individual users' reactions - topicinfo2.py, mallet.py
----------------------------------------------------------------------
As before, topicinfo2.py generates files of features and labels for mallet.py to use, and assumes the same existence and location of the debates corpora. A sample run would look like this:
$ python topicinfo2.py
Skipped 2499 data points with missing information.
Got 190787 useable data points.
$ python mallet.py
...
(mallet output)
...
The output of mallet.py will be the results of 12-way classification, for each possible individual reaction.
Classifying turns by N-Gram features - baseline_ngrams.py
----------------------------------------------------------------------
Dependencies: Python 2.7, NumPy 1.6.1, Pandas 0.10.1, NLTK 2.0.4
This has been tested with Enthought Python Distribution (EPD) Free 7.3-2. Other dependencies may
exist that are not satisfied if a different Python distribution is used to run this software.
To run, do the following:
$ python baseline_ngrams.py <path to reactions CSV file> <path to coded transcript CSV file> <unigram/bigram>
Example:
$ python baseline_ngrams.py data/reactions_oct3_4project.csv corpora/oct3_coded_transcript_sync.csv unigram
The program will display the results (mean and std dev of accuracies) of the n-grams evaluation to the console.
Classifying turns by LDA Topic Features
------------------------------------------------
## Running (dependencies: numpy, scipy, sqlite3, Mallet)
Mallet can be obtained [here](http://mallet.cs.umass.edu/download.php).
The other dependencies can be installed with pip or by installing the [enthough python distribution](https://www.enthought.com/canopy-express/).
1. Edit input_example.json to that the filepaths for the October 3rd debate and presidential debate corpus are correct for your computer.
2. Preprocess the data:
`python format.py debates input_example.json`
3. Use Mallet to run LDA on the preprocessed data.
`<path/to>/bin/mallet import-dir debates/ --output topic-input.mallet --keep-sequence --remove-stopwords`
`<path/to>/bin/mallet train-topics --input topic-input.mallet --num-topics 19 --output-state topic-state.gz --output-doc-topics debates.doc.topics --output-topic-keys debate.keys --optimize-interval 10`
4. Edit database.py so that the variable REACTIONS_FNAME points to the file with the ReactLabs reactions on your computer.
5. Create the reactions database.
`python database.py create`
6. Generate the svmlite style inputs for Mallet.
`python svmlitegen.py`
7. To run mallet on the output of the last step, for each file generated by svmlitegen.py (task1obama.train, etc...):
`<path/to>/bin/mallet import-svmlite --input task1obama.train --output train.mallet`
`<path/to>/bin/mallet train-classifier --input train.mallet --output-classifier naivebayes.classifier --trainer DecisionTree --trainer NaiveBayes --trainer MaxEnt --training-portion 0.9 --num-trials 10 --cross-validation 10`