This repository hosts the data used in my paper, in lieu of an appendix. Here, I will describe the contents of each file and what model they were used is.
- acceptability_master_PWN includes every cluster, a column designating with an 'x' if a cluster does NOT occur in the PWN dictionary (or a note if there is very limited use), and then the acceptability scores from the survey. Note that, for the survey, 0 means that a cluster was unacceptable for its given ending (endings have been removed).
- pol_bigrams is a collation of all bigrams in acceptability_master_PWN, appended with a boundary symbol # at the beginning and end of each cluster.
- illegal_clusters, which lists the average acceptability score for all the clusters in acceptability_master_PWN, as well as counting the number of illegal bigrams in each cluster.
- The folder MaxEntInput has files used to run with the Maximum Entropy learner - further details in that folder.