diff --git a/README.md b/README.md index 2a31949..0e91747 100644 --- a/README.md +++ b/README.md @@ -26,13 +26,7 @@ This repository contains [R](http://www.r-project.org/) code to build and plot 1 The data included in this repository are briefly presented in [this note](http://f.briatte.org/research/parlnet-note.pdf) and are fully documented in [this appendix](http://f.briatte.org/research/parlnet-appendix.pdf), which explains how they were collected and how the networks were constructed. -The repository contains two additional documentation files: - -- The [HOWTO](HOWTO.md) file contains detailed instructions on how to get and run the code. -- The [TODO](TODO.md) file contains a list of known limitations and possible improvements. -- The wiki contains [links](https://github.com/briatte/parlnet/wiki) to parliamentary open data services and third parties. - -The `README` file of each country-specific repository contains further details on code execution and data collection, as well as additional thanks to people who provided help. +The repository contains a [HOWTO](HOWTO.md) file with detailed instructions on how to get and run the code, and the `README` file of each country-specific repository contains further details on code execution and data collection, as well as additional thanks to people who provided help. Further ideas and links are available in the [repository wiki](https://github.com/briatte/parlnet/wiki). The raw data collected by [release 2.3](https://github.com/briatte/parlnet/releases) (October 2015) is available online as a 1.5 GB archive of HTML, JSON and XML files. It can be accessed at the following DOI handle: diff --git a/TODO.md b/TODO.md deleted file mode 100644 index 40c9229..0000000 --- a/TODO.md +++ /dev/null @@ -1,23 +0,0 @@ -The code prepared for this project has gone through several rounds of debugging and updates, but the scrapers still require some knowledge of R and some attention to country-specific particularities in order to run properly. The notes below suggest how the code might be improved, in big or in small ways. - -# IMPROVEMENTS - -1. __Most calls to `read.csv` and `write.csv` might be handled through the [`readr`](https://cran.r-project.org/web/packages/readr/) package__, in order to speed up input/output during code execution. The current version of `readr`, however, can garble dates, and row names (which are used during network construction) are explicitly forbidden by the `data_frame` class. -2. __Most calls to the `XML` package might be handled through (or rewritten for) the [`rvest`](https://cran.r-project.org/web/packages/rvest/) package__, in order to make the code easier to read. In my experience, this is possible but highly tricky to do. -3. __The networks could be first built as bill-sponsor bipartite graphs__, and then collapsed to one-mode cosponsorship networks. This would require using [sparse matrixes](http://solomonmessing.wordpress.com/2012/09/30/working-with-bipartiteaffiliation-network-data-in-r/) and only slightly different visualisation code. The bills, however, have very little attributes of their own, because very few chambers provide legislative keywords and/or outcomes. -4. __The code could be greatly accelerated by switching to Python or Ruby scrapers and SQL databases__, although that would obviously require starting again from scratch. Similarly, adjacency matrixes are probably faster than edge lists as network constructors, but are less practical for inspection and debugging purposes. -5. __Some aspects of the code could be further standardised__, in particular: the organisation of the `raw` data folders, links to constituencies, sponsor profiles and photos, and network attributes that describe the country, chamber and legislature. These are all neatly organised in the current code, but not perfectly standardised. -6. __Additional official open data portals could be put to use__ to retrieve bill or sponsor details. These portals are already used in the code for the [French upper chamber](https://github.com/briatte/parlement), [Norway](https://github.com/briatte/stortinget), [Sweden](https://github.com/briatte/riksdag) and [Switzerland](https://github.com/briatte/swparl), but similar services for [Austria](https://www.data.gv.at/), the [French lower chamber](http://data.assemblee-nationale.fr/), the [Italian lower chamber](http://data.camera.it/) and the [Italian upper chamber](http://dati.senato.it/) are not. - -# LIMITATIONS - -1. __There is no self-updating mechanism: the data have to be refreshed manually__, because self-updating the code would probably require recoding all scrapers in a language supported by scraping platforms like [Morph](https://morph.io/) or [ScraperWiki](https://scraperwiki.com/). Data collection would only improve for ongoing legislatures, and website redesigns would still require manual updates. -2. __Some repositories rely on manual inputs during data collection__. The code for [Finland](https://github.com/briatte/eduskunta) requires editing a URL parameter (which does not even work at the moment, since the entire website has been [redesigned](https://github.com/briatte/eduskunta/issues/1)), and the code for [Hungary](https://github.com/briatte/orszaggyules) requires downloading a few bill indexes by hand. -3. __Network errors in the download loops require to rerun some of the scripts:__ rerunning the `data.r` scripts two or three times is therefore highly recommended. Some (but not all) scripts contain exception lists to skip over the little amount of errors that might have occurred, but some errors are permanent HTTP 404's and cannot be solved. -4. __Some variables are based on manual or semi-manual imputations:__ the `sex` variable is often based on imputation from first names, family names, or both; and the `party` variable is often based on manual recodings or on the "longest affiliation throughout legislature" rule. These limitations are fully documented in the `README` files of each repository. -5. __Some variables have many missing counts:__ this issue affects the `born` variable, which occasionally has high missing counts in upper houses and is completely missing in [Hungary](https://github.com/briatte/orszaggyules), and the `constituency` variable, which is occasionally missing in [Austria](https://github.com/briatte/nationalrat) and often missing in pre-redistricted [Sweden](https://github.com/briatte/riksdag). -6. __Some variables are imperfectly standardised across countries:__ the `committee` variable varies considerably because of differing parliamentary practices in committee formation (some networks have many committee co-memberships, others have almost none), and the `nyears` variable is not always a perfectly continuous measure. - ---- - -All improvements and limitations that do not require switching to a different programming language are under consideration for future releases of the repository, as is further integration with the data provided by [Every Politician](https://github.com/everypolitician/everypolitician-data/), [ParlGov](http://www.parlgov.org/) and [Wikidata](https://www.wikidata.org/).