This is a PostgreSQL schema loader for the data provided by Texas Ethics Commission.
We provide a utility to
- extract schema from PDSERF/Plus format export by reading the ReadMe files and parse them to determine the schema, types, and keys and constraints.
- create the tables needed to load up the 1295 certs -- these are hand written from the pdf documentation by provided by TEC
- load the data up from csv format into the tables we create
Internally, lines from PDSERF readme are either,
- Table Description rows
- Column Description-cotd rows
- Column rows
- Start-rows for table (Start with "Record #:")
- End-rows for table (Containing just a
-)
Column lines are either
- Indented as part of a group (array) replicated a certain amount of times
- Derived from a "single line"
All data is loaded up into PostgreSQL, including the Descriptions which we pull
down as
COMMENTS.
You can find the readmes from the Texas Ethics Commission added in this project here:
This module loads has full coverage of the meta-data, and data of the TEC.
- Lobby Reports (tables
l_*)tec.l_awardmementodata tec.l_foodbeveragedata tec.l_coversheetladata tec.l_giftdata tec.l_docketdata tec.l_individualreportingdata tec.l_entertainmentdata tec.l_subjectmatterdata tec.l_eventdata tec.l_transportationdata - Campaign Finance Reports (tables
c_*)tec.c_assetdata tec.c_creditdata tec.c_finaldata tec.c_candidatedata tec.c_debtdata tec.c_loandata tec.c_contributiondata tec.c_expendcategory tec.c_pledgedata tec.c_coversheet1data tec.c_expenddata tec.c_spacdata tec.c_coversheet2data tec.c_expendrepayment tec.c_traveldata tec.c_coversheet3data tec.c_filerdata - 1295 Certs
tec.form1295_box123 tec.form1295_interested_party
Requirements: PostgreSQL, git, curl
Repo download and database setup (example in bash):
$ git clone https://github.com/EvanCarroll/db-Texas-Ethics-Commission.git
$ cd ./db-Texas-Ethics-Commission
$ make
$ createdb mydb
$ psql -d mydb -f ./runme.sql 2>&1 | tee out.log
$ make cleanCreated at Houston Hackathon 2018 as the sole work of Evan Carroll.
If you use this, open source all (100%) of your stuff, or I'll litigate. The GPL is not the AGPL. Please read, and be advised:
GNU Affero General Public License v3, see included LICENSE.md
Contact Evan Carroll 281.901.0011 for a quote on development.