Skip to content
jwbowers edited this page Sep 19, 2011 · 9 revisions

Interactive map request cycle

The proposed plan for self-hosted survey involves several moving parts and vendors. This section attempts to document and list the responsibilities of each component of the system.

  1. Google App Engine: GAE provides a hosting service for web apps that are written in either Java or Python. As Clojure compiles JVM byte code, it is indistinguishable from Java once compiled and can be run on GAE also. The libraries to facilitate Clojure on GAE are on github. GAE will handle randomization, creating the HTML version of questions, storing respondent answers, and providing basic security for the data (e.g. users cannot fill out the survey twice). GAE will also handle communication with other services, listed below.

  2. Amazon EC2: EC2 is a scalable computing platform. One buys time on Amazon's servers. While GAE is fairly specialized in what can execute in that environment, EC2 allows any software that the user installs. In our case, this will be PostgreSQL with the PostGIS extensions. This will allow queries of the user's location and community polygons against administrative units in Canada. For example, we can run an intersection query of the respondent's community with census districts and get the average percentage of visible minorities in the area to present to the respondent. We can also directly export polygons as KML files to be displayed on the user's map. These can either be done on the fly or pre-rendered into .kml files in advance.

This diagram shows the request-response cycle for a user getting a survey page, which includes a dynamically generated district shape file that will be displayed on his/her map.

Amazon EC2

The first thing one must select when setting up an Amazon EC2 instance is the base operating system. The Ubuntu linux distribution has several images availablable. The most recent stable release is appears to by Natty Narwhal. To this we will need to install (if not installed by default):

  • Apache: for handling web requests from GAE
  • PHP: probably the best tool for reading the requests, generating the queries, and returning the results
  • PostgreSQL: the database engine
  • PostGIS (and related libraries): adds spatial indexes and functions to PostgreSQL

Images

The Amazon service requires an image. The US East Natty Narwhal image is ami-e2af508b.

Connecting

You must enable SSH access in the security group step of setting up the instance (I missed this on the first few attempts). You will also need to use the SSH key provided in the setup.

AutoScaling

Both Amazon and Google are supposed to support autoscaling. We want to do this. We are not exactly sure how to test this.

On amazon: We think that we need to load instances from S3. We don't plan to save data on Amazon, so we don't need to worry about the Elastic Block Storage.

On Google: We'll set a daily budget and modify it over time. Set it really high the first day and decrease thereafter. One of us will want to keep an eye on the dashboards for both Google and Amazon.

We do not yet have a domain name but will need one.

Setting up

The amazon-ec2 branch of this repository is for scripts and setup of the instance. You will need to connect to the github repository to get them. You will need to add a new public key to your github account. Then also add the private key to ~/.ssh/. Then:

$ sudo apt-get install git
$ ssh-agent bash
$ ssh-add ~/.ssh/KEYNAME
$ git clone [email protected]:bowers-illinois-edu/community-maps.git
Clone this wiki locally