FSUCourseScraper

This is the component of Qcumber from students at Queen's university that was made to scrape the data off FSU's my.fsu.edu, parse it, and generate structured data that a site can then display.

Setup Guide

This guide has been verified for Ubuntu 11.10 and 12.10.
Setting up on mac OSX should be quite similar. It will be verified soon.
It works on Windows, but installation there is left as an exercise for the reader.

Installing the Prerequisites

Make sure you have all the needed permissions to install.
For most users, this means prepending each install command with sudo
Ex: sudo apt-get install ...

Python and Libraries

This project has been designed to work with Python versions 2.7.x and 3.3.x You can try other versions, but no promises.

Python 3.3.x is recommended.

Install a compatible version of Python. Use a package manager (Ex: apt-get install python3 python3-dev), or get the source from http://www.python.org/download/ if your distribution doesn't have the correct version of Python availible.
Make sure to also install the developement libraries (packages python3-dev or python2-dev). If you compile from source, these are already included.
Install extra libraries needed for compiling the lxml module:
- Most Debian-based distros: apt-get install libxml2-dev libxslt1-dev
- Red Hat/Fedora: yum install libxml2-devel libxslt-devel
- Arch: pacman -S libxml2 libxslt

Git and a Github account

Go to https://github.com/ and follow the instructions to register an account.
Run apt-get install git to install Git.
Follow the guide at https://help.github.com/articles/set-up-git to set up Git.

Pip and a Virtual Environment

Pip is used to install extra Python modules that aren't included by default. A virtual environment is an isolated Python environment. It allows for per-program environment configuration.

Install Pip by running apt-get install python3-pip (or python-pip for 2.7.x users)
Once Pip is installed, run pip install virtualenv
The virtual environment will be configured later.

Fork the Repository

Click the "Fork" button at the top-right of https://github.com/Queens-Hacks/qcumber-scraper
You now have your own copy of qcumber-scraper that you can safely mess around with!

Clone it to your computer

Copy the [email protected]:[yourusername]/qcumber-scraper.git link on the page.
Open up a terminal window.
Navigate to the folder in which you want to store your local copy of the scraper.
Clone the repository. git clone [repository], where [repository] is the url you copied.
You should now have a qcumber-scraper folder.

Create and Activate a Virtual Environment

Navigate into the FSUCourseScraper folder
Create a new virtual environment: virtualenv venv
If you have multiple versions of Python on your system, make sure to specify the correct one with a -p switch (Ex: virtualenv -p /usr/bin/python3 venv)
Activate the new environment: source venv/bin/activate
NOTE: you will need to activate the virtual environment every time you want to run the local project.
To deactivate the virtual environment: deactivate

Install Required Packages

Make sure you have activated your virtual environment (see above) before running this command!

pip install -r requirements.txt
If this command reports an error, check the log to see if you have all the dependencies required.

Runnning a scrape

Make sure your virtual environment is activated.
Make you you have created a config.py
To do a my.fsu.edu webscrape run python main.py
To do a textbook scrape run python textbooks.py

Better Logging

For better logging and debugging later it is recommended to redirect the output to log files. Something like: python main.py >logs/debug.log 2>logs/error.log

To watch the logs as they happen, first open 2 other terminals, and run tailf logs/debug.log in one, and tailf logs/error.log in the other. Then start the main scrape command like above.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.idea		.idea
__pycache__		__pycache__
data-dump		data-dump
logs		logs
.DS_Store		.DS_Store
README.md		README.md
_gitignore		_gitignore
main.py		main.py
main.pyc		main.pyc
navigation.py		navigation.py
navigation.pyc		navigation.pyc
parser1.py		parser1.py
parser1.pyc		parser1.pyc
requirements.txt		requirements.txt
sample_config.py		sample_config.py
scraper.py		scraper.py
scraper.pyc		scraper.pyc
textbooks.py		textbooks.py
writer.py		writer.py
writer.pyc		writer.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FSUCourseScraper

Setup Guide

Python and Libraries

Git and a Github account

Pip and a Virtual Environment

Runnning a scrape

Better Logging

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

NSegal/FSUCourseScraper

Folders and files

Latest commit

History

Repository files navigation

FSUCourseScraper

Setup Guide

Python and Libraries

Git and a Github account

Pip and a Virtual Environment

Runnning a scrape

Better Logging

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages