-
Notifications
You must be signed in to change notification settings - Fork 0
Tutorial to run OPSD scripts
- Install Anaconda (choose Python 3.x). Anaconda is a standard Python-distribution that includes packages required for scientific work with Python. It also includes Jupyter Notebook (formerly known as IPython Notebook) which is used in OPSD to run and document the scripts.
- This Beginners Guide is a brief step-by-step tutorial on installing and running Jupyter (IPython) notebooks for new users who have no familiarity with python.
- On the OPSD-site on github you can choose the datapackage you are interested in. Click on it.
- Then you can see all files contained in the data package. Click on the green button on the top right ("Clown or download") to download all files to your computer.
- You can download the package as as a ZIP file. Then you just have the latest version on your computer.
You can also use git for retrieving older versions of data packages which is recommended if you want to retrieve updated versions easily.
- Install the version control software Git
- Register a free account on GitHub, which hosts collaborative open source projects
- Generate a SSH-key for authentication on GitHub.
- Log into GitHub with your account and add the SSH-key in your profile
- Clone repository from GitHub by typing in your terminal
-
> cd <Folder where you want to create it>
(go into folder where repository should be) -
> git clone <name repository>
(insert SSH key you can find when clicking on the green "download or clone" button on github. (e.g [email protected]:Open-Power-System-Data/datapackage_renewable_power_plants.git) -
> cd <name repository>
(go into the repository you have just cloned)
-
- The most important git commands are summarized in this git cheat sheet (PDF)
-
Optional: Create a virtual environment. A Virtual Environment is a tool to keep the dependencies required by different projects in separate places by creating virtual Python environments for them.
-
> conda create -n OPSD python=3.5 jupyter pandas
creates the virtual environment OPSD with the python version 3.5, the packages jupyter and pandas. - It can be activated by
> source activate OPSD
- and closed by
> source deactivate OPSD
-
-
Each Data Package has a file called requirements.yml which lists all packages and their versions required for the scripts in the respective Data Package. To install these run in your terminal
> conda install requirements.yml
There are two possible ways to access it:
- You can find it in Windows by by searching "cmd" and open "cmd.exe", linux users probably know where they can find the terminal
- Type
< jupyter notebook
Press Enter. - (If it does not work, try
< ipython notebook
)
- Go to the Anaconda-folder in the start menu and click on the IPython Notebook Link.
No matter which access path you use this should happen:
- A new window in your web browser opens and shows the Notebook Dashboard. Firefox works best, Internet Explorer does not work (well).
- Now you can choose the first notebook of the Data Package
- It is recommended to start with the main.ipynb notebook because it explains the structure and aim of the scripts in the respective Data Package.
If this explanation for Jupyter Notebooks was too short, maybe this Jupyter Notebook beginner guide is helpful.
- Start with main.ipynb, which can be found in each Data Package and explains the Data Package and scripts
- If there is a lot of rather boring stuff (e.g. for downloading routines), this is sometimes outsourced in extra python scripts which you also have on your computer in the data packages. They are automatically run by the notebooks where necessary. Code directly working with the original data like verification, processing, corrections is in the notebooks.
- Generally, the script starts with downloading, continues with reading, processing and some also have graphs to illustrate data.
- Downloaded files will be put in the input/original_data folder on your computer
- If you run the scripts again, these already downloaded files will be used
- Output files will be stored in the folder output which is generated once you run the script in the Data Package folder on your computer
- In the browser window of the notebook that should be closed click File -> Close and halt
or - On the main Jupyter Notebook browser window / the dashboard click on the tab Running and close all notebooks you want to close
- Closing only the tabs/browser does not shut down the kernel
- Creating new Notebook: Creating Notebook Failed [...] Errno 13 The link to open IPython Notebook (see the Anaconda folder in start menu) has to be copied to the folder that contains the notebook files (.ipynb).
- Versions of the Data Packages is organized by tags which are named with the date of the version. We do not use version names but version dates
- On the respective Data Package page on GitHub, you can click on branches, there you can choose tags
- There you have the choice between different dates
- The original_data on which the version of the script is based is stored in a folder with the same date. This can be accessed via view_original_data on the Data Package site on the OPSD page