|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": { |
| 6 | + "deletable": true, |
| 7 | + "editable": true |
| 8 | + }, |
| 9 | + "source": [ |
| 10 | + "# HDF5 and pandas" |
| 11 | + ] |
| 12 | + }, |
| 13 | + { |
| 14 | + "cell_type": "markdown", |
| 15 | + "metadata": { |
| 16 | + "deletable": true, |
| 17 | + "editable": true |
| 18 | + }, |
| 19 | + "source": [ |
| 20 | + "HDF5 is both a data container and a library that is meant to store and retrieve large amounts of data in a convenient way. It is used extensively in science, engineering, finance and many other fields. HDF5 has to major Python packages that wrap it:\n", |
| 21 | + "\n", |
| 22 | + "1. h5py\n", |
| 23 | + "2. PyTables\n", |
| 24 | + "\n", |
| 25 | + "Also, pandas is using one of them (PyTables) so as to efficiently store and retrieve dataframes.\n", |
| 26 | + "\n", |
| 27 | + "During this tutorial you will learn how to create and read HDF5 datasets using both h5py and PyTables, as well as introducing the concept of data chunking and how it can be used to compress data efficiently. Moreover, a gentle description of the querying capabilities of PyTables will be made. Finally, we will see how HDF5 and pandas can interact, not only to serialize pandas dataframes, but also to efficiently query them right on-disk (i.e. with no need to load the data in-memory). " |
| 28 | + ] |
| 29 | + }, |
| 30 | + { |
| 31 | + "cell_type": "markdown", |
| 32 | + "metadata": { |
| 33 | + "deletable": true, |
| 34 | + "editable": true |
| 35 | + }, |
| 36 | + "source": [ |
| 37 | + "## Caveats for following the tutorial:\n", |
| 38 | + "\n", |
| 39 | + "1. These notebooks have been created and tested mainly on Jupyter notebook 4.4 and Python 3.6, but Python 2.7 should work equally fine, except for some particularities that will be seldom used.\n", |
| 40 | + "\n", |
| 41 | + "2. You can follow the tutorial by re-playing the [provided notebooks](https://github.com/FrancescAlted/PyData-BCN/releases). For those of you with problems with the Wifi, there are pendrives available.\n", |
| 42 | + "\n", |
| 43 | + "3. **In case** you cannot reproduce the desired results in your own laptop, do not worry too much; my advice is that you just concentrate in tutor's explanations and ask in case something is not clear enough." |
| 44 | + ] |
| 45 | + }, |
| 46 | + { |
| 47 | + "cell_type": "markdown", |
| 48 | + "metadata": { |
| 49 | + "deletable": true, |
| 50 | + "editable": true |
| 51 | + }, |
| 52 | + "source": [ |
| 53 | + "## Requisites\n", |
| 54 | + "\n", |
| 55 | + "* Jupyter notebook\n", |
| 56 | + "* numpy\n", |
| 57 | + "* h5py\n", |
| 58 | + "* tables (pytables)\n", |
| 59 | + "* pandas\n", |
| 60 | + "* matplotlib\n", |
| 61 | + "* cartopy\n", |
| 62 | + "\n", |
| 63 | + "These should be all in Anaconda or in the PyPI repo. The only exception could be `cartopy` that might not exist in the regular conda channel, so in order to install it, try the `conda-forge` channel instead:\n", |
| 64 | + "\n", |
| 65 | + "```\n", |
| 66 | + "$ conda install -c conda-forge cartopy\n", |
| 67 | + "```" |
| 68 | + ] |
| 69 | + }, |
| 70 | + { |
| 71 | + "cell_type": "markdown", |
| 72 | + "metadata": { |
| 73 | + "deletable": true, |
| 74 | + "editable": true |
| 75 | + }, |
| 76 | + "source": [ |
| 77 | + "## Contents" |
| 78 | + ] |
| 79 | + }, |
| 80 | + { |
| 81 | + "cell_type": "markdown", |
| 82 | + "metadata": { |
| 83 | + "deletable": true, |
| 84 | + "editable": true |
| 85 | + }, |
| 86 | + "source": [ |
| 87 | + "\n", |
| 88 | + "1. [Basic Datatypes](1-Basic-Datatypes.ipynb)\n", |
| 89 | + "\n", |
| 90 | + "1. [Chunking](2-Chunking.ipynb)\n", |
| 91 | + "\n", |
| 92 | + "1. [Using Compression](3-Using-Compression.ipynb)\n", |
| 93 | + "\n", |
| 94 | + "1. [Structuring Datasets](4-Structuring-Datasets.ipynb)\n", |
| 95 | + "\n", |
| 96 | + "1. [Querying with PyTables](5-Querying-With-PyTables.ipynb)\n", |
| 97 | + "\n", |
| 98 | + "1. [Integration with pandas](6-Integration-With-Pandas.ipynb)" |
| 99 | + ] |
| 100 | + }, |
| 101 | + { |
| 102 | + "cell_type": "code", |
| 103 | + "execution_count": null, |
| 104 | + "metadata": { |
| 105 | + "collapsed": true, |
| 106 | + "deletable": true, |
| 107 | + "editable": true |
| 108 | + }, |
| 109 | + "outputs": [], |
| 110 | + "source": [] |
| 111 | + } |
| 112 | + ], |
| 113 | + "metadata": { |
| 114 | + "kernelspec": { |
| 115 | + "display_name": "Python 3", |
| 116 | + "language": "python", |
| 117 | + "name": "python3" |
| 118 | + }, |
| 119 | + "language_info": { |
| 120 | + "codemirror_mode": { |
| 121 | + "name": "ipython", |
| 122 | + "version": 3 |
| 123 | + }, |
| 124 | + "file_extension": ".py", |
| 125 | + "mimetype": "text/x-python", |
| 126 | + "name": "python", |
| 127 | + "nbconvert_exporter": "python", |
| 128 | + "pygments_lexer": "ipython3", |
| 129 | + "version": "3.6.1" |
| 130 | + } |
| 131 | + }, |
| 132 | + "nbformat": 4, |
| 133 | + "nbformat_minor": 0 |
| 134 | +} |
0 commit comments