Skip to content

Commit

Permalink
Refactor install docs [dc install, notebooks, ui] and ingestion docs …
Browse files Browse the repository at this point in the history
…with John's revisions.
  • Loading branch information
Otto Wagner committed Jul 9, 2018
1 parent d65dfd1 commit 56eeab4
Show file tree
Hide file tree
Showing 4 changed files with 125 additions and 98 deletions.
25 changes: 15 additions & 10 deletions docs/datacube_install.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,18 +119,17 @@ pip install netcdf4
```

Please note that the installed gdal version should be as close to your system gdal version as possible.
At the time of this writing, the `gdalinfo` command below outputs 1.11.3, which means that version 1.11.2 is the closest version that satisfies our requirements.
We try to install a non-existent version (99999999999) to have pip print all available version.

```
gdalinfo --version
pip install gdal==99999999999
```

At the time this is being written, the above command outputs 1.11.3, which means that version 1.11.2 is the closest version that satisfies our requirements.

Now that all requirements have been satisfied, run the setup.py script in the agdc-v2 directory:

**It has come to our attention that the setup.py script fails the first time it is run due to some NetCDF/Cython issues. Run the script a second time to install if this occurs.**
**It has come to our attention that the setup.py script can fail the first time it is run due to some NetCDF/Cython issues. Run the script a second time to install if this occurs.**
```
cd ~/Datacube/agdc-v2
python setup.py develop
Expand All @@ -156,7 +155,7 @@ Open this file in your editor of choice and find the line that starts with 'time
timezone = 'UTC'
```

This will ensure that all of the datetime fields in the database are stored in UTC. Next, open the pg_hba.conf file found at:
This will ensure that all of the datetime fields in the database are stored in UTC. Next, open the `pg_hba.conf` file found at:

```
/etc/postgresql/9.5/main/pg_hba.conf
Expand Down Expand Up @@ -184,7 +183,7 @@ sudo service postgresql restart

Data Cube Configuration file
---------------
The Data Cube requires a configuration file that points to the correct database and provides credentials. The file's contents looks like below should be named '.datacube.conf':
The Data Cube requires a configuration file that points to the correct database and provides credentials. The contents of the `.datacube.conf` file should appear as follows:

```
[datacube]
Expand All @@ -208,9 +207,9 @@ gedit ~/Datacube/data_cube_ui/config/.datacube.conf
cp ~/Datacube/data_cube_ui/config/.datacube.conf ~/.datacube.conf
```

This will move the required .datacube.conf file to the home directory. The user's home directory is the default location for the configuration file and will be used for all command line based Data Cube operations. The next step is to create the database specified in the configuration file.
This will copy the required `.datacube.conf` file to the home directory. The user's home directory is the default location for the configuration file and will be used for all command-line-based Data Cube operations. The next step is to create the database specified in the configuration file.

To create the database use the following:
To create the database run the following commands:

```
sudo -u postgres createuser --superuser dc_user
Expand Down Expand Up @@ -244,9 +243,15 @@ Done.

If you have PGAdmin3 installed, you can view the default schemas and relationships by connecting to the database named 'datacube' and viewing the tables, views, and indexes in the schema 'agdc'.

Alternatively, you can do the same from the command line. First log in with the command `psql -U dc_user datacube`.
To view schemas, run `psql \dn`.
View the full documentation of the `psql` command [here](https://www.postgresql.org/docs/9.5/static/app-psql.html).

<a name="next_steps"></a> Next Steps
========
Now that the Data Cube system is installed and initialized, the next step is to ingest some sample data. Our focus is on ARD (Analysis Ready Data) - the best introduction to the ingestion/indexing process is to use a single Landsat 7 or Landsat 8 SR product. Download a sample dataset from [Earth Explorer](https://earthexplorer.usgs.gov/) and proceed to the next document in this series, [The ingestion process](ingestion.md). Please ensure that the dataset you download is an SR product - the L\*.tar.gz should contain .tif files with the file pattern `L**_sr_band*.tif` This will correspond to datasets labeled "Collection 1 Higher-Level".
Now that the Data Cube system is installed and initialized, the next step is to ingest some sample data. Our focus is on ARD (Analysis Ready Data) - the best introduction to the ingestion/indexing process is to use a single Landsat 7 or Landsat 8 SR product.
There is a sample ingestion file provided in [the ingestion documentation](ingestion.md) in the "Prerequisites" section.
More generally, download a sample dataset from [Earth Explorer](https://earthexplorer.usgs.gov/) and proceed to the next document in this series, [the ingestion process](ingestion.md). Please ensure that the dataset you download is an SR product - the L\*.tar.gz should contain .tif files with the file pattern `L**_sr_band*.tif` This will correspond to datasets labeled "Collection 1 Higher-Level".


<a name="faqs"></a> Common problems/FAQs
Expand Down Expand Up @@ -281,7 +286,7 @@ Q:
>Can the Data Cube be accessed from R/C++/IDL/etc.?
A:
>This is not currently directly supported, the Data Cube is a Python based API. The base technology managing data access PostgreSQL, so theoretically the functionality can be ported to any language that can interact with the database. An additional option is just shelling out from those languages, accessing data using the Python API, then passing the result back to the other program/language.
>This is not currently directly supported. The Data Cube is a Python-based API. The technology managing data access is PostgreSQL, so theoretically the functionality can be ported to any language that can interact with the database. An additional option is just shelling out from those languages, accessing data using the Python API, then passing the result back to the other program/language.
---

Expand All @@ -297,7 +302,7 @@ Q:
>I want to store more metadata that isn't mentioned in the documentation. Is this possible?
A:
>This entire process is completely customizable. Users can configure exactly what metadata they want to capture for each dataset - we use the default for simplicities sake.
>This entire process is completely customizable. Users can configure exactly what metadata they want to capture for each dataset - we use the default for simplicity's sake.
---

Expand Down
3 changes: 2 additions & 1 deletion docs/ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ To index and ingest data into the Data Cube, the following prerequisites must be

Note that the ingestion file hyperlinked above by "our AWS site" can be downloaded with the command:<br>
```
wget http://ec2-52-201-154-0.compute-1.amazonaws.com/datacube/data/LE071950542015121201T1-SC20170427222707.tar.gz
wget -p /datacube/original_data http://ec2-52-201-154-0.compute-1.amazonaws.com/datacube/data/LE071950542015121201T1-SC20170427222707.tar.gz
```

If you have not yet completed our Data Cube Installation Guide, please do so before continuing.
Expand Down Expand Up @@ -720,6 +720,7 @@ Q:
A:
> If your dataset is already in an optimized format and you don't desire any projection or resampling changes, then you can simply index the data and then begin to use the Data Cube.
You will have to specify CRS when loading indexed data, since the ingestion process - which informs the Data Cube about the metadata - has not occurred.

---

Expand Down
59 changes: 33 additions & 26 deletions docs/notebook_install.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,18 @@ Jupyter notebooks are extremely useful as a learning tool and as an introductory

To run our Jupyter notebook examples, the following prerequisites must be complete:

* The full Data Cube Installation Guide must have been followed and completed. This includes:
* You have a local user that is used to run the Data Cube commands/applications
* You have a database user that is used to connect to your 'datacube' database
* The Data Cube is installed and you have successfully run 'datacube system init'
* All code is checked out and you have a virtual environment in the correct directories: `~/Datacube/{data_cube_ui, data_cube_notebooks, datacube_env, agdc-v2}`
* The full Ingestion guide must have been followed and completed. This includes:
* A sample Landsat 7 scene was downloaded and uncompressed in your `/datacube/original_data` directory
* The ingestion process was completed for that sample Landsat 7 scene
The full Data Cube Installation Guide must have been followed and completed before proceeding. This includes:
* You have a local user that is used to run the Data Cube commands/applications
* You have a database user that is used to connect to your 'datacube' database
* The Data Cube is installed and you have successfully run 'datacube system init'
* All code is checked out and you have a virtual environment in the correct directories: `~/Datacube/{data_cube_ui, data_cube_notebooks, datacube_env, agdc-v2}`

If these requirements are not met, please see the associated documentation.

You can view the notebooks without ingesting any data, but to be able to run notebooks with the sample ingested data,
the ingestion guide must have been followed and completed. The steps include:
* A sample Landsat 7 scene was downloaded and uncompressed in your `/datacube/original_data` directory
* The ingestion process was completed for that sample Landsat 7 scene

<a name="installation_process"></a> Installation Process
========
Expand All @@ -44,36 +48,32 @@ source ~/Datacube/datacube_env/bin/activate
Now install the following Python packages:

```
pip install jupyter
pip install matplotlib
pip install scipy
pip install sklearn
pip install lcmap-pyccd
pip install folium
pip install jupyter matplotlib scipy sklearn lcmap-pyccd folium
```

<a name="configuration"></a> Configuration
========

The first step is to generate a notebook configuration file. Run the following commands:

The first step is to generate a notebook configuration file.
Ensure that you're in the virtual environment. If not, activate with `source ~/Datacube/datacube_env/bin/activate`.
Then run the following commands:
```
#ensure that you're in the virtual environment. If not, activate with 'source ~/Datacube/datacube_env/bin/activate'
cd ~/Datacube/data_cube_notebooks
jupyter notebook --generate-config
jupyter nbextension enable --py --sys-prefix widgetsnbextension
```

Jupyter will create a configuration file in `~/.jupyter/jupyter_notebook_config.py`. Now set the password and edit the server details:
Jupyter will create a configuration file in `~/.jupyter/jupyter_notebook_config.py`.
Now set the password and edit the server details. Remember this password for future reference.

```
#enter a password - remember this for future reference.
jupyter notebook password
gedit ~/.jupyter/jupyter_notebook_config.py
```

Edit the generated configuration file to include relevant details - You'll need to find the relevant entries in the file:
Now edit the Jupyter notebook configuration file `~/.jupyter/jupyter_notebook_config.py` with your favorite text editor.

Edit the generated configuration file to include relevant details.
You'll need to set the relevant entries in the file:

```
c.NotebookApp.ip = '*'
Expand All @@ -90,16 +90,19 @@ cd ~/Datacube/data_cube_notebooks
jupyter notebook
```

Open a web browser and go to localhost:8888 if you're on the server, or use 'ifconfig' to list your ip address and go to {ip}:8888. You should be greeted with a password field - enter the password from the previous step.
Open a web browser and navigate to the notebook URL. If you are running your browser from the same machine that is
hosting the notebooks, you can use `localhost:{jupyter_port_num}` as the URL, where `jupyter_port_num` is the port number set for `c.NotebookApp.port` in the configuration file.
If you are connecting from another machine, you will need to enter the public IP address of the server in the URL (which can be determined by running the `ifconfig` command on the server) in place of `localhost`.
You should be greeted with a password field. Enter the password from the previous step.

<a name="using_notebooks"></a> Using the Notebooks
========

Now that your notebook server is running and the Data Cube is set up, you can run any of our examples.

Open the notebook titled 'Data_Cube_API_Demo' and run through all of the cells using either the button on the toolbar or CTRL+Enter.
Open the notebook titled 'Data_Cube_Test' and run through all of the cells using either the "Run" button on the toolbar or `Shift+Enter`.

You'll see that a connection to the Data Cube is established, some metadata is listed, and some data is loaded and plotted. Further down the page, you'll see that we are also demonstrating our API that includes getting acquisition dates, scene metadata, and data.
You'll see that a connection to the Data Cube is established, some metadata is queried, and some data is loaded and plotted.

<a name="next_steps"></a> Next Steps
========
Expand All @@ -114,6 +117,10 @@ Q:
>I’m having trouble connecting to my notebook server from another computer.
A:
> There can be a variety of problems that can cause this issue. Check your notebook configuration file, your network settings, and your firewall settings.
> There can be a variety of problems that can cause this issue.<br><br>
First check the IP and port number in your notebook configuration file.
Be sure you are connecting to `localhost:<port>` if your browser is running on the same
machine as the Jupyter server, and `<IP>:<port>` otherwise.
Also check that your firewall is not blocking the port that it is running on.

---
Loading

0 comments on commit 56eeab4

Please sign in to comment.