Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 96 additions & 90 deletions examples/r-load-data/r-load-data-kaggle.Rmd
Original file line number Diff line number Diff line change
@@ -1,90 +1,96 @@
---
title: "Load Data from Kaggle"
description: "Load Kaggle datasets into Saturn Cloud"
output: html_notebook
---

![Kaggle Logo](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-logo.png)

## Overview

[Kaggle](https://www.kaggle.com/), in addition to its competitions and other offerings, has an expansive offering of curated and community submitted datasets. The datasets span numerous domains, sizes, and file types. This tutorial will give you the foundational information to load data from Kaggle directly into Saturn Cloud, quickly and easily!

Before starting this, you should create a RStudio server resource. See our [quickstart](https://saturncloud.io/docs/start_in_ten/) if you don't know how to do this yet.

## Process
### Create Kaggle Credentials

The first step for accessing data from Kaggle is to create an API token.

Access the account page of your Kaggle account by signing in and clicking on your username and picture in the top right. Click on the **Account** tab:

![Kaggle account menu with arrow pointing to the Account tab](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-account-header-arrow.jpeg "doc-image")

Then scroll down to the API section, and click **Create New API Token**:

![Kaggle account page with arrow pointing to Create New API Token](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-create-token-arrow.jpeg "doc-image")

This will download a file named "kaggle.json." This file contains your username and API key. Save it in a safe place!

Open the "kaggle.json" file in your favorite text editor and you will see your Kaggle username and key.

### Add Kaggle Credentials to Saturn Cloud
Sign in to your Saturn Cloud account and select **Credentials** from the menu on the left.

<img src="https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/saturn-credentials-arrow.jpeg" style="width:200px;" alt="Saturn Cloud left menu with arrow pointing to Credentials tab" class="doc-image">

This is where you will add your Kaggle API key information. *This is a secure storage location, and it will not be available to the public or other users without your consent.*

At the top right corner of this page, you will find the **New** button. Click here, and you will be taken to the Credentials Creation form.

![Screenshot of Saturn Cloud Create Credentials form](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/credentials.jpg "doc-image")

You will be adding two credentials items: your Kaggle username and API key. Complete the form one time for each item.

| Credential | Type | Name| Variable Name |
|---|---|---|---|
| Kaggle Username | Environment Variable | `kaggle-username` | `KAGGLE_USERNAME`
| Kaggle API Key | Environment Variable | `kaggle-api-key` | `KAGGLE_KEY`

Copy the values from your "kaggle.json" file into the *Value* section of the credential creation form. The credential names are recommendations; feel free to change them as needed for your workflow. You must, however, use the provided *Variable Names* for Kaggle to connect correctly.

With this complete, your Kaggle credentials will be accessible by Saturn Cloud resources! You will need to restart any Jupyter Server or Dask Clusters for the credentials to populate to those resources.

### Setting Up Your Resource
Kaggle is not installed by default in Saturn images, so you will need to install it onto your resource. This is already done in this example recipe, but if you are using a custom resource you will need to `pip install kaggle`. Check out our page on [installing packages](https://saturncloud.io/docs/using-saturn-cloud/install-packages/) to see the various methods for achieving this!

### Download a Dataset
Now that you have set up the credentials for Kaggle and installed kaggle, downloading Kaggle data is really straightforward!

In Kaggle, find the dataset you want to download.

On the dataset page, click on the three dots to the right and select **Copy API Command**.

![Kaggle dataset page with arrow pointing to Copy API command](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-dataset-arrow.jpeg "doc-image")

Now, in Saturn Cloud, open the terminal, then paste the API command. For example:


```{bash}
kaggle datasets download -d deepcontractor/swarm-behaviour-classification
```

That's it! Your dataset will download to your current path, and you will be able to use it for calculations!

### Download a Competition Dataset
Downloading a competition dataset is similarly straightforward, but it is a slightly different process.

In Kaggle, find the competition you want to download the dataset for.

Click on **Data** in the top menu and then copy the command displayed.

![Kaggle competition dataset page with arrows pointing to the Data tab and the API command](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-competition-dataset-arrow.jpeg "doc-image")

Now, in Saturn Cloud, open the terminal, then paste the API command. For example:

```{bash}
kaggle competitions download -c titanic
```

That's it! Your dataset will download to your current path, and you will be able to use it for calculations!
---
title: "Load Data from Kaggle"
description: "Load Kaggle datasets into Saturn Cloud"
output: html_notebook
---

![Kaggle Logo](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-logo.png)

## Overview

[Kaggle](https://www.kaggle.com/), in addition to its competitions and other offerings, has an expansive offering of curated and community submitted datasets. The datasets span numerous domains, sizes, and file types. This tutorial will give you the foundational information to load data from Kaggle directly into Saturn Cloud, quickly and easily!

Before starting this, you should create a RStudio server resource. See our [quickstart](https://saturncloud.io/docs/start_in_ten/) if you don't know how to do this yet.

## Process
### Create Kaggle Credentials

The first step for accessing data from Kaggle is to create an API token.

Access the account page of your Kaggle account by signing in and clicking on your username and picture in the top right. Click on the **Account** tab:

![Kaggle account menu with arrow pointing to the Account tab](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-account-header-arrow.jpeg "doc-image")

Then scroll down to the API section, and click **Create New API Token**:

![Kaggle account page with arrow pointing to Create New API Token](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-create-token-arrow.jpeg "doc-image")

This will download a file named "kaggle.json." This file contains your username and API key. Save it in a safe place!

Open the "kaggle.json" file in your favorite text editor and you will see your Kaggle username and key.

### Add Kaggle Credentials to Saturn Cloud
Sign in to your Saturn Cloud account and select **Secrets** from the menu on the left.

<img src="https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/saturn-secrets-outline.png" style="width:200px;" alt="Saturn Cloud left menu with arrow pointing to Credentials tab" class="doc-image">

This is where you will add your Kaggle API key information. *This is a secure storage location, and it will not be available to the public or other users without your consent.*

At the top right corner of this page, you will find the **New** button. Click here, and you will be taken to the Secrets Creation form.

![Screenshot of Saturn Cloud Create Credentials form](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/create-new-secret.png "doc-image")

You will be adding two credentials items: your Kaggle username and API key. Complete the form one time for each item.

| Credential | Name |
|---|---|
| Kaggle Username | `kaggle-username`
| Kaggle API Key | `kaggle-key`

Copy the values from your "kaggle.json" file into the *Value* section of the secret creation form. You must use the provided variable names above for Kaggle to connect correctly.

With this complete, your Kaggle credentials will be accessible by Saturn Cloud resources!

### Setting Up Your Resource
First, you will need to attach your Kaggle credentials to your resource. On the resource's page, navigate to the *Secrets* tab, and then click on the *Attach Secret Environment Variable* button.

[Screenshot of Saturn Cloud Attach Secret Environment Variable button](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/attach-secret-variable-arrow.png "doc-image")

Next, add the `kaggle-username` and `kaggle-key` variables one at a time by selecting them from the dropdown list. The correct *Environment Variable Name* field will be generated by default. Once your two secrets are attached, you're ready to connect to Kaggle from your resource!

Kaggle is not installed by default in Saturn images, so you will need to install it onto your resource. This is already done in this example recipe, but if you are using a custom resource you will need to `pip install kaggle`. Check out our page on [installing packages](https://saturncloud.io/docs/using-saturn-cloud/install-packages/) to see the various methods for achieving this!

### Download a Dataset
Now that you have set up the credentials for Kaggle and installed kaggle, downloading Kaggle data is really straightforward!

In Kaggle, find the dataset you want to download.

On the dataset page, click on the three dots to the right and select **Copy API Command**.

![Kaggle dataset page with arrow pointing to Copy API command](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-dataset-arrow.jpeg "doc-image")

Now, in your Saturn Cloud resource, open the terminal then paste the API command. For example:


```{bash}
kaggle datasets download -d deepcontractor/swarm-behaviour-classification
```

That's it! Your dataset will download to your current path, and you will be able to use it for calculations!

### Download a Competition Dataset
Downloading a competition dataset is similarly straightforward, but it is a slightly different process.

In Kaggle, find the competition you want to download the dataset for.

Click on **Data** in the top menu and then copy the command displayed.

![Kaggle competition dataset page with arrows pointing to the Data tab and the API command](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/kaggle-competition-dataset-arrow.jpeg "doc-image")

Now, in Saturn Cloud, open the terminal, then paste the API command. For example:

```{bash}
kaggle competitions download -c titanic
```

That's it! Your dataset will download to your current path, and you will be able to use it for calculations!