Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion blog/2026-04-22-spark-vs-fusion-compaction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,9 @@ We used the following parameters for the compaction:

- `target-file-size-bytes`: 512 MB (the desired output file size after compaction)

- `max-task-size-bytes`: 512 MB (the maximum amount of data processed in one compaction task)
:::note
By default, `max-task-size-bytes` is set to the same value as `target-file-size-bytes`.
:::

### 2. Spark Configuration

Expand Down
68 changes: 41 additions & 27 deletions docs/getting-started/creating-first-pipeline.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ Start from the **Jobs** page and set up everything in one flow.

1. Go to **Jobs** in the left menu and click **Create Job**.
1. Configure **job name & schedule**
1. Configure the **source**.
1. Configure the **destination**.
1. If the **source** is already configured select the source connector and existing source. Otherwise, configure a new **source**.
1. If the **destination** is already configured select the destination connector and existing destination. Otherwise, configure a new **destination**.
1. Configure streams and save.

#### 2. Resource-first workflow:
Expand All @@ -55,32 +55,20 @@ Choose **Resource-first** if your source and destination are already configured,

## Tutorial: Creating a Job

In this guide, we'll use the **Job-first workflow** to set up a job from configuring the source and destination to running it. If you prefer video, check out our [video walkthrough](#video-tutorial).
In this guide, we'll use the **Resource-first workflow** to set up a job from configuring the source and destination to running it.

First things first, every job needs a source and a destination before it can run.
For this demonstration, we'll use [**Postgres**](/docs/connectors/postgres) as the source and [**Apache Iceberg**](/iceberg/why-iceberg) with [**Glue Catalog**](/docs/writers/iceberg/catalog/glue/) as the destination.

Let's get started!

### 1. Create a New Job
### 1. Create a New Source

Navigate to **Jobs** section and select **+ Create Job** button in the top right corner. This opens the Job creation wizard, starting with the **Configure Job Name & Schedule** step.
Navigate to **Sources** section and select **+ Create Source** button in the top right corner.

<div className='mx-auto w-full lg:w-[80%]'>
![OLake jobs dashboard with the Jobs tab, Create Job button, and Create your first Job button highlighted](../../static/img/docs/getting-started/create-your-first-job/job-create.webp)
</div>

### 2. Configure Job Name & Schedule

Give your job a descriptive name. For this guide, set the **Frequency** dropdown to **Every Day** and choose **12:00 AM** as the **Time**.

<div className='mx-auto w-full lg:w-[80%]'>
![OLake Create Job page showing step 1, with job name, frequency dropdown (Every Day highlighted), and job start time settings](../../static/img/docs/getting-started/create-your-first-job/job-schedule.webp)
</div>

### 3. Configure Source
![Olake create source](../../static/img/docs/getting-started/create-your-first-job/create-source.webp)

Since we're following the **Job-first workflow**, select the **Set up a new source** option.
### 2. Configure Source

For this guide, choose **Postgres** from the connector dropdown, and keep the **OLake version** set to the latest stable version.

Expand All @@ -96,7 +84,7 @@ Give your source a descriptive name, then fill in the required Postgres connecti
creation](../../static/img/docs/getting-started/create-your-first-job/job-source-config.webp)
</div>

Once the test connection succeeds, OLake shows a success message and takes you to the destination configuration step.
Once the test connection succeeds, OLake shows a success message, and by clicking on destinations button it takes you to the destination configuration step.

You can find the configuration and troubleshooting guides for all supported source connectors below.
Comment thread
merlynm20 marked this conversation as resolved.

Expand All @@ -106,19 +94,29 @@ You can find the configuration and troubleshooting guides for all supported sour
| Postgres | [Config](/docs/connectors/postgres#configuration) |
| MongoDB | [Config](/docs/connectors/mongodb#configuration) |
| Oracle | [Config](/docs/connectors/oracle#configuration) |
| MSSQL | [Config](/docs/connectors/mssql#configuration) |
| Kafka | [Config](/docs/connectors/kafka#configuration) |
| DB2 LUW | [Config](/docs/connectors/db2#configuration) |
| S3 | [Config](/docs/connectors/s3#configuration) |

:::note
If you plan to enable CDC (Change Data Capture), make sure a replication slot already exists on your Postgres database.
You can learn how to check or create one in our [Replication Slot Guide](/docs/connectors/postgres/setup/generic).
:::

### 3. Create a New Destination

Navigate to **Destination** section and select **+ Create Destination** button in the top right corner.

![Olake create destination](../../static/img/docs/getting-started/create-your-first-job/create-dest.webp)

### 4. Configure Destination

Similarly, here we'll be using **Iceberg** with **AWS Glue Catalog** as the destination.

For this guide, select **Apache Iceberg** from the connector dropdown, and keep the **OLake version** set to the latest stable version.

<div className='mx-auto w-full lg:w-[80%]'>
<div>
![Job destination
creation](../../static/img/docs/getting-started/create-your-first-job/job-dest-connector.webp)
</div>
Expand All @@ -127,17 +125,17 @@ Choose the catalog as **AWS Glue** from the Catalog Type dropdown.

<div className='mx-auto w-full lg:w-[80%]'>
![Job destination
catalog](../../static/img/docs/getting-started/create-your-first-job/configure_dest.png)
catalog](../../static/img/docs/getting-started/create-your-first-job/job-dest-catalog.webp)
</div>

Give your destination a descriptive name, then fill in the required connection details in the Endpoint Config form.

<div className='mx-auto w-full lg:w-[80%]'>
![Job destination
config](../../static/img/docs/getting-started/create-your-first-job/dest_config.png)
config](../../static/img/docs/getting-started/create-your-first-job/job-dest-config.webp)
</div>

Once the test connection succeeds, OLake shows a success message and takes you to the streams configuration step.
Once the test connection succeeds, OLake shows a success message and by clicking on create job button it takes you to the job configuration step.

You can find the configuration and troubleshooting guides for all supported destination connectors below.

Expand All @@ -159,9 +157,25 @@ You can find the configuration and troubleshooting guides for all supported dest
| Polaris | [Config](/docs/writers/iceberg/catalog/rest?rest-catalog=polaris#configuration) |
| Unity | [Config](/docs/writers/iceberg/catalog/rest?rest-catalog=unity#configuration) |

### 5. Configure Streams
### 5. Create a New Job

Navigate to **Jobs** section and select **+ Create Job** button in the top right corner.
Comment thread
merlynm20 marked this conversation as resolved.

![Olake create Job](../../static/img/docs/getting-started/create-your-first-job/create-job.webp)

### 6. Configure Job

Give your job a descriptive name. For this guide, set the **Frequency** dropdown to **Every Minute**.

Next we have to select the source and destination that we created in the previous steps. First we need to select the **source connector** from the dropdown and then select the **source**. Similarly for the destination we have to select the **destination connector** from the dropdown and then select the **destination**.

Once you have selected the source and destination, click on the **Next** button to continue. At this stage, the system validates both configurations. You can proceed to the Streams section only after both validations succeed and a success status is displayed.

Comment thread
merlynm20 marked this conversation as resolved.
![Olake Job Configuration](../../static/img/docs/getting-started/create-your-first-job/select-source-dest.webp)

### 7. Configure Streams

The **Streams** page is where you select which streams to replicate to the destination.
The **Streams** page is where you select which streams to replicate to the destination and configure stream-level properties for each selected stream. For more details, please check the [Stream Properties](/docs/understanding/terminologies/olake/#streams-properties).
Here, you can choose your preferred [sync mode](/docs/understanding/terminologies/olake#2-sync-modes) and configure [partitioning](/docs/writers/parquet/partitioning) and [Destination Database](/docs/understanding/terminologies/olake#7-tablecolumn-normalization--destination-database-creation) as well as other stream-level settings here.

<div className='mx-auto w-full lg:w-[80%]'>
Expand Down Expand Up @@ -249,7 +263,7 @@ Yay! The sync is complete, and our data has been replicated to Iceberg exactly a

<br />

### 6. Manage Your Job
### 8. Manage Your Job

Once your job is created, you can manage it from the **Jobs** page using the **Actions** menu **(⋮)**

Expand Down
4 changes: 1 addition & 3 deletions docs/getting-started/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,7 @@ The default credentials are:

![Login page with fields for admin username and password](/img/docs/jobs/olake-job-0.webp)

<div style={{textAlign: 'center'}}>
<img src="/img/docs/jobs/olake-job.webp" alt="OLake UI Jobs" width="800" loading="lazy" decoding="async" />
</div>
![After login](/img/docs/getting-started/olake-job.webp)

### Updating OLake UI

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/docs/getting-started/olake-job.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading