diff --git a/blog/2025-09-04-creating-job-olake-docker-cli.mdx b/blog/2025-09-04-creating-job-olake-docker-cli.mdx new file mode 100644 index 00000000..0c4d8ab8 --- /dev/null +++ b/blog/2025-09-04-creating-job-olake-docker-cli.mdx @@ -0,0 +1,381 @@ +--- +title: "From Postgres to Iceberg: Creating OLake Jobs with Docker CLI and UI" +description: "A friendly, step-by-step walkthrough to configure replication from Postgres to Apache Iceberg (Glue catalog) using the OLake UI or the Docker CLI." +slug: creating-job-olake-docker-cli +authors: [akshay] +tags: [docker,apache-iceberg,replication] +image: /img/blog/cover/pipeline-on-olake.png +--- + +# From Postgres to Iceberg: Creating OLake Jobs with Docker CLI and UI + +Data replication has become one of the most essential building blocks in modern data engineering. Whether it's keeping your analytics warehouse in sync with operational databases or feeding real-time pipelines for machine learning, companies rely on tools to move data quickly and reliably. + +Today, there's no shortage of options—platforms like Fivetran, Airbyte, Debezium, and even custom-built Flink or Spark pipelines are widely used to handle replication. But each of these comes with trade-offs: infrastructure complexity, cost, or lack of flexibility when you want to adapt replication to your specific needs. + +That's where OLake comes in. Instead of forcing you into one way of working, OLake focuses on making replication into Apache Iceberg (and other destinations) straightforward, fast, and adaptable. You can choose between a guided UI experience for simplicity or a Docker CLI flow for automation and DevOps-style control. + +In this blog, we'll walk through how to set up a replication job in OLake, step by step. We'll start with the UI wizard for those who prefer a visual setup, then move on to the CLI-based workflow for teams that like to keep things in code. By the end, you'll have a job that continuously replicates from Postgres → Apache Iceberg (Glue catalog) with CDC, normalization, filters, partitioning, and scheduling—all running seamlessly. + +## Two Setup Styles (pick what fits you) + +### Option A — UI "Job-first" (guided, all-in-one) +Perfect if you want a clear wizard and visual guardrails. + +### Option B — CLI (Docker) +Great if you prefer terminal, versioned JSON, or automation. + +Both produce the **same result**. Choose the path that matches your workflow today. + +## Option A — OLake UI (Guided) + +We'll take the "job-first" approach. It's straightforward and keeps you in one flow. + +### 1) Create a Job + +From the left nav, go to **Jobs → Create Job**. +You'll land on a wizard that starts with the **source**. + +![Job page](/img/docs/getting-started/create-your-first-job/job-create.png) + +### 2) Configure the Source (Postgres) + +Choose **Set up a new source** → select **Postgres** → keep OLake version at the latest stable. +Name it clearly, fill the Postgres endpoint config, and hit **Test Connection**. + +![Job source connector](/img/docs/getting-started/create-your-first-job/job-source-connector.png) + +![Job source config](/img/docs/getting-started/create-your-first-job/job-source-config.png) + +> 📝 **Planning for CDC?** +> Make sure a **replication slot** exists in Postgres. +> See: [Replication Slot Guide](/docs/connectors/postgres/setup/generic). + +### 3) Configure the Destination (Iceberg + Glue) + +Now we set where the data will land. +Pick **Apache Iceberg** as the destination, and **AWS Glue** as the catalog. + +![Job dest connector](/img/docs/getting-started/create-your-first-job/job-dest-connector.png) + +![Job dest catalog](/img/docs/getting-started/create-your-first-job/job-dest-catalog.png) + +Provide the connection details and **Test Connection**. + +![Job dest config](/img/docs/getting-started/create-your-first-job/job-dest-config.png) + +### 4) Configure Streams + +This is where we dial in *what* to replicate and *how*. +For this walkthrough, we'll: + +- Include stream `fivehundred` +- **Sync mode:** **Full Refresh + CDC** +- **Normalization:** **On** +- **Filter:** `dropoff_datetime >= "2010-01-01 00:00:00"` +- **Partitioning:** by **year** extracted from `dropoff_datetime` +- **Schedule:** every day at **12:00 AM** + +![Job streams page](/img/docs/getting-started/create-your-first-job/job-streams.png) + +Select the checkbox for `fivehundred`, then click the stream name to open stream settings. +Pick the sync mode and toggle **Normalization**. + +![Select stream](/img/docs/getting-started/create-your-first-job/job-stream-select.png) + +Let's make the destination query-friendly. Open **Partitioning** → choose `dropoff_datetime` → **year**. +Want more? Read the [Partitioning Guide](/docs/writers/parquet/partitioning). + +![Stream partitioning](/img/docs/getting-started/create-your-first-job/job-stream-partition.png) + +Add the **Data Filter** so we only move rows from 2010 onward. + +![Stream filter](/img/docs/getting-started/create-your-first-job/job-data-filter.png) + +Click **Next** to continue. + +### 5) Schedule the Job + +Give the job a clear name, set **Every Day @ 12:00 AM**, and hit **Create Job**. + +![Job schedule](/img/docs/getting-started/create-your-first-job/job-schedule.png) + +You're set! 🎉 + +![Job created](/img/docs/getting-started/create-your-first-job/job-creation-success.png) + +Want results right away? Start a run immediately with **Jobs → (⋮) → Sync Now**. + +![Sync now](/img/docs/getting-started/create-your-first-job/job-sync-now.png) + +You'll see status badges on the right (**Running / Failed / Completed**). +For more details, open **Job Logs & History**. + +- Running + ![Job running](/img/docs/getting-started/create-your-first-job/job-running.png) + +- Completed + ![Job success](/img/docs/getting-started/create-your-first-job/job-success.png) + +Finally, verify that data landed in S3/Iceberg as configured: + +![S3 data](/img/docs/getting-started/create-your-first-job/job-data-s3.png) + +### 6) Manage Your Job (from the Jobs page) + +**Sync Now** — Trigger a run without waiting. + +**Edit Streams** — Change which streams are included and tweak replication settings. +Use the stepper to jump between **Source** and **Destination**. + +![Edit streams](/img/docs/getting-started/create-your-first-job/job-edit-streams-page.png) + +> By default, source/destination editing is locked. Click **Edit** to unlock. + +![Edit destination](/img/docs/getting-started/create-your-first-job/job-edit-destination.png) + +> 🔄 **Need to change Partitioning / Filter / Normalization for an existing stream?** +> Unselect the stream → **Save** → reopen **Edit Streams** → re-add it with new settings. + +**Pause Job** — Temporarily stop runs. You'll find paused jobs under **Inactive Jobs**, where you can **Resume** any time. + +![Pause/Resume](/img/docs/getting-started/create-your-first-job/job-resume.png) + +**Job Logs & History** — See all runs. Use **View Logs** for per-run details. + +![Job logs list](/img/docs/getting-started/create-your-first-job/view-logs.png) + +![Logs page](/img/docs/getting-started/create-your-first-job/logs-page.png) + +**Job Settings** — Rename, change frequency, pause, or delete. +Deleting a job moves its source/destination to **inactive** (if not used elsewhere). + +![Job settings](/img/docs/getting-started/create-your-first-job/job-settings.png) + +## Option B — OLake CLI (Docker) + +Prefer terminals, PR reviews, and repeatable runs? Let's do the same pipeline via Docker. + +### Prerequisites + +- **Docker** installed and running +- OLake images: **Docker Hub → `olakego/*`** + +### How the CLI flow works + +1. **Configure source & destination** (JSON files) +2. **Discover streams** → writes a `streams.json` +3. **Edit stream configuration** (normalization, filters, partitions, sync mode) +4. **Run the sync** +5. **Monitor with `stats.json`** + +### What we'll build + +- Source: **Postgres** +- Destination: **Apache Iceberg** (Glue catalog) +- Table: `fivehundred` +- **CDC** mode + **Normalization** +- Filter: `dropoff_datetime >= "2010-01-01 00:00:00"` +- Partition by **year** from `dropoff_datetime` + +### 1) Create Config Files + +We'll put everything under `/path/to/config/`. + +**Source — `source.json`** + +```json title="source.json" +{ + "host": "dz-stag.postgres.database.azure.com", + "port": 5432, + "database": "postgres", + "username": "postgres", + "password": "XXX", + "jdbc_url_params": {}, + "ssl": { "mode": "require" }, + "update_method": { + "replication_slot": "replication_slot", + "intital_wait_time": 120 + }, + "default_mode": "cdc", + "max_threads": 6 +} +``` + +> 📝 If you plan to run CDC, ensure a Postgres **replication slot** exists. +> See: [Replication Slot Guide](/docs/connectors/postgres/setup/generic). + +**Destination — `destination.json`** + +```json title="destination.json" +{ + "type": "ICEBERG", + "writer": { + "iceberg_s3_path": "s3://vz-testing-olake/olake_cli_demo", + "aws_region": "XXX", + "aws_access_key": "XXX", + "aws_secret_key": "XXX", + "iceberg_db": "olake_cli_demo", + "grpc_port": 50051, + "sink_rpc_server_host": "localhost" + } +} +``` + +### 2) Discover Streams + +This pulls available tables and writes `streams.json`. + +```bash +docker run --pull=always \ + -v "/path/to/config:/mnt/config" \ + olakego/source-postgres:latest \ + discover \ + --config /mnt/config/source.json +``` + +*Start logs* +![Discover start](/img/docs/getting-started/create-your-first-job/cli-discover-logs-start.jpeg) + +*Completion* +![Discover end](/img/docs/getting-started/create-your-first-job/cli-discover-logs-end.jpeg) + +> ℹ️ Logs are also written to: +> `/path/to/config/logs/sync_[YYYY-MM-DD]_[HH-MM-SS]/olake.log` + +### 3) Edit `streams.json` + +Select exactly what to move and how. + +* **Select streams** → keep only `fivehundred` under `"selected_streams"`. +* **Normalization** → `"normalization": true` +* **Filter** → `"filter": "dropoff_datetime >= \"2010-01-01 00:00:00\""` +* **Partitioning** → `"partition_regex": "/{dropoff_datetime, year}"` +* **Sync mode** → set the stream's `"sync_mode"` to `"cdc"` + +**Minimal selection block** + +```json title="streams.json (selection)" +{ + "selected_streams": { + "public": [ + { + "partition_regex": "/{dropoff_datetime, year}", + "stream_name": "fivehundred", + "normalization": true, + "filter": "dropoff_datetime >= \"2010-01-01 00:00:00\"" + } + ] + } +} +``` + +**Full stream entry (showing supported modes)** + +```json title="streams.json (stream detail)" +{ + "streams": [ + { + "stream": { + "name": "fivehundred", + "namespace": "public", + "type_schema": { + "properties": { + "dropoff_datetime": { "type": ["timestamp", "null"] } + } + }, + "supported_sync_modes": [ + "strict_cdc", + "full_refresh", + "incremental", + "cdc" + ], + "source_defined_primary_key": [], + "available_cursor_fields": ["id", "pickup_datetime", "rate_code_id"], + "sync_mode": "cdc" + } + } + ] +} +``` + +> 📚 Need a refresher on how modes differ? +> See: [Sync Modes](/docs/understanding/olake-terminologies/stream-properties#sync-modes). + +### 4) Run the Sync + +Kick off replication: + +```bash +docker run --pull=always \ + -v "/path/to/config:/mnt/config" \ + olakego/source-postgres:latest \ + sync \ + --config /mnt/config/streams.json \ + --catalog /mnt/config/catalog.json \ + --destination /mnt/config/destination.json +``` + +*Sync start* +![Sync start](/img/docs/getting-started/create-your-first-job/cli-sync-logs-start.jpeg) + +*Sync completed* +![Sync completed](/img/docs/getting-started/create-your-first-job/cli-sync-logs-end.jpeg) + +### 5) Monitor Progress with `stats.json` + +A `stats.json` appears next to your configs: + +```json title="stats.json" +{ + "Estimated Remaining Time": "0.00 s", + "Memory": "367 mb", + "Running Threads": 0, + "Seconds Elapsed": "34.01", + "Speed": "14.70 rps", + "Synced Records": 500 +} +``` + +Confirm the data in your destination (S3 / Iceberg): + +![Data in Iceberg](/img/docs/getting-started/create-your-first-job/cli-s3-data.png) + +### 6) About the `state.json` (Resumable & CDC-friendly) + +When a sync starts, OLake writes a `state.json` that tracks progress and CDC offsets (e.g., Postgres LSN). +This lets you **resume without duplicates** and continue CDC seamlessly. + +To resume / keep streaming: + +```bash +docker run --pull=always \ + -v "/path/to/config:/mnt/config" \ + olakego/source-postgres:latest \ + sync \ + --config /mnt/config/streams.json \ + --catalog /mnt/config/catalog.json \ + --destination /mnt/config/destination.json \ + --state /mnt/config/state.json +``` + +More details: [State File (Postgres)](/docs/connectors/postgres/config#statejson-configuration) + +--- + +## Quick Q&A + +**UI or CLI—how should I choose?** +If you're new to OLake or prefer a guided setup, start with **UI**. +If you're automating, versioning configs, or scripting in CI, use **CLI**. + +**Why "Full Refresh + CDC"?** +You get a baseline snapshot *and* continuous changes—ideal for keeping downstream analytics fresh. + +**Can I change partitioning later?** + +* **UI**: unselect the stream → save → re-add with updated partitioning/filter/normalization. +* **CLI**: edit `streams.json` and re-run. + +--- + diff --git a/src/components/SlidesCarousel.tsx b/src/components/SlidesCarousel.tsx index 1b29a977..dbf10919 100644 --- a/src/components/SlidesCarousel.tsx +++ b/src/components/SlidesCarousel.tsx @@ -34,6 +34,7 @@ const deriveThumbnail = (url: string): string | null => { if (!match) return null; const fileId = match[1]; // Google Drive thumbnail endpoint. Size w320‑h240 keeps 4:3 ratio + // Note: This may not work due to CORS restrictions return `https://drive.google.com/thumbnail?id=${fileId}&sz=w640`; }; @@ -78,6 +79,8 @@ export const SlidesCarousel: React.FC = ({ > {slides.map((slide) => { const thumb = slide.thumbnail ?? deriveThumbnail(slide.url); + const [imageError, setImageError] = React.useState(false); + return ( = ({ rel="noopener noreferrer" className="group w-64 shrink-0 snap-start rounded-xl border border-gray-200 dark:border-gray-700 hover:shadow-lg transition-shadow bg-white dark:bg-gray-900" > - {thumb ? ( - {slide.title} + {thumb && !imageError ? ( + {slide.title} setImageError(true)} + /> ) : (
diff --git a/src/components/webinars/EnhancedWebinarCard.tsx b/src/components/webinars/EnhancedWebinarCard.tsx index 70ad4f2c..58d3a9c2 100644 --- a/src/components/webinars/EnhancedWebinarCard.tsx +++ b/src/components/webinars/EnhancedWebinarCard.tsx @@ -162,8 +162,8 @@ const EnhancedWebinarCard: React.FC = ({

{title} diff --git a/src/components/webinars/WebinarGrid.tsx b/src/components/webinars/WebinarGrid.tsx index a897c6d4..9e81e8fe 100644 --- a/src/components/webinars/WebinarGrid.tsx +++ b/src/components/webinars/WebinarGrid.tsx @@ -80,7 +80,7 @@ const WebinarGrid: React.FC = ({ webinars }) => {

{/* Title - Enhanced typography */} -

+

{webinar.title}

diff --git a/src/components/webinars/WebinarTitle.tsx b/src/components/webinars/WebinarTitle.tsx index cd9eab81..617194a3 100644 --- a/src/components/webinars/WebinarTitle.tsx +++ b/src/components/webinars/WebinarTitle.tsx @@ -71,7 +71,7 @@ const WebinarTitle: React.FC = ({ title, tag }) => { {tag} )} -

+

{title}

diff --git a/src/data/meetup/8th-meetup.json b/src/data/meetup/8th-meetup.json new file mode 100644 index 00000000..9372e43d --- /dev/null +++ b/src/data/meetup/8th-meetup.json @@ -0,0 +1,38 @@ +{ + "summary": "The eighth OLake community meetup showcased significant new features including Helm deployment capabilities, incremental sync functionality, advanced filtering options, and the new Oracle connector. The team demonstrated how these features address enterprise needs for easier deployment, cost optimization, and broader database support. Akshay Kumar Sharma introduced the Oracle connector as a major addition to OLake's source connectors, while Schitiz Sharma conducted an end-to-end demo showing Oracle CDC with full refresh + incremental sync, followed by Helm deployment to Kubernetes, demonstrating how data flows seamlessly to Iceberg format in S3.", + "chaptersAndTopics": [ + { + "title": "Introduction and New Features Overview", + "details": "Akshay Kumar Sharma opened the eighth community meetup by introducing the latest OLake features designed to address enterprise deployment challenges. He highlighted four key areas: Helm deployment for simplified Kubernetes orchestration, incremental sync for cost optimization, advanced filtering capabilities, and the new Oracle connector for broader database support." + }, + { + "title": "Oracle Connector Introduction", + "details": "Akshay described the new Oracle connector as a significant addition to OLake's source connector ecosystem. While OLake previously supported databases like PostgreSQL, MySQL, and MongoDB, Oracle's widespread adoption in enterprise environments made it a critical addition. He explained how this connector enables organizations to seamlessly integrate their Oracle databases into modern lakehouse architectures." + }, + { + "title": "Incremental Sync Capabilities", + "details": "Akshay explained the evolution of OLake's sync capabilities. Previously, users could configure full refresh and CDC (Change Data Capture) separately. Now, OLake supports full refresh + incremental sync as a unified configuration option, allowing organizations to backfill historical data while maintaining real-time incremental updates. This hybrid approach significantly reduces compute costs by avoiding unnecessary full data reprocessing." + }, + { + "title": "Advanced Filtering Features", + "details": "The team discussed new filtering capabilities that provide organizations with granular control over data ingestion. These filtering options allow users to specify exactly which data should flow through their pipelines, ensuring cleaner datasets and reducing storage costs by excluding unnecessary data from the lakehouse." + }, + { + "title": "Helm Deployment for Enterprise Scale", + "details": "Akshay emphasized the importance of Helm deployment for enterprise adoption. This feature addresses the need for easier deployment and scalability in organizational environments. Helm charts simplify the Kubernetes deployment process, making it easier for DevOps teams to manage OLake installations at scale and integrate with existing CI/CD pipelines." + }, + { + "title": "End-to-End Oracle Demo", + "details": "Schitiz Sharma conducted a comprehensive live demonstration using Oracle as the source connector. He showed the complete workflow from Oracle CDC configuration to data landing in Iceberg format in S3. The demo included setting up full refresh + incremental sync, configuring the Oracle connector, and demonstrating real-time data flow with automatic schema evolution and Iceberg table creation." + }, + { + "title": "Kubernetes Deployment with Helm", + "details": "Schitiz concluded the demo by showcasing the Helm deployment process. He demonstrated how to deploy OLake to a Kubernetes cluster using Helm charts, highlighting the simplified setup process and how it integrates with existing Kubernetes infrastructure. The deployment showed how organizations can easily scale their data pipelines using familiar Kubernetes orchestration tools." + } + ], + "actionItems": [ + "Akshay Kumar Sharma will publish documentation for the new Oracle connector and provide setup guides for enterprise deployments.", + "Schitiz Sharma will create Helm chart documentation and deployment examples for different Kubernetes environments.", + "The team will continue developing additional filtering options and advanced configuration capabilities for enterprise use cases." + ] +} diff --git a/src/pages/community/6th-community-meetup.tsx b/src/pages/community/6th-community-meetup.tsx index 4387d274..1a5afae9 100644 --- a/src/pages/community/6th-community-meetup.tsx +++ b/src/pages/community/6th-community-meetup.tsx @@ -8,7 +8,7 @@ import React from "react"; import Layout from '@theme/Layout'; import Hr from '../../components/Hr'; import MeetupNotes from '../../components/MeetupNotes'; -import meetupData from '../../data/meetup/6th-meetup.json' +import meetupData from '../../data/meetup/6th-meetup.json'; import YouTubeEmbed from '../../components/webinars/YouTubeEmbed'; import SlidesCarousel, { Slide } from '../../components/SlidesCarousel'; diff --git a/src/pages/community/8th-community-meetup.tsx b/src/pages/community/8th-community-meetup.tsx new file mode 100644 index 00000000..9c086404 --- /dev/null +++ b/src/pages/community/8th-community-meetup.tsx @@ -0,0 +1,96 @@ +import WebinarTitle from '../../components/webinars/WebinarTitle'; +import WebinarHosts from '../../components/webinars/WebinarHosts'; +import WebinarCTA from '../../components/webinars/WebinarCTA'; +import WebinarOverview from '../../components/webinars/WebinarOverview'; +import React from "react"; +import Layout from '@theme/Layout'; +import Hr from '../../components/Hr'; +import MeetupNotes from '../../components/MeetupNotes'; +import meetupData from '../../data/meetup/8th-meetup.json'; +import YouTubeEmbed from '../../components/webinars/YouTubeEmbed'; +import SlidesCarousel, { Slide } from '../../components/SlidesCarousel'; + +const hosts = [ + { + name: "Akshay Kumar Sharma", + role: "DevRel @ OLake", + bio: "OLake DevRel and community advocate, passionate about open-source data engineering and lakehouse architectures.", + image: "/img/authors/akshay.jpg", + linkedin: "https://www.linkedin.com/in/akshay-kumar-sharma-devvoyager", + }, + { + name: "Schitiz Sharma", + role: "DevOps Engineer @ OLake", + bio: "OLake Maintainer and DevOps engineer with expertise in Kubernetes deployments, Helm charts, and infrastructure automation for data engineering platforms.", + image: "/img/authors/schitiz.jpg", + linkedin: "https://www.linkedin.com/in/schitizsharma", + }, +]; + +const decks: Slide[] = [ + { title: '8th OLake Community Meetup', url: 'https://docs.google.com/presentation/d/1QvuMbxpsklrmvE2NVacowKcRDnZEZOmc-9FJGprKHgw/edit?usp=sharing' }, +]; + +const CommunityPage = () => { + const communityData = { + title: 'OLake 8th Community Meetup', + summary: 'In this community meetup we showcased an end-to-end demo of OLake\'s latest features, including Oracle CDC, filtering capabilities, incremental sync, and Helm deployment within the OLake UI.', + }; + + return ( + +
+ + + + +
+ +
+ +
+
+ + + +
+
+ + + +
+
+ + + + + + + +
+
+ ); +}; + +export default CommunityPage; diff --git a/src/pages/webinar/index.tsx b/src/pages/webinar/index.tsx index fbb4e198..04d4616a 100644 --- a/src/pages/webinar/index.tsx +++ b/src/pages/webinar/index.tsx @@ -5,9 +5,21 @@ import { FaFileVideo, FaVideo, FaPlay, FaUsers, FaCalendarAlt, FaBroadcastTower const WebinarsPage = () => { const communityMeets = [ + { + title: 'OLake 8th Community Meetup', + subtitle: 'Join us for an end-to-end demo of OLake\'s latest features, showcasing Oracle CDC, filtering capabilities, incremental sync, and Helm deployment within the OLake UI.', + route: '/community/8th-community-meetup', + img: `/img/community/8th-olake-community-call.png`, + alt: 'OLake 8th Community Meetup', + status: 'archived', + button: 'secondary', + CTA: 'Watch Now', + date: '29 August 2025', + icon: FaVideo + }, { title: 'OLake 6th Community Meetup', - subtitle: 'Join us ', + subtitle: 'Join us for a real-world production story from PhysicsWallah showcasing their migration from Redshift to Iceberg-based lakehouse, and explore OLake\'s roadmap including Golang architecture, upcoming UI, and SMT transformations.', route: '/community/6th-community-meetup', img: `/img/community/6th-community-meetup-cover.png`, alt: 'OLake 6th Community Meetup', @@ -19,7 +31,7 @@ const WebinarsPage = () => { }, { title: 'OLake 5th Community Meetup', - subtitle: 'Join us ', + subtitle: 'Join us for a showcase of new features including Apache Iceberg as a destination for AWS S3 and local setups, MongoDB to Iceberg sync capabilities, upcoming MySQL and Postgres sync features, and performance improvements with 2-3x faster syncs.', route: '/community/5th-community-meetup', img: `/img/community/5th-community-meetup-cover.png`, alt: 'OLake 5th Community Meetup', @@ -31,7 +43,7 @@ const WebinarsPage = () => { }, { title: 'OLake 4th Community Meetup', - subtitle: 'Join us ', + subtitle: 'Join us for updates on recent developments including faster target writer for normalization, new stats file for performance metrics, Docker Compose for MongoDB replica sets, Split Vector Strategy, and Iceberg Writer development with schema evolution.', route: '/community/4th-community-meetup', img: `/img/community/4th-community-meetup-cover.png`, alt: 'OLake 4th Community Meetup', @@ -43,7 +55,7 @@ const WebinarsPage = () => { }, { title: 'OLake 3rd Community Meetup', - subtitle: 'Join us ', + subtitle: 'Join us for updates on new features including parquet writer, MongoDB 2.0 connector, Apache Iceberg Writer integration, Postgres Writer development, and a comprehensive demo of OLake\'s CLI functionality with MongoDB to S3 syncing.', route: '/community/3rd-community-meetup', img: `/img/community/3rd-community-meetup-cover.png`, alt: 'OLake 3rd Community Meetup', @@ -56,6 +68,18 @@ const WebinarsPage = () => { ] // Define webinars data directly const webinars = [ + { + title: 'ClickHouse Iceberg Workshop: Unified Lakehouse Architectures', + subtitle: 'Join us for a comprehensive technical workshop exploring ClickHouse\'s experimental Iceberg support and how open table formats are revolutionizing data engineering workflows.', + route: '/webinar/w-9-clickhouse-iceberg-workshop', + img: `/img/webinars/w-9-clickhouse-iceberg-write.png`, + alt: 'ClickHouse Iceberg Workshop: Unified Lakehouse Architectures', + status: 'archived', + button: 'secondary', + CTA: 'Watch Now', + date: '28 August 2025', + icon: FaVideo + }, { title: 'Fastest Apache Iceberg Native CDC: Introducing OLake', subtitle: 'Introducing OLake v0.', diff --git a/src/pages/webinar/w-9-clickhouse-iceberg-workshop.tsx b/src/pages/webinar/w-9-clickhouse-iceberg-workshop.tsx new file mode 100644 index 00000000..2d8a4246 --- /dev/null +++ b/src/pages/webinar/w-9-clickhouse-iceberg-workshop.tsx @@ -0,0 +1,92 @@ +import WebinarTitle from '../../components/webinars/WebinarTitle'; +import WebinarHosts from '../../components/webinars/WebinarHosts'; +import WebinarCTA from '../../components/webinars/WebinarCTA'; +import WebinarOverview from '../../components/webinars/WebinarOverview'; +import WebinarCoverImage from '../../components/webinars/WebinarCoverImage'; + +import CTAButton from '../../components/webinars/CTAButton'; +import YouTubeEmbed from '../../components/webinars/YouTubeEmbed'; + +import Layout from '@theme/Layout'; +import React from "react"; +import Hr from '../../components/Hr'; + +const hosts = [ + { + name: "Shivji Kumar Jha", + role: "Staff Engineer @ Nutanix", + bio: "Founding engineer of Nutanix's data platform team with 30+ conference talks at Apache and CNCF events. Apache Pulsar committer with deep expertise in ClickHouse, distributed systems, and open-source database technologies. Contributed to ClickHouse, MySQL, Apache Pulsar. ", + image: "/img/authors/shivji.jpeg", + linkedin: "https://www.linkedin.com/in/shivjijha/", + }, + { + name: "Saurabh Kumar Ojha", + role: "Software Engineer @ Nutanix", + bio: "Database internals expert with hands-on experience in lakehouse integrations. Open-source enthusiast with contributions to ClickHouse server and ecosystem, Nats, Transferia and more.", + image: "/img/authors/saurabh.jpeg", + linkedin: "https://www.linkedin.com/in/ojhasaurabh2099/", + }, + { + name: "Akshay Kumar Sharma", + role: "DevRel @ OLake", + bio: "OLake DevRel and community advocate, passionate about open-source data engineering and lakehouse architectures.", + image: "/img/authors/akshay.jpg", + linkedin: "https://www.linkedin.com/in/akshay-kumar-sharma-devvoyager", + }, +]; + +const WebinarPage = () => { + const webinarData = { + title: 'ClickHouse Iceberg Workshop: Read & Write to Lakehouse', + summary: 'In this seminar we discussed ClickHouse\'s experimental Iceberg support and how open table formats are revolutionizing data engineering workflows. This hands-on session demonstrated unified lakehouse architectures with cross-engine compatibility.', + }; + + return ( + +
+ + + +
+
+ +
+ +
+ + + +
+
+ + + + + +
+
+ ); +}; + +export default WebinarPage; diff --git a/static/img/authors/saurabh.jpeg b/static/img/authors/saurabh.jpeg new file mode 100644 index 00000000..963b1703 Binary files /dev/null and b/static/img/authors/saurabh.jpeg differ diff --git a/static/img/authors/shivji.jpeg b/static/img/authors/shivji.jpeg new file mode 100644 index 00000000..ca4b1286 Binary files /dev/null and b/static/img/authors/shivji.jpeg differ diff --git a/static/img/blog/cover/pipeline-on-olake.png b/static/img/blog/cover/pipeline-on-olake.png new file mode 100644 index 00000000..0dc8aaec Binary files /dev/null and b/static/img/blog/cover/pipeline-on-olake.png differ diff --git a/static/img/community/8th-olake-community-call.png b/static/img/community/8th-olake-community-call.png new file mode 100644 index 00000000..081ef45c Binary files /dev/null and b/static/img/community/8th-olake-community-call.png differ diff --git a/static/img/webinars/w-9-clickhouse-iceberg-write.png b/static/img/webinars/w-9-clickhouse-iceberg-write.png new file mode 100644 index 00000000..c48e19d1 Binary files /dev/null and b/static/img/webinars/w-9-clickhouse-iceberg-write.png differ