|
| 1 | +<!-- |
| 2 | + Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + or more contributor license agreements. See the NOTICE file |
| 4 | + distributed with this work for additional information |
| 5 | + regarding copyright ownership. The ASF licenses this file |
| 6 | + to you under the Apache License, Version 2.0 (the |
| 7 | + "License"); you may not use this file except in compliance |
| 8 | + with the License. You may obtain a copy of the License at |
| 9 | + |
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | + |
| 12 | + Unless required by applicable law or agreed to in writing, |
| 13 | + software distributed under the License is distributed on an |
| 14 | + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | + KIND, either express or implied. See the License for the |
| 16 | + specific language governing permissions and limitations |
| 17 | + under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# Getting Started with Apache Polaris and Ceph |
| 21 | + |
| 22 | +## Overview |
| 23 | + |
| 24 | +This guide describes how to spin up a **single-node Ceph cluster** with **RADOS Gateway (RGW)** for S3-compatible storage and configure it for use by **Polaris**. |
| 25 | + |
| 26 | +This example cluster is configured for basic access key authentication only. |
| 27 | +It does not include STS (Security Token Service) or temporary credentials. |
| 28 | +All access to the Ceph RGW (RADOS Gateway) and Polaris integration uses static S3-style credentials (as configured via radosgw-admin user create). |
| 29 | + |
| 30 | +Spark is used as a query engine. This example assumes a local Spark installation. |
| 31 | +See the [Spark Notebooks Example](../spark/README.md) for a more advanced Spark setup. |
| 32 | + |
| 33 | +## Starting the Example |
| 34 | + |
| 35 | +Before starting the Ceph + Polaris stack, you’ll need to configure environment variables that define network settings, credentials, and cluster IDs. |
| 36 | + |
| 37 | +The services are started **in sequence**: |
| 38 | +1. Monitor + Manager |
| 39 | +2. OSD |
| 40 | +3. RGW |
| 41 | +4. Polaris |
| 42 | + |
| 43 | +Note: this example pulls the `apache/polaris:latest` image, but assumes the image is `1.2.0-incubating` or later. |
| 44 | + |
| 45 | +### 1. Copy the example environment file |
| 46 | +```shell |
| 47 | +cp .env.example .env |
| 48 | +``` |
| 49 | + |
| 50 | +### 2. Start monitor and manager |
| 51 | +```shell |
| 52 | +docker compose up -d mon1 mgr |
| 53 | +``` |
| 54 | + |
| 55 | +### 3. Start OSD |
| 56 | +```shell |
| 57 | +docker compose up -d osd1 |
| 58 | +``` |
| 59 | + |
| 60 | +### 4. Start RGW |
| 61 | +```shell |
| 62 | +docker compose up -d rgw1 |
| 63 | +``` |
| 64 | +#### Check status |
| 65 | +```shell |
| 66 | +docker exec --interactive --tty ceph-mon1-1 ceph -s |
| 67 | +``` |
| 68 | +You should see something like: |
| 69 | +```yaml |
| 70 | +cluster: |
| 71 | + id: b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e |
| 72 | + health: HEALTH_WARN |
| 73 | + mon is allowing insecure global_id reclaim |
| 74 | + 1 monitors have not enabled msgr2 |
| 75 | + 6 pool(s) have no replicas configured |
| 76 | + |
| 77 | +services: |
| 78 | + mon: 1 daemons, quorum mon1 (age 49m) |
| 79 | + mgr: mgr(active, since 94m) |
| 80 | + osd: 1 osds: 1 up (since 36m), 1 in (since 93m) |
| 81 | + rgw: 1 daemon active (1 hosts, 1 zones) |
| 82 | +``` |
| 83 | +
|
| 84 | +### 5. Create bucket for Polaris storage |
| 85 | +```shell |
| 86 | +docker compose up -d setup_bucket |
| 87 | +``` |
| 88 | + |
| 89 | +### 6. Run Polaris service |
| 90 | +```shell |
| 91 | +docker compose up -d polaris |
| 92 | +``` |
| 93 | + |
| 94 | +### 7. Setup polaris catalog |
| 95 | +```shell |
| 96 | +docker compose up -d polaris-setup |
| 97 | +``` |
| 98 | + |
| 99 | +## 8. Connecting From Spark |
| 100 | + |
| 101 | +```shell |
| 102 | +bin/spark-sql \ |
| 103 | + --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \ |
| 104 | + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ |
| 105 | + --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \ |
| 106 | + --conf spark.sql.catalog.polaris.type=rest \ |
| 107 | + --conf spark.sql.catalog.polaris.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ |
| 108 | + --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \ |
| 109 | + --conf spark.sql.catalog.polaris.token-refresh-enabled=true \ |
| 110 | + --conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \ |
| 111 | + --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \ |
| 112 | + --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation="" \ |
| 113 | + --conf spark.sql.catalog.polaris.credential=root:s3cr3t \ |
| 114 | + --conf spark.sql.catalog.polaris.client.region=irrelevant \ |
| 115 | + --conf spark.sql.catalog.polaris.s3.access-key-id=POLARIS123ACCESS \ |
| 116 | + --conf spark.sql.catalog.polaris.s3.secret-access-key=POLARIS456SECRET |
| 117 | +``` |
| 118 | + |
| 119 | +Note: `s3cr3t` is defined as the password for the `root` user in the `docker-compose.yml` file. |
| 120 | + |
| 121 | +Note: The `client.region` configuration is required for the AWS S3 client to work, but it is not used in this example |
| 122 | +since Ceph does not require a specific region. |
| 123 | + |
| 124 | +## 9. Running Queries |
| 125 | + |
| 126 | +Run inside the Spark SQL shell: |
| 127 | + |
| 128 | +``` |
| 129 | +spark-sql (default)> use polaris; |
| 130 | +Time taken: 0.837 seconds |
| 131 | +
|
| 132 | +spark-sql ()> create namespace ns; |
| 133 | +Time taken: 0.374 seconds |
| 134 | +
|
| 135 | +spark-sql ()> create table ns.t1 as select 'abc'; |
| 136 | +Time taken: 2.192 seconds |
| 137 | +
|
| 138 | +spark-sql ()> select * from ns.t1; |
| 139 | +abc |
| 140 | +Time taken: 0.579 seconds, Fetched 1 row(s) |
| 141 | +``` |
| 142 | +## Lack of Credential Vending |
| 143 | + |
| 144 | +Notice that the Spark configuration does not contain a `X-Iceberg-Access-Delegation` header. |
| 145 | +This is because example cluster does not include STS (Security Token Service) or temporary credentials. |
| 146 | + |
| 147 | +The lack of STS API is represented in the Catalog storage configuration by the |
| 148 | +`stsUnavailable=true` property. |
0 commit comments