Skip to content

Commit 31c7784

Browse files
sharas2050sarunas.svegzda
andauthored
Add Docker-based Ceph + Polaris cluster setup (#3022)
--------- Co-authored-by: sarunas.svegzda <[email protected]>
1 parent bf2261d commit 31c7784

File tree

4 files changed

+430
-0
lines changed

4 files changed

+430
-0
lines changed

getting-started/ceph/.env.example

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing,
13+
# software distributed under the License is distributed on an
14+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
# KIND, either express or implied. See the License for the
16+
# specific language governing permissions and limitations
17+
# under the License.
18+
#
19+
20+
LANG=en_US.utf8 # Default system locale used inside containers
21+
TZ=UTC # Timezone used inside containers
22+
DASHBOARD_PORT=8443 # Port for Ceph Dashboard
23+
INTERNAL_DASHBOARD_PORT=8443 # Internal port for Ceph Dashboard
24+
RGW_PORT=8080 # Port for Rados Gateway
25+
MON_IP=$(hostname -i) # IP address of the monitor
26+
RGW_ACCESS_KEY=POLARIS123ACCESS # Access key for Polaris S3 user
27+
RGW_SECRET_KEY=POLARIS456SECRET # Secret key for Polaris S3 user
28+
FSID=b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e # Unique cluster identifier (use `uuidgen` to regenerate)
29+
OSD_UUID_1=80505106-0d32-4777-bac9-3dfc901b1273 # Unique OSD identifier (use `uuidgen` to regenerate)
30+
S3_ENDPOINT_URL=http://rgw1:7480 # Internal endpoint for S3-compatible RGW service
31+
S3_REGION=us-east-1 # S3 region name
32+
S3_POLARIS_BUCKET=polaris-storage # Default S3 bucket name for Polaris storage

getting-started/ceph/README.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Getting Started with Apache Polaris and Ceph
21+
22+
## Overview
23+
24+
This guide describes how to spin up a **single-node Ceph cluster** with **RADOS Gateway (RGW)** for S3-compatible storage and configure it for use by **Polaris**.
25+
26+
This example cluster is configured for basic access key authentication only.
27+
It does not include STS (Security Token Service) or temporary credentials.
28+
All access to the Ceph RGW (RADOS Gateway) and Polaris integration uses static S3-style credentials (as configured via radosgw-admin user create).
29+
30+
Spark is used as a query engine. This example assumes a local Spark installation.
31+
See the [Spark Notebooks Example](../spark/README.md) for a more advanced Spark setup.
32+
33+
## Starting the Example
34+
35+
Before starting the Ceph + Polaris stack, you’ll need to configure environment variables that define network settings, credentials, and cluster IDs.
36+
37+
The services are started **in sequence**:
38+
1. Monitor + Manager
39+
2. OSD
40+
3. RGW
41+
4. Polaris
42+
43+
Note: this example pulls the `apache/polaris:latest` image, but assumes the image is `1.2.0-incubating` or later.
44+
45+
### 1. Copy the example environment file
46+
```shell
47+
cp .env.example .env
48+
```
49+
50+
### 2. Start monitor and manager
51+
```shell
52+
docker compose up -d mon1 mgr
53+
```
54+
55+
### 3. Start OSD
56+
```shell
57+
docker compose up -d osd1
58+
```
59+
60+
### 4. Start RGW
61+
```shell
62+
docker compose up -d rgw1
63+
```
64+
#### Check status
65+
```shell
66+
docker exec --interactive --tty ceph-mon1-1 ceph -s
67+
```
68+
You should see something like:
69+
```yaml
70+
cluster:
71+
id: b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e
72+
health: HEALTH_WARN
73+
mon is allowing insecure global_id reclaim
74+
1 monitors have not enabled msgr2
75+
6 pool(s) have no replicas configured
76+
77+
services:
78+
mon: 1 daemons, quorum mon1 (age 49m)
79+
mgr: mgr(active, since 94m)
80+
osd: 1 osds: 1 up (since 36m), 1 in (since 93m)
81+
rgw: 1 daemon active (1 hosts, 1 zones)
82+
```
83+
84+
### 5. Create bucket for Polaris storage
85+
```shell
86+
docker compose up -d setup_bucket
87+
```
88+
89+
### 6. Run Polaris service
90+
```shell
91+
docker compose up -d polaris
92+
```
93+
94+
### 7. Setup polaris catalog
95+
```shell
96+
docker compose up -d polaris-setup
97+
```
98+
99+
## 8. Connecting From Spark
100+
101+
```shell
102+
bin/spark-sql \
103+
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \
104+
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
105+
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
106+
--conf spark.sql.catalog.polaris.type=rest \
107+
--conf spark.sql.catalog.polaris.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
108+
--conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
109+
--conf spark.sql.catalog.polaris.token-refresh-enabled=true \
110+
--conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
111+
--conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
112+
--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation="" \
113+
--conf spark.sql.catalog.polaris.credential=root:s3cr3t \
114+
--conf spark.sql.catalog.polaris.client.region=irrelevant \
115+
--conf spark.sql.catalog.polaris.s3.access-key-id=POLARIS123ACCESS \
116+
--conf spark.sql.catalog.polaris.s3.secret-access-key=POLARIS456SECRET
117+
```
118+
119+
Note: `s3cr3t` is defined as the password for the `root` user in the `docker-compose.yml` file.
120+
121+
Note: The `client.region` configuration is required for the AWS S3 client to work, but it is not used in this example
122+
since Ceph does not require a specific region.
123+
124+
## 9. Running Queries
125+
126+
Run inside the Spark SQL shell:
127+
128+
```
129+
spark-sql (default)> use polaris;
130+
Time taken: 0.837 seconds
131+
132+
spark-sql ()> create namespace ns;
133+
Time taken: 0.374 seconds
134+
135+
spark-sql ()> create table ns.t1 as select 'abc';
136+
Time taken: 2.192 seconds
137+
138+
spark-sql ()> select * from ns.t1;
139+
abc
140+
Time taken: 0.579 seconds, Fetched 1 row(s)
141+
```
142+
## Lack of Credential Vending
143+
144+
Notice that the Spark configuration does not contain a `X-Iceberg-Access-Delegation` header.
145+
This is because example cluster does not include STS (Security Token Service) or temporary credentials.
146+
147+
The lack of STS API is represented in the Catalog storage configuration by the
148+
`stsUnavailable=true` property.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing,
13+
# software distributed under the License is distributed on an
14+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
# KIND, either express or implied. See the License for the
16+
# specific language governing permissions and limitations
17+
# under the License.
18+
#
19+
20+
[global]
21+
fsid = b2f59c4b-5f14-4f8c-a9b7-3b7998c76a0e
22+
mon_initial_members = mon1
23+
mon_host = mon1
24+
auth_cluster_required = cephx
25+
auth_service_required = cephx
26+
auth_client_required = cephx
27+
osd_pool_default_size = 1
28+
osd_pool_default_min_size = 1
29+
osd_pool_default_pg_num = 333
30+
osd_crush_chooseleaf_type = 1
31+
mon_allow_pool_size_one= true
32+
33+
[mon.mon1]
34+
mon_data = /var/lib/ceph/mon/ceph-mon1
35+
mon_rocksdb_min_wal_logs = 1
36+
mon_rocksdb_max_total_wal_size = 64M
37+
mon_rocksdb_options = max_background_compactions=4;max_background_flushes=2
38+
39+
[client.rgw1]
40+
host = ceph-rgw1
41+
rgw_frontends = civetweb port=7480

0 commit comments

Comments
 (0)