Skip to content

Commit 886939c

Browse files
committed
[python] Introduce Python SQL to PyPaimon
1 parent bb00b63 commit 886939c

File tree

13 files changed

+885
-1
lines changed

13 files changed

+885
-1
lines changed

.github/workflows/paimon-python-checks.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ jobs:
7171
build-essential \
7272
git \
7373
curl \
74+
pkg-config \
75+
libssl-dev \
7476
&& apt-get clean \
7577
&& rm -rf /var/lib/apt/lists/*
7678
@@ -139,12 +141,22 @@ jobs:
139141
if: matrix.python-version != '3.6.15'
140142
shell: bash
141143
run: |
142-
pip install maturin
144+
pip install maturin[patchelf]
143145
git clone -b support_directory https://github.com/JingsongLi/tantivy-py.git /tmp/tantivy-py
144146
cd /tmp/tantivy-py
145147
maturin build --release
146148
pip install target/wheels/tantivy-*.whl
147149
150+
- name: Build and install pypaimon-rust from source
151+
if: matrix.python-version != '3.6.15'
152+
shell: bash
153+
run: |
154+
git clone https://github.com/apache/paimon-rust.git /tmp/paimon-rust
155+
cd /tmp/paimon-rust/bindings/python
156+
maturin build --release -o dist
157+
pip install dist/pypaimon_rust-*.whl
158+
pip install 'datafusion>=52'
159+
148160
- name: Run lint-python.sh
149161
shell: bash
150162
run: |

docs/content/pypaimon/cli.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -621,3 +621,105 @@ default
621621
mydb
622622
analytics
623623
```
624+
625+
## SQL Command
626+
627+
Execute SQL queries on Paimon tables directly from the command line. This feature is powered by pypaimon-rust and DataFusion.
628+
629+
**Prerequisites:**
630+
631+
```shell
632+
pip install pypaimon[sql]
633+
```
634+
635+
### One-Shot Query
636+
637+
Execute a single SQL query and display the result:
638+
639+
```shell
640+
paimon sql "SELECT * FROM users LIMIT 10"
641+
```
642+
643+
Output:
644+
```
645+
id name age city
646+
1 Alice 25 Beijing
647+
2 Bob 30 Shanghai
648+
3 Charlie 35 Guangzhou
649+
```
650+
651+
**Options:**
652+
653+
- `--format, -f`: Output format: `table` (default) or `json`
654+
655+
**Examples:**
656+
657+
```shell
658+
# Direct table name (uses default catalog and database)
659+
paimon sql "SELECT * FROM users"
660+
661+
# Two-part: database.table
662+
paimon sql "SELECT * FROM mydb.users"
663+
664+
# Query with filter and aggregation
665+
paimon sql "SELECT city, COUNT(*) AS cnt FROM users GROUP BY city ORDER BY cnt DESC"
666+
667+
# Output as JSON
668+
paimon sql "SELECT * FROM users LIMIT 5" --format json
669+
```
670+
671+
### Interactive REPL
672+
673+
Start an interactive SQL session by running `paimon sql` without a query argument. The REPL supports arrow keys for line editing, and command history is persisted across sessions in `~/.paimon_history`.
674+
675+
```shell
676+
paimon sql
677+
```
678+
679+
Output:
680+
```
681+
____ _
682+
/ __ \____ _(_)___ ___ ____ ____
683+
/ /_/ / __ `/ / __ `__ \/ __ \/ __ \
684+
/ ____/ /_/ / / / / / / / /_/ / / / /
685+
/_/ \__,_/_/_/ /_/ /_/\____/_/ /_/
686+
687+
Powered by pypaimon-rust + DataFusion
688+
Type 'help' for usage, 'exit' to quit.
689+
690+
paimon> SHOW DATABASES;
691+
default
692+
mydb
693+
694+
paimon> USE mydb;
695+
Using database 'mydb'.
696+
697+
paimon> SHOW TABLES;
698+
orders
699+
users
700+
701+
paimon> SELECT count(*) AS cnt
702+
> FROM users
703+
> WHERE age > 18;
704+
cnt
705+
42
706+
(1 row in 0.05s)
707+
708+
paimon> exit
709+
Bye!
710+
```
711+
712+
SQL statements end with `;` and can span multiple lines. The continuation prompt ` >` indicates that more input is expected.
713+
714+
**REPL Commands:**
715+
716+
| Command | Description |
717+
|---|---|
718+
| `USE <database>;` | Switch the default database |
719+
| `SHOW DATABASES;` | List all databases |
720+
| `SHOW TABLES;` | List tables in the current database |
721+
| `SELECT ...;` | Execute a SQL query |
722+
| `help` | Show usage information |
723+
| `exit` / `quit` | Exit the REPL |
724+
725+
For more details on SQL syntax and the Python API, see [SQL Query]({{< ref "pypaimon/sql" >}}).

docs/content/pypaimon/sql.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
title: "SQL Query"
3+
weight: 8
4+
type: docs
5+
aliases:
6+
- /pypaimon/sql.html
7+
---
8+
9+
<!--
10+
Licensed to the Apache Software Foundation (ASF) under one
11+
or more contributor license agreements. See the NOTICE file
12+
distributed with this work for additional information
13+
regarding copyright ownership. The ASF licenses this file
14+
to you under the Apache License, Version 2.0 (the
15+
"License"); you may not use this file except in compliance
16+
with the License. You may obtain a copy of the License at
17+
18+
http://www.apache.org/licenses/LICENSE-2.0
19+
20+
Unless required by applicable law or agreed to in writing,
21+
software distributed under the License is distributed on an
22+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
23+
KIND, either express or implied. See the License for the
24+
specific language governing permissions and limitations
25+
under the License.
26+
-->
27+
28+
# SQL Query
29+
30+
PyPaimon supports executing SQL queries on Paimon tables, powered by [pypaimon-rust](https://github.com/apache/paimon-rust/tree/main/bindings/python) and [DataFusion](https://datafusion.apache.org/python/).
31+
32+
## Installation
33+
34+
SQL query support requires additional dependencies. Install them with:
35+
36+
```shell
37+
pip install pypaimon[sql]
38+
```
39+
40+
This will install `pypaimon-rust` and `datafusion`.
41+
42+
## Usage
43+
44+
Create a `SQLContext`, register one or more catalogs, and run SQL queries.
45+
46+
### Basic Query
47+
48+
```python
49+
from pypaimon import CatalogFactory, SQLContext
50+
51+
catalog = CatalogFactory.create({"warehouse": "/path/to/warehouse"})
52+
53+
ctx = SQLContext()
54+
ctx.register_catalog("paimon", catalog)
55+
ctx.set_current_catalog("paimon")
56+
ctx.set_current_database("default")
57+
58+
# Execute SQL and get PyArrow Table
59+
table = ctx.sql("SELECT * FROM my_table")
60+
print(table)
61+
62+
# Convert to Pandas DataFrame
63+
df = table.to_pandas()
64+
print(df)
65+
```
66+
67+
### Table Reference Format
68+
69+
The default catalog and default database can be configured via `set_current_catalog()` and `set_current_database()`, so you can reference tables in two ways:
70+
71+
```python
72+
# Direct table name (uses default database)
73+
ctx.sql("SELECT * FROM my_table")
74+
75+
# Two-part: database.table
76+
ctx.sql("SELECT * FROM mydb.my_table")
77+
```
78+
79+
### Filtering
80+
81+
```python
82+
table = ctx.sql("""
83+
SELECT id, name, age
84+
FROM users
85+
WHERE age > 18 AND city = 'Beijing'
86+
""")
87+
```
88+
89+
### Aggregation
90+
91+
```python
92+
table = ctx.sql("""
93+
SELECT city, COUNT(*) AS cnt, AVG(age) AS avg_age
94+
FROM users
95+
GROUP BY city
96+
ORDER BY cnt DESC
97+
""")
98+
```
99+
100+
### Join
101+
102+
```python
103+
table = ctx.sql("""
104+
SELECT u.name, o.order_id, o.amount
105+
FROM users u
106+
JOIN orders o ON u.id = o.user_id
107+
WHERE o.amount > 100
108+
""")
109+
```
110+
111+
### Subquery
112+
113+
```python
114+
table = ctx.sql("""
115+
SELECT * FROM users
116+
WHERE id IN (
117+
SELECT user_id FROM orders
118+
WHERE amount > 1000
119+
)
120+
""")
121+
```
122+
123+
### Cross-Database Query
124+
125+
```python
126+
# Query a table in another database using two-part syntax
127+
table = ctx.sql("""
128+
SELECT u.name, o.amount
129+
FROM default.users u
130+
JOIN analytics.orders o ON u.id = o.user_id
131+
""")
132+
```
133+
134+
### Multi-Catalog Query
135+
136+
`SQLContext` supports registering multiple catalogs for cross-catalog queries:
137+
138+
```python
139+
from pypaimon import CatalogFactory, SQLContext
140+
141+
catalog_a = CatalogFactory.create({"warehouse": "/path/to/warehouse_a"})
142+
catalog_b = CatalogFactory.create({
143+
"metastore": "rest",
144+
"uri": "http://localhost:8080",
145+
"warehouse": "warehouse_b",
146+
})
147+
148+
ctx = SQLContext()
149+
ctx.register_catalog("a", catalog_a)
150+
ctx.register_catalog("b", catalog_b)
151+
ctx.set_current_catalog("a")
152+
ctx.set_current_database("default")
153+
154+
# Cross-catalog join
155+
table = ctx.sql("""
156+
SELECT a_users.name, b_orders.amount
157+
FROM a.default.users AS a_users
158+
JOIN b.default.orders AS b_orders ON a_users.id = b_orders.user_id
159+
""")
160+
```
161+
162+
## Supported SQL Syntax
163+
164+
The SQL engine is powered by Apache DataFusion, which supports a rich set of SQL syntax including:
165+
166+
- `SELECT`, `WHERE`, `GROUP BY`, `HAVING`, `ORDER BY`, `LIMIT`
167+
- `JOIN` (INNER, LEFT, RIGHT, FULL, CROSS)
168+
- Subqueries and CTEs (`WITH`)
169+
- Aggregate functions (`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, etc.)
170+
- Window functions (`ROW_NUMBER`, `RANK`, `LAG`, `LEAD`, etc.)
171+
- `UNION`, `INTERSECT`, `EXCEPT`
172+
173+
For the full SQL reference, see the [DataFusion SQL documentation](https://datafusion.apache.org/user-guide/sql/index.html).

paimon-python/pypaimon/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,12 @@
3535
"Schema",
3636
"Tag",
3737
"TagManager",
38+
"SQLContext",
3839
]
40+
41+
42+
def __getattr__(name):
43+
if name == "SQLContext":
44+
from pypaimon.sql.sql_context import SQLContext
45+
return SQLContext
46+
raise AttributeError(f"module 'pypaimon' has no attribute {name}")

paimon-python/pypaimon/catalog/catalog.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,3 +293,9 @@ def list_branches(self, identifier: Identifier) -> List[str]:
293293
raise NotImplementedError(
294294
"list_branches is not supported by this catalog."
295295
)
296+
297+
def _get_catalog_options_map(self) -> Dict[str, str]:
298+
"""Return catalog options as a string-to-string dict for pypaimon-rust."""
299+
raise NotImplementedError(
300+
"SQL query is not supported by this catalog."
301+
)

paimon-python/pypaimon/catalog/filesystem_catalog.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,9 @@ def __init__(self, catalog_options: Options):
4848
self.catalog_options = catalog_options
4949
self.file_io = FileIO.get(self.warehouse, self.catalog_options)
5050

51+
def _get_catalog_options_map(self) -> dict:
52+
return {str(k): str(v) for k, v in self.catalog_options.to_map().items()}
53+
5154
def list_databases(self) -> list:
5255
statuses = self.file_io.list_status(self.warehouse)
5356
database_names = []

paimon-python/pypaimon/catalog/rest/rest_catalog.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,9 @@ def __init__(self, context: CatalogContext, config_required: Optional[bool] = Tr
7272
from pypaimon.catalog.rest.fuse_support import FusePathResolver
7373
self._fuse_resolver = FusePathResolver(self.context.options, self.rest_api)
7474

75+
def _get_catalog_options_map(self) -> dict:
76+
return {str(k): str(v) for k, v in self.context.options.to_map().items()}
77+
7578
def catalog_loader(self):
7679
"""
7780
Create and return a RESTCatalogLoader for this catalog.

paimon-python/pypaimon/cli/cli.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,10 @@ def main():
121121
from pypaimon.cli.cli_catalog import add_catalog_subcommands
122122
add_catalog_subcommands(catalog_parser)
123123

124+
# SQL command
125+
from pypaimon.cli.cli_sql import add_sql_subcommand
126+
add_sql_subcommand(subparsers)
127+
124128
args = parser.parse_args()
125129

126130
if args.command is None:

0 commit comments

Comments
 (0)