Skip to content

Commit f2b2c4f

Browse files
committed
[python] Introduce Python SQL to PyPaimon
1 parent bb00b63 commit f2b2c4f

File tree

13 files changed

+923
-1
lines changed

13 files changed

+923
-1
lines changed

.github/workflows/paimon-python-checks.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ jobs:
7171
build-essential \
7272
git \
7373
curl \
74+
pkg-config \
75+
libssl-dev \
7476
&& apt-get clean \
7577
&& rm -rf /var/lib/apt/lists/*
7678
@@ -139,12 +141,22 @@ jobs:
139141
if: matrix.python-version != '3.6.15'
140142
shell: bash
141143
run: |
142-
pip install maturin
144+
pip install maturin[patchelf]
143145
git clone -b support_directory https://github.com/JingsongLi/tantivy-py.git /tmp/tantivy-py
144146
cd /tmp/tantivy-py
145147
maturin build --release
146148
pip install target/wheels/tantivy-*.whl
147149
150+
- name: Build and install pypaimon-rust from source
151+
if: matrix.python-version != '3.6.15'
152+
shell: bash
153+
run: |
154+
git clone https://github.com/apache/paimon-rust.git /tmp/paimon-rust
155+
cd /tmp/paimon-rust/bindings/python
156+
maturin build --release -o dist
157+
pip install dist/pypaimon_rust-*.whl
158+
pip install 'datafusion>=52'
159+
148160
- name: Run lint-python.sh
149161
shell: bash
150162
run: |

docs/content/pypaimon/cli.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -621,3 +621,105 @@ default
621621
mydb
622622
analytics
623623
```
624+
625+
## SQL Command
626+
627+
Execute SQL queries on Paimon tables directly from the command line. This feature is powered by pypaimon-rust and DataFusion.
628+
629+
**Prerequisites:**
630+
631+
```shell
632+
pip install pypaimon[sql]
633+
```
634+
635+
### One-Shot Query
636+
637+
Execute a single SQL query and display the result:
638+
639+
```shell
640+
paimon sql "SELECT * FROM users LIMIT 10"
641+
```
642+
643+
Output:
644+
```
645+
id name age city
646+
1 Alice 25 Beijing
647+
2 Bob 30 Shanghai
648+
3 Charlie 35 Guangzhou
649+
```
650+
651+
**Options:**
652+
653+
- `--format, -f`: Output format: `table` (default) or `json`
654+
655+
**Examples:**
656+
657+
```shell
658+
# Direct table name (uses default catalog and database)
659+
paimon sql "SELECT * FROM users"
660+
661+
# Two-part: database.table
662+
paimon sql "SELECT * FROM mydb.users"
663+
664+
# Query with filter and aggregation
665+
paimon sql "SELECT city, COUNT(*) AS cnt FROM users GROUP BY city ORDER BY cnt DESC"
666+
667+
# Output as JSON
668+
paimon sql "SELECT * FROM users LIMIT 5" --format json
669+
```
670+
671+
### Interactive REPL
672+
673+
Start an interactive SQL session by running `paimon sql` without a query argument. The REPL supports arrow keys for line editing, and command history is persisted across sessions in `~/.paimon_history`.
674+
675+
```shell
676+
paimon sql
677+
```
678+
679+
Output:
680+
```
681+
____ _
682+
/ __ \____ _(_)___ ___ ____ ____
683+
/ /_/ / __ `/ / __ `__ \/ __ \/ __ \
684+
/ ____/ /_/ / / / / / / / /_/ / / / /
685+
/_/ \__,_/_/_/ /_/ /_/\____/_/ /_/
686+
687+
Powered by pypaimon-rust + DataFusion
688+
Type 'help' for usage, 'exit' to quit.
689+
690+
paimon> SHOW DATABASES;
691+
default
692+
mydb
693+
694+
paimon> USE mydb;
695+
Using database 'mydb'.
696+
697+
paimon> SHOW TABLES;
698+
orders
699+
users
700+
701+
paimon> SELECT count(*) AS cnt
702+
> FROM users
703+
> WHERE age > 18;
704+
cnt
705+
42
706+
(1 row in 0.05s)
707+
708+
paimon> exit
709+
Bye!
710+
```
711+
712+
SQL statements end with `;` and can span multiple lines. The continuation prompt ` >` indicates that more input is expected.
713+
714+
**REPL Commands:**
715+
716+
| Command | Description |
717+
|---|---|
718+
| `USE <database>;` | Switch the default database |
719+
| `SHOW DATABASES;` | List all databases |
720+
| `SHOW TABLES;` | List tables in the current database |
721+
| `SELECT ...;` | Execute a SQL query |
722+
| `help` | Show usage information |
723+
| `exit` / `quit` | Exit the REPL |
724+
725+
For more details on SQL syntax and the Python API, see [SQL Query]({{< ref "pypaimon/sql" >}}).

docs/content/pypaimon/sql.md

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
---
2+
title: "SQL Query"
3+
weight: 8
4+
type: docs
5+
aliases:
6+
- /pypaimon/sql.html
7+
---
8+
9+
<!--
10+
Licensed to the Apache Software Foundation (ASF) under one
11+
or more contributor license agreements. See the NOTICE file
12+
distributed with this work for additional information
13+
regarding copyright ownership. The ASF licenses this file
14+
to you under the Apache License, Version 2.0 (the
15+
"License"); you may not use this file except in compliance
16+
with the License. You may obtain a copy of the License at
17+
18+
http://www.apache.org/licenses/LICENSE-2.0
19+
20+
Unless required by applicable law or agreed to in writing,
21+
software distributed under the License is distributed on an
22+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
23+
KIND, either express or implied. See the License for the
24+
specific language governing permissions and limitations
25+
under the License.
26+
-->
27+
28+
# SQL Query
29+
30+
PyPaimon supports executing SQL queries on Paimon tables, powered by [pypaimon-rust](https://github.com/apache/paimon-rust/tree/main/bindings/python) and [DataFusion](https://datafusion.apache.org/python/).
31+
32+
## Installation
33+
34+
SQL query support requires additional dependencies. Install them with:
35+
36+
```shell
37+
pip install pypaimon[sql]
38+
```
39+
40+
This will install `pypaimon-rust` and `datafusion`.
41+
42+
## Usage
43+
44+
SQL query capability is available directly on the `Catalog` object via the `sql()` and `sql_to_pandas()` methods.
45+
46+
### Basic Query
47+
48+
```python
49+
from pypaimon import CatalogFactory
50+
51+
catalog = CatalogFactory.create({"warehouse": "/path/to/warehouse"})
52+
53+
# Execute SQL and get a PyArrow Table
54+
result = catalog.sql("SELECT * FROM my_table")
55+
print(result)
56+
57+
# Execute SQL and get a Pandas DataFrame
58+
df = catalog.sql_to_pandas("SELECT * FROM my_table")
59+
print(df)
60+
```
61+
62+
### Table Reference Format
63+
64+
The default catalog and default database (`default`) are pre-configured, so you can reference tables in two ways:
65+
66+
```python
67+
# Direct table name (uses default database)
68+
catalog.sql("SELECT * FROM my_table")
69+
70+
# Two-part: database.table
71+
catalog.sql("SELECT * FROM mydb.my_table")
72+
```
73+
74+
### Filtering
75+
76+
```python
77+
result = catalog.sql("""
78+
SELECT id, name, age
79+
FROM users
80+
WHERE age > 18 AND city = 'Beijing'
81+
""")
82+
```
83+
84+
### Aggregation
85+
86+
```python
87+
result = catalog.sql("""
88+
SELECT city, COUNT(*) AS cnt, AVG(age) AS avg_age
89+
FROM users
90+
GROUP BY city
91+
ORDER BY cnt DESC
92+
""")
93+
```
94+
95+
### Join
96+
97+
```python
98+
result = catalog.sql("""
99+
SELECT u.name, o.order_id, o.amount
100+
FROM users u
101+
JOIN orders o ON u.id = o.user_id
102+
WHERE o.amount > 100
103+
""")
104+
```
105+
106+
### Subquery
107+
108+
```python
109+
result = catalog.sql("""
110+
SELECT * FROM users
111+
WHERE id IN (
112+
SELECT user_id FROM orders
113+
WHERE amount > 1000
114+
)
115+
""")
116+
```
117+
118+
### Cross-Database Query
119+
120+
```python
121+
# Query a table in another database using two-part syntax
122+
result = catalog.sql("""
123+
SELECT u.name, o.amount
124+
FROM default.users u
125+
JOIN analytics.orders o ON u.id = o.user_id
126+
""")
127+
```
128+
129+
## REST Catalog
130+
131+
SQL query also works with REST catalogs:
132+
133+
```python
134+
from pypaimon import CatalogFactory
135+
136+
catalog = CatalogFactory.create({
137+
"metastore": "rest",
138+
"uri": "http://localhost:8080",
139+
"warehouse": "my_warehouse",
140+
})
141+
142+
result = catalog.sql("SELECT * FROM my_table LIMIT 10")
143+
```
144+
145+
## Supported SQL Syntax
146+
147+
The SQL engine is powered by Apache DataFusion, which supports a rich set of SQL syntax including:
148+
149+
- `SELECT`, `WHERE`, `GROUP BY`, `HAVING`, `ORDER BY`, `LIMIT`
150+
- `JOIN` (INNER, LEFT, RIGHT, FULL, CROSS)
151+
- Subqueries and CTEs (`WITH`)
152+
- Aggregate functions (`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, etc.)
153+
- Window functions (`ROW_NUMBER`, `RANK`, `LAG`, `LEAD`, etc.)
154+
- `UNION`, `INTERSECT`, `EXCEPT`
155+
156+
For the full SQL reference, see the [DataFusion SQL documentation](https://datafusion.apache.org/user-guide/sql/index.html).

paimon-python/pypaimon/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,12 @@
3535
"Schema",
3636
"Tag",
3737
"TagManager",
38+
"SQLContext",
3839
]
40+
41+
42+
def __getattr__(name):
43+
if name == "SQLContext":
44+
from pypaimon.sql.sql_context import SQLContext
45+
return SQLContext
46+
raise AttributeError(f"module 'pypaimon' has no attribute {name}")

paimon-python/pypaimon/catalog/catalog.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,3 +293,37 @@ def list_branches(self, identifier: Identifier) -> List[str]:
293293
raise NotImplementedError(
294294
"list_branches is not supported by this catalog."
295295
)
296+
297+
@abstractmethod
298+
def _get_catalog_options_map(self) -> Dict[str, str]:
299+
"""Return catalog options as a string-to-string dict for pypaimon-rust."""
300+
301+
def _get_sql_context(self):
302+
if self._sql_context is None:
303+
from pypaimon.sql.sql_context import SQLContext
304+
self._sql_context = SQLContext(self._get_catalog_options_map())
305+
return self._sql_context
306+
307+
def sql(self, query: str):
308+
"""Execute a SQL query and return the result as a PyArrow Table.
309+
310+
Requires pypaimon-rust and datafusion. Install with: pip install pypaimon[sql]
311+
312+
Tables can be referenced by name directly or with two-part (database.table) syntax.
313+
314+
Example::
315+
316+
catalog = CatalogFactory.create({"warehouse": "/path/to/warehouse"})
317+
# Direct table name (uses default database)
318+
catalog.sql("SELECT * FROM my_table WHERE id > 10")
319+
# Two-part: database.table
320+
catalog.sql("SELECT * FROM mydb.my_table")
321+
"""
322+
return self._get_sql_context().sql(query)
323+
324+
def sql_to_pandas(self, query: str):
325+
"""Execute a SQL query and return the result as a Pandas DataFrame.
326+
327+
Requires pypaimon-rust and datafusion. Install with: pip install pypaimon[sql]
328+
"""
329+
return self._get_sql_context().sql_to_pandas(query)

paimon-python/pypaimon/catalog/filesystem_catalog.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@ def __init__(self, catalog_options: Options):
4747
self.warehouse = catalog_options.get(CatalogOptions.WAREHOUSE)
4848
self.catalog_options = catalog_options
4949
self.file_io = FileIO.get(self.warehouse, self.catalog_options)
50+
self._sql_context = None
51+
52+
def _get_catalog_options_map(self) -> dict:
53+
return {str(k): str(v) for k, v in self.catalog_options.to_map().items()}
5054

5155
def list_databases(self) -> list:
5256
statuses = self.file_io.list_status(self.warehouse)

paimon-python/pypaimon/catalog/rest/rest_catalog.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,14 @@ def __init__(self, context: CatalogContext, config_required: Optional[bool] = Tr
6868
# FUSE support (lazy import only when enabled)
6969
self.fuse_enabled = self.context.options.get(FuseOptions.FUSE_ENABLED, False)
7070
self._fuse_resolver = None
71+
self._sql_context = None
7172
if self.fuse_enabled:
7273
from pypaimon.catalog.rest.fuse_support import FusePathResolver
7374
self._fuse_resolver = FusePathResolver(self.context.options, self.rest_api)
7475

76+
def _get_catalog_options_map(self) -> dict:
77+
return {str(k): str(v) for k, v in self.context.options.to_map().items()}
78+
7579
def catalog_loader(self):
7680
"""
7781
Create and return a RESTCatalogLoader for this catalog.

paimon-python/pypaimon/cli/cli.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,10 @@ def main():
121121
from pypaimon.cli.cli_catalog import add_catalog_subcommands
122122
add_catalog_subcommands(catalog_parser)
123123

124+
# SQL command
125+
from pypaimon.cli.cli_sql import add_sql_subcommand
126+
add_sql_subcommand(subparsers)
127+
124128
args = parser.parse_args()
125129

126130
if args.command is None:

0 commit comments

Comments
 (0)