Skip to content

Commit d227603

Browse files
authored
[python] Introduce where for Python CLI table read (apache#7389)
Read data from a Paimon table and display it in a tabular format. ```shell paimon table read mydb.users ``` **Options:** - `--select, -s`: Select specific columns to read (comma-separated) - `--where, -w`: Filter condition in SQL-like syntax - `--limit, -l`: Maximum number of results to display (default: 100) **Examples:** ```shell # Read with limit paimon table read mydb.users -l 50 # Read specific columns paimon table read mydb.users -s id,name,age # Filter with WHERE clause paimon table read mydb.users --where "age > 18" # Combine select, where, and limit paimon table read mydb.users -s id,name -w "age >= 20 AND city = 'Beijing'" -l 50 ``` **WHERE Operators** The `--where` option supports SQL-like filter expressions: | Operator | Example | |---|---| | `=`, `!=`, `<>` | `name = 'Alice'` | | `<`, `<=`, `>`, `>=` | `age > 18` | | `IS NULL`, `IS NOT NULL` | `deleted_at IS NULL` | | `IN (...)`, `NOT IN (...)` | `status IN ('active', 'pending')` | | `BETWEEN ... AND ...` | `age BETWEEN 20 AND 30` | | `LIKE` | `name LIKE 'A%'` | Multiple conditions can be combined with `AND` and `OR` (AND has higher precedence). Parentheses are supported for grouping: ```shell # AND condition paimon table read mydb.users -w "age >= 20 AND age <= 30" # OR condition paimon table read mydb.users -w "city = 'Beijing' OR city = 'Shanghai'" # Parenthesized grouping paimon table read mydb.users -w "(age > 18 OR name = 'Bob') AND city = 'Beijing'" # IN list paimon table read mydb.users -w "city IN ('Beijing', 'Shanghai', 'Hangzhou')" # BETWEEN paimon table read mydb.users -w "age BETWEEN 25 AND 35" # LIKE pattern paimon table read mydb.users -w "name LIKE 'A%'" # IS NULL / IS NOT NULL paimon table read mydb.users -w "email IS NOT NULL" ``` Literal values are automatically cast to the appropriate Python type based on the table schema (e.g., `INT` fields cast to `int`, `DOUBLE` to `float`). Output: ``` id name age city 1 Alice 25 Beijing 2 Bob 30 Shanghai 3 Charlie 35 Guangzhou 4 David 28 Shenzhen 5 Eve 32 Hangzhou ```
1 parent 12e1b60 commit d227603

File tree

5 files changed

+1003
-10
lines changed

5 files changed

+1003
-10
lines changed

docs/content/pypaimon/cli.md

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ paimon table read mydb.users
8383
**Options:**
8484

8585
- `--select, -s`: Select specific columns to read (comma-separated)
86+
- `--where, -w`: Filter condition in SQL-like syntax
8687
- `--limit, -l`: Maximum number of results to display (default: 100)
8788

8889
**Examples:**
@@ -94,10 +95,53 @@ paimon table read mydb.users -l 50
9495
# Read specific columns
9596
paimon table read mydb.users -s id,name,age
9697

97-
# Combine select and limit
98-
paimon table read mydb.users -s id,name -l 50
98+
# Filter with WHERE clause
99+
paimon table read mydb.users --where "age > 18"
100+
101+
# Combine select, where, and limit
102+
paimon table read mydb.users -s id,name -w "age >= 20 AND city = 'Beijing'" -l 50
99103
```
100104

105+
**WHERE Operators**
106+
107+
The `--where` option supports SQL-like filter expressions:
108+
109+
| Operator | Example |
110+
|---|---|
111+
| `=`, `!=`, `<>` | `name = 'Alice'` |
112+
| `<`, `<=`, `>`, `>=` | `age > 18` |
113+
| `IS NULL`, `IS NOT NULL` | `deleted_at IS NULL` |
114+
| `IN (...)`, `NOT IN (...)` | `status IN ('active', 'pending')` |
115+
| `BETWEEN ... AND ...` | `age BETWEEN 20 AND 30` |
116+
| `LIKE` | `name LIKE 'A%'` |
117+
118+
Multiple conditions can be combined with `AND` and `OR` (AND has higher precedence). Parentheses are supported for grouping:
119+
120+
```shell
121+
# AND condition
122+
paimon table read mydb.users -w "age >= 20 AND age <= 30"
123+
124+
# OR condition
125+
paimon table read mydb.users -w "city = 'Beijing' OR city = 'Shanghai'"
126+
127+
# Parenthesized grouping
128+
paimon table read mydb.users -w "(age > 18 OR name = 'Bob') AND city = 'Beijing'"
129+
130+
# IN list
131+
paimon table read mydb.users -w "city IN ('Beijing', 'Shanghai', 'Hangzhou')"
132+
133+
# BETWEEN
134+
paimon table read mydb.users -w "age BETWEEN 25 AND 35"
135+
136+
# LIKE pattern
137+
paimon table read mydb.users -w "name LIKE 'A%'"
138+
139+
# IS NULL / IS NOT NULL
140+
paimon table read mydb.users -w "email IS NOT NULL"
141+
```
142+
143+
Literal values are automatically cast to the appropriate Python type based on the table schema (e.g., `INT` fields cast to `int`, `DOUBLE` to `float`).
144+
101145
Output:
102146
```
103147
id name age city

paimon-python/pypaimon/cli/cli_table.py

Lines changed: 45 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -63,21 +63,46 @@ def cmd_table_read(args):
6363
# Build read pipeline
6464
read_builder = table.new_read_builder()
6565

66-
# Apply projection (select columns) if specified
66+
available_fields = set(field.name for field in table.table_schema.fields)
67+
68+
# Parse select and where options
6769
select_columns = args.select
70+
where_clause = args.where
71+
user_columns = None
72+
extra_where_columns = []
73+
6874
if select_columns:
6975
# Parse column names (comma-separated)
70-
columns = [col.strip() for col in select_columns.split(',')]
71-
76+
user_columns = [col.strip() for col in select_columns.split(',')]
77+
7278
# Validate that all columns exist in the table schema
73-
available_fields = set(field.name for field in table.table_schema.fields)
74-
invalid_columns = [col for col in columns if col not in available_fields]
75-
79+
invalid_columns = [col for col in user_columns if col not in available_fields]
7680
if invalid_columns:
7781
print(f"Error: Column(s) {invalid_columns} do not exist in table '{table_identifier}'.", file=sys.stderr)
7882
sys.exit(1)
79-
80-
read_builder = read_builder.with_projection(columns)
83+
84+
# When both select and where are specified, ensure where-referenced fields
85+
# are included in the projection so the filter can work correctly.
86+
if user_columns and where_clause:
87+
from pypaimon.cli.where_parser import extract_fields_from_where
88+
where_fields = extract_fields_from_where(where_clause, available_fields)
89+
user_column_set = set(user_columns)
90+
extra_where_columns = [f for f in where_fields if f not in user_column_set]
91+
projection_columns = user_columns + extra_where_columns
92+
read_builder = read_builder.with_projection(projection_columns)
93+
elif user_columns:
94+
read_builder = read_builder.with_projection(user_columns)
95+
96+
# Apply where filter if specified
97+
if where_clause:
98+
from pypaimon.cli.where_parser import parse_where_clause
99+
try:
100+
predicate = parse_where_clause(where_clause, table.table_schema.fields)
101+
if predicate:
102+
read_builder = read_builder.with_filter(predicate)
103+
except ValueError as e:
104+
print(f"Error: Invalid WHERE clause: {e}", file=sys.stderr)
105+
sys.exit(1)
81106

82107
# Apply limit if specified
83108
limit = args.limit
@@ -95,6 +120,11 @@ def cmd_table_read(args):
95120
df = read.to_pandas(splits)
96121
if limit and len(df) > limit:
97122
df = df.head(limit)
123+
124+
# Drop extra columns that were added only for where-clause filtering
125+
if extra_where_columns:
126+
df = df.drop(columns=extra_where_columns, errors='ignore')
127+
98128
print(df.to_string(index=False))
99129

100130

@@ -550,6 +580,13 @@ def add_table_subcommands(table_parser):
550580
default=None,
551581
help='Select specific columns to read (comma-separated, e.g., "id,name,age")'
552582
)
583+
read_parser.add_argument(
584+
'--where', '-w',
585+
type=str,
586+
default=None,
587+
help=('Filter condition in SQL-like syntax '
588+
'(e.g., "age > 18", "name = \'Alice\' AND status IN (\'active\', \'pending\')")')
589+
)
553590
read_parser.add_argument(
554591
'--limit', '-l',
555592
type=int,

0 commit comments

Comments
 (0)