Skip to content

Commit ff91833

Browse files
committed
Add TCB 68 about Python user-defined functions
1 parent 87b36d4 commit ff91833

File tree

3 files changed

+196
-4
lines changed

3 files changed

+196
-4
lines changed

_episodes/68.md

+196
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
---
2+
layout: episode
3+
title: "68: Year of the Snake - Python UDFs"
4+
date: 2025-01-16
5+
tags: trino python udf user-defined function
6+
youtube_id: "IjncfSBqhbY"
7+
wistia_id: "m6vd7uxd1z"
8+
sections:
9+
- time: 0:00
10+
title: Introduction with Cole, Manfred, and David
11+
- time: 1:10
12+
title: Releases 465-468
13+
- time: 10:47
14+
title: Trino 469 progress and Trino Gateway
15+
- time: 13:00
16+
title: Trino Summit recap
17+
- time: 16:45
18+
title: Python UDF history, architecture, and details
19+
- time: 26:00
20+
title: UDFs in Trino - SQL towards Python
21+
- time: 30:20
22+
title: Python UDF syntax details, performance, and lower level details
23+
- time: 48:22
24+
title: Question about table function UDFs
25+
- time: 51:58
26+
title: How to test and provide feedback
27+
- time: 54:15
28+
title: Trino in 2024 and user survey
29+
- time: 56:00
30+
title: Rounding out
31+
introduction: |
32+
Manfred and Cole are joined by David Phillips to talk about the new support of
33+
user-defined functions written in Python. We discuss motivation, development
34+
history, dive into implementation details, and explore some examples.
35+
---
36+
37+
## Host
38+
39+
* [Manfred Moser](https://www.linkedin.com/in/manfredmoser), Director/Open
40+
Source Engineering and Trino maintainer at
41+
[Starburst]({{site.url}}/users.html#starburst) -
42+
[@simpligility](https://x.com/simpligility)
43+
* [Cole Bowden](https://www.linkedin.com/in/cole-m-bowden), Developer Advocate
44+
at [Firebolt](https://www.firebolt.io/)
45+
46+
## Guests
47+
48+
* [David Phillips](https://github.com/wendigo), Trino co-creator and maintainer
49+
50+
## Releases
51+
52+
Follow are some highlights of the Trino releases since episode 67:
53+
54+
[Trino 465]({{site.baseurl}}/docs/current/release/release-465.html)
55+
56+
* Add support for customer-provided SSE key in S3 file system relevant for Hive,
57+
Iceberg, Delta Lake and Hudi connectors.
58+
* Deterministic data, locale support, and `random_string` function for the Faker
59+
connector.
60+
* Add support for `extra_properties` in the Iceberg connector.
61+
* Add support for the `geometry` type in the PostgreSQL connector.
62+
63+
[Trino 466]({{site.baseurl}}docs/current/release/release-466.html)
64+
65+
* Remove Python requirement for Trino by replacing the `launcher` script.
66+
* Improve client protocol throughput by introducing the spooling protocol and
67+
ship it with documentation, including implementation in the JDBC driver and
68+
the CLI.
69+
* Add support for data access control with Apache Ranger, including support for
70+
column masking, row filtering, and audit logging.
71+
72+
[Trino 467]({{site.baseurl}}docs/current/release/release-467.html)
73+
74+
* Change default for internal communication to HTTP/1.1.
75+
* Add support for OpenTelemetry tracing to the HTTP, Kafka, and MySQL event
76+
listeners.
77+
* Remove the `microdnf` package manager from the Docker image.
78+
* Add the `$all_manifests` metadata tables in the Iceberg connector.
79+
* Add the `$transactions` metadata table in the Delta Lake connector.
80+
81+
[Trino 468]({{site.baseurl}}/docs/current/release/release-468.html)
82+
83+
* Add [Python user-defined functions]({{site.baseurl}}/docs/current/udf/python.html).
84+
* Rename SQL routines to SQL user-defined functions.
85+
* Add cluster overview to the Preview Web UI.
86+
* Improve bucket execution for Hive and Iceberg.
87+
* Add support for non-transactional `MERGE` statements for PostgreSQL.
88+
89+
As always, numerous performance improvements, bug fixes, and other features were
90+
added as well.
91+
92+
## Other news
93+
94+
* [Trino Gateway 13](https://trinodb.github.io/trino-gateway/release-notes/#13)
95+
* [Trino Summit recap]({% post_url 2024-12-18-trino-summit-2024-quick-recap %})
96+
* [Trino in 2024 and beyond]({% post_url 2025-01-07-2024-and-beyond %}), answer
97+
our survey!
98+
* December 2024 Trino maintainer and contributor calls took place virtually.
99+
* Trino Python client 0.332.0 includes support for spooling mode of client
100+
protocol.
101+
102+
## User-defined functions in Trino
103+
104+
First there were [custom plugins with user defined
105+
functions]({{site.baseurl}}/docs/current/develop/functions.html), and for a long
106+
time, that was all there is.
107+
108+
In 2023, David contributed SQL user-defined functions, also known as SQL
109+
routines, and we ran a [competition for examples]({% post_url
110+
2023-11-09-routines %}). Manfred wrote the docs and did a [training session with
111+
Dain and Martin]({% post_url 2023-11-29-sql-training-4 %}). And even back then,
112+
David had plans to add other languages, and started working on Python.
113+
114+
At [Trino Summit in 2024]({% post_url 2024-12-18-trino-summit-2024-quick-recap
115+
%}) Martin Traverso announced the new upcoming feature in the keynote, and with
116+
[Trino 468]({{site.baseurl}}/docs/current/release/release-468.html) we shipped
117+
support for [Python user-defined functions]({{site.baseurl}}/docs/current/udf/python.html).
118+
119+
## Motivation
120+
121+
Why support Python for user-defined functions, as compared to just SQL? Simply
122+
put, more is better, and Python is everywhere. We chat with David about the
123+
details.
124+
125+
## Development history and collaboration
126+
127+
David tell us more about figuring out how to make it all work at all. He touches
128+
on topics such as security, performance, deployment, monitoring, and
129+
collaboration with other projects. We also talk about why other approaches like
130+
using local CPython were discarded.
131+
132+
## Architecture and consequences
133+
134+
In this discussion we talk try to cover the following topics:
135+
136+
* How does it all work?
137+
* What are some restrictions?
138+
* What performance can users expect?
139+
140+
Let's chat about this nesting:
141+
142+
<img src="{{site.baseurl}}/assets/episode/tcb68-python-udf-architecture.png">
143+
144+
## Examples and demo
145+
146+
A simple example from the documentation:
147+
148+
```sql
149+
FUNCTION python_udf_name(input_parameter data_type)
150+
RETURNS result_data_type
151+
LANGUAGE PYTHON
152+
WITH (handler = 'python_function')
153+
AS $$
154+
...
155+
def python_function(input):
156+
return ...
157+
...
158+
$$
159+
```
160+
161+
David shows us more, and we talk about the details.
162+
163+
## Feedback and future work
164+
165+
We are looking for feedback:
166+
167+
* More examples for the documentation for our users
168+
* Use cases and experience testing the feature
169+
* Production deployment experiences
170+
171+
Future work depends on the feedback but definitely includes the following:
172+
173+
* Performance improvements
174+
* Fine-tuning of available Python packages
175+
176+
## Resources
177+
178+
* [Python](https://www.python.org/)
179+
* [WebAssembly (Wasm)](https://webassembly.org/)
180+
* [Chicory](https://chicory.dev/)
181+
* [Trino user-defined functions overview]({{site.baseurl}}/docs/current/udf.html)
182+
* [Python user-defined functions]({{site.baseurl}}/docs/current/udf/python.html)
183+
* [trino-wasm-python](https://github.com/trinodb/trino-wasm-python)
184+
185+
## Rounding out
186+
187+
* You are all invited to chat with us about development at the Trino contributor
188+
call on the 23rd of January.
189+
* Join us on the 30th of January with Mateusz Gajewski to learn about client
190+
protocol improvements.
191+
192+
If you want to learn more about Trino, check out the definitive guide from
193+
O'Reilly. You can get [the free PDF from
194+
Starburst](https://www.starburst.io/info/oreilly-trino-guide/) or buy the
195+
[English, Polish, Chinese, or Japanese
196+
edition]({{site.url}}/trino-the-definitive-guide.html).
56.9 KB
Loading

broadcast/index.md

-4
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,6 @@ interesting developments in the ecosystem around Trino.
2020
## Upcoming episodes
2121

2222
<dl>
23-
<dt>16 Jan 2024: Trino Community Broadcast 68 - Year of the Snake</dt>
24-
<dd>David Phillips joins us to talk about the new support of user-defined
25-
functions written in Python. We discuss motivation, development history, dive
26-
into implementation details, and run some demos.</dd>
2723
<dt>30 Jan 2024: Trino Community Broadcast 69 - Client performance upgrade</dt>
2824
<dd>Mateusz Gajewski discusses the development of the new spooling mode for the
2925
Trino client protocol. We look at cluster configuration, client drivers, and run

0 commit comments

Comments
 (0)