|
| 1 | +--- |
| 2 | +layout: episode |
| 3 | +title: "68: Year of the Snake - Python UDFs" |
| 4 | +date: 2025-01-16 |
| 5 | +tags: trino python udf user-defined function |
| 6 | +youtube_id: "IjncfSBqhbY" |
| 7 | +wistia_id: "m6vd7uxd1z" |
| 8 | +sections: |
| 9 | +- time: 0:00 |
| 10 | + title: Introduction with Cole, Manfred, and David |
| 11 | +- time: 1:10 |
| 12 | + title: Releases 465-468 |
| 13 | +- time: 10:47 |
| 14 | + title: Trino 469 progress and Trino Gateway |
| 15 | +- time: 13:00 |
| 16 | + title: Trino Summit recap |
| 17 | +- time: 16:45 |
| 18 | + title: Python UDF history, architecture, and details |
| 19 | +- time: 26:00 |
| 20 | + title: UDFs in Trino - SQL towards Python |
| 21 | +- time: 30:20 |
| 22 | + title: Python UDF syntax details, performance, and lower level details |
| 23 | +- time: 48:22 |
| 24 | + title: Question about table function UDFs |
| 25 | +- time: 51:58 |
| 26 | + title: How to test and provide feedback |
| 27 | +- time: 54:15 |
| 28 | + title: Trino in 2024 and user survey |
| 29 | +- time: 56:00 |
| 30 | + title: Rounding out |
| 31 | +introduction: | |
| 32 | + Manfred and Cole are joined by David Phillips to talk about the new support of |
| 33 | + user-defined functions written in Python. We discuss motivation, development |
| 34 | + history, dive into implementation details, and explore some examples. |
| 35 | +--- |
| 36 | + |
| 37 | +## Host |
| 38 | + |
| 39 | +* [Manfred Moser](https://www.linkedin.com/in/manfredmoser), Director/Open |
| 40 | + Source Engineering and Trino maintainer at |
| 41 | + [Starburst]({{site.url}}/users.html#starburst) - |
| 42 | + [@simpligility](https://x.com/simpligility) |
| 43 | +* [Cole Bowden](https://www.linkedin.com/in/cole-m-bowden), Developer Advocate |
| 44 | + at [Firebolt](https://www.firebolt.io/) |
| 45 | + |
| 46 | +## Guests |
| 47 | + |
| 48 | +* [David Phillips](https://github.com/wendigo), Trino co-creator and maintainer |
| 49 | + |
| 50 | +## Releases |
| 51 | + |
| 52 | +Follow are some highlights of the Trino releases since episode 67: |
| 53 | + |
| 54 | +[Trino 465]({{site.baseurl}}/docs/current/release/release-465.html) |
| 55 | + |
| 56 | +* Add support for customer-provided SSE key in S3 file system relevant for Hive, |
| 57 | + Iceberg, Delta Lake and Hudi connectors. |
| 58 | +* Deterministic data, locale support, and `random_string` function for the Faker |
| 59 | + connector. |
| 60 | +* Add support for `extra_properties` in the Iceberg connector. |
| 61 | +* Add support for the `geometry` type in the PostgreSQL connector. |
| 62 | + |
| 63 | +[Trino 466]({{site.baseurl}}docs/current/release/release-466.html) |
| 64 | + |
| 65 | +* Remove Python requirement for Trino by replacing the `launcher` script. |
| 66 | +* Improve client protocol throughput by introducing the spooling protocol and |
| 67 | + ship it with documentation, including implementation in the JDBC driver and |
| 68 | + the CLI. |
| 69 | +* Add support for data access control with Apache Ranger, including support for |
| 70 | + column masking, row filtering, and audit logging. |
| 71 | + |
| 72 | +[Trino 467]({{site.baseurl}}docs/current/release/release-467.html) |
| 73 | + |
| 74 | +* Change default for internal communication to HTTP/1.1. |
| 75 | +* Add support for OpenTelemetry tracing to the HTTP, Kafka, and MySQL event |
| 76 | + listeners. |
| 77 | +* Remove the `microdnf` package manager from the Docker image. |
| 78 | +* Add the `$all_manifests` metadata tables in the Iceberg connector. |
| 79 | +* Add the `$transactions` metadata table in the Delta Lake connector. |
| 80 | + |
| 81 | +[Trino 468]({{site.baseurl}}/docs/current/release/release-468.html) |
| 82 | + |
| 83 | +* Add [Python user-defined functions]({{site.baseurl}}/docs/current/udf/python.html). |
| 84 | +* Rename SQL routines to SQL user-defined functions. |
| 85 | +* Add cluster overview to the Preview Web UI. |
| 86 | +* Improve bucket execution for Hive and Iceberg. |
| 87 | +* Add support for non-transactional `MERGE` statements for PostgreSQL. |
| 88 | + |
| 89 | +As always, numerous performance improvements, bug fixes, and other features were |
| 90 | +added as well. |
| 91 | + |
| 92 | +## Other news |
| 93 | + |
| 94 | +* [Trino Gateway 13](https://trinodb.github.io/trino-gateway/release-notes/#13) |
| 95 | +* [Trino Summit recap]({% post_url 2024-12-18-trino-summit-2024-quick-recap %}) |
| 96 | +* [Trino in 2024 and beyond]({% post_url 2025-01-07-2024-and-beyond %}), answer |
| 97 | + our survey! |
| 98 | +* December 2024 Trino maintainer and contributor calls took place virtually. |
| 99 | +* Trino Python client 0.332.0 includes support for spooling mode of client |
| 100 | + protocol. |
| 101 | + |
| 102 | +## User-defined functions in Trino |
| 103 | + |
| 104 | +First there were [custom plugins with user defined |
| 105 | +functions]({{site.baseurl}}/docs/current/develop/functions.html), and for a long |
| 106 | +time, that was all there is. |
| 107 | + |
| 108 | +In 2023, David contributed SQL user-defined functions, also known as SQL |
| 109 | +routines, and we ran a [competition for examples]({% post_url |
| 110 | +2023-11-09-routines %}). Manfred wrote the docs and did a [training session with |
| 111 | +Dain and Martin]({% post_url 2023-11-29-sql-training-4 %}). And even back then, |
| 112 | +David had plans to add other languages, and started working on Python. |
| 113 | + |
| 114 | +At [Trino Summit in 2024]({% post_url 2024-12-18-trino-summit-2024-quick-recap |
| 115 | +%}) Martin Traverso announced the new upcoming feature in the keynote, and with |
| 116 | +[Trino 468]({{site.baseurl}}/docs/current/release/release-468.html) we shipped |
| 117 | +support for [Python user-defined functions]({{site.baseurl}}/docs/current/udf/python.html). |
| 118 | + |
| 119 | +## Motivation |
| 120 | + |
| 121 | +Why support Python for user-defined functions, as compared to just SQL? Simply |
| 122 | +put, more is better, and Python is everywhere. We chat with David about the |
| 123 | +details. |
| 124 | + |
| 125 | +## Development history and collaboration |
| 126 | + |
| 127 | +David tell us more about figuring out how to make it all work at all. He touches |
| 128 | +on topics such as security, performance, deployment, monitoring, and |
| 129 | +collaboration with other projects. We also talk about why other approaches like |
| 130 | +using local CPython were discarded. |
| 131 | + |
| 132 | +## Architecture and consequences |
| 133 | + |
| 134 | +In this discussion we talk try to cover the following topics: |
| 135 | + |
| 136 | +* How does it all work? |
| 137 | +* What are some restrictions? |
| 138 | +* What performance can users expect? |
| 139 | + |
| 140 | +Let's chat about this nesting: |
| 141 | + |
| 142 | +<img src="{{site.baseurl}}/assets/episode/tcb68-python-udf-architecture.png"> |
| 143 | + |
| 144 | +## Examples and demo |
| 145 | + |
| 146 | +A simple example from the documentation: |
| 147 | + |
| 148 | +```sql |
| 149 | +FUNCTION python_udf_name(input_parameter data_type) |
| 150 | + RETURNS result_data_type |
| 151 | + LANGUAGE PYTHON |
| 152 | + WITH (handler = 'python_function') |
| 153 | + AS $$ |
| 154 | + ... |
| 155 | + def python_function(input): |
| 156 | + return ... |
| 157 | + ... |
| 158 | + $$ |
| 159 | +``` |
| 160 | + |
| 161 | +David shows us more, and we talk about the details. |
| 162 | + |
| 163 | +## Feedback and future work |
| 164 | + |
| 165 | +We are looking for feedback: |
| 166 | + |
| 167 | +* More examples for the documentation for our users |
| 168 | +* Use cases and experience testing the feature |
| 169 | +* Production deployment experiences |
| 170 | + |
| 171 | +Future work depends on the feedback but definitely includes the following: |
| 172 | + |
| 173 | +* Performance improvements |
| 174 | +* Fine-tuning of available Python packages |
| 175 | + |
| 176 | +## Resources |
| 177 | + |
| 178 | +* [Python](https://www.python.org/) |
| 179 | +* [WebAssembly (Wasm)](https://webassembly.org/) |
| 180 | +* [Chicory](https://chicory.dev/) |
| 181 | +* [Trino user-defined functions overview]({{site.baseurl}}/docs/current/udf.html) |
| 182 | +* [Python user-defined functions]({{site.baseurl}}/docs/current/udf/python.html) |
| 183 | +* [trino-wasm-python](https://github.com/trinodb/trino-wasm-python) |
| 184 | + |
| 185 | +## Rounding out |
| 186 | + |
| 187 | +* You are all invited to chat with us about development at the Trino contributor |
| 188 | + call on the 23rd of January. |
| 189 | +* Join us on the 30th of January with Mateusz Gajewski to learn about client |
| 190 | + protocol improvements. |
| 191 | + |
| 192 | +If you want to learn more about Trino, check out the definitive guide from |
| 193 | +O'Reilly. You can get [the free PDF from |
| 194 | +Starburst](https://www.starburst.io/info/oreilly-trino-guide/) or buy the |
| 195 | +[English, Polish, Chinese, or Japanese |
| 196 | +edition]({{site.url}}/trino-the-definitive-guide.html). |
0 commit comments