Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
Release list
v0.31.1
Added
pw.io.elasticsearch.readreads an Elasticsearch index into Pathway. Since Elasticsearch has no change-data-capture API, the connector ingests by polling and reconciling the overlap between consecutive queries, so no row is missed or delivered twice. It is configured withtimestamp_column(a numeric column it watermarks and orders by),id_column(a unique, sortable identifier used to deduplicate the overlap and as the Pathway row key), andmax_transaction_duration(how late a row may still become visible).mode="streaming"(default) keeps polling atpoll_interval;mode="static"reads the index once. The index is read in bounded pages ofread_batch_sizedocuments (default 10 000), each becoming one minibatch, and an idle index is detected and skipped without re-reading the overlap window. With persistence enabled, the connector resumes from the saved watermark, delivering only rows added since the last checkpoint. At startup it warns iftimestamp_columnorid_columnis mapped in a way it cannot poll on (e.g. anid_columnmapped astext, which Elasticsearch cannot sort by).pw.io.clickhouse.writewrites a Pathway table to a ClickHouse table over the native protocol. Two output formats are available viaoutput_table_type: the default"stream_of_changes"appends every change withtime/diffcolumns, while"snapshot"maintains the current state in aReplacingMergeTreekeyed by the requiredprimary_key(query it withSELECT ... FINAL). Theinit_modeparameter ("default","create_if_not_exists","replace") controls whether the connector creates the destination table, and the destination is validated at start-up so a missing table, column, or incompatible type is reported immediately. Most scalar,Optional,list,tuple, and 1-Dnp.ndarraycolumn types are supported (see the connector documentation for the full type mapping).pw.io.iceberg.readnow decodes every Iceberg primitive type. The new arms are:datematerializes asDateTimeNaiveat midnight on the calendar day (Pathway has no date-only type);timematerializes asDurationrepresenting microseconds since midnight (same convention as the PostgresTIMEmapping);uuidmaterializes as the canonical 8-4-4-4-12 hex string when the column is declared asstrin the Pathway schema (or as 16 rawbyteswhen declared asbytes);fixed(N)materializes asbytes;decimal(p, s)materializes as eitherfloat(lossy, with a one-shot startup warning naming each affected column) orstr(lossless decimal text — opt in by declaring the column asstr).pw.io.iceberg.writenow reconciles Pathway types against the destination table's existing schema and writes the narrower / alternatively-encoded representation when the target column already declares one: Pathwayintinto an existingint(32-bit) column with overflow detection, Pathwayfloatinto an existingfloat(32-bit) column (cast to 32-bit float; precision beyond ~7 significant decimal digits is lost), Pathwaystrinto an existingdecimal(p, s)column (parsed as decimal text) oruuidcolumn (parsed as canonical UUID hex), Pathwaybytesinto an existingfixed(N)column (length-checked), and PathwayDurationinto an existingtimecolumn (microseconds since midnight). When Pathway creates the destination table from a Pathway schema, the connector continues to emit the wide representations (longfrom Pathwayint/Duration,doublefromfloat,stringfromstr,binaryfrombytes); choosing a narrow / specialized type at create-time isn't exposed yet. Icebergdateis not supported on write at all — neither at create-time (Pathway has no date-only type to derive from) nor as an existing-column override. Icebergmap<K, V>remains unsupported on both sides.pw.io.iceberg.readandpw.io.iceberg.writenow support Icebergstruct<…>columns through Pathway's positionaltuple[…]type. Tuples are written with synthesized field names[0], [1], …(same conventionpw.io.deltalake.writealready uses). Reads ignore struct field names and bind tuple positions to struct field positions in the destination order; the mapping composes transitively, solist[tuple[…]]works as well. When writing into an existing table whose target column declares a struct with arbitrary field names, the writer adopts the destination's field names automatically, so the user'stuple[…]declaration only needs to align with the destination struct's field order — Pathway has no named-record type that would let a tuple bind to struct fields by name, so reordering the destination struct's fields out-of-band would silently misalign a Pathway pipeline declaring the column astuple[…].pw.io.mongodb.readnow accepts four additional BSON types that were previously dropped at parse time.ObjectIdandDecimal128map tostr(the canonical 24-character hex form and the canonical decimal string respectively);RegularExpressionmaps tostrformatted as"/<pattern>/<options>";Timestampmaps tointcarrying the seconds-since-epochtimecomponent (the companionincrementfield is dropped). When such a value is written back to MongoDB it is stored as an ordinary string (or integer forTimestamp) field rather than under its original BSON type, so a write-then-read round-trip preserves the value but not the original BSON type of the column.pw.io.postgres.writenow accepts aschema_nameparameter for writing to tables in non-default PostgreSQL schemas.pw.io.postgres.writenow supports pre-existingINET,CIDR,MACADDR, andMACADDR8columns from astrPathway column, matching the reader round-trip.pw.io.postgres.readandpw.io.postgres.writenow run extensive preflight validation that surfaces misconfigurations (PostgreSQL types that are not yet supported, array element type mismatches, nullability mismatches,REPLICA IDENTITY NOTHINGon non-append-only streaming tables, etc.) as clear pipeline-start errors instead of silent row drops or opaque worker panics.pw.io.mysql.readreads a MySQL table into Pathway. Inmode="streaming"(the default) it performs Change Data Capture by reading the MySQL binary log: it takes an initial snapshot and then continuously delivers inserts, updates, and deletes (requireslog_binon,binlog_format=ROW,binlog_row_image=FULL, and theREPLICATION SLAVE/REPLICATION CLIENTprivileges). Inmode="static"it reads the table once and terminates. The schema must declare at least one primary-key column. Every type produced bypw.io.mysql.writeround-trips back, and common native MySQL types (DECIMAL,DATE, integer and text families,JSON, …) are parsed as well. Unlike PostgreSQL logical replication, the connector leaves no server-side state behind — there is no replication slot to retain logs and fill the disk; binary-log retention is governed solely by the server's own settings. With persistence enabled, the streaming connector saves the binary-log coordinates and resumes from them on restart, raising a clear error if the needed binary logs have already been purged by the server's normal expiry.
Changed
pw.io.iceberg.readandpw.io.iceberg.writenow retry transient catalog errors automatically (e.g. concurrent-commit conflicts on write, transient REST/Glue catalog failures on read).pw.io.postgres.writenow retries transient PostgreSQL errors automatically — SQLSTATE class 08 (connection exceptions), class 57 (admin / crash shutdown,cannot_connect_now),serialization_failure(40001),deadlock_detected(40P01), and any closed connection are retried up to three times with exponential backoff before the writer surfaces the error. Permanent failures (syntax errors, missing tables, constraint violations) still propagate on the first attempt.pw.io.postgres.read(streaming mode) no longer requiresuser,password, orhostinpostgres_settings. Missing components are omitted from the connection string and resolved by PostgreSQL's standard client defaults (OS user,~/.pgpass, UNIX socket), matching how static mode has always behaved. This unblocks deployments authenticated viatrust,peer,cert, or other passwordlesspg_hba.confmodes.pw.io.postgresconnections now tag themselves in PostgreSQL asapplication_name=pathway[:<name>](where<name>comes from the connector'snameparameter), so operators can identify Pathway sessions inpg_stat_activity,pg_stat_replication, and server logs. The value is sanitized to printable ASCII and truncated to 63 bytes to match PostgreSQL'sNAMEDATALEN. A user-suppliedapplication_nameinpostgres_settingsis left untouched.pw.io.postgresconnections now default to TCP keepalives tuned for roughly five-minute dead-peer detection (keepalives_idle=300,keepalives_interval=30,keepalives_count=3, plustcp_user_timeout=300000), so a SIGKILL'd Pathway process releases its temporary replication slot in minutes rather than the OS-inherited ~2 hour timeline. Each value is only applied when the user has not already set it inpostgres_settings.pw.io.mssql.readandpw.io.mssql.writenow validate configuration and schemas at call/init time, producing clear errors for cases that previously surfaced as opaque SQL Server failures partway through the run: invalidprimary_key(passed instream_of_changesmode, with duplicates, referring to a different table, or withOptionaldtype), schema columns colliding with the auto-appendedtime/diffcolumns or differing only in letter case, non-existent source tables or columns, missing or incompatible destination columns (non-existent, IDENTITY, computed, or requiredNOT NULLcolumns absent from the Pathway schema),Optional[T]fields mapped toNOT NULLdestination columns, and empty or NUL-containingtable_name/ `schem...
v0.31.0
Added
pw.io.sqlite.writeconnector, which writes a Pathway table into a SQLite database file. Supports two modes:stream_of_changes(default) appends each event alongsidetime/diffmetadata columns, whilesnapshotmaintains the current state of the table viaINSERT ... ON CONFLICT DO UPDATEon insertions andDELETEon retractions, keyed on theprimary_keyparameter. Values are encoded using the same storage-class mapping thatpw.io.sqlite.readaccepts, sowrite/readround-trips every supported Pathway type losslessly.init_modecontrols whether the destination table is left as-is, auto-created, or replaced on start-up.pw.io.deltalake.readnow accepts Deltadecimal(p, s)columns. The Pathway type declared in the schema chooses the projection:floatconverts each value through f64 (lossy in general — both because f64 is binary and because its mantissa carries only ~15–17 significant decimal digits) and emits a one-time warning at startup naming each affected column;strformats the unscaled integer with the column's scale and passes the resulting decimal text through unchanged, lossless for the full Delta precision range (up to 38 digits).pw.io.deltalake.writeaccepts a Pathwaystrcolumn when writing into an existing Deltadecimal(p, s)column: each row's text is parsed as decimal and stored as the column's fixed-point value. Combined with the losslessdecimal → strread path, a Deltadecimalcolumn can round-trip through a Pathway pipeline with no precision loss. A string that can't be parsed as a decimal of the column's shape fails the write with an error message naming the offending value, the column's precision and scale, and the specific constraint it violated. Tables that don't contain adecimalcolumn (or that are being created fresh by Pathway) are unaffected.pw.io.deltalake.readnow accepts Deltadatecolumns (mapped ontoDateTimeNaive/DateTimeUtcat midnight on the calendar day, since Pathway has no nativeDatetype) andtimestamp_milliscolumns (mapped onto the same Pathway types with millisecond precision preserved).- The panel widget for table visualization now accepts
page_sizeandtable_heightparameters.
Changed
- BREAKING:
pw.io.iceberg.writeto a Glue catalog no longer acceptsDateTimeUtccolumns. Glue's metastore has no timezone-aware timestamp type, so previous versions silently dropped the timezone on read-back; writes now fail with an explicit error instead of corrupting the zone. To store UTC timestamps in Glue, convert toDateTimeNaivewith UTC-normalized values, or write through the REST catalog, which preserves the timezone. pw.io.sqlite.readnow parses every PathwayValuevariant. In addition toint,float,str,bytes,pw.Json, and theirOptionalforms, the reader now acceptsbool,pw.DateTimeNaive,pw.DateTimeUtc,pw.Duration,pw.Pointer,pw.PyObjectWrapper, homogeneoustuple/list, andnp.ndarray. Composite types are stored asTEXTusing the same JSON encoding thatpw.io.jsonlines.writeemits. Booleans additionally accept PostgreSQL-style textual literals (true/false,yes/no,on/off,t/f,y/n; case-insensitive, whitespace-trimmed), andfloatcolumns tolerate values stored withINTEGERstorage class.pw.io.mssql.readandpw.io.mssql.writenow retry transient SQL Server errors automatically.
Fixed
pw.io.http.rest_connectorno longer raisesTypeError: Cannot instantiate typing.Anywhen a request column has the inferred default schema type (Any). The cast step now skips columns typed asAnyinstead of attempting to call the type as a constructor.pw.io.deltalake.readnow accepts Delta tables whose integer columns use any of the standard Parquet integer widths (INT_8,INT_16,INT_32, unsigned variants), and whose floating-point columns useFLOAT(32-bit) orFLOAT16. Previously the row-level reader only matchedINT_64andDOUBLE, so tables produced by Spark / DuckDB / pandas with explicit narrower casts read back as zero rows with per-row conversion errors.pw.io.deltalake.writepartition columns of typepw.Pointer,pw.Duration, andpw.Jsonnow round-trip correctly throughpw.io.deltalake.read. Previously the values were correctly placed in the partition path on write, but the reader had no decoder for those types and produced a conversion error for every row.
v0.30.1
Added
pw.io.rabbitmq.readandpw.io.rabbitmq.writeconnectors for reading from and writing to RabbitMQ Streams. Supports JSON, plaintext, and raw formats; streaming and static modes; persistence with offset recovery; dynamic topics (writing to different streams per row);start_fromparameter ("beginning","end", or"timestamp"); TLS configuration; and message metadata including AMQP 1.0 properties and application properties. Header values are JSON-encoded for round-trip compatibility. Requires a Pathway Scale or Enterprise license.pw.io.mssql.readconnector, which reads data from a Microsoft SQL Server table. The connector first delivers a full snapshot of the table and then, if the streaming mode is used, tracks incremental changes via SQL Server Change Data Capture (CDC).pw.io.mssql.writeconnector, which writes a Pathway table to a Microsoft SQL Server table. Row additions and updates are applied as MERGE (upsert) statements keyed on the configured primary key columns, and row deletions are applied as DELETE statements.pw.io.milvus.writeconnector, which writes a Pathway table to a Milvus collection. Row additions are sent as upserts and row deletions are sent as deletes keyed on the configured primary key column. Requires a Pathway Scale license.pathway spawnnow supports the--addressesand--process-idflags for multi-machine deployments. Pass a comma-separated list ofhost:portaddresses for all processes and the index of the local process; Pathway will connect the cluster over TCP without requiring all processes to run on the same machine.pw.xpacks.llm.parsers.AudioParser, audio transcription parser based on OpenAI Whisper API. Accepts raw audio bytes and returns transcribed text, following the same interface as other Pathway document parsers.pw.io.leann.writeconnector for writing Pathway tables to LEANN vector indices. LEANN uses graph-based selective recomputation to achieve 97% storage reduction compared to traditional vector databases.pw.iteratenow supports operator persistence. On restart, the iterate operator loads its previous input from an operator snapshot and reconverges inside the loop, allowing incremental processing of new data without replaying the full input stream.
v0.30.0
Added
pw.io.mongodb.readconnector, which reads data from a MongoDB collection. The connector first delivers a full snapshot of the collection and then, if the streaming mode is used, subscribes to the change stream to receive incremental updates in real time.pw.io.postgres.readconnector, which reads data from a PostgreSQL table directly by parsing the Write-Ahead Log (WAL).pw.io.postgres.writeandpw.io.postgres.readnow support serialization/deserialization ofnp.ndarray(int/floatelements), homogeneoustupleandlist(via PostgresARRAY; multidimensional rectangular arrays supported).pw.io.airbyte.readnow accepts adependency_overridesparameter, allowing users to pin specific versions of transitive dependencies (e.g.airbyte-cdk) installed into the connector's virtual environment. This unblocks connectors broken by upstream dependency changes without waiting for upstream fixes.
Changed
- BREAKING:
pw.io.mongodb.writeandpw.io.mongodb.readnow serialize and deserializenp.ndarraycolumns as nested BSON arrays that preserve the array's shape. Previously, all ndarrays were flattened to a single BSON array regardless of dimensionality, making it impossible to reconstruct the original shape on read-back. For 1-D arrays the representation is identical to before ([1, 2, 3]); only multi-dimensional arrays are affected. - BREAKING: The dependencies for
pw.io.pyfilesystem.readare no longer included in the default package installation. To install them, please usepip install pathway[pyfilesystem]. - Asynchronous callback for
pw.io.python.writeis now available aspw.io.OnChangeCallbackAsync. pw.runandpw.run_allnow have theevent_loopparameter to support reusing async state across multiple graph runs.
Fixed
pathway web-dashboardnow waits for the metrics database to be created instead of terminating instantly.
v0.29.1
Added
pw.io.kafka.readandpw.io.kafka.writeconnectors now support OAUTHBEARER authentication.pw.io.mongodb.writeconnector now supports anoutput_table_typeparameter with two modes:stream_of_changes(default) andsnapshot. Insnapshotmode, the connector maintains the current state of the Pathway table in MongoDB using the_idfield as the primary key, whilestream_of_changespreserves the existing behavior by writing all events withtimeanddiffflags to reflect transactional minibatches and the nature of each change.- Workers can now automatically scale up or down based on pipeline load, using a configurable monitoring window. This feature requires persistence to be enabled and can be configured via
worker_scaling_enabledandworkload_tracking_window_msinpw.persistence.Config. Please refer to the tutorial for more details. pw.io.postgres.writenow properly supports TLS configuration viasslmodeandsslrootcertconnection string parameters.
Changed
pw.xpacks.connectors.readnow retries initial connection requests.
v0.29.0
Added
- Pathway Web Dashboard providing user-friendly interface for monitoring Pathway pipelines in real time with interactive graph plotting and latency/memory metrics.
pw.io.kafka.readnow includes message headers in the parsed metadata. The headers are available at the top level of the metadata in theheadersarray. Each element of the array is a pair consisting of a string header name and a base64-encoded header value. If the header is null, the corresponding value is also null.pw.xpacks.llm.llms.BedrockChat- Native AWS Bedrock chat integration using the Converse API. Supports Claude, Llama, Titan, Mistral, and other Bedrock models.pw.xpacks.llm.embedders.BedrockEmbedder- Native AWS Bedrock embedding integration supporting Amazon Titan and Cohere embedding models.
Changed
- Most Python dependencies are now imported only if the related capabilities are used by a program.
- BREAKING: Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS. The string values are forwarded as-is. The
Nonevalue is handled differently: in Kafka, it is serialized as a header without a value, while in NATS it becomes the string"None".
v0.28.0
Added
pw.io.kafka.readandpw.io.redpanda.readnow allow each schema field to be specified as coming from either the message key or the message value.- Connector groups now support the specification of an idle duration. When this is set, if a source does not provide any data for the specified period of time, it will be excluded from the group until it produces data again.
- It is now possible to assign priorities to sources within a connector group. When a priority is set, it ensures that at any moment, the source is not lagging behind any other source with a higher priority in terms of the tracked column.
- Connector groups can now be used in the multiprocess runs.
Changed
- BREAKING: The
__str__anddumpsmethods inpw.Jsonno longer enforce the result to be an ASCII string. This way, the behavior ofpw.debug.compute_and_printis now consistent with other output connectors. - The window functions now internally use deterministic UDFs, where possible.
v0.27.1
[0.27.1] - 2025-12-08
Added
pw.Table.filter_out_results_of_forgettingmethod, allowing to revert the effects of forgetting at a later stage.
Changed
- The MCP server
toolmethod now allows to pass an optionaldescription, default value being kept as the handler's docstring. pw.io.kafka.readandpw.io.redpanda.readnow create akeycolumn storing the contents of the message keys.
v0.27.0
Added
- JetStream extension is now supported in both NATS read and write connectors.
- The Iceberg connectors now support Glue as a catalog backend.
- New
Table.add_update_timestamp_utcfunction for tracking update time of rows in the table
Changed
- BREAKING The API for the Iceberg connectors has changed. The
catalogparameter is now required in bothpw.io.iceberg.readandpw.io.iceberg.write. This parameter can be either of typepw.io.iceberg.RestCatalogorpw.io.iceberg.GlueCatalog, and it must contain the connection parameters. - BREAKING
paddlepaddleis no longer a dependency of the Pathway package. The reason is that choosing a specific version for the hardware it will be run on is advantageous from the performance point of view. To installpaddlepaddlefollow instructions on https://www.paddlepaddle.org.cn/en/install/quick. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerernow supports document reranking. This enables two-stage retrieval where initial vector similarity search is followed by reranking to improve document relevance ordering.
Fixed
- Endpoints created by
pw.io.http.rest_connectornow accept requests both with and without a trailing slash. For example,/endpoint/and/endpointare now treated equivalently. - Schemas that inherit from other schemas now automatically preserve all properties from their parent schemas.
- Fixed an issue where the persistence configuration failed when provided with a relative filesystem path.
- Fixed unique name autogeneration for the Python connectors.
v0.26.4
Added
- New external integration with Qdrant.
pw.io.mysql.writemethod for writing to MySQL. It supports two output table types: stream of changes and a realtime-updated data snapshot.
Changed
pw.io.deltalake.readnow accepts thestart_from_timestamp_msparameter for non-append-only tables. In this case, the connector will replay the history of changes in the table version by version starting from the state of the table at the given timestamp. The differences between versions will be applied atomically.- Asynchronous UDFs for connecting to API based llm and embedding models now have by default retry strategy set to
pw.udfs.ExponentialRetryStrategy() pw.io.postgres.writemethod now supports two output table types: stream of changes and realtime-updated data snapshot. The output table type can be chosen with theoutput_table_typeparameter.pw.io.postgres.write_snapshotmethod has been deprecated.