Skip to content

Releases: pathwaycom/pathway

v0.31.1

Choose a tag to compare

@github-actions github-actions released this 12 Jun 11:25

Added

  • pw.io.elasticsearch.read reads an Elasticsearch index into Pathway. Since Elasticsearch has no change-data-capture API, the connector ingests by polling and reconciling the overlap between consecutive queries, so no row is missed or delivered twice. It is configured with timestamp_column (a numeric column it watermarks and orders by), id_column (a unique, sortable identifier used to deduplicate the overlap and as the Pathway row key), and max_transaction_duration (how late a row may still become visible). mode="streaming" (default) keeps polling at poll_interval; mode="static" reads the index once. The index is read in bounded pages of read_batch_size documents (default 10 000), each becoming one minibatch, and an idle index is detected and skipped without re-reading the overlap window. With persistence enabled, the connector resumes from the saved watermark, delivering only rows added since the last checkpoint. At startup it warns if timestamp_column or id_column is mapped in a way it cannot poll on (e.g. an id_column mapped as text, which Elasticsearch cannot sort by).
  • pw.io.clickhouse.write writes a Pathway table to a ClickHouse table over the native protocol. Two output formats are available via output_table_type: the default "stream_of_changes" appends every change with time/diff columns, while "snapshot" maintains the current state in a ReplacingMergeTree keyed by the required primary_key (query it with SELECT ... FINAL). The init_mode parameter ("default", "create_if_not_exists", "replace") controls whether the connector creates the destination table, and the destination is validated at start-up so a missing table, column, or incompatible type is reported immediately. Most scalar, Optional, list, tuple, and 1-D np.ndarray column types are supported (see the connector documentation for the full type mapping).
  • pw.io.iceberg.read now decodes every Iceberg primitive type. The new arms are: date materializes as DateTimeNaive at midnight on the calendar day (Pathway has no date-only type); time materializes as Duration representing microseconds since midnight (same convention as the Postgres TIME mapping); uuid materializes as the canonical 8-4-4-4-12 hex string when the column is declared as str in the Pathway schema (or as 16 raw bytes when declared as bytes); fixed(N) materializes as bytes; decimal(p, s) materializes as either float (lossy, with a one-shot startup warning naming each affected column) or str (lossless decimal text — opt in by declaring the column as str).
  • pw.io.iceberg.write now reconciles Pathway types against the destination table's existing schema and writes the narrower / alternatively-encoded representation when the target column already declares one: Pathway int into an existing int (32-bit) column with overflow detection, Pathway float into an existing float (32-bit) column (cast to 32-bit float; precision beyond ~7 significant decimal digits is lost), Pathway str into an existing decimal(p, s) column (parsed as decimal text) or uuid column (parsed as canonical UUID hex), Pathway bytes into an existing fixed(N) column (length-checked), and Pathway Duration into an existing time column (microseconds since midnight). When Pathway creates the destination table from a Pathway schema, the connector continues to emit the wide representations (long from Pathway int / Duration, double from float, string from str, binary from bytes); choosing a narrow / specialized type at create-time isn't exposed yet. Iceberg date is not supported on write at all — neither at create-time (Pathway has no date-only type to derive from) nor as an existing-column override. Iceberg map<K, V> remains unsupported on both sides.
  • pw.io.iceberg.read and pw.io.iceberg.write now support Iceberg struct<…> columns through Pathway's positional tuple[…] type. Tuples are written with synthesized field names [0], [1], … (same convention pw.io.deltalake.write already uses). Reads ignore struct field names and bind tuple positions to struct field positions in the destination order; the mapping composes transitively, so list[tuple[…]] works as well. When writing into an existing table whose target column declares a struct with arbitrary field names, the writer adopts the destination's field names automatically, so the user's tuple[…] declaration only needs to align with the destination struct's field order — Pathway has no named-record type that would let a tuple bind to struct fields by name, so reordering the destination struct's fields out-of-band would silently misalign a Pathway pipeline declaring the column as tuple[…].
  • pw.io.mongodb.read now accepts four additional BSON types that were previously dropped at parse time. ObjectId and Decimal128 map to str (the canonical 24-character hex form and the canonical decimal string respectively); RegularExpression maps to str formatted as "/<pattern>/<options>"; Timestamp maps to int carrying the seconds-since-epoch time component (the companion increment field is dropped). When such a value is written back to MongoDB it is stored as an ordinary string (or integer for Timestamp) field rather than under its original BSON type, so a write-then-read round-trip preserves the value but not the original BSON type of the column.
  • pw.io.postgres.write now accepts a schema_name parameter for writing to tables in non-default PostgreSQL schemas.
  • pw.io.postgres.write now supports pre-existing INET, CIDR, MACADDR, and MACADDR8 columns from a str Pathway column, matching the reader round-trip.
  • pw.io.postgres.read and pw.io.postgres.write now run extensive preflight validation that surfaces misconfigurations (PostgreSQL types that are not yet supported, array element type mismatches, nullability mismatches, REPLICA IDENTITY NOTHING on non-append-only streaming tables, etc.) as clear pipeline-start errors instead of silent row drops or opaque worker panics.
  • pw.io.mysql.read reads a MySQL table into Pathway. In mode="streaming" (the default) it performs Change Data Capture by reading the MySQL binary log: it takes an initial snapshot and then continuously delivers inserts, updates, and deletes (requires log_bin on, binlog_format=ROW, binlog_row_image=FULL, and the REPLICATION SLAVE / REPLICATION CLIENT privileges). In mode="static" it reads the table once and terminates. The schema must declare at least one primary-key column. Every type produced by pw.io.mysql.write round-trips back, and common native MySQL types (DECIMAL, DATE, integer and text families, JSON, …) are parsed as well. Unlike PostgreSQL logical replication, the connector leaves no server-side state behind — there is no replication slot to retain logs and fill the disk; binary-log retention is governed solely by the server's own settings. With persistence enabled, the streaming connector saves the binary-log coordinates and resumes from them on restart, raising a clear error if the needed binary logs have already been purged by the server's normal expiry.

Changed

  • pw.io.iceberg.read and pw.io.iceberg.write now retry transient catalog errors automatically (e.g. concurrent-commit conflicts on write, transient REST/Glue catalog failures on read).
  • pw.io.postgres.write now retries transient PostgreSQL errors automatically — SQLSTATE class 08 (connection exceptions), class 57 (admin / crash shutdown, cannot_connect_now), serialization_failure (40001), deadlock_detected (40P01), and any closed connection are retried up to three times with exponential backoff before the writer surfaces the error. Permanent failures (syntax errors, missing tables, constraint violations) still propagate on the first attempt.
  • pw.io.postgres.read (streaming mode) no longer requires user, password, or host in postgres_settings. Missing components are omitted from the connection string and resolved by PostgreSQL's standard client defaults (OS user, ~/.pgpass, UNIX socket), matching how static mode has always behaved. This unblocks deployments authenticated via trust, peer, cert, or other passwordless pg_hba.conf modes.
  • pw.io.postgres connections now tag themselves in PostgreSQL as application_name=pathway[:<name>] (where <name> comes from the connector's name parameter), so operators can identify Pathway sessions in pg_stat_activity, pg_stat_replication, and server logs. The value is sanitized to printable ASCII and truncated to 63 bytes to match PostgreSQL's NAMEDATALEN. A user-supplied application_name in postgres_settings is left untouched.
  • pw.io.postgres connections now default to TCP keepalives tuned for roughly five-minute dead-peer detection (keepalives_idle=300, keepalives_interval=30, keepalives_count=3, plus tcp_user_timeout=300000), so a SIGKILL'd Pathway process releases its temporary replication slot in minutes rather than the OS-inherited ~2 hour timeline. Each value is only applied when the user has not already set it in postgres_settings.
  • pw.io.mssql.read and pw.io.mssql.write now validate configuration and schemas at call/init time, producing clear errors for cases that previously surfaced as opaque SQL Server failures partway through the run: invalid primary_key (passed in stream_of_changes mode, with duplicates, referring to a different table, or with Optional dtype), schema columns colliding with the auto-appended time/diff columns or differing only in letter case, non-existent source tables or columns, missing or incompatible destination columns (non-existent, IDENTITY, computed, or required NOT NULL columns absent from the Pathway schema), Optional[T] fields mapped to NOT NULL destination columns, and empty or NUL-containing table_name / `schem...
Read more

v0.31.0

Choose a tag to compare

@github-actions github-actions released this 25 May 15:37

Added

  • pw.io.sqlite.write connector, which writes a Pathway table into a SQLite database file. Supports two modes: stream_of_changes (default) appends each event alongside time/diff metadata columns, while snapshot maintains the current state of the table via INSERT ... ON CONFLICT DO UPDATE on insertions and DELETE on retractions, keyed on the primary_key parameter. Values are encoded using the same storage-class mapping that pw.io.sqlite.read accepts, so write / read round-trips every supported Pathway type losslessly. init_mode controls whether the destination table is left as-is, auto-created, or replaced on start-up.
  • pw.io.deltalake.read now accepts Delta decimal(p, s) columns. The Pathway type declared in the schema chooses the projection: float converts each value through f64 (lossy in general — both because f64 is binary and because its mantissa carries only ~15–17 significant decimal digits) and emits a one-time warning at startup naming each affected column; str formats the unscaled integer with the column's scale and passes the resulting decimal text through unchanged, lossless for the full Delta precision range (up to 38 digits).
  • pw.io.deltalake.write accepts a Pathway str column when writing into an existing Delta decimal(p, s) column: each row's text is parsed as decimal and stored as the column's fixed-point value. Combined with the lossless decimal → str read path, a Delta decimal column can round-trip through a Pathway pipeline with no precision loss. A string that can't be parsed as a decimal of the column's shape fails the write with an error message naming the offending value, the column's precision and scale, and the specific constraint it violated. Tables that don't contain a decimal column (or that are being created fresh by Pathway) are unaffected.
  • pw.io.deltalake.read now accepts Delta date columns (mapped onto DateTimeNaive / DateTimeUtc at midnight on the calendar day, since Pathway has no native Date type) and timestamp_millis columns (mapped onto the same Pathway types with millisecond precision preserved).
  • The panel widget for table visualization now accepts page_size and table_height parameters.

Changed

  • BREAKING: pw.io.iceberg.write to a Glue catalog no longer accepts DateTimeUtc columns. Glue's metastore has no timezone-aware timestamp type, so previous versions silently dropped the timezone on read-back; writes now fail with an explicit error instead of corrupting the zone. To store UTC timestamps in Glue, convert to DateTimeNaive with UTC-normalized values, or write through the REST catalog, which preserves the timezone.
  • pw.io.sqlite.read now parses every Pathway Value variant. In addition to int, float, str, bytes, pw.Json, and their Optional forms, the reader now accepts bool, pw.DateTimeNaive, pw.DateTimeUtc, pw.Duration, pw.Pointer, pw.PyObjectWrapper, homogeneous tuple / list, and np.ndarray. Composite types are stored as TEXT using the same JSON encoding that pw.io.jsonlines.write emits. Booleans additionally accept PostgreSQL-style textual literals (true/false, yes/no, on/off, t/f, y/n; case-insensitive, whitespace-trimmed), and float columns tolerate values stored with INTEGER storage class.
  • pw.io.mssql.read and pw.io.mssql.write now retry transient SQL Server errors automatically.

Fixed

  • pw.io.http.rest_connector no longer raises TypeError: Cannot instantiate typing.Any when a request column has the inferred default schema type (Any). The cast step now skips columns typed as Any instead of attempting to call the type as a constructor.
  • pw.io.deltalake.read now accepts Delta tables whose integer columns use any of the standard Parquet integer widths (INT_8, INT_16, INT_32, unsigned variants), and whose floating-point columns use FLOAT (32-bit) or FLOAT16. Previously the row-level reader only matched INT_64 and DOUBLE, so tables produced by Spark / DuckDB / pandas with explicit narrower casts read back as zero rows with per-row conversion errors.
  • pw.io.deltalake.write partition columns of type pw.Pointer, pw.Duration, and pw.Json now round-trip correctly through pw.io.deltalake.read. Previously the values were correctly placed in the partition path on write, but the reader had no decoder for those types and produced a conversion error for every row.

v0.30.1

Choose a tag to compare

@github-actions github-actions released this 23 Apr 08:05

Added

  • pw.io.rabbitmq.read and pw.io.rabbitmq.write connectors for reading from and writing to RabbitMQ Streams. Supports JSON, plaintext, and raw formats; streaming and static modes; persistence with offset recovery; dynamic topics (writing to different streams per row); start_from parameter ("beginning", "end", or "timestamp"); TLS configuration; and message metadata including AMQP 1.0 properties and application properties. Header values are JSON-encoded for round-trip compatibility. Requires a Pathway Scale or Enterprise license.
  • pw.io.mssql.read connector, which reads data from a Microsoft SQL Server table. The connector first delivers a full snapshot of the table and then, if the streaming mode is used, tracks incremental changes via SQL Server Change Data Capture (CDC).
  • pw.io.mssql.write connector, which writes a Pathway table to a Microsoft SQL Server table. Row additions and updates are applied as MERGE (upsert) statements keyed on the configured primary key columns, and row deletions are applied as DELETE statements.
  • pw.io.milvus.write connector, which writes a Pathway table to a Milvus collection. Row additions are sent as upserts and row deletions are sent as deletes keyed on the configured primary key column. Requires a Pathway Scale license.
  • pathway spawn now supports the --addresses and --process-id flags for multi-machine deployments. Pass a comma-separated list of host:port addresses for all processes and the index of the local process; Pathway will connect the cluster over TCP without requiring all processes to run on the same machine.
  • pw.xpacks.llm.parsers.AudioParser, audio transcription parser based on OpenAI Whisper API. Accepts raw audio bytes and returns transcribed text, following the same interface as other Pathway document parsers.
  • pw.io.leann.write connector for writing Pathway tables to LEANN vector indices. LEANN uses graph-based selective recomputation to achieve 97% storage reduction compared to traditional vector databases.
  • pw.iterate now supports operator persistence. On restart, the iterate operator loads its previous input from an operator snapshot and reconverges inside the loop, allowing incremental processing of new data without replaying the full input stream.

v0.30.0

Choose a tag to compare

@github-actions github-actions released this 24 Mar 19:09

Added

  • pw.io.mongodb.read connector, which reads data from a MongoDB collection. The connector first delivers a full snapshot of the collection and then, if the streaming mode is used, subscribes to the change stream to receive incremental updates in real time.
  • pw.io.postgres.read connector, which reads data from a PostgreSQL table directly by parsing the Write-Ahead Log (WAL).
  • pw.io.postgres.write and pw.io.postgres.read now support serialization/deserialization of np.ndarray (int/float elements), homogeneous tuple and list (via Postgres ARRAY; multidimensional rectangular arrays supported).
  • pw.io.airbyte.read now accepts a dependency_overrides parameter, allowing users to pin specific versions of transitive dependencies (e.g. airbyte-cdk) installed into the connector's virtual environment. This unblocks connectors broken by upstream dependency changes without waiting for upstream fixes.

Changed

  • BREAKING: pw.io.mongodb.write and pw.io.mongodb.read now serialize and deserialize np.ndarray columns as nested BSON arrays that preserve the array's shape. Previously, all ndarrays were flattened to a single BSON array regardless of dimensionality, making it impossible to reconstruct the original shape on read-back. For 1-D arrays the representation is identical to before ([1, 2, 3]); only multi-dimensional arrays are affected.
  • BREAKING: The dependencies for pw.io.pyfilesystem.read are no longer included in the default package installation. To install them, please use pip install pathway[pyfilesystem].
  • Asynchronous callback for pw.io.python.write is now available as pw.io.OnChangeCallbackAsync.
  • pw.run and pw.run_all now have the event_loop parameter to support reusing async state across multiple graph runs.

Fixed

  • pathway web-dashboard now waits for the metrics database to be created instead of terminating instantly.

v0.29.1

Choose a tag to compare

@github-actions github-actions released this 16 Feb 13:48

Added

  • pw.io.kafka.read and pw.io.kafka.write connectors now support OAUTHBEARER authentication.
  • pw.io.mongodb.write connector now supports an output_table_type parameter with two modes: stream_of_changes (default) and snapshot. In snapshot mode, the connector maintains the current state of the Pathway table in MongoDB using the _id field as the primary key, while stream_of_changes preserves the existing behavior by writing all events with time and diff flags to reflect transactional minibatches and the nature of each change.
  • Workers can now automatically scale up or down based on pipeline load, using a configurable monitoring window. This feature requires persistence to be enabled and can be configured via worker_scaling_enabled and workload_tracking_window_ms in pw.persistence.Config. Please refer to the tutorial for more details.
  • pw.io.postgres.write now properly supports TLS configuration via sslmode and sslrootcert connection string parameters.

Changed

  • pw.xpacks.connectors.read now retries initial connection requests.

v0.29.0

Choose a tag to compare

@github-actions github-actions released this 22 Jan 07:18

Added

  • Pathway Web Dashboard providing user-friendly interface for monitoring Pathway pipelines in real time with interactive graph plotting and latency/memory metrics.
  • pw.io.kafka.read now includes message headers in the parsed metadata. The headers are available at the top level of the metadata in the headers array. Each element of the array is a pair consisting of a string header name and a base64-encoded header value. If the header is null, the corresponding value is also null.
  • pw.xpacks.llm.llms.BedrockChat - Native AWS Bedrock chat integration using the Converse API. Supports Claude, Llama, Titan, Mistral, and other Bedrock models.
  • pw.xpacks.llm.embedders.BedrockEmbedder - Native AWS Bedrock embedding integration supporting Amazon Titan and Cohere embedding models.

Changed

  • Most Python dependencies are now imported only if the related capabilities are used by a program.
  • BREAKING: Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS. The string values are forwarded as-is. The None value is handled differently: in Kafka, it is serialized as a header without a value, while in NATS it becomes the string "None".

v0.28.0

Choose a tag to compare

@github-actions github-actions released this 08 Jan 08:06

Added

  • pw.io.kafka.read and pw.io.redpanda.read now allow each schema field to be specified as coming from either the message key or the message value.
  • Connector groups now support the specification of an idle duration. When this is set, if a source does not provide any data for the specified period of time, it will be excluded from the group until it produces data again.
  • It is now possible to assign priorities to sources within a connector group. When a priority is set, it ensures that at any moment, the source is not lagging behind any other source with a higher priority in terms of the tracked column.
  • Connector groups can now be used in the multiprocess runs.

Changed

  • BREAKING: The __str__ and dumps methods in pw.Json no longer enforce the result to be an ASCII string. This way, the behavior of pw.debug.compute_and_print is now consistent with other output connectors.
  • The window functions now internally use deterministic UDFs, where possible.

v0.27.1

Choose a tag to compare

@github-actions github-actions released this 08 Dec 13:01

[0.27.1] - 2025-12-08

Added

  • pw.Table.filter_out_results_of_forgetting method, allowing to revert the effects of forgetting at a later stage.

Changed

  • The MCP server tool method now allows to pass an optional description, default value ​​being kept as the handler's docstring.
  • pw.io.kafka.read and pw.io.redpanda.read now create a key column storing the contents of the message keys.

v0.27.0

Choose a tag to compare

@github-actions github-actions released this 13 Nov 08:44

Added

  • JetStream extension is now supported in both NATS read and write connectors.
  • The Iceberg connectors now support Glue as a catalog backend.
  • New Table.add_update_timestamp_utc function for tracking update time of rows in the table

Changed

  • BREAKING The API for the Iceberg connectors has changed. The catalog parameter is now required in both pw.io.iceberg.read and pw.io.iceberg.write. This parameter can be either of type pw.io.iceberg.RestCatalog or pw.io.iceberg.GlueCatalog, and it must contain the connection parameters.
  • BREAKING paddlepaddle is no longer a dependency of the Pathway package. The reason is that choosing a specific version for the hardware it will be run on is advantageous from the performance point of view. To install paddlepaddle follow instructions on https://www.paddlepaddle.org.cn/en/install/quick.
  • pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer now supports document reranking. This enables two-stage retrieval where initial vector similarity search is followed by reranking to improve document relevance ordering.

Fixed

  • Endpoints created by pw.io.http.rest_connector now accept requests both with and without a trailing slash. For example, /endpoint/ and /endpoint are now treated equivalently.
  • Schemas that inherit from other schemas now automatically preserve all properties from their parent schemas.
  • Fixed an issue where the persistence configuration failed when provided with a relative filesystem path.
  • Fixed unique name autogeneration for the Python connectors.

v0.26.4

Choose a tag to compare

@github-actions github-actions released this 16 Oct 07:20

Added

  • New external integration with Qdrant.
  • pw.io.mysql.write method for writing to MySQL. It supports two output table types: stream of changes and a realtime-updated data snapshot.

Changed

  • pw.io.deltalake.read now accepts the start_from_timestamp_ms parameter for non-append-only tables. In this case, the connector will replay the history of changes in the table version by version starting from the state of the table at the given timestamp. The differences between versions will be applied atomically.
  • Asynchronous UDFs for connecting to API based llm and embedding models now have by default retry strategy set to pw.udfs.ExponentialRetryStrategy()
  • pw.io.postgres.write method now supports two output table types: stream of changes and realtime-updated data snapshot. The output table type can be chosen with the output_table_type parameter.
  • pw.io.postgres.write_snapshot method has been deprecated.