Skip to content

Releases: pathwaycom/pathway

v0.27.0

13 Nov 08:44

Choose a tag to compare

Added

  • JetStream extension is now supported in both NATS read and write connectors.
  • The Iceberg connectors now support Glue as a catalog backend.
  • New Table.add_update_timestamp_utc function for tracking update time of rows in the table

Changed

  • BREAKING The API for the Iceberg connectors has changed. The catalog parameter is now required in both pw.io.iceberg.read and pw.io.iceberg.write. This parameter can be either of type pw.io.iceberg.RestCatalog or pw.io.iceberg.GlueCatalog, and it must contain the connection parameters.
  • BREAKING paddlepaddle is no longer a dependency of the Pathway package. The reason is that choosing a specific version for the hardware it will be run on is advantageous from the performance point of view. To install paddlepaddle follow instructions on https://www.paddlepaddle.org.cn/en/install/quick.
  • pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer now supports document reranking. This enables two-stage retrieval where initial vector similarity search is followed by reranking to improve document relevance ordering.

Fixed

  • Endpoints created by pw.io.http.rest_connector now accept requests both with and without a trailing slash. For example, /endpoint/ and /endpoint are now treated equivalently.
  • Schemas that inherit from other schemas now automatically preserve all properties from their parent schemas.
  • Fixed an issue where the persistence configuration failed when provided with a relative filesystem path.
  • Fixed unique name autogeneration for the Python connectors.

v0.26.4

16 Oct 07:20

Choose a tag to compare

Added

  • New external integration with Qdrant.
  • pw.io.mysql.write method for writing to MySQL. It supports two output table types: stream of changes and a realtime-updated data snapshot.

Changed

  • pw.io.deltalake.read now accepts the start_from_timestamp_ms parameter for non-append-only tables. In this case, the connector will replay the history of changes in the table version by version starting from the state of the table at the given timestamp. The differences between versions will be applied atomically.
  • Asynchronous UDFs for connecting to API based llm and embedding models now have by default retry strategy set to pw.udfs.ExponentialRetryStrategy()
  • pw.io.postgres.write method now supports two output table types: stream of changes and realtime-updated data snapshot. The output table type can be chosen with the output_table_type parameter.
  • pw.io.postgres.write_snapshot method has been deprecated.

v0.26.3

03 Oct 09:26

Choose a tag to compare

Added

  • New parser pathway.xpacks.llm.parsers.PaddleOCRParser supporting parsing of PDF, PPTX and images.

v0.26.2

01 Oct 13:48

Choose a tag to compare

Added

  • pw.io.gdrive.read now supports the "only_metadata" format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading object contents.
  • Detailed metrics can now be exported to SQLite. Enable this feature using the environment variable PATHWAY_DETAILED_METRICS_DIR or via pw.set_monitoring_config().
  • pw.io.kinesis.read and pw.io.kinesis.write methods for reading from and writing to AWS Kinesis.

Fixed

  • A bug leading to potentially unbounded memory consumption that could occur in Table.forget and Table.sort operators during multi-worker runs has been fixed.
  • Improved memory efficiency during cold starts by compacting intermediary structures and reducing retained memory after backfilling.

Changed

  • The frequency of background operator snapshot compression in data persistence is limited to the greater of the user-defined snapshot_interval or 30 minutes when S3 or Azure is used as the backend, in order to avoid frequent calls to potentially expensive operations.
  • The Google Drive input connector performance has been improved, especially when handling directories with many nested subdirectories.
  • The MCP server tool method now allows to pass the optional data title, output_schema, annotations and meta to inform the LLM client.
  • Relaxed boto3 dependency to <2.0.0.

v0.26.1

28 Aug 07:58

Choose a tag to compare

Added

  • pw.Table.forget to remove old (in terms of event time) entries from the pipeline.
  • pw.Table.buffer, a stateful buffering operator that delays entries until time_column <= max(time_column) - threshold condition is met.
  • pw.Table.ignore_late to filter out old (in terms of event time) entries.
  • Rows batching for async UDFs. It can be enabled with max_batch_size parameter.

Changed

  • pw.io.subscribe and pw.io.python.write now work with async callbacks.
  • The diff column in tables automatically created by pw.io.postgres.write and pw.io.postgres.write_snapshot in replace and create_if_not_exists initialization modes now uses the smallint type.
  • optimize_transaction_log option has been removed from pw.io.deltalake.TableOptimizer.

Fixed

  • pw.io.postgres.write and pw.io.postgres.write_snapshot now respect the type optionality defined in the Pathway table schema when creating a new PostgreSQL table. This applies to the replace and create_if_not_exists initialization modes.

v0.26.0

14 Aug 08:20

Choose a tag to compare

Added

  • path_filter parameter in pw.io.s3.read and pw.io.minio.read functions. It enables post-filtering of object paths using a wildcard pattern (*, ?), allowing exclusion of paths that pass the main path filter but do not match path_filter.
  • Input connectors now support backpressure control via max_backlog_size, allowing to limit the number of read events in processing per connector. This is useful when the data source emits a large initial burst followed by smaller, incremental updates.
  • pw.reducers.count_distinct and pw.reducers.count_distinct_approximate to count the number of distinct elements in a table. The pw.reducers.count_distinct_approximate allows you to save memory by decreasing the accuracy. It is possible to control this tradeoff by using the precision parameter.
  • pw.Table.join (and its variants) now has two additional parameters - left_exactly_once and right_exactly_once. If the elements from a side of a join should be joined exactly once, *_exactly_once parameter of the side can be set to True. Then after getting a match an entry will be removed from the join state and the memory consumption will be reduced.

Changed

  • Delta table compression logging has been improved: logs now include table names, and verbose messages have been streamlined while preserving details of important processing steps.
  • Improved initialization speed of pw.io.s3.read and pw.io.minio.read.
  • pw.io.s3.read and pw.io.minio.read now limit the number and the total size of objects to be predownloaded.
  • BREAKING optimized the implementation of pw.reducers.min, pw.reducers.max, pw.reducers.argmin, pw.reducers.argmax, pw.reducers.any reducers for append-only tables. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed.
  • BREAKING optimized the implementation of pw.reducers.sum reducer on float and np.ndarray columns. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed.
  • BREAKING the implementation of data persistence has been optimized for the case of many small objects in filesystem and S3 connectors. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
  • BREAKING the data snapshot logic in persistence has been optimized for the case of big input snapshots. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
  • Improved precision of pw.reducers.sum on float columns by introducing Neumeier summation.

v0.25.1

24 Jul 12:09

Choose a tag to compare

Added

  • pw.xpacks.llm.mcp_server.PathwayMcp that allows serving pw.xpacks.llm.document_store.DocumentStore and pw.xpacks.llm.question_answering endpoints as MCP (Model Context Protocol) tools.
  • pw.io.dynamodb.write method for writing to Dynamo DB.

v0.25.0

17 Jul 17:44

Choose a tag to compare

Added

  • pw.io.questdb.write method for writing to Quest DB.
  • pw.io.fs.read now supports the "only_metadata" format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading file contents.

Changed

  • BREAKING The Elasticsearch and BigQuery connectors have been moved to the Scale license tier. You can obtain the Scale tier license for free at https://pathway.com/get-license.
  • BREAKING pw.io.fs.read no longer accepts format="raw". Use format="binary" to read binary objects, format="plaintext_by_file" to read plaintext objects per file, or format="plaintext" to read plaintext objects split into lines.
  • BREAKING The pw.io.s3_csv.read connector has been removed. Please use pw.io.s3.read with format="csv" instead.

Fixed

  • pw.io.s3.read and pw.io.s3.write now also check the AWS_PROFILE environment variable for AWS credentials if none are explicitly provided.

v0.24.1

17 Jul 17:44

Choose a tag to compare

Added

  • Confluent Schema Registry support in Kafka and Redpanda input and output connectors.

Changed

  • pw.io.airbyte.read will now retry the pip install command if it fails during the installation of a connector. It only applies when using the PyPI version of the connector, not the Docker one.

v0.24.0

17 Jul 17:44

Choose a tag to compare

Added

  • pw.io.mqtt.read and pw.io.mqtt.write methods for reading from and writing to MQTT.

Changed

  • pw.xpacks.llm.embedders.SentenceTransformerEmbedder and pw.xpacks.llm.llms.HFPipelineChat are now computed in batches. The maximum size of a single batch can be set in the constructor with the argument max_batch_size.
  • BREAKING Arguments api_key and base_url for pw.xpacks.llm.llms.OpenAIChat can no longer be set in the __call__ method, and instead, if needed, should be set in the constructor.
  • BREAKING Argument api_key for pw.xpacks.llm.llms.OpenAIEmbedder can no longer be set in the __call__ method, and instead, if needed, should be set in the constructor.
  • pw.io.postgres.write now accepts arbitrary types for the values of the postgres_settings dict. If a value is not a string, Python's str() method will be used.

Removed

  • pw.io.kafka.read_from_upstash has been removed, as the managed Kafka service in Upstash has been deprecated.