Releases · pathwaycom/pathway · GitHub

12 Jun 08:22

v0.23.0

Changed

BREAKING: To use pw.sql you now have to install pathway[sql].

Fixed

pw.io.deltalake.read now correctly reads data from partitioned tables in all cases.
Added retries for all cloud-based persistence backend operations to improve reliability.

Assets 6

05 Jun 10:48

v0.22.0

Added

Data persistence can now be configured to use Azure Blob Storage as a backend. An Azure backend instance can be created using pw.persistence.Backend.azure and included in the persistence config.
Added batching to UDFs. It is now possible to make UDFs operate on batches of data instead of single rows. To do so max_batch_size argument has to be set.

Changed

BREAKING: when creating pw.DateTimeUtc it is now obligatory to pass the time zone information.
BREAKING: when creating pw.DateTimeNaive passing time zone information is not allowed.
BREAKING: expressions are now evaluated in batches. Generally, it speeds up the computations but might increase the memory usage if the intermediate state in the expressions is large.

Fixed

Synchronization groups now correctly handle cases where the source file-like object is updated during the reading process.

Assets 6

29 May 07:49

v0.21.6

Added

sort_by method to pw.BaseCustomAccumulator that allows to sort rows within a single batch. When sort_by is defined the rows are reduced in the order specified by the sort_by method. It can for example be used to process entries in the order of event time.

Changed

pw.Table.debug now prints a whole row in a single line instead of printing each cell separately.
Calling functions without arguments in YAML configurations files is now deprecated in pw.load_yaml. To call the function a mapping should be passed, e.g. empty mapping as {}. In the future ! syntax without any mapping will be used to pass function objects without calling them.
The license check error message now provides a more detailed explanation of the failure.
When code is run using pathway spawn with multiple processes, if one process terminates with an error, all other processes will also be terminated.
pw.xpacks.llm.vector_store.VectorStoreServer is being deprecated, and it is now subclass of pw.xpacks.llm.document_store.DocumentStore. Public API is being kept the same, however users are encouraged to switch to using DocumentStore from now on.
pw.xpacks.llm.vector_store.VectorStoreClient is being deprecated in favor of pw.xpacks.llm.document_store.DocumentStoreClient.
pw.io.deltalake.write can now maintain the target table's snapshot on the output.

Assets 6

09 May 07:50

v0.21.5

Changed

pw.io.deltalake.read now processes Delta table version updates atomically, applying all changes together in a single minibatch.
The panel widget for table visualization now has a horizontal scroll bar for large tables.
Added the possibility to return value from any column from pw.reducers.argmax and pw.reducers.argmin, not only id.

Fixed

pw.reducers.argmax and pw.reducers.argmin work correctly with the result of pw.Table.windowby.

Assets 6

24 Apr 14:37

v0.21.4

Added

pw.io.kafka.read and pw.io.redpanda.read now support static mode.

Changed

The inactivity_detection function is now a method for append only tables. It no longer relies on an event timestamp column but now uses table processing times to detect inactivity periods.

Assets 6

24 Apr 14:37

v0.21.3

Fixed

The performance of input connectors is optimized in certain cases.
The panel widget for table visualization does now a better formatting for timestamps and missing values. The pagination was also updated to better fit the widget and the default sorters in snapshot mode have been fixed.

Assets 6

10 Apr 07:28

v0.21.2

Added

Added synchronization group mechanism to align multiple data sources based on selected columns. It can be accessed with pw.io.register_input_synchronization_group.
pw.io.register_input_synchronization_group now supports the following types of columns: pw.DateTimeUtc, pw.DateTimeNaive, pw.DateTimeDuration, and int.

Changed

Enhanced error reporting for runtime errors across most operators, providing a trace that simplifies identifying the root cause.

Fixed

Bugfix for problem with list_documents() when no documents present in store.
The append-only property of tables created by pw.io.kafka.read is now set correctly.

Assets 6

28 Mar 11:39

v0.21.1

Changed

Input connectors now throttle parsing error messages if their share is more than 10% of the parsing attempts.
New flag return_status for inputs_query method in pw.xpacks.llm.DocumentStore. If set to True, DocumentStore returns the status of indexing for each file.

Assets 6

19 Mar 13:46

v0.21.0

Added

All Pathway types can now be serialized to CSV using pw.io.csv.write and deserialized back using pw.io.csv.read.
pw.io.csv.read now parses null-values in data when it can be done unambiguously.

Changed

BREAKING: Updated endpoints in pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer:
- Deprecated: /v1/pw_list_documents, /v1/pw_ai_answer
- New: /v2/list_documents, /v2/answer
RAG methods under the pw.xpacks.llm.question_answering.RAGClient are re-named, and they now use the new endpoints. Old methods are deprecated and will be removed in the future.
- pw_ai_summary -> summarize
- pw_ai_answer -> answer
- pw_list_documents -> list_documents
When pw.io.deltalake.write creates a table, it also stores its metadata in the columns of the created Delta table. This metadata can be used by Pathway when reading the table with pw.io.deltalake.read if no schema is specified.
The schema parameter is now optional for pw.io.deltalake.read. If the table was created by Pathway and the schema was not specified by user, it is read from the table metadata.
pw.io.deltalake.write now aligns the output metadata with the existing table's metadata, preserving any custom metadata in the sink.
BREAKING: The Bytes type is now serialized and deserialized with base64 encoding and decoding when the CSV format is used.
BREAKING: The Duration type is now serialized and deserialized as a number of nanoseconds when the CSV format is used.
BREAKING: The tuple and np.ndarray types are now serialized and deserialized as their JSON representations when the CSV format is used.

Fixed

pw.io.csv.write now correctly escapes quote characters.

Assets 6

07 Mar 08:18

v0.20.1

Added

Added RecursiveSplitter
pw.io.deltalake.write now checks that the schema of the target table Delta Table corresponds to the schema of the Pathway table that is sent for the output. If the schemas differ, a human-readable error message is produced.

Assets 6