Release Version 0.10.0 · sandialabs/talkpipe

0.10.0

New Features

Added splitText segment to split strings either by length or by delimiter. Now available as an entry point.
Added shingleText segment for creating overlapping n-grams (shingles) from text with support for key-based grouping and configurable overlap. Now available as an entry point.
Added extractProperty segment that extracts properties using the same methodology as a the
property designator in a field list.
Added makeVectorDatabase segment to create vector databases in LanceDB by embedding documents
and storing them with their metadata. Supports custom embedding models, table names, document IDs,
and overwrite options.
Added searchVectorDatabase segment to search vector databases in LanceDB. Accepts either string
queries or dictionary inputs with a query field. Search results can be yielded directly or attached
to the input item. Supports custom embedding models, result limits, and read consistency intervals.
Added ragToText end-to-end RAG pipeline convenience segment that combines vector database search,
prompt construction, and LLM completion into a single pipeline. Automatically retrieves relevant
documents from a vector database, constructs a RAG prompt with background context, and generates
completions using the specified LLM. Supports configurable prompt directives, result limits, and
field assignments.
Added ragToBinaryAnswer RAG pipeline segment that outputs structured binary (yes/no) answers
with explanations from an LLM. Similar to ragToText but uses guided generation to ensure responses
follow a boolean answer format with justification.
Added ragToScore RAG pipeline segment that outputs structured integer scores with explanations
from an LLM. Useful for relevance scoring, quality assessment, and other evaluation tasks where
numeric ratings are needed along with reasoning.
Added comprehensive unit tests for RAGToBinaryAnswer and RAGToScore segments, covering basic
functionality, field operations, multiple queries, and edge cases.

Breaking Changes

Removed deprecated simplevectordb module. Users should migrate to LanceDBDocumentStore from
talkpipe.search.lancedb, which provides equivalent functionality with improved performance.
Removed deprecated reduceUMAP segment and umap-learn dependency. Users should migrate to
reduceTSNE or other dimensionality reduction methods.
Removed numba dependency as it was only required by the removed umap-learn package.

Improvements

Updated README.md Example 5 (RAG Pipeline with Vector Database) to use standalone data instead
of relying on external txt files. The example now defines documents as inline strings and uses
toDict to convert them to dictionaries, making it fully self-contained and easier to run.
Added an option n to sleep so that it can be configured to sleep every n items rather than
every item.
Extended AbstractFieldSegment and the field_segment decorator to support segments that return
more than one item per input item. If set_as is specified, the outer object is shallow copied
for each output and the output is set appropriately.
Added a new registry system with configurable lazy loading via LAZY_IMPORT setting. When enabled,
provides an 18-fold performance improvement (from 2.9s to 0.16s in testing) by deferring module
imports until needed. The default behavior remains unchanged for compatibility. Added comprehensive
documentation for the lazy loading feature in docs/api-reference/lazy-loading.md,
covering configuration, performance characteristics, usage examples, and best practices for module
organization and plugin development.
Fixed security issue in src/talkpipe/chatterlang/registry.py where exceptions were being silently
ignored (Bandit B110). Replaced bare except Exception: pass blocks with proper logging statements
to aid in debugging and follow security best practices.
Added requires_package_installed pytest fixture to gracefully skip entry point tests when the
package is not installed (e.g., before running pip install -e .). Tests now provide clear skip
messages instead of failing with confusing errors. This makes the test suite more developer-friendly
for new contributors and CI environments.
Updated README.md section 2 (ChatterLang External DSL) to demonstrate how to register custom
components using @registry.register_segment(). The example shows how to take the uppercase
segment from section 1 (Pipe API) and make it available in ChatterLang scripts, with working
code that demonstrates compilation and execution.
Added documentation links to all built-in application commands in README.md "Key Applications"
and "Built-in Applications" sections, making it easier for users to find detailed documentation
for each command-line tool.
Enhanced extract_property to support _ as a passthrough/no-op in dotted property paths. The _
character now acts as an identity operation when used within a path. For example, "X._" is
equivalent to "X", "X._.1" is equivalent to "X.1", and "_.1" is equivalent to "1". This
makes it easier to construct dynamic property paths where some components may need to reference the
current object without changing the navigation level.

Full Changelog: v0.9.4...v0.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Version 0.10.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

0.10.0

New Features

Breaking Changes

Improvements

Uh oh!