Skip to content

Version 0.10.0

Choose a tag to compare

@tlbauer2 tlbauer2 released this 13 Nov 11:37
· 162 commits to main since this release

0.10.0

New Features

  • Added splitText segment to split strings either by length or by delimiter. Now available as an entry point.
  • Added shingleText segment for creating overlapping n-grams (shingles) from text with support for key-based grouping and configurable overlap. Now available as an entry point.
  • Added extractProperty segment that extracts properties using the same methodology as a the
    property designator in a field list.
  • Added makeVectorDatabase segment to create vector databases in LanceDB by embedding documents
    and storing them with their metadata. Supports custom embedding models, table names, document IDs,
    and overwrite options.
  • Added searchVectorDatabase segment to search vector databases in LanceDB. Accepts either string
    queries or dictionary inputs with a query field. Search results can be yielded directly or attached
    to the input item. Supports custom embedding models, result limits, and read consistency intervals.
  • Added ragToText end-to-end RAG pipeline convenience segment that combines vector database search,
    prompt construction, and LLM completion into a single pipeline. Automatically retrieves relevant
    documents from a vector database, constructs a RAG prompt with background context, and generates
    completions using the specified LLM. Supports configurable prompt directives, result limits, and
    field assignments.
  • Added ragToBinaryAnswer RAG pipeline segment that outputs structured binary (yes/no) answers
    with explanations from an LLM. Similar to ragToText but uses guided generation to ensure responses
    follow a boolean answer format with justification.
  • Added ragToScore RAG pipeline segment that outputs structured integer scores with explanations
    from an LLM. Useful for relevance scoring, quality assessment, and other evaluation tasks where
    numeric ratings are needed along with reasoning.
  • Added comprehensive unit tests for RAGToBinaryAnswer and RAGToScore segments, covering basic
    functionality, field operations, multiple queries, and edge cases.

Breaking Changes

  • Removed deprecated simplevectordb module. Users should migrate to LanceDBDocumentStore from
    talkpipe.search.lancedb, which provides equivalent functionality with improved performance.
  • Removed deprecated reduceUMAP segment and umap-learn dependency. Users should migrate to
    reduceTSNE or other dimensionality reduction methods.
  • Removed numba dependency as it was only required by the removed umap-learn package.

Improvements

  • Updated README.md Example 5 (RAG Pipeline with Vector Database) to use standalone data instead
    of relying on external txt files. The example now defines documents as inline strings and uses
    toDict to convert them to dictionaries, making it fully self-contained and easier to run.
  • Added an option n to sleep so that it can be configured to sleep every n items rather than
    every item.
  • Extended AbstractFieldSegment and the field_segment decorator to support segments that return
    more than one item per input item. If set_as is specified, the outer object is shallow copied
    for each output and the output is set appropriately.
  • Added a new registry system with configurable lazy loading via LAZY_IMPORT setting. When enabled,
    provides an 18-fold performance improvement (from 2.9s to 0.16s in testing) by deferring module
    imports until needed. The default behavior remains unchanged for compatibility. Added comprehensive
    documentation for the lazy loading feature in docs/api-reference/lazy-loading.md,
    covering configuration, performance characteristics, usage examples, and best practices for module
    organization and plugin development.
  • Fixed security issue in src/talkpipe/chatterlang/registry.py where exceptions were being silently
    ignored (Bandit B110). Replaced bare except Exception: pass blocks with proper logging statements
    to aid in debugging and follow security best practices.
  • Added requires_package_installed pytest fixture to gracefully skip entry point tests when the
    package is not installed (e.g., before running pip install -e .). Tests now provide clear skip
    messages instead of failing with confusing errors. This makes the test suite more developer-friendly
    for new contributors and CI environments.
  • Updated README.md section 2 (ChatterLang External DSL) to demonstrate how to register custom
    components using @registry.register_segment(). The example shows how to take the uppercase
    segment from section 1 (Pipe API) and make it available in ChatterLang scripts, with working
    code that demonstrates compilation and execution.
  • Added documentation links to all built-in application commands in README.md "Key Applications"
    and "Built-in Applications" sections, making it easier for users to find detailed documentation
    for each command-line tool.
  • Enhanced extract_property to support _ as a passthrough/no-op in dotted property paths. The _
    character now acts as an identity operation when used within a path. For example, "X._" is
    equivalent to "X", "X._.1" is equivalent to "X.1", and "_.1" is equivalent to "1". This
    makes it easier to construct dynamic property paths where some components may need to reference the
    current object without changing the navigation level.

Full Changelog: v0.9.4...v0.10.0