Skip to content

Commit dcad05e

Browse files
committed
Merge branch 'main' into develop
2 parents 696abe0 + cac9dfc commit dcad05e

12 files changed

Lines changed: 212 additions & 152 deletions

File tree

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,14 @@
1414
- Added skill for checking documentation examples.
1515
- Added skill for updating pyproject.toml when a new source or segment is added to the repository.
1616
- Updated AGENTS.md, consolidating different coding assistant configuration files.
17+
- Documentation: fixed internal link and config overview ordering; developer handbook grammar, `$var_name`
18+
explanation, and `default_embedding_model_source` key; pip install examples; RAG/config notes on
19+
`DEFAULT_*` vs `default_*` keys for `serverag` vs segments; minor README and workbench markdown fixes.
20+
- README: stronger opening (audience, fit, differentiation), badges and Python version note, shorter
21+
architecture section, documentation map table, tightened Quick Start and config pointer, RAG “at a
22+
glance” under Quick Start, per-example problem/result blurbs, and status note on what is stable;
23+
contributor glossary/conventions moved to [docs/contributing/developer-handbook.md](docs/contributing/developer-handbook.md)
24+
with links from the README and [docs/README.md](docs/README.md).
1725

1826
## 0.11.6
1927
- Fixed chatterlang_serve stream UI duplicate output: response items are no longer added to the

README.md

Lines changed: 72 additions & 140 deletions
Large diffs are not rendered by default.

docs/README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# TalkPipe Documentation
22

3-
Welcome to the TalkPipe documentation! This directory contains comprehensive, API references, and examples for using TalkPipe effectively.
3+
Welcome to the TalkPipe documentation! This directory contains guides, API references, and examples for using TalkPipe effectively.
44

55
## Quick Navigation
66

@@ -19,10 +19,10 @@ Real-world usage examples and tutorials:
1919

2020
### 📚 [API Reference](api-reference/)
2121
Complete documentation for all TalkPipe commands and components:
22-
- [chatterlang_serve](api-reference/chatterlang-server.md) - Run a script as an API endpoint and with a customizable web interface
22+
- [chatterlang_serve](api-reference/chatterlang-server.md) - Run a script as an API endpoint with a customizable web interface
2323
- [ChatterLang Workbench](api-reference/chatterlang-workbench.md) - Interactive web interface for writing and testing scripts interactively
2424
- [ChatterLang Script Runner](api-reference/chatterlang-script.md) - Run a script from the command line
25-
- [Documentation Generator](api-reference/talkpipe-ref.md) - Generate reference documentations for Segments and Sources
25+
- [Documentation Generator](api-reference/talkpipe-ref.md) - Generate reference documentation for Segments and Sources
2626
- [Plugin Manager](api-reference/talkpipe-plugin-manager.md) - Manage and inspect TalkPipe plugins
2727

2828
### 🏗️ [Architecture](architecture/)
@@ -32,6 +32,9 @@ Deep technical documentation:
3232
- [Extending TalkPipe](architecture/extending-talkpipe.md) - Creating custom components and plugins
3333
- [Configuration](architecture/configuration.md) - Configuring variables for scripts
3434

35+
### 🧩 [Contributing / developer handbook](contributing/developer-handbook.md)
36+
Glossary, repository conventions, shared parameter semantics (`set_as`, `field`, `field_list`, …), and standard `~/.talkpipe.toml` keys—reference material for contributors and advanced users.
37+
3538

3639
## Contributing to Documentation
3740

docs/api-reference/chatterlang-workbench.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ chatterlang_workbench --reload
132132

133133
---
134134

135-
*For conceptual information about ChatterLang, see [ChatterLang Architecture](../architecture/chatterlang.md).
135+
*For conceptual information about ChatterLang, see [ChatterLang Architecture](../architecture/chatterlang.md).*
136136

137137
---
138138
Last Reviewed: 20250814

docs/api-reference/lazy-loading.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -574,6 +574,6 @@ print(f'Total: {len(seen)} unique names')
574574

575575
**See Also:**
576576
- [Plugin Manager](talkpipe-plugin-manager.md) for managing external plugins
577-
- [ChatterLang Compiler](../architecture/compiler.md) for how components are resolved during compilation
577+
- [ChatterLang compiler layer](../architecture/chatterlang.md#2-compiler-layer) for how components are resolved during compilation
578578

579579
Last Reviewed: 20251025

docs/architecture/configuration.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ This document describes how TalkPipe manages configuration across different envi
66

77
TalkPipe uses a layered configuration system that supports multiple sources with clear precedence rules. Configuration can come from:
88

9-
3. **Command-line arguments** (including unknown arguments that get added to the configuration)
9+
1. **Command-line arguments** (including unknown arguments that get added to the configuration)
1010
2. **Environment variables** (with `TALKPIPE_` prefix)
11-
1. **Configuration files** (TOML format)
11+
3. **Configuration files** (TOML format)
1212
4. **Default values** (hard-coded in the application)
1313

1414
## Configuration File Formats
@@ -146,6 +146,15 @@ For application settings (logging, server ports, etc.):
146146
3. **Configuration file values** (`~/.talkpipe.toml`)
147147
4. **Application defaults** (hard-coded values)
148148

149+
### Embedding and LLM defaults: `serverag` vs segment keys
150+
151+
Several components resolve embedding and chat model defaults from `get_config()`. Two naming patterns appear in configuration:
152+
153+
- **Segment defaults**`LLMEmbed` and `LLMPrompt` fall back to these keys when `model` / `source` arguments are omitted: `default_embedding_model_name`, `default_embedding_model_source`, `default_model_name`, and `default_model_source` (see `talkpipe.util.constants`).
154+
- **`serverag` defaults** — When you omit `--embedding_model`, `--embedding_source`, `--completion_model`, and `--completion_source`, `serverag` reads `DEFAULT_EMBEDDING_MODEL`, `DEFAULT_EMBEDDING_SOURCE`, `DEFAULT_LLM_MODEL`, and `DEFAULT_LLM_SOURCE` from the merged config. If those keys are unset, `serverag` passes `None` into the RAG pipeline, and the segments above apply their `default_*` fallbacks.
155+
156+
Use **`default_*`** in `~/.talkpipe.toml` for one consistent set of defaults across pipelines. Use **`DEFAULT_*`** when you want values applied at the `serverag` CLI layer (still overridable per invocation). Environment variables use the usual `TALKPIPE_` prefix and map to the exact key name after the prefix (for example, `TALKPIPE_default_model_name` or `TALKPIPE_DEFAULT_LLM_MODEL`).
157+
149158
### ChatterLang Script Variable Access
150159

151160
**Configuration Variables** (accessed with `$key` syntax in scripts):
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# TalkPipe developer handbook
2+
3+
Contributor-oriented reference: glossary, repository conventions, parameter semantics, and standard configuration keys.
4+
5+
## Glossary
6+
7+
* **Unit** - A component in a pipeline that either produces or processes data. There are two types of units: Sources and Segments.
8+
* **Segment** - A unit that reads from another Unit and may or may not yield data of its own. All units that
9+
are not at the start of a pipeline are Segments.
10+
* **Source** - A unit that takes nothing as input and yields data items. These Units are used in the
11+
"INPUT FROM..." portion of a pipeline.
12+
13+
## Conventions
14+
15+
### Versioning
16+
17+
This codebase will use [semantic versioning](https://semver.org/) with the additional convention that during the 0.x.y development that each MINOR version will mostly maintain backward compatibility and PATCH versions will include substantial new capability. So, for example, every 0.2.x version will be mostly backward compatible, but 0.3.0 might contain code reorganization.
18+
19+
### Codebase Structure
20+
21+
The following are the main breakdown of the codebase. These should be considered firm but not strict breakdowns. Sometimes a source could fit within either operations or data, for example.
22+
23+
* **talkpipe.app** - Contains the primary runnable applications.
24+
* Example: chatterlang_script
25+
* **talkpipe.operations** - Contains general algorithm implementations. Associated segments and sources can be included next to the algorithm implementations, but the algorithms themselves should also work stand-alone.
26+
* Example: bloom filters
27+
* **talkpipe.data** - Contains components having to do with complex, type-specific data manipulation.
28+
* Example: extracting text from files.
29+
* **talkpipe.llm** - Contains the abstract classes and implementations for accessing LLMs, both code for accessing specific LLMs and code for doing prompting.
30+
* Example: Code for talking with Ollama or OpenAI
31+
* **talkpipe.pipe** - Code that implements the core classes and decorators for the pipe api as well and misc implementations of helper segments and sources.
32+
* Example: echo and the definition of the @segment decorator
33+
* **talkpipe.chatterlang** - The definition, parsers, and compiler for the chatterlang language as well as any chatterlang specific segments and sources
34+
* Example: the chatterlang compiler and the variable segment
35+
36+
### Source/Segment Names
37+
38+
- **For your own Units, do whatever you want!** These conventions are for authors writing units intended for broader reuse.
39+
- **Classes that implement Units** are named in CamelCase with the initial letter in uppercase.
40+
- **Units defined using `@segment` and `@source` decorators** should be named in camelCase with an initial lowercase letter.
41+
- In **ChatterLang**, sources and segments also use camelCase with an initial lowercase letter.
42+
- Except for the **`cast`** segment, segments that convert data into a specific format—whether they process items one-by-one or drain the entire input—should be named using the form `[tT]oX`, where **X** is the output data type (e.g., `toDataFrame` outputs a pandas DataFrame).
43+
- **Segments that write files** use the form `[Ww]riteX`, where **X** is the file type (e.g., `writeExcel` writes an Excel file, `writePickle` writes a pickle file).
44+
- **Segments that read files** use the form `[Rr]eadX`, where **X** is the file type (e.g., `readExcel` should read an Excel file).
45+
- **Parameter names in segments** should be in all lower case with words separated by an underscore (_)
46+
47+
### Parameter Names
48+
49+
These parameter names should behave consistently across all units:
50+
51+
- **item** should be used in field_segment, referring to the item passed to the function. It will not
52+
be a parameter to the segment in ChatterLang.
53+
54+
- **items** are used in segment definitions, referring to the iterable over all the pieces of data in the stream.
55+
It will not be a parameter used anywhere as a parameter in ChatterLang.
56+
57+
- **set_as**
58+
If used, any processed output is attached to the original data using bracket notation. The original item is then emitted.
59+
60+
- **fail_on_error**
61+
If True, the exception should be raised, likely aborting the pipeline. If False, the operation should continue
62+
and either None should be yielded or nothing, depending on the segment or source. A warning message should be logged.
63+
64+
- **field**
65+
Specifies that the unit should operate on data accessed via “field syntax.” This syntax can include indices, properties, or parameter-free methods, separated by periods.
66+
- For example, given `{"X": ["a", "b", ["c", "d"]]}`, the field `"X.2.0"` refers to `"c"`.
67+
68+
- **field_list**
69+
Specifies that a list of fields can or should be provided, with each field separated
70+
by a comma. In some cases, each field needs to be mapped to some other name. In
71+
those cases, the field and name should be separated by a colon. In field_lists,
72+
the underscore (_) refers to the item as a whole.
73+
- For example, "X.2.0:SomeName,X.1:SomeOtherName". If no "name" is provided,
74+
the fieldname itself is used. Where only a list of fields is needed and no names,
75+
the names can still be provided but have no effect.
76+
77+
### General Behavior Principles
78+
79+
* Units that have side effects (e.g. writing data to a disk) should generally also pass
80+
on their data.
81+
82+
### Source and Segment Reference
83+
84+
The chatterlang_workbench command starts a web service designed for experimentation. It also contains links to HTML and text versions
85+
of all the sources and segments included in TalkPipe.
86+
87+
After talkpipe is installed, a script called "chatterlang_reference_browser" is available that provides an interactive command-line search and exploration of sources and segments. The command "chatterlang_reference_generator" will generate single page HTML and text versions of all the source and segment documentation.
88+
89+
### Standard Configuration File Items
90+
91+
Configuration constants can be defined either in ~/.talkpipe.toml or in environment variables. Any constant defined in an environment variable needs to be prefixed with TALKPIPE_. So email_password, stored in an environment variable, needs to be TALKPIPE_email_password. Note that in ChatterLang, any key defined in ~/.talkpipe.toml or set via a TALKPIPE_* environment variable can be referenced in scripts as a parameter using $var_name. That reference resolves to the environment variable TALKPIPE_var_name or to var_name in talkpipe.toml.
92+
93+
* **default_embedding_model_source** - The default source (e.g. ollama) to be used for creating sentence embeddings.
94+
* **default_embedding_model_name** - The name of the LLM model to be used for creating sentence embeddings.
95+
* **default_model_name** - The default name of a LLM model to be used in chat
96+
* **default_model_source** - The default source (e.g. ollama) to be used in chat
97+
* **email_password** - Password for the SMTP server
98+
* **logger_files** - Files to store logs, in the form logger1:fname1,logger2:fname2,...
99+
* **logger_levels** - Logger levels in the form logger1:level1,logger2:level2
100+
* **recipient_email** - Who should receive a sent email
101+
* **rss_url** - The default URL used by the rss segment
102+
* **sender_email** - Who the sender of an email should be
103+
* **smtp_port** - SMTP server port
104+
* **smtp_server** - SMTP server hostname
105+
106+
---
107+
108+
*For the main project overview, see the [project README](../../README.md).*

docs/guides/makevectordatabase-and-serverag.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Together they form a minimal path from raw documents to a queryable RAG interfac
1919

2020
## Prerequisites
2121

22-
- **TalkPipe** with LLM support: `pip install talkpipe[ollama]` or `talkpipe[all]`
22+
- **TalkPipe** with LLM support: `pip install talkpipe[ollama]` or `pip install talkpipe[all]`
2323
- **Embedding model**: Ollama with an embedding model (e.g. `ollama pull mxbai-embed-large`)
2424
- **Completion model** (for serverag): Ollama with an LLM (e.g. `ollama pull llama3.2`)
2525
- **Configuration**: Set `DEFAULT_EMBEDDING_MODEL`, `DEFAULT_EMBEDDING_SOURCE`, `DEFAULT_LLM_MODEL`, and `DEFAULT_LLM_SOURCE` in `~/.talkpipe.toml` or pass them on the command line

docs/tutorials/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Each tutorial builds on the previous. Tutorial 1's index feeds Tutorial 2; Tutor
1616

1717
## Get Started in 5 Minutes
1818

19-
1. **Install**: See [Getting Started](../quickstart.md) for installation. For tutorials: `pip install talkpipe[ollama]` or `talkpipe[all]`
19+
1. **Install**: See [Getting Started](../quickstart.md) for installation. For tutorials: `pip install talkpipe[ollama]` or `pip install talkpipe[all]`
2020
2. **Run Tutorial 1** (from `docs/tutorials/Tutorial_1-Document_Indexing`):
2121
- `./Step_1_CreateSyntheticData.sh` — creates `stories.json` (~5–10 min)
2222
- `./Step_2_IndexStories.sh` — builds search index (~5 sec)

docs/tutorials/Tutorial_1-Document_Indexing/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ TalkPipe lets you prototype searchable document systems without external databas
2727

2828
## Prerequisites
2929

30-
- **TalkPipe** installed: See [Getting Started](../../quickstart.md) for installation. For this tutorial: `pip install talkpipe[ollama]` or `talkpipe[all]`
30+
- **TalkPipe** installed: See [Getting Started](../../quickstart.md) for installation. For this tutorial: `pip install talkpipe[ollama]` or `pip install talkpipe[all]`
3131
- **Step 1 only**: Ollama installed locally with the `llama3.2` model (or adjust the script to use another model)
3232

3333
> **Tip:** If you skip Step 1, you can use the included `stories.json` and go straight to Step 2.

0 commit comments

Comments
 (0)