TalkPipe's lazy loading feature provides significant performance improvements by loading components on-demand rather than importing all sources and segments at startup.
The hybrid registry system supports two modes:
- Eager mode (default): Components are loaded when first accessed, maintaining backward compatibility
- Lazy mode: Entry point discovery is deferred until components are explicitly requested, providing up to 18x faster startup times (from ~2.9s to ~0.16s)
This is particularly beneficial for command-line tools, documentation generators, and applications that only use a subset of TalkPipe's 72+ built-in components.
- Faster startup: Dramatically reduced import time when lazy loading is enabled
- Reduced memory footprint: Only load components that are actually used
- Backward compatible: Existing code works without changes
- Flexible configuration: Enable via environment variable, config file, or programmatically
- Graceful degradation: Handles missing components cleanly
Set the TALKPIPE_LAZY_IMPORT environment variable:
export TALKPIPE_LAZY_IMPORT=true
python your_script.pyAccepted values: '1', 'true', 'yes' (case-insensitive)
Add to ~/.talkpipe.toml:
LAZY_IMPORT = trueControl lazy loading from within your Python code:
from talkpipe.chatterlang.registry import enable_lazy_imports, disable_lazy_imports
# Enable lazy loading
enable_lazy_imports()
# Your TalkPipe code here
from talkpipe import compile
pipeline = compile("INPUT FROM range[lower=0, upper=100] | print")
# Disable lazy loading if needed
disable_lazy_imports()TalkPipe uses a hybrid registry that combines decorator-based registration with entry point discovery:
- Decorator registration: Components decorated with
@register_source()or@register_segment()are registered immediately when their module is imported - Entry point discovery: Components declared in
pyproject.tomlentry points are discovered without importing - On-demand loading: When a component is requested, its module is imported (triggering all decorators in that module), registering all components from that module at once
When you request a component by name:
from talkpipe.chatterlang.registry import segment_registry
segment_registry.get("print")The registry follows this process:
- Check if already registered via decorators (fast path)
- Check if this component failed to load before (avoid retry loops)
- Try loading from entry points if available
- Raise
KeyErrorwith list of available components if not found
Important: When a component's module is imported to load that component, all components in that module are registered at once. This is because importing the module executes all decorators in the file. For example, requesting the print segment will load and register all segments defined in the talkpipe/pipe/io.py module.
When you access all components:
from talkpipe.chatterlang.registry import segment_registry
all_segments = segment_registry.allIn lazy mode, this triggers one-time loading of all entry points. Subsequent accesses use cached results.
- Initial import: ~0.16 seconds
- First
.allaccess: ~3 seconds (loads all 72 components) - Subsequent accesses: Very fast (cached)
- Individual component access: Loads the component's module (registering all components in that module)
- Initial import: Slightly slower than lazy mode
- Component access: Same on-demand loading behavior
- Backward compatible with existing code
Lazy loading works automatically with ChatterLang scripts:
from talkpipe import compile
# With lazy loading enabled, this starts fast
pipeline = compile("""
INPUT FROM range[lower=0, upper=100]
| scale[multiplier=2]
| print
""")
# Only the modules containing 'range', 'scale', and 'print' are loaded
# All components in those modules are registered when their module is importedfrom talkpipe.chatterlang.registry import segment_registry, input_registry
# Get a specific segment (loads that component's module in lazy mode)
# Note: All components in the same module are registered when the module is imported
PrintSegment = segment_registry.get("print")
# List available components without loading them
entry_points = segment_registry.list_entry_points()
print(f"Available segments: {', '.join(entry_points)}")
# Get all segments (triggers loading of all modules in lazy mode)
all_segments = segment_registry.allMonitor what's loaded in the registry:
from talkpipe.chatterlang.registry import segment_registry
stats = segment_registry.stats()
print(stats)
# Output: {'registered': 10, 'entry_points': 71, 'loaded_modules': 5, 'failed_loads': 0}from talkpipe.chatterlang.registry import get_registry_stats
stats = get_registry_stats()
print(stats)
# Output: {
# 'sources': {'registered': 5, 'entry_points': 8, ...},
# 'segments': {'registered': 35, 'entry_points': 71, ...},
# 'lazy_mode': True
# }Enable lazy loading mode for all registries.
from talkpipe.chatterlang.registry import enable_lazy_imports
enable_lazy_imports()Disable lazy loading mode, returning to eager loading.
from talkpipe.chatterlang.registry import disable_lazy_imports
disable_lazy_imports()Get statistics for all registries including lazy mode status.
Returns: Dictionary with keys 'sources', 'segments', and 'lazy_mode'
from talkpipe.chatterlang.registry import get_registry_stats
stats = get_registry_stats()Get a component by name, loading from entry points if needed.
Raises: KeyError if component is not found
from talkpipe.chatterlang.registry import segment_registry
component = segment_registry.get("print")Get all registered components. In lazy mode, this triggers loading of all entry points.
from talkpipe.chatterlang.registry import segment_registry
all_components = segment_registry.allList names of all components available via entry points without loading them.
from talkpipe.chatterlang.registry import segment_registry
names = segment_registry.list_entry_points()Get statistics about the registry state.
Returns: Dictionary with keys:
'registered': Number of currently loaded components'entry_points': Total number of available entry points'loaded_modules': Number of modules that have been imported'failed_loads': Number of components that failed to load
from talkpipe.chatterlang.registry import segment_registry
stats = segment_registry.stats()Global boolean flag indicating whether lazy loading is enabled.
from talkpipe.chatterlang.registry import LAZY_IMPORT_MODE
print(f"Lazy loading enabled: {LAZY_IMPORT_MODE}")Lazy loading is fully supported for all TalkPipe components:
- 8 built-in sources:
chatterlangServer,echo,exec,prompt,randomInts,range,readEmail,rss - 71 built-in segments: Including data I/O, transformations, filtering, LLM operations, search, web processing, and more
All components are declared as entry points in pyproject.toml and use decorator-based registration for seamless lazy loading.
As a developer building TalkPipe applications or creating custom sources and segments, you can structure your code to maximize the benefits of lazy loading.
Since lazy loading works at the module level (importing one component loads all components in that module), organize your code strategically:
Place components that are often used together in the same module:
# Good: data_io.py - components commonly used together
from talkpipe.chatterlang.registry import register_segment
from talkpipe.pipe.core import AbstractSegment
@register_segment("readJson")
class ReadJson(AbstractSegment):
"""Read JSON files"""
pass
@register_segment("writeJson")
class WriteJson(AbstractSegment):
"""Write JSON files"""
passStructure components by domain or functionality:
# Project structure
src/myproject/
sources/
database.py # All database sources
web.py # All web-related sources
segments/
transform.py # Data transformation
llm.py # LLM-related (may have heavy deps)
Declare your components as entry points in pyproject.toml:
[project.entry-points."talkpipe.sources"]
myDatabaseSource = "myproject.sources.database"
[project.entry-points."talkpipe.segments"]
myTransform = "myproject.segments.transform"This allows TalkPipe to discover and load components on-demand without importing them.
Important: Component names must be unique across all installed packages. TalkPipe detects name collisions and raises detailed errors. Use unique prefixes for plugin components (e.g., myplugin_transform) to avoid conflicts. See the Extending TalkPipe documentation for details on plugin architecture and collision handling.
Enable lazy loading in these scenarios:
-
Command-line tools: Fast startup is critical for good UX
export TALKPIPE_LAZY_IMPORT=true python my_cli_tool.py -
Documentation generators: Need to discover all components but may not use them all
from talkpipe.chatterlang.registry import segment_registry, enable_lazy_imports enable_lazy_imports() all_segments = segment_registry.list_entry_points() # Fast discovery
-
Development and testing: Faster feedback loops
export TALKPIPE_LAZY_IMPORT=true pytest tests/ -
Large libraries: Projects with many components where users typically use a small subset
# In your library's __init__.py from talkpipe.chatterlang.registry import enable_lazy_imports enable_lazy_imports()
- Production services: Where startup time happens once but runtime performance matters
- Small applications: With few components where lazy loading overhead isn't worth it
- Using most components: If your pipeline uses most available components anyway
When developing TalkPipe plugins, follow these guidelines:
Offer both bundled and granular entry points:
# In your plugin's pyproject.toml
[project.entry-points."talkpipe.segments"]
# All-in-one for convenience
all_ml_segments = "myplugin.ml"
# Individual components for lazy loading
mlPredict = "myplugin.ml.predict"
mlTrain = "myplugin.ml.train"
mlEvaluate = "myplugin.ml.evaluate"Make it clear which components have special requirements:
from talkpipe.chatterlang.registry import register_segment
from talkpipe.pipe.core import AbstractSegment
@register_segment("csvProcess")
class CsvProcess(AbstractSegment):
"""
Process CSV data using the csv module.
Requires: csv (stdlib)
For large datasets, consider: pip install pandas
"""
passMonitor the effectiveness of lazy loading:
import time
from talkpipe.chatterlang.registry import segment_registry, enable_lazy_imports
# Measure startup time
start = time.time()
enable_lazy_imports()
from talkpipe import compile
startup_time = time.time() - start
print(f"Startup time: {startup_time:.3f}s")
# Check what's actually loaded
stats = segment_registry.stats()
print(f"Registered: {stats['registered']}/{stats['entry_points']} segments")
print(f"Modules loaded: {stats['loaded_modules']}")
# Run your pipeline
pipeline = compile("INPUT FROM range[lower=0, upper=10] | print")
# Check what got loaded
stats_after = segment_registry.stats()
print(f"After pipeline: {stats_after['registered']}/{stats_after['entry_points']} segments")
print(f"Modules loaded: {stats_after['loaded_modules']}")Here's a real-world example of restructuring for lazy loading:
Before: All components in one file
# components.py (slow to load)
from talkpipe.chatterlang.registry import register_segment
from talkpipe.pipe.core import AbstractSegment
import csv
import sqlite3
import xml.etree.ElementTree as ET
import decimal
# 50+ segments all in one file
@register_segment("csvFilter")
class CsvFilter(AbstractSegment): pass
@register_segment("sqliteQuery")
class SqliteQuery(AbstractSegment): pass
# ... 48 more segmentsAfter: Organized by functionality and dependencies
# segments/basic.py (fast to load, no heavy deps)
from talkpipe.chatterlang.registry import register_segment
from talkpipe.pipe.core import AbstractSegment
@register_segment("filter")
class Filter(AbstractSegment): pass
@register_segment("map")
class Map(AbstractSegment): pass
# segments/data_io.py (moderate deps)
from talkpipe.chatterlang.registry import register_segment
from talkpipe.pipe.core import AbstractSegment
@register_segment("csvFilter")
class CsvFilter(AbstractSegment):
def transform(self, input_iter):
import csv # Lazy import
# Use csv here
# segments/database.py (stdlib deps)
from talkpipe.chatterlang.registry import register_segment
from talkpipe.pipe.core import AbstractSegment
@register_segment("sqliteQuery")
class SqliteQuery(AbstractSegment):
def transform(self, input_iter):
import sqlite3 # Lazy import
# Use sqlite3 hereResult: Users who only need basic filtering get 10x faster startup because csv and sqlite3 are never imported. The same pattern applies to heavy ML libraries like TensorFlow or PyTorch.
If a component fails to load, check:
- Is the component name correct? Use
registry.list_entry_points()to see available names - Are dependencies installed? Some components require optional dependencies
- Check
registry.stats()for'failed_loads'count
If you don't see performance improvements:
- Verify lazy loading is enabled: check
LAZY_IMPORT_MODEvalue - Ensure environment variable is set before importing TalkPipe
- Consider if your use case actually benefits (e.g., loading all components negates the benefit)
The hybrid registry is designed to avoid circular imports by:
- Not loading entry points at
__init__time - Loading decorators only when components are requested
- Providing fallback to entry point discovery
If you encounter circular import issues, they're likely from other parts of your code.
Automatic Detection: As of the current version, TalkPipe automatically detects and reports entry point name collisions when packages are loaded. If you see a ValueError about name collisions, it means multiple installed packages are trying to use the same component name.
What to do when you see a collision error:
-
Read the error message carefully - it lists all conflicting packages and component names:
ValueError: Entry point name collision detected in group 'talkpipe.segments'. Multiple packages are trying to register components with the same name: - Component 'transform' defined by: • package1.segments:Transform (from package 'myplugin1') • package2.segments:Transform (from package 'myplugin2') -
Choose your resolution strategy:
- Uninstall one of the conflicting packages if you don't need both
- Contact plugin authors to request they use unique prefixes
- Temporarily use a workaround by forking one plugin and renaming its components
-
For plugin developers - prevent conflicts by using unique prefixes:
[project.entry-points."talkpipe.segments"] # Good: unique prefix myplugin_transform = "myplugin.segments.transform" # Risky: generic name transform = "myplugin.segments.transform"
Manual collision checking (if you want to check before installing a package):
# Show all packages defining talkpipe.segments
python -c "
from importlib.metadata import entry_points
eps = entry_points().select(group='talkpipe.segments')
seen = {}
for ep in eps:
if ep.name in seen:
print(f'CONFLICT: {ep.name} defined by both {seen[ep.name]} and {ep.value}')
else:
seen[ep.name] = ep.value
print(f'Total: {len(seen)} unique names')
"See Also:
- Plugin Manager for managing external plugins
- ChatterLang compiler layer for how components are resolved during compilation
Last Reviewed: 20251025