Building Natural Language and LLM Pipelines, First Edition

This is the code repository for Building Natural Language and LLM Pipelines, First Edition, published by Packt.

Build production-grade RAG, tool contracts, and context engineering with Haystack and LangGraph

Laura Funderburk

About the book

Building Natural Language and LLM Pipelines, First Edition

Develop production-ready NLP and LLM pipelines with Haystack using strict tool contracts, and orchestrate context-aware, observable RAG and agentic workflows with LangGraph for scalable, customizable AI applications.

Modern LLM applications often break in production due to brittle pipelines, loose tool definitions, and noisy context. This book shows you how to build production-ready, context-aware systems using Haystack and LangGraph. You’ll learn to design deterministic pipelines with strict tool contracts and deploy them as microservices. Through structured context engineering, you’ll orchestrate reliable agent workflows and move beyond simple prompt-based interactions. You'll start by understanding LLM behavior—tokens, embeddings, and transformer models—and see how prompt engineering has evolved into a full context engineering discipline. Then, you'll build retrieval-augmented generation (RAG) pipelines with retrievers, rankers, and custom components using Haystack’s graph-based architecture. You’ll also create knowledge graphs, synthesize unstructured data, and evaluate system behavior using Ragas and Weights & Biases. In LangGraph, you’ll orchestrate agents with supervisor-worker patterns, typed state machines, retries, fallbacks, and safety guardrails. By the end of the book, you’ll have the skills to design scalable, testable LLM pipelines and multi-agent systems that remain robust as the AI ecosystem evolves.

Key Learnings

Build structured retrieval pipelines with Haystack
Apply context engineering to improve agent performance
Design and implement robust end-to-end NLP and LLM pipelines using Haystack
Track cost and quality with Ragas and Weights & Biases
Implement retries, circuit breakers, and observability
Deploy REST APIs using FastAPI and Hayhooks
Serve pipelines as LangGraph-compatible microservices
Use LangGraph to orchestrate multi-agent workflows
Implement real-world NLP projects that use NER, text classification and sentiment analysis as agentic tools
Design sovereign agents for high-volume local execution

Requirements for this book

Clone the repository

git clone https://github.com/PacktPublishing/Building-Natural-Language-Pipelines.git

cd Building-Natural-Language-Pipelines/

Each chapter contains a pyproject.toml file with the folder's dependencies. (Recommended) Open each folder in a new VS Code window.

Install uv:
```
pip install uv
```
Change directories into the folder
Install dependencies:
```
uv sync
```
Activate the virtual environment:
```
source .venv/bin/activate
```
Select the virtual environment as the Jupyter kernel:
- Open any notebook.
- Click the kernel picker (top right) and select the .venv environment.

Chapters

Chapters	Colab	Kaggle	Gradient	Studio Lab
Chapter 1: Introduction to Natural Language Processing (NLP) pipelines
Chapter 2: Foundational concepts in NLP pipelines
01_prompt-ollama-model.ipynb
02_create-simple-agent.ipynb
03_document-qa-langchain.ipynb
02_middleware-tutorial.ipynb
03_multi-agent-workflow.ipynb
04_understanding-state-graph.ipynb
05_graph-based-agent-with-tools.ipynb
06_multi-agent-systems-middleware.ipynb
Chapter 3: Introduction to Haystack by Deepset
components.ipynb
supercomponents.ipynb
your-first-custom-component.ipynb
your-first-pipeline.ipynb
Chapter 4: Bringing components together: Haystack pipelines for different use cases
async_hybrid_pipeline.ipynb
hybrid_pipeline.ipynb
indexing_pipeline.ipynb
1_image_embeddings_with_clip.ipynb
2_multimodal_indexing_clip.ipynb
3_multimodal_indexing_llm.ipynb
4_multimodal_rag_vision_llm.ipynb
5_audio_transcription_whisper.ipynb
conditional_router.ipynb
metadata_router.ipynb
text_language_router.ipynb
semantic_search_pipeline.ipynb
supercomponents_and_agentic_rag.ipynb
Chapter 5: Haystack pipeline development with custom components
advanced_branching_pipeline.ipynb
pdf_knowledge_graph_pipeline.ipynb
prefixed_custom-component.ipynb
warmup_component.ipynb
web_knowledge_graph_pipeline.ipynb
Chapter 6: Setting up a reproducible project: Q&A pipeline
add_observability_with_wandb.ipynb
get_started_rag_evaluation_with_ragas.ipynb
ragas_evaluation_with_custom_components.ipynb
Chapter 7: Deploying Haystack-based applications
Chapter 8: Hands-on Projects
tool-calling.ipynb
ner-with-haystack-search-pipeline.ipynb
classification-with-haystack-search-pipeline.ipynb
classification-ner-agent-exercise.ipynb
sentiment_analysis.ipynb
text-classification.ipynb
haystack_agent_with_tools.ipynb
haystack_looping_supervisor.ipynb
langgraph_multiagent_supervisor.ipynb
pipeline_chaining_guide.ipynb
Chapter 9: Future Trends and Beyond

Get to know Authors

Laura Funderburk Laura Funderburk is Developer Relations and Community Lead at AI Makerspace, where she specializes in production-grade AI systems using large language models, RAG pipelines, and agentic workflows. With a background in AI operations, machine learning engineering, and developer advocacy, she has built scalable NLP tools across academia, non-profit, and private sectors. A frequent speaker at PyCon US and AI By the Bay, Laura is also a skilled Python engineer and technical author. She holds a Bachelor’s in Mathematics from Simon Fraser University and received the prestigious Terry Fox Gold Medal for resilience and community leadership. She remains active in open-source, mentorship, and outreach, helping engineers build reliable LLM applications through writing, teaching, and hands-on projects.

Other Related Books

Note From Author end:

What You'll Learn to Build
Setting Up
Chapter Breakdown
- Chapter 1: Introduction to natural language processing pipelines (no required code exercises)
- Chapter 2: Diving Deep into Large Language Models
- Chapter 3: Introduction to Haystack
- Chapter 4: Bringing components together: Haystack pipelines for different use cases
- Chapter 5: Haystack pipeline development with custom components
- Chapter 6: Setting up a reproducible project: naive vs hybrid RAG with reranking and evaluation
- Chapter 7: Production deployment strategies
- Chapter 8: Hands-on projects
- Chapter 9: Future trends and beyond (no required code exercises)
- Optional: Advanced multi-agent architecture for production

What You'll Learn to Build

This book guides you through building advanced Retrieval-Augmented Generation (RAG) systems and multi-agent applications using the Haystack 2.0, Ragas and LangGraph frameworks. Beginning with state-based agent development using LangGraph, you'll learn to build intelligent agents with tool integration, middleware patterns, and multi-agent coordination. You'll then master Haystack's component architecture, progressing through creating intelligent search systems with semantic and hybrid retrieval, building custom components for specialized tasks, and implementing comprehensive evaluation frameworks. The journey advances through production deployment strategies with Docker and REST APIs, culminating in hands-on projects including named entity recognition systems, zero-shot text classification pipelines, sentiment analysis tools, and sophisticated multi-agent orchestration systems that coordinate multiple specialized Haystack pipelines through supervisor-worker patterns with LangGraph.

Chapter 2: Single agents and multi agents with LangChain and LangGraph

This chapter contains optional LangGraph demonstrations that introduce state-based agents at a conceptual level. These examples are previews intended to build intuition. The full, practical use of LangGraph for multi-agent orchestration appears later in Chapter 8 and the epilogue, once the Haystack tool layer has been fully developed.


Agent with one tool	Agent calling supervisor

Chapter 3: Building robust agent tools with Haystack


Supercomponents and pipeline	Prompt template pipeline

Chapter 4: RAG pipelines: indexing and retrieval for text-based and multimodal pipelines (image and audio)


Indexing pipeline	Hybrid RAG pipeline

Chapter 5: Build custom components: synthetic data generation with Ragas


Knowledge graph and synthetic data generation (SDG) pipeline	SDG applied to websites and PDFS

Chapter 6: Reproducible evaluation of hybrid and naive RAG with Ragas and Weights and Biases

Chapter 7: Deploy pipelines as an API with FastAPI and Hayhooks

Chapter 8 and Optional Advanced Modules: Capstone and Agentic Patterns for Production

Microservice architecture Multi-agent system using microservices

📝 Sovereign-Friendly & Local Execution: The majority of exercises throughout this book are written so you can choose between OpenAI APIs or local models via Ollama (such as Mistral Nemo, GPT-OSS, or Deepseek-R1 and Qwen3), with the exception of the cost tracking exercises in Chapter 6 which specifically demonstrate OpenAI API usage monitoring. Each notebook provides specific model recommendations to help you choose the most suitable option for that particular exercise. The frameworks explored are extensible and models from other providers can be used to substitute OpenAI or local models. No US cloud, external APIs, or proprietary services are required for the majority of the book, making it easy to run in EU-regulated or air-gapped environments. The epilogue folder includes an optional prototype-to-production multi-agent implementation with LangGraph using LangSmith Studio. These exercises require a free LangSmith Studio API key, all exercises can also be run entirely locally and you can disable the tracer export LANGCHAIN_TRACING_V2="false". Scripts are provided so you can run the agent on your terminal - you simply won’t see the studio traces or visualize the agent if you choose not to use LangSmith studio.

Setting up

Clone the repository

git clone https://github.com/PacktPublishing/Building-Natural-Language-Pipelines.git cd Building-Natural-Language-Pipelines/

Each chapter contains a pyproject.toml file with the folder's dependencies. (Recommended) Open each folder in a new VS Code window.

Install uv:
pip install uv

Change directories into the folder

Install dependencies:
uv sync

Activate the virtual environment:
source .venv/bin/activate

Select the virtual environment as the Jupyter kernel:

Open any notebook.

Click the kernel picker (top right) and select the .venv environment.

Chapter breakdown

Chapter 2: Diving Deep into Large Language Models

Agent Foundations & State Management

LangGraph Fundamentals: Understanding state-based agent frameworks and graph architecture

Building Simple Agents: Creating agents with state management using MessagesState and reducers

Tool Integration: Connecting agents with external tools (search APIs, databases, custom functions)

Multi-Agent Systems: Designing and coordinating multiple specialized agents in workflows

Middleware Patterns: Implementing logging, authentication, and monitoring layers for agent systems

Local vs Cloud LLMs: Running agents with OpenAI APIs or locally with Ollama (Qwen2, Llama, Mistral)

Chapter 3: Introduction to Haystack

Core Concepts & Foundation

Component Architecture: Understanding Haystack's modular design patterns

Pipeline Construction: Building linear and branching data flow pipelines

Document Processing: Text extraction, cleaning, and preprocessing workflows

Prompting LLMs: Learn to build prompt templates and guide how an LLM responds

Package pipelines as Supercomponents: Abstract a pipeline as a Haystack component

Chapter 4: Bringing components together: Haystack pipelines for different use cases

Scaling & Optimization

Indexing Pipelines: Automated document ingestion and preprocessing workflows

Naive RAG: Semantic search using sentence transformers and embedding models

Hybrid RAG: Combining keyword (BM25) and semantic (vector) search strategies

Reranking: Advanced retrieval techniques using ranker models

Multimodal Pipelines: Processing and analyzing images alongside text data

Chapter 5: Haystack pipeline development with custom components

Extensibility & Testing

Component SDK: Creating custom Haystack components with proper interfaces

Knowledge Graph Integration: Building components for structured knowledge representation

Synthetic Data Generation: Automated test data creation for pipeline validation

Quality Control Systems: Implementing automated evaluation and monitoring components

Unit Testing Frameworks: Comprehensive testing strategies for custom components

Chapter 6: Setting up a reproducible project: naive vs hybrid RAG with reranking and evaluation

Reproducible Workflows & Evaluation

Reproducible Workflow Building Blocks: Setting up consistent environments with Docker, Elasticsearch, and dependency management

Naive RAG Implementation: Building basic retrieval-augmented generation with semantic search

Hybrid RAG with Reranking: Advanced retrieval combining keyword (BM25) and semantic search with rank fusion strategies

Evaluation with RAGAS: Using the RAGAS framework to assess and compare naive vs hybrid RAG system quality across multiple dimensions

Observability with Weights and Biases: Implementing monitoring and tracking for RAG system performance comparison and experiment management

Performance Optimization through Feedback Loops: Creating iterative improvement cycles using evaluation results to enhance retrieval and generation performance

Chapter 7: Production deployment strategies

Deployment & Scaling

Deploying a Retriever Pipeline as FastAPI App

FastAPI REST API: Building production-ready APIs with clean documentation and error handling

Docker Containerization: Full containerization with Docker Compose for scalable deployments

Qdrant Integration: Production-grade document storage and hybrid search capabilities

Local Development Workflows: Script-based development environment setup and testing

Deploying Multiple Pipelines with Hayhooks

Hayhooks Framework: Multi-pipeline deployment using Haystack's native REST API framework

Pipeline Orchestration: Managing multiple RAG pipelines (indexing + querying) as microservices

Service Discovery: Automated API endpoint generation and pipeline management

Chapter 8: Hands-on projects

Real-World Applications & Multi-Agent Systems

Hands-on projects that progress from beginner to advanced complexity, focusing on Named Entity Recognition, Text Classification, and Multi-Agent Systems. Projects includes complete notebooks with custom component definition, pipeline definition, and pipeline serialization.

Named Entity Recognition (NER) - Beginner

Haystack Pipeline Fundamentals: Building basic pipelines for entity extraction workflows

Pre-trained NER Models: Using transformer models to identify people, organizations, and locations

Custom Component Creation: Developing reusable components for text processing

Web Content Processing: Building pipelines that extract entities from web search results

SuperComponents and Agents: Wrapping pipelines as tools and building agents for natural language interaction

Text Classification & Sentiment Analysis - Intermediate

Zero-Shot Classification: Categorizing content without training data using LLMs

External API Integration: Connecting Haystack pipelines with the Yelp API

Model Performance Evaluation: Assessing classification accuracy on labeled datasets

Sentiment Analysis Pipelines: Building custom components for analyzing review sentiment

Haystack Agent Mini Project: Hands-on exercise combining NER and classification pipelines with agent orchestration and Hayhooks deployment

Yelp Navigator - Multi-Agent System - Advanced

Modular Pipeline Architecture: Creating 3 specialized pipelines (business search, details, sentiment) with NER and text classification

Hayhooks Deployment: Deploying pipelines as REST API endpoints for agent consumption

Pipeline Chaining: Connecting multiple specialized Haystack pipelines into complex workflows

LangGraph Multi-Agent Orchestration: Building intelligent supervisor systems that coordinate specialized agents

Case study: Can we achieve the same fluid dynamic reasoning with Haystack primitives

Optional - Advanced LangGraph Supervisor Patterns for Production

This folder contains an extended, production-grade implementation of the agentic supervisor described in Chapter 8.

Three Agent Architectures: Progressive implementations from learning (V1 monolithic) to production-ready (V3 with checkpointing)

State Management Patterns: Understanding how architectural decisions impact token usage and cost (16-50% reduction)

Monolithic vs Supervisor Patterns: Comparing design approaches with automated token measurement tools

Production Features: Error handling with retry policies, conversation persistence with checkpointing, and graceful degradation

Guardrails: Input validation with prompt injection detection and PII sanitization for secure agent interactions

Checkpointing Systems: Thread-based session management with both in-memory (development) and SQLite (production) persistence options

Name		Name	Last commit message	Last commit date
Latest commit History 730 Commits
.github/workflows		.github/workflows
assets		assets
ch2		ch2
ch3		ch3
ch4		ch4
ch5		ch5
ch6		ch6
ch7-hayhooks		ch7-hayhooks
ch7		ch7
ch8		ch8
epilogue		epilogue
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Building Natural Language and LLM Pipelines, First Edition

Build production-grade RAG, tool contracts, and context engineering with Haystack and LangGraph

About the book

Key Learnings

Requirements for this book

Chapters

Get to know Authors

Other Related Books

Note From Author end:

Table of Contents

What You'll Learn to Build

Setting up

Chapter breakdown

Chapter 2: Diving Deep into Large Language Models

Chapter 3: Introduction to Haystack

Chapter 4: Bringing components together: Haystack pipelines for different use cases

Chapter 5: Haystack pipeline development with custom components

Chapter 6: Setting up a reproducible project: naive vs hybrid RAG with reranking and evaluation

Chapter 7: Production deployment strategies

Deploying a Retriever Pipeline as FastAPI App

Deploying Multiple Pipelines with Hayhooks

Chapter 8: Hands-on projects

Named Entity Recognition (NER) - Beginner

Text Classification & Sentiment Analysis - Intermediate

Yelp Navigator - Multi-Agent System - Advanced

Optional - Advanced LangGraph Supervisor Patterns for Production

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages


Microservice architecture	Multi-agent system using microservices

License

PacktPublishing/Building-Natural-Language-and-LLM-Pipelines

Folders and files

Latest commit

History

Repository files navigation

Building Natural Language and LLM Pipelines, First Edition

Build production-grade RAG, tool contracts, and context engineering with Haystack and LangGraph

About the book

Key Learnings

Requirements for this book

Chapters

Get to know Authors

Other Related Books

Note From Author end:

Table of Contents

What You'll Learn to Build

Setting up

Chapter breakdown

Chapter 7: Production deployment strategies

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages