Skip to content

PacktPublishing/Building-Natural-Language-and-LLM-Pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building Natural Language and LLM Pipelines, First Edition

This is the code repository for Building Natural Language and LLM Pipelines, First Edition, published by Packt.

Build production-grade RAG, tool contracts, and context engineering with Haystack and LangGraph

Laura Funderburk

      Free PDF       Graphic Bundle       Amazon      

About the book

Building Natural Language and LLM Pipelines, First Edition

Develop production-ready NLP and LLM pipelines with Haystack using strict tool contracts, and orchestrate context-aware, observable RAG and agentic workflows with LangGraph for scalable, customizable AI applications.

Modern LLM applications often break in production due to brittle pipelines, loose tool definitions, and noisy context. This book shows you how to build production-ready, context-aware systems using Haystack and LangGraph. You’ll learn to design deterministic pipelines with strict tool contracts and deploy them as microservices. Through structured context engineering, you’ll orchestrate reliable agent workflows and move beyond simple prompt-based interactions. You'll start by understanding LLM behavior—tokens, embeddings, and transformer models—and see how prompt engineering has evolved into a full context engineering discipline. Then, you'll build retrieval-augmented generation (RAG) pipelines with retrievers, rankers, and custom components using Haystack’s graph-based architecture. You’ll also create knowledge graphs, synthesize unstructured data, and evaluate system behavior using Ragas and Weights & Biases. In LangGraph, you’ll orchestrate agents with supervisor-worker patterns, typed state machines, retries, fallbacks, and safety guardrails. By the end of the book, you’ll have the skills to design scalable, testable LLM pipelines and multi-agent systems that remain robust as the AI ecosystem evolves.

Key Learnings

  • Build structured retrieval pipelines with Haystack
  • Apply context engineering to improve agent performance
  • Design and implement robust end-to-end NLP and LLM pipelines using Haystack
  • Track cost and quality with Ragas and Weights & Biases
  • Implement retries, circuit breakers, and observability
  • Deploy REST APIs using FastAPI and Hayhooks
  • Serve pipelines as LangGraph-compatible microservices
  • Use LangGraph to orchestrate multi-agent workflows
  • Implement real-world NLP projects that use NER, text classification and sentiment analysis as agentic tools
  • Design sovereign agents for high-volume local execution

Requirements for this book

Clone the repository

git clone https://github.com/PacktPublishing/Building-Natural-Language-Pipelines.git

cd Building-Natural-Language-Pipelines/

Each chapter contains a pyproject.toml file with the folder's dependencies. (Recommended) Open each folder in a new VS Code window.

  1. Install uv:
    pip install uv
  2. Change directories into the folder
  3. Install dependencies:
    uv sync
  4. Activate the virtual environment:
    source .venv/bin/activate
  5. Select the virtual environment as the Jupyter kernel:
    • Open any notebook.
    • Click the kernel picker (top right) and select the .venv environment.

Chapters

Chapters Colab Kaggle Gradient Studio Lab
Chapter 1: Introduction to Natural Language Processing (NLP) pipelines
Chapter 2: Foundational concepts in NLP pipelines
  • 01_prompt-ollama-model.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 02_create-simple-agent.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 03_document-qa-langchain.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 02_middleware-tutorial.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 03_multi-agent-workflow.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 04_understanding-state-graph.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 05_graph-based-agent-with-tools.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 06_multi-agent-systems-middleware.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 3: Introduction to Haystack by Deepset
  • components.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • supercomponents.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • your-first-custom-component.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • your-first-pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 4: Bringing components together: Haystack pipelines for different use cases
  • async_hybrid_pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • hybrid_pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • indexing_pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 1_image_embeddings_with_clip.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 2_multimodal_indexing_clip.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 3_multimodal_indexing_llm.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 4_multimodal_rag_vision_llm.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • 5_audio_transcription_whisper.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • conditional_router.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • metadata_router.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • text_language_router.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • semantic_search_pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • supercomponents_and_agentic_rag.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 5: Haystack pipeline development with custom components
  • advanced_branching_pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • pdf_knowledge_graph_pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • prefixed_custom-component.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • warmup_component.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • web_knowledge_graph_pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 6: Setting up a reproducible project: Q&A pipeline
  • add_observability_with_wandb.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • get_started_rag_evaluation_with_ragas.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • ragas_evaluation_with_custom_components.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 7: Deploying Haystack-based applications
Chapter 8: Hands-on Projects
  • tool-calling.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • ner-with-haystack-search-pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • classification-with-haystack-search-pipeline.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • classification-ner-agent-exercise.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • sentiment_analysis.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • text-classification.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • haystack_agent_with_tools.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • haystack_looping_supervisor.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • langgraph_multiagent_supervisor.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • pipeline_chaining_guide.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 9: Future Trends and Beyond

Get to know Authors

Laura Funderburk Laura Funderburk is Developer Relations and Community Lead at AI Makerspace, where she specializes in production-grade AI systems using large language models, RAG pipelines, and agentic workflows. With a background in AI operations, machine learning engineering, and developer advocacy, she has built scalable NLP tools across academia, non-profit, and private sectors. A frequent speaker at PyCon US and AI By the Bay, Laura is also a skilled Python engineer and technical author. She holds a Bachelor’s in Mathematics from Simon Fraser University and received the prestigious Terry Fox Gold Medal for resilience and community leadership. She remains active in open-source, mentorship, and outreach, helping engineers build reliable LLM applications through writing, teaching, and hands-on projects.

Other Related Books

Note From Author end:

    Table of Contents

    What You'll Learn to Build

    This book guides you through building advanced Retrieval-Augmented Generation (RAG) systems and multi-agent applications using the Haystack 2.0, Ragas and LangGraph frameworks. Beginning with state-based agent development using LangGraph, you'll learn to build intelligent agents with tool integration, middleware patterns, and multi-agent coordination. You'll then master Haystack's component architecture, progressing through creating intelligent search systems with semantic and hybrid retrieval, building custom components for specialized tasks, and implementing comprehensive evaluation frameworks. The journey advances through production deployment strategies with Docker and REST APIs, culminating in hands-on projects including named entity recognition systems, zero-shot text classification pipelines, sentiment analysis tools, and sophisticated multi-agent orchestration systems that coordinate multiple specialized Haystack pipelines through supervisor-worker patterns with LangGraph.

    Chapter 2: Single agents and multi agents with LangChain and LangGraph

    This chapter contains optional LangGraph demonstrations that introduce state-based agents at a conceptual level. These examples are previews intended to build intuition. The full, practical use of LangGraph for multi-agent orchestration appears later in Chapter 8 and the epilogue, once the Haystack tool layer has been fully developed.

    Agent with one tool Agent calling supervisor
    Chapter 3: Building robust agent tools with Haystack
    Supercomponents and pipeline Prompt template pipeline
    Chapter 4: RAG pipelines: indexing and retrieval for text-based and multimodal pipelines (image and audio)
    Indexing pipeline Hybrid RAG pipeline
    Chapter 5: Build custom components: synthetic data generation with Ragas
    Knowledge graph and synthetic data generation (SDG) pipeline SDG applied to websites and PDFS

    Chapter 6: Reproducible evaluation of hybrid and naive RAG with Ragas and Weights and Biases

    Chapter 7: Deploy pipelines as an API with FastAPI and Hayhooks

    Chapter 8 and Optional Advanced Modules: Capstone and Agentic Patterns for Production

    Microservice architecture Multi-agent system using microservices

    📝 Sovereign-Friendly & Local Execution: The majority of exercises throughout this book are written so you can choose between OpenAI APIs or local models via Ollama (such as Mistral Nemo, GPT-OSS, or Deepseek-R1 and Qwen3), with the exception of the cost tracking exercises in Chapter 6 which specifically demonstrate OpenAI API usage monitoring. Each notebook provides specific model recommendations to help you choose the most suitable option for that particular exercise. The frameworks explored are extensible and models from other providers can be used to substitute OpenAI or local models. No US cloud, external APIs, or proprietary services are required for the majority of the book, making it easy to run in EU-regulated or air-gapped environments. The epilogue folder includes an optional prototype-to-production multi-agent implementation with LangGraph using LangSmith Studio. These exercises require a free LangSmith Studio API key, all exercises can also be run entirely locally and you can disable the tracer export LANGCHAIN_TRACING_V2="false". Scripts are provided so you can run the agent on your terminal - you simply won’t see the studio traces or visualize the agent if you choose not to use LangSmith studio.

    Setting up

    Clone the repository

    git clone https://github.com/PacktPublishing/Building-Natural-Language-Pipelines.git
    
    cd Building-Natural-Language-Pipelines/
    

    Each chapter contains a pyproject.toml file with the folder's dependencies. (Recommended) Open each folder in a new VS Code window.

    1. Install uv:
      pip install uv
    2. Change directories into the folder
    3. Install dependencies:
      uv sync
    4. Activate the virtual environment:
      source .venv/bin/activate
    5. Select the virtual environment as the Jupyter kernel:
      • Open any notebook.
      • Click the kernel picker (top right) and select the .venv environment.

    Chapter breakdown

    Agent Foundations & State Management

    • LangGraph Fundamentals: Understanding state-based agent frameworks and graph architecture
    • Building Simple Agents: Creating agents with state management using MessagesState and reducers
    • Tool Integration: Connecting agents with external tools (search APIs, databases, custom functions)
    • Multi-Agent Systems: Designing and coordinating multiple specialized agents in workflows
    • Middleware Patterns: Implementing logging, authentication, and monitoring layers for agent systems
    • Local vs Cloud LLMs: Running agents with OpenAI APIs or locally with Ollama (Qwen2, Llama, Mistral)

    Core Concepts & Foundation

    • Component Architecture: Understanding Haystack's modular design patterns
    • Pipeline Construction: Building linear and branching data flow pipelines
    • Document Processing: Text extraction, cleaning, and preprocessing workflows
    • Prompting LLMs: Learn to build prompt templates and guide how an LLM responds
    • Package pipelines as Supercomponents: Abstract a pipeline as a Haystack component

    Scaling & Optimization

    • Indexing Pipelines: Automated document ingestion and preprocessing workflows
    • Naive RAG: Semantic search using sentence transformers and embedding models
    • Hybrid RAG: Combining keyword (BM25) and semantic (vector) search strategies
    • Reranking: Advanced retrieval techniques using ranker models
    • Multimodal Pipelines: Processing and analyzing images alongside text data

    Extensibility & Testing

    • Component SDK: Creating custom Haystack components with proper interfaces
    • Knowledge Graph Integration: Building components for structured knowledge representation
    • Synthetic Data Generation: Automated test data creation for pipeline validation
    • Quality Control Systems: Implementing automated evaluation and monitoring components
    • Unit Testing Frameworks: Comprehensive testing strategies for custom components

    Reproducible Workflows & Evaluation

    • Reproducible Workflow Building Blocks: Setting up consistent environments with Docker, Elasticsearch, and dependency management
    • Naive RAG Implementation: Building basic retrieval-augmented generation with semantic search
    • Hybrid RAG with Reranking: Advanced retrieval combining keyword (BM25) and semantic search with rank fusion strategies
    • Evaluation with RAGAS: Using the RAGAS framework to assess and compare naive vs hybrid RAG system quality across multiple dimensions
    • Observability with Weights and Biases: Implementing monitoring and tracking for RAG system performance comparison and experiment management
    • Performance Optimization through Feedback Loops: Creating iterative improvement cycles using evaluation results to enhance retrieval and generation performance

    Chapter 7: Production deployment strategies

    Deployment & Scaling

    • FastAPI REST API: Building production-ready APIs with clean documentation and error handling
    • Docker Containerization: Full containerization with Docker Compose for scalable deployments
    • Qdrant Integration: Production-grade document storage and hybrid search capabilities
    • Local Development Workflows: Script-based development environment setup and testing
    • Hayhooks Framework: Multi-pipeline deployment using Haystack's native REST API framework
    • Pipeline Orchestration: Managing multiple RAG pipelines (indexing + querying) as microservices
    • Service Discovery: Automated API endpoint generation and pipeline management

    Real-World Applications & Multi-Agent Systems

    Hands-on projects that progress from beginner to advanced complexity, focusing on Named Entity Recognition, Text Classification, and Multi-Agent Systems. Projects includes complete notebooks with custom component definition, pipeline definition, and pipeline serialization.

    • Haystack Pipeline Fundamentals: Building basic pipelines for entity extraction workflows
    • Pre-trained NER Models: Using transformer models to identify people, organizations, and locations
    • Custom Component Creation: Developing reusable components for text processing
    • Web Content Processing: Building pipelines that extract entities from web search results
    • SuperComponents and Agents: Wrapping pipelines as tools and building agents for natural language interaction
    • Zero-Shot Classification: Categorizing content without training data using LLMs
    • External API Integration: Connecting Haystack pipelines with the Yelp API
    • Model Performance Evaluation: Assessing classification accuracy on labeled datasets
    • Sentiment Analysis Pipelines: Building custom components for analyzing review sentiment
    • Haystack Agent Mini Project: Hands-on exercise combining NER and classification pipelines with agent orchestration and Hayhooks deployment
    • Modular Pipeline Architecture: Creating 3 specialized pipelines (business search, details, sentiment) with NER and text classification
    • Hayhooks Deployment: Deploying pipelines as REST API endpoints for agent consumption
    • Pipeline Chaining: Connecting multiple specialized Haystack pipelines into complex workflows
    • LangGraph Multi-Agent Orchestration: Building intelligent supervisor systems that coordinate specialized agents
    • Case study: Can we achieve the same fluid dynamic reasoning with Haystack primitives

    This folder contains an extended, production-grade implementation of the agentic supervisor described in Chapter 8.

    • Three Agent Architectures: Progressive implementations from learning (V1 monolithic) to production-ready (V3 with checkpointing)
    • State Management Patterns: Understanding how architectural decisions impact token usage and cost (16-50% reduction)
    • Monolithic vs Supervisor Patterns: Comparing design approaches with automated token measurement tools
    • Production Features: Error handling with retry policies, conversation persistence with checkpointing, and graceful degradation
    • Guardrails: Input validation with prompt injection detection and PII sanitization for secure agent interactions
    • Checkpointing Systems: Thread-based session management with both in-memory (development) and SQLite (production) persistence options

Releases

No releases published

Packages

No packages published

Contributors 5