Skip to content

Code repository for "Building Agentic AI" by Sinan Ozdemir. Practical examples, experiments, and implementations covering LLMs, embeddings, agents, RAG, fine-tuning, and optimization.

Notifications You must be signed in to change notification settings

sinanuozdemir/building-agentic-ai

Repository files navigation

Applied AI Book Cover

Building Agentic AI Book - Code Repository

Buy the Book on Amazon: https://a.co/d/eaTeURV

This repository contains code examples, experiments, and implementations for the Building Agentic AI book by Sinan Ozdemir. The book offers a practical, durable foundation for understanding how modern AI systems are built, why they behave the way they do, and how to push them to their limits.

About the Book

Building Agentic AI is a practical guide for builders. If you're a developer deploying your first model, a data scientist making sense of embeddings and agents, or a founder exploring how AI workflows can reshape your product, this repository provides the code and examples to accompany your learning journey.

The book is organized in three acts:

  • Act I: Foundations — LLMs, embeddings, retrieval, and workflows for reliable, cost-effective, scalable systems
  • Act II: Agents — Designing, deploying, and evaluating systems that don't just respond, but act
  • Act III: Optimization — Fine-tuning, quantization, distillation, and tools to push performance while maintaining efficiency

Quick Navigation

Prerequisites

Python Version

  • Python 3.8 or higher recommended
  • Python 3.10+ for optimal compatibility with all libraries

Common Dependencies

Most case studies use these core libraries:

  • langchain and langgraph - Agent frameworks
  • langchain-openai - OpenAI integration
  • openai - OpenAI API client
  • chromadb - Vector database
  • pydantic - Structured outputs and data validation
  • fastapi / flask - Web frameworks
  • streamlit - Interactive UIs
  • playwright - Browser automation
  • pandas / numpy - Data manipulation

Environment Variables

IMPORTANT: Never commit your actual API keys to the repository. Always use environment variables or .env files (which are gitignored).

You'll need API keys for various services. We've provided .env.example files in key directories as templates:

  1. Copy the example file to create your own .env file:

    # Root directory
    cp .env.example .env
    
    # Or for specific case studies
    cp sdr_multi_agent/.env.example sdr_multi_agent/.env
    cp text_to_sql/.env.example text_to_sql/.env
    cp codeact_browser/.env.example codeact_browser/.env
  2. Edit the .env file and add your actual API keys:

    # Required for most case studies
    OPENAI_API_KEY=your_actual_openai_api_key_here
    OPENROUTER_API_KEY=your_actual_openrouter_api_key_here
    
    # For specific case studies
    GROQ_API_KEY=your_actual_groq_api_key  # Case Study 16: Voice Bot
    SERPAPI_API_KEY=your_actual_serpapi_key  # Case Study 9: Deep Research
    FIRECRAWL_API_KEY=your_actual_firecrawl_key  # Case Study 9: Deep Research
    RESEND_API_KEY=your_actual_resend_key  # Case Study 7: AI SDR
    TWILIO_API_KEY=your_actual_twilio_key  # Case Study 16: Voice Bot
    ANTHROPIC_API_KEY=your_actual_anthropic_key  # For Claude models
    LANGSMITH_API_KEY=your_actual_langsmith_key  # Optional: for tracing
  3. For Jupyter notebooks: Some notebooks use %env magic commands. You can either:

    • Set environment variables before starting Jupyter: export OPENROUTER_API_KEY=your_key
    • Or update the notebook's first cell to use your actual key (but remember not to commit it!)

Security Note: All .env files are automatically ignored by git. Never commit actual API keys to the repository.

Installation

  1. Clone the repository:
git clone <repository-url>
cd applied-ai-book
  1. Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies for specific case studies (see individual case study sections)

Case Studies

Case Study 1: Text to SQL Workflow

Description: Build a system that converts natural language questions to SQL queries using RAG (Retrieval-Augmented Generation). This implementation achieves 30% better SQL accuracy than raw LLMs with half the token cost.

Key Concepts: LangGraph workflows, RAG pipelines, ChromaDB vector storage, document retrieval, SQL generation

Code Location: text_to_sql/

Setup:

cd text_to_sql/prototype
pip install -r requirements.txt
# Set OPENAI_API_KEY in .env file
python app.py

Files:

Dependencies: See text_to_sql/prototype/requirements.txt

Key Features:

  • LangGraph-based RAG workflow
  • ChromaDB for vector storage
  • Multiple database support (Formula 1, Superhero, Financial, etc.)
  • Interactive Flask web interface
  • Evidence retrieval with similarity scores

Case Study 2: LLM Evaluation

Description: An exploration of how well the SQL system from Case Study 1 actually works. This includes comprehensive evaluation metrics, accuracy measurements, and performance analysis.

Key Concepts: Evaluation methodologies, accuracy metrics, performance benchmarking, SQL correctness validation

Code Location: text_to_sql/src/choosing_generator/

Setup:

cd text_to_sql/src/choosing_generator
# Install dependencies from parent directory
python run_batch_evaluation.py
python visualize_model_performance.py

Files:

Key Features:

  • SQL accuracy evaluation across multiple models
  • Performance metrics and visualization
  • Batch processing for large-scale evaluation
  • Model comparison and analysis

Case Study 3: LLM Experimentation

Description: Run systematic tests on prompts, models, and embedding models. Learn how even the smallest changes can 2x your performance when you experiment efficiently.

Key Concepts: Systematic experimentation, prompt optimization, embedding model comparison, A/B testing, performance optimization

Code Location: text_to_sql/src/choosing_generator/ and prompting/

Setup:

cd text_to_sql/src/choosing_generator
# Run prompt engineering experiments
jupyter notebook prompt_engineering_generator.ipynb

# Or run multi-model evaluation
python run_multi_model_evaluation.py

Files:

Key Features:

  • Systematic testing framework
  • Prompt optimization techniques
  • Embedding model experiments
  • Performance comparison across models
  • Cost and latency analysis

Case Study 4: "Simple" Summary Prompt

Description: Discover why LLMs favor content at the start and end of prompts. This positional bias is breaking your RAG systems and chatbots, and you may not even know it.

Key Concepts: Positional bias, embedding similarity, prompt engineering, RAG system optimization

Code Location: prompting/

Setup:

cd prompting
jupyter notebook summary_positional_bias.ipynb

Files:

Key Features:

  • Embedding similarity analysis
  • Positional bias discovery
  • Impact on RAG system performance
  • Visualization of bias patterns

Case Study 5: From RAG to Agents

Description: Convert a workflow into an agent that makes its own decisions using tools. Agents handle the weird edge cases your workflow never imagined, but at what cost (literally)?

Key Concepts: ReAct agents, LangGraph, tool usage, agent decision-making, cost analysis

Code Location: text_to_sql/agent/

Setup:

cd text_to_sql/agent
jupyter notebook react_agent_sql.ipynb

Files:

Key Features:

  • LangGraph ReAct agent implementation
  • Tool-based decision making
  • Edge case handling
  • Long-term memory experiments
  • Cost and performance comparison with workflows

Case Study 6: AI Rubrics for Grading

Description: Create scoring systems to evaluate AI outputs consistently and with mitigated bias. Less arguing about quality—more clear, measurable criteria.

Key Concepts: Structured outputs, Pydantic models, evaluation rubrics, bias mitigation, consistent scoring

Code Location: policy_bot/

Setup:

cd policy_bot
pip install -r requirements.txt
# Set OPENROUTER_API_KEY in environment
python -c "from ai.rubric import get_structured_scorer; scorer = get_structured_scorer()"

Files:

Key Features:

  • Pydantic structured outputs for consistent scoring
  • 0-3 scoring scale with detailed reasoning
  • Bias mitigation in evaluation
  • Automated rubric-based grading
  • Integration with policy agents

Case Study 7: AI SDR with MCP

Description: Build multiple agents that research contacts and send emails. Your outreach can finally sound human at scale.

Key Concepts: Multi-agent systems, MCP (Model Context Protocol), Flask applications, Celery async tasks, agent orchestration

Code Location: sdr_multi_agent/

Setup:

cd sdr_multi_agent
# Start Docker services (RabbitMQ, etc.)
docker-compose up -d

# Install dependencies
cd flask_app
pip install -r requirements.txt

# Run Flask app
python app.py

Files:

Dependencies: See sdr_multi_agent/flask_app/requirements.txt

Key Features:

  • Multi-agent system architecture
  • MCP server integration
  • Celery for async task processing
  • Configurable agent system with JSON configs
  • Persistent conversation memory
  • Research and email automation

Case Study 8: Prompt Engineering Agents

Description: Create agents that follow company policies using synthetic test data as a measuring stick. See how one single sentence in a prompt can move accuracy from 44% to 70%.

Key Concepts: Prompt engineering, policy compliance, accuracy optimization, synthetic test data, agent evaluation

Code Location: policy_bot/

Setup:

cd policy_bot
pip install -r requirements.txt
jupyter notebook agent_prompting_test.ipynb

Files:

Dependencies: See policy_bot/requirements.txt

Key Features:

  • Prompt optimization techniques
  • Accuracy improvements (44% to 70%)
  • Synthetic test data generation
  • Policy compliance evaluation
  • Tool call tracking and analysis

Case Study 9: Deep Research + Agentic Workflows

Description: Combine structured workflows with agent flexibility for research tasks. Get reliability without sacrificing adaptability.

Key Concepts: LangGraph workflows, planning and replanning, step execution, structured workflows, agent flexibility

Code Location: deep_research/

Setup:

cd deep_research
pip install -r requirements.txt

# For Streamlit UI
pip install -r streamlit_requirements.txt
streamlit run streamlit_app.py

# Or run notebook
jupyter notebook deep_research_langgraph_demo.ipynb

Files:

Dependencies:

Key Features:

  • Structured workflows with agent flexibility
  • Planning and replanning system
  • Step-by-step research execution
  • Real-time streaming events
  • Performance analytics
  • Web scraping and search integration

Environment Variables Needed:

  • OPENROUTER_API_KEY - For LLM access
  • SERPAPI_API_KEY - For Google search
  • FIRECRAWL_API_KEY - For web scraping

Case Study 10: Agentic Tool Selection Performance

Description: Test how well different LLMs choose the right tools. Tool order in prompts can shift accuracy by 40%.

Key Concepts: Tool selection, positional bias, MCP servers, agent performance evaluation, accuracy analysis

Code Location: agent_positional_bias/

Setup:

cd agent_positional_bias
jupyter notebook "LangGraph_React - MCP + Tool Selection.ipynb"

Files:

Key Features:

  • Tool selection accuracy testing
  • Positional bias analysis (40% accuracy shifts)
  • Comparison across multiple LLMs
  • Reasoning vs non-reasoning model comparison
  • Visualization of tool selection patterns

Case Study 11: Benchmarking Reasoning Models

Description: Compare reasoning models like o1 and Claude against standard LLMs. They may even lose to cheaper models on real tasks—we'll see!

Key Concepts: Reasoning models, chain-of-thought, model benchmarking, cost/performance analysis, o1, Claude reasoning

Code Location: reasoning_llms/

Setup:

cd reasoning_llms
jupyter notebook benchmarking_reasoning_models.ipynb

Files:

Key Features:

  • Reasoning model comparison
  • Cost and performance analysis
  • Tree of Thoughts implementation
  • Real-world task evaluation
  • Performance visualization

Case Study 12: Computer Use

Description: Build agents that control browsers and applications through screenshots. Your agent can finally use software you can't API into.

Key Concepts: Screenshot-based automation, GUI control, browser automation, computer vision, agent control

Code Location: reasoning_llms/computer_use/

Setup:

cd reasoning_llms/computer_use
jupyter notebook using_computer_use.ipynb

Files:

Key Features:

  • Screenshot-based UI automation
  • Browser and application control
  • GUI interaction through vision
  • Benchmarking and evaluation
  • Real-world application control

Case Study 13: Classification vs Multiple Choice

Description: Compare fine-tuning against multiple choice prompting for classification. The winner might depend on if you have 100 or 10,000 examples.

Key Concepts: Fine-tuning, classification, multiple choice prompting, model comparison, data efficiency

Code Location: finetuning/app_review_clf/ and root clf.ipynb

Colab Notebooks:

Setup:

# For fine-tuning approach
cd finetuning/app_review_clf
jupyter notebook openai_app_review_ft.ipynb

# For classification comparison
cd ../..
jupyter notebook clf.ipynb

Files:

Key Features:

  • Fine-tuning vs prompting comparison
  • Classification accuracy analysis
  • Data efficiency evaluation
  • Cost and performance trade-offs
  • App review sentiment classification

Case Study 14: Domain Adaptation

Description: Fine-tune Qwen on domain-specific documents. Generic models become experts in your exact business rules.

Key Concepts: Domain adaptation, fine-tuning, Qwen model, policy compliance, business rule learning

Colab Notebook:

Setup:

cd policy_bot
jupyter notebook rubric_grade_domain_adapt.ipynb

Files:

Key Features:

  • Qwen model fine-tuning
  • Airbnb policy domain adaptation
  • Domain-specific expertise
  • Before/after performance comparison
  • Business rule learning

Case Study 15: Speculative Decoding

Description: Speed up inference by having a small model draft for a large model. Same exact outputs, 2-3x faster, sometimes.

Key Concepts: Speculative decoding, inference acceleration, draft models, performance optimization

Colab Notebook:


Case Study 16: Voice Bot - Need for Speed

Description: As newer voice-to-voice models mature, building real-time voice bots with streaming audio can still perform well with sub-500ms responses making conversations feel natural.

Key Concepts: Real-time voice streaming, WebSockets, Twilio integration, Groq API, low-latency responses

Code Location: multimodal/twilio/

Setup:

cd multimodal/twilio

# Option 1: Docker (recommended)
docker-compose up --build

# Option 2: Local setup
pip install -r requirements.txt
# Create .env file with GROQ_API_KEY, NGROK_AUTHTOKEN
python twilio_app.py

Files:

Dependencies: See multimodal/twilio/requirements.txt

Key Features:

  • Real-time voice streaming via WebSockets
  • Sub-500ms response times
  • Twilio voice call integration
  • Groq API for fast inference
  • Audio recording and storage
  • Docker support with ngrok integration

Environment Variables Needed:

  • GROQ_API_KEY - For Groq API access
  • NGROK_AUTHTOKEN - For public tunnel (optional)
  • PORT - Server port (default: 5015)

Case Study 17: Fine-Tuning Matryoshka Embeddings

Description: Train embeddings that work at multiple dimensions. Dynamically trade speed for accuracy based on each query's needs.

Key Concepts: Matryoshka embeddings, multi-dimensional embeddings, dynamic dimension selection, speed/accuracy trade-offs

Colab Notebook:


Common Patterns

LangGraph Workflow Pattern

Most case studies use LangGraph for building agent workflows:

from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage

class WorkflowState(BaseModel):
    messages: Annotated[List[BaseMessage], add_messages]
    # ... other state fields

workflow = StateGraph(WorkflowState)
workflow.add_node("process", process_node)
workflow.add_edge("process", END)
app = workflow.compile()

MCP Server Setup

For case studies using MCP (Model Context Protocol):

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("Server Name")

@mcp.tool()
def my_tool(param: str) -> str:
    """Tool description"""
    return result

if __name__ == "__main__":
    mcp.run(transport="stdio")

Agent Creation Pattern

Creating ReAct agents with LangGraph:

from langchain.agents import create_agent
from langgraph.checkpoint.memory import MemorySaver
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5.1")
tools = [tool1, tool2, tool3]
checkpointer = MemorySaver()

agent = create_agent(
    llm,
    tools,
    prompt=system_prompt,
    checkpointer=checkpointer
)

Evaluation Pattern

Structured evaluation using Pydantic:

from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

class ScoreResponse(BaseModel):
    reasoning: str = Field(description="Evaluation reasoning")
    score: int = Field(description="Score from 0-3")

llm = ChatOpenAI(model="gpt-4")
structured_llm = llm.with_structured_output(ScoreResponse)

Troubleshooting

Common Issues

API Key Errors:

  • Ensure all required API keys are set in your .env file
  • Check that environment variables are loaded correctly
  • Verify API key format and permissions

Import Errors:

  • Install dependencies from the case study's requirements.txt
  • Ensure you're using the correct Python version (3.8+)
  • Try reinstalling packages: pip install --upgrade -r requirements.txt

ChromaDB Issues:

  • Clear the ChromaDB directory if you encounter corruption
  • Ensure write permissions for the database directory
  • Check disk space availability

Docker Issues:

  • Ensure Docker is running: docker ps
  • Check Docker Compose version: docker-compose --version
  • Review logs: docker-compose logs

Notebook Issues:

  • Restart kernel if cells hang
  • Clear output and re-run cells in order
  • Check that all required files are in the correct directories

Getting Help

  1. Check the specific case study's directory for additional README files
  2. Review error messages carefully—they often point to missing dependencies
  3. Ensure all environment variables are set correctly
  4. Verify Python version compatibility

Contributing

Code Style

  • Follow PEP 8 for Python code
  • Use type hints where possible
  • Include docstrings for functions and classes
  • Keep notebooks organized with clear markdown cells

Adding New Case Studies

  1. Create a new directory following the naming convention
  2. Include a requirements.txt file
  3. Add a README.md in the directory with setup instructions
  4. Update this main README.md with the new case study
  5. Include example code and test cases

Testing

  • Run notebooks from top to bottom to ensure they work
  • Test with different API keys/models where applicable
  • Verify that results match expected outputs
  • Include error handling in production code

About the Author

Sinan Ozdemir is an AI entrepreneur, educator, and advisor. He holds a master's degree in pure mathematics and has founded startups, written multiple textbooks on AI, and guided venture-backed companies through deploying AI at scale. He currently serves as CTO at LoopGenius, where he leads teams building AI-driven automation systems, and continues to teach, write, and share knowledge on applied AI.


License

This repository contains code examples and implementations for the Building Agentic AI book. Please refer to the book for detailed explanations and context.


Last updated: 2025

About

Code repository for "Building Agentic AI" by Sinan Ozdemir. Practical examples, experiments, and implementations covering LLMs, embeddings, agents, RAG, fine-tuning, and optimization.

Topics

Resources

Stars

Watchers

Forks