Galileo Golden Demo

A multi-turn agentic system that showcases Galileo across multiple domains and agent frameworks, designed to be used for product demos. The code itself is reusable and configurable for a variety of use cases.

What This Repo Is

A multi-turn agentic system showcasing Galileo's observability capabilities with configurable domains and RAG integration. Built to be reusable for product demos with minimal setup time.

What This Repo Isn't

Not a production reference architecture or replacement for customer-specific POCs requiring heavy customization.

Getting Started

Prerequisites

Python 3.8+
OpenAI API key
Galileo API key
Pinecone API keys (for both local and hosted environments)

Setup

Clone the repository

git clone <repository-url>
cd galileo-golden-demo

Set up virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install requirements
```
pip install -r requirements.txt
```

Configure secrets Copy the secrets template and add your API keys:

cp .streamlit/secrets.toml.template .streamlit/secrets.toml

Edit .streamlit/secrets.toml with your actual API keys:

# API Keys
openai_api_key = "your_openai_api_key_here"
galileo_api_key = "your_galileo_api_key_here"

# Galileo Configuration
galileo_console_url = "https://console.galileo.ai"  # or your custom URL

# Pinecone Configuration
pinecone_api_key_local = "your_local_project_api_key"
pinecone_api_key_hosted = "your_hosted_project_api_key"

# Environment: "local" for development, "hosted" for production
environment = "local"

Note: Galileo project names are configured per-domain in domains/{domain}/config.yaml

Run the Streamlit app
```
streamlit run app.py
```

The app will be available at http://localhost:8501

Model Selection

You can change the LLM used for chat and experiments from the sidebar (Model → LLM dropdown). Each domain's config.yaml defines a default model and additional models (OpenAI family). The selected model applies to both the Chat tab and the Experiments tab, and you can change it mid-session without losing conversation history.

Multi-Domain Support

This demo supports multiple domains with automatic routing and separate Galileo projects per domain. The app automatically discovers all domains in the domains/ directory and creates navigation pages for each.

Each domain automatically gets its own Galileo project using the convention: galileo-demo-{domain_name} (e.g., galileo-demo-finance). You can optionally override this in the domain's config.yaml.

📖 For detailed multi-domain setup instructions, see documentation/MULTI_DOMAIN_SETUP.md

How to Add a New Domain

This demo code is designed to easily be extended to different domains, that way, SE's can spend less time writing code and more time focusing on how to display Galileo in the best light.

Adding a new domain is straightforward - simply copy the existing finance domain structure and customize the components:

1. Create Domain Directory Structure

mkdir domains/your_domain_name
cd domains/your_domain_name

Create the following structure:

your_domain_name/
├── config.yaml          # Domain configuration
├── system_prompt.json   # System prompt for the agent
├── dataset.csv          # Evaluation dataset (optional)
├── docs/               # RAG documents
│   ├── document1.pdf
│   └── document2.csv
└── tools/              # Domain-specific tools
    ├── schema.json     # Tool definitions (OpenAI format)
    └── logic.py        # Tool implementation

2. Configure Domain Settings

config.yaml - Main configuration file:

domain:
  name: "your_domain"
  description: "Your domain description"

# Galileo Configuration (OPTIONAL)
# If not specified, defaults to: "galileo-demo-{domain_name}"
# galileo:
#   project: "custom-project-name"      # Override default project name
#   log_stream: "custom-stream"         # Override default log stream

ui:
  app_title: "Your Domain Assistant"
  icon: "🤖"  # Icon for navigation (optional, defaults to 🤖)
  example_queries:
    - "Example query 1"
    - "Example query 2"

# Model configuration (OpenAI family)
# default_model: used by default; additional_models: list shown in the sidebar selector
model:
  default_model: "gpt-4.1"
  temperature: 0.1
  additional_models:
    - "gpt-4o"
    - "gpt-4o-mini"
    - "gpt-4.1"

rag:
  enabled: true
  chunk_size: 1000
  chunk_overlap: 200
  top_k: 5

tools:
  - "your_tool_name"

vectorstore:
  embedding_model: "text-embedding-3-large"

# Optional: Add Galileo Protect (see Protect section below)
# protect:
#   metrics:
#     - name: "prompt_injection"
#       operator: "any"
#       target_values: ["impersonation", "obfuscation"]
#   messages:
#     - "I cannot process that request."

# Optional: Hallucination demo examples (see Hallucination Demo section below)
# demo_hallucinations:
#   - question: "Sample question"
#     hallucinated_answer: "Wrong answer"
#     context:
#       - "Real context"

system_prompt.json - Define the agent's behavior:

{
  "system_prompt": "You are a helpful assistant for [your domain]. Your role is to..."
}

tools/schema.json - Define available tools in OpenAI function format:

[
  {
    "name": "your_tool_name",
    "description": "What your tool does",
    "parameters": {
      "type": "object",
      "properties": {
        "param1": {
          "type": "string",
          "description": "Parameter description"
        }
      },
      "required": ["param1"]
    }
  }
]

tools/logic.py - Implement tool functionality:

def your_tool_name(param1: str) -> str:
    """
    Tool implementation
    """
    # Your logic here
    return "Tool result"

 TOOLS = [your_tool_name]

Make sure you export your tools in this file by creating a TOOLS array at the end of your file

3. Add Domain Documents

Place your RAG documents in the docs/ directory:

PDFs, text files, CSVs are all supported
Documents will be automatically chunked and embedded

4. Set Up Vector Database

The app uses Pinecone for vector storage. This is a one-time setup per domain and environment:

# For local demos
python helpers/setup_vectordb.py your_domain_name local

# For hosted demos  
python helpers/setup_vectordb.py your_domain_name hosted

Important Notes:

You need both project API keys to create indexes using the setup scripts
Once indexes are created, you only need the environment and matching API key in your secrets file
This processes documents from domains/your_domain_name/docs/ directory
Creates Pinecone indexes that persist in the cloud and don't need to be rebuilt

See documentation/PINECONE_SETUP.md for detailed configuration instructions.

5. Test Your Domain

That's it! The app will automatically discover your new domain:

streamlit run app.py

Your domain will be available at:

Root URL: http://localhost:8501 (defaults to "finance" domain, or first available domain)
Direct URL: http://localhost:8501/your_domain_name

6. Create a Domain README (Optional but Recommended)

Create a README.md in your domain directory to help users understand what questions they can ask. This is especially helpful for demos and testing.

See domains/finance/README.md and domains/healthcare/README.md for complete examples.

How To Add a Domain with Cursor

Watch the following video tutorial to see how you can add a new domain using cursor: https://drive.google.com/file/d/1yM0dMa9uNNJay1q9gfPZJ3eTJ4lPB129/view?usp=drive_link

Underlying Architecture

Data Flow

User Input → Streamlit UI captures user message
Agent Processing → AgentFactory creates domain-specific agent
Tool Execution → Agent decides which tools to call based on user intent
RAG Integration → Pinecone vector database provides relevant context when needed
Response Generation → Agent synthesizes final response
Observability → All interactions logged to Galileo automatically

Vector Database Architecture

The app uses Pinecone for vector storage with environment-based configuration:

Local Demos: Uses galileo-demo-local Pinecone project
Hosted Demos (i.e. streamlit): Uses galileo-demo-hosted Pinecone project
Index Naming: {domain}-{environment}-index (e.g., finance-local-index)
Automatic Selection: When the app executes vectorDB searches, the app automatically uses the correct project based on environment setting

See documentation/PINECONE_SETUP.md for detailed configuration instructions.

Code Structure

galileo-golden-demo/
├── app.py                    # Streamlit application entry point
├── agent_factory.py          # Agent creation and management
├── base_agent.py            # Abstract base agent class
├── domain_manager.py        # Domain configuration management
├── setup_env.py            # Environment setup utilities
├── run_streamlit.py        # Alternative app runner
├── requirements.txt         # Python dependencies
├── documentation/          # Setup guides and documentation
│   ├── MULTI_DOMAIN_SETUP.md  # Multi-domain configuration guide
│   └── PINECONE_SETUP.md      # Pinecone setup instructions
├── agent_frameworks/        # Agent framework implementations
│   └── langgraph/
│       ├── agent.py         # LangGraph agent implementation
│       └── langgraph_rag.py # RAG integration for LangGraph
├── domains/                 # Domain-specific configurations
│   └── finance/            # Example finance domain
│       ├── config.yaml     # Domain configuration
│       ├── system_prompt.json
│       ├── dataset.csv     # Evaluation data
│       ├── docs/          # RAG documents (for vectorDB)
│       └── tools/         # Domain tools
├── experiments/            # Experiment system (UI + CLI)
│   ├── experiment_helpers.py  # Shared experiment functions
│   ├── run_experiment.py      # CLI script to run experiments
│   ├── create_galileo_dataset.py  # CLI script to create datasets
│   └── README.md              # Detailed experiments documentation
├── helpers/                # Utility scripts
│   ├── setup_vectordb.py  # Pinecone vector database setup
│   ├── test_vectordb.py   # Vector database testing
│   ├── protect_helpers.py # Galileo Protect stage setup and rulesets
│   ├── hallucination_helpers.py  # Hallucination demo logging
│   └── galileo_api_helpers.py  # Galileo API utilities
└── tools/                 # Shared tools
    └── rag_retrieval.py   # General RAG functionality (not implemented)

For Sales Engineers

As an SE, you primarily need to focus on the domains/ directory:

To customize for a demo: Update the domain configuration files
To add new use cases: Create a new domain following the structure above
For troubleshooting: If you encounter issues with other files, reach out to the FDE team immediately

The system is designed so that domain customization requires just configuration updates and document additions.

Running Experiments

The demo includes a full experiments system to evaluate your agents using Galileo. Experiments can be run from both the Streamlit UI and the command line.

Quick Start

Via UI

Start the Streamlit app: streamlit run app.py
Click on the 🧪 Experiments tab
Follow the 3-step workflow:
- Select or create a dataset
- Configure experiment settings and metrics
- Run the experiment and view results

Via CLI

Step 1: Create a Dataset (one-time setup)

# Preview the dataset before creating
python experiments/create_galileo_dataset.py finance --preview

# Create the dataset in Galileo
python experiments/create_galileo_dataset.py finance

This script:

Reads the domains/{domain}/dataset.csv file
Validates it has input and output columns
Creates a Galileo dataset with name: "{Domain} Domain Dataset"
Returns the dataset ID for reference

Step 2: Run an Experiment

# Run experiment with default settings
python experiments/run_experiment.py finance

# Run with custom experiment name
python experiments/run_experiment.py finance --experiment-name "my-experiment-v1"

This script:

Loads the dataset created in Step 1
Runs each input through the domain's agent
Evaluates responses with selected metrics
Logs all traces to Galileo as an experiment
Provides link to view results in Galileo Console

Key Features

Multiple Dataset Options: Select existing datasets, create from sample data, or upload CSV files
Model Selection: Experiments use the same model as the sidebar (Model → LLM). Change it in the sidebar before running; the experiment config shows which model will be used.
Custom Naming: Avoid conflicts with customizable dataset and experiment names
Direct Links: Click through to view datasets and results in Galileo Console
Flexible Metrics: Choose which metrics to evaluate for each run
Tab Navigation: Easy access alongside the Chat interface

📖 Full Documentation

For detailed information including:

Complete UI workflow guide
CLI usage examples
Dataset format requirements
Architecture and integration details
Available metrics

See experiments/README.md for the full documentation.

Galileo Protect Integration

The demo includes Galileo Protect for runtime protection against harmful content. Protect can be enabled from the sidebar and is fully configurable per domain.

How to Enable

Enable in UI: Toggle "Enable Prompt Injection Protection" in the sidebar
Automatic Setup: The app automatically creates and configures a Protect stage
Runtime Protection: Each query is checked against configured rules before processing
Observability: All Protect checks are logged to Galileo along with agent traces

Configuring Protect for Your Domain

Add a protect section to your domain's config.yaml:

# Protect configuration
protect:
  metrics:
    - name: "prompt_injection"
      operator: "any"
      target_values:
        - "impersonation"
        - "obfuscation"
        - "simple_instruction"
        - "few_shot"
        - "new_context"
    - name: "input_toxicity"
      operator: "gt"
      threshold: 0.95
  messages:
    - "I'm sorry, but I cannot process that request."
    - "I've detected harmful content. Please rephrase your query."

What You Get

Domain-Specific Rules: Configure different protection rules for each domain
Multiple Metrics: Combine prompt injection, toxicity, PII detection, and more
Custom Messages: Define what users see when Protect triggers
Full Observability: All checks logged to Galileo with complete trace visibility
Automatic Routing: Harmful queries are blocked before reaching your agent

Learn More

Protect Overview - Complete guide to runtime protection concepts and metrics
LangChain Integration - Using Protect with LangChain and LangGraph

Hallucination Demo

The demo includes a Hallucination Demo feature to showcase Galileo's hallucination detection capabilities. This allows you to log intentional hallucinations that contradict retrieved context.

How It Works

Click "Log Hallucination" in the sidebar
A pre-configured hallucination is logged to Galileo with:
- Real context documents (that say one thing)
- A hallucinated answer (that contradicts the context)
Galileo's hallucination detection flags the contradiction

Configuring Hallucinations for Your Domain

Add a demo_hallucinations section to your domain's config.yaml:

demo_hallucinations:
  - question: "What was the Q4 revenue?"
    hallucinated_answer: "Revenue was $9.3B, up 4% from the previous quarter."
    # NOTE: The real answer in context says "up 4% from a year ago"
    context:
      - "Q4 revenue was $9.3 billion, up 4% from a year ago."
      - "Additional context documents..."

Chaos Engineering

The demo includes a Chaos Engineering system to showcase Galileo's observability and detection capabilities by intentionally injecting failures. Chaos modes can be toggled from the sidebar during demos.

Available Chaos Modes

The system includes 5 chaos modes that work automatically across all domains:

🔧 Tool Instability - Simulate API failures with realistic HTTP errors
🔢 Sloppiness - Corrupt numbers in tool outputs before LLM sees them
💥 Data Corruption - Force LLM to corrupt data it receives correctly
📚 RAG Disconnects - Simulate vector database failures
⏱️ Rate Limits - Inject rate limit errors (429 responses)

All modes operate at 100% when enabled for predictable, demo-ready behavior.

Each mode tests different observability capabilities and helps demonstrate how Galileo detects issues at different levels (span, trace, session).

How to Use

Enable in UI: Toggle chaos modes in the sidebar under "Chaos Engineering"
Run Queries: Ask normal questions - chaos is injected automatically based on configured rates
Check Galileo: View traces in Galileo Console to see detected issues
View Stats: Real-time counters show how many chaos events occurred
Reset Stats: Click "Reset Stats" to clear counters between demos

What Makes This Special

🌍 Domain-Agnostic: Works automatically across all domains without custom code
🎯 Targeted Testing: Each mode tests specific observability capabilities
📊 Real-time Stats: See chaos injection rates and counts in the UI
🔧 Demo-Ready: Perfect for showing Galileo's detection capabilities in action

Learn More

📖 Full Chaos Engineering Documentation - Complete guide including:

Detailed explanation of each chaos mode
What Galileo detects for each type of failure
Technical architecture and how chaos is applied
Demo tips and best practices
Common questions and troubleshooting

What's Coming Next

Live deployment URL for easy demo access without local setup

Updates and Issues

If you encounter any issues or have feedback please contact the FDE team via slack

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.streamlit		.streamlit
agent_frameworks/langgraph		agent_frameworks/langgraph
documentation		documentation
domains		domains
experiments		experiments
helpers		helpers
tools		tools
.gitignore		.gitignore
README.md		README.md
agent_factory.py		agent_factory.py
app.py		app.py
base_agent.py		base_agent.py
chaos_engine.py		chaos_engine.py
chaos_wrapper.py		chaos_wrapper.py
domain_manager.py		domain_manager.py
requirements.txt		requirements.txt
run_streamlit.py		run_streamlit.py
setup_env.py		setup_env.py
tracing_setup.py		tracing_setup.py

Folders and files

Latest commit

History

Repository files navigation

Galileo Golden Demo

What This Repo Is

What This Repo Isn't

Getting Started

Prerequisites

Setup

Model Selection

Multi-Domain Support

How to Add a New Domain

1. Create Domain Directory Structure

2. Configure Domain Settings

3. Add Domain Documents

4. Set Up Vector Database

5. Test Your Domain

6. Create a Domain README (Optional but Recommended)

How To Add a Domain with Cursor

Underlying Architecture

Data Flow

Vector Database Architecture

Code Structure

For Sales Engineers

Running Experiments

Quick Start

Via UI

Via CLI

Key Features

📖 Full Documentation

Galileo Protect Integration

How to Enable

Configuring Protect for Your Domain

What You Get

Learn More

Hallucination Demo

How It Works

Configuring Hallucinations for Your Domain

Chaos Engineering

Available Chaos Modes

How to Use

What Makes This Special

Learn More

What's Coming Next

Updates and Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages