2025 Gates Foundation Datathon

Team: Insight Alchemists

An AI-powered Copilot chatbot that answers research questions about Postsecondary Data Partnership (PDP) data from the National Student Clearinghouse.

The Postsecondary Data Partnership (PDP) is a nationwide effort to help colleges and universities gain a fuller picture of student progress and outcomes, meet various reporting requirements, and identify where to focus their resources. Both institutions and educational organizations can participate in the PDP.

— National Student Clearinghouse

This chatbot will help employees learn how to leverage PDP AR course, cohort and financial aid data files in their work with students.

Quick Start

Get started with the AR data analysis agent by choosing one of the following options:

Option 1: Create M365 Copilot Agent (Recommended for Production)

This creates an AI-powered chatbot that can query the AR database through the Azure Function API.

Step 1: Generate Data Dictionaries (optional - metadata already included)

The data dictionaries provide context about the AR data structure. The metadata/ directory already contains generated data dictionaries, but you can regenerate them if needed:

cd data_fetching
make build && make dictionaries

This generates:

metadata/ar_data_dictionary.csv - Complete data dictionary for all AR fields
metadata/institutional_knowledge.csv - Context on data strengths, limitations, and common joins

These files will be used by the Copilot Agent to understand the data structure.

Step 2: Create Copilot Agent

Navigate to M365 Copilot Studio
Click "Create Agent" button in the left sidebar
Choose a name for your agent (e.g., "AR Data Analyst")
In the chatbox at the bottom left, paste one of the prompts from CoPilot Agent Prompts/:
- PDP Question Seer.txt - For question understanding and discoverability
- PDP Data Alchemist.txt - For detailed data analysis (takes longer to process)
- Prompt Minion.txt - For optimizing and creating prompts
Press Enter - Copilot will create the agent based on the prompt
Test the agent in the preview panel on the right side
Click "Create" button in the top-right to finalize

Available Prompts (in CoPilot Agent Prompts/):

PDP Question Seer.txt - Analyzes if questions can be answered with the dataset
PDP Data Alchemist.txt - Provides detailed data analysis responses
Prompt Minion.txt - Helps optimize agent prompts
Test Questions.txt - Sample questions for testing

Note: See CoPilot Agent Prompts/prompts_readme.md for more details on each prompt.

Step 3: Create and Deploy Azure Function

The Azure Function provides the API endpoint that the Copilot Agent will call to execute SQL queries.

# Navigate to the Azure Function directory
cd agent-sql-connector

# Create and activate virtual environment (if not already done)
python3.10 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Deploy the Azure Function to Azure
func azure functionapp publish datathon-query-2025-aa --build remote

# Note the endpoint URL: https://datathon-query-2025-aa.azurewebsites.net/api/query

Note: Ensure you have Azure Functions Core Tools v4 installed and are authenticated with Azure CLI.

Step 4: Connect Agent to Azure Function

Now that both the agent and function are created, connect them:

In your Copilot agent, go to Actions or Connections settings
Add an HTTP action/connection to call the Azure Function endpoint
Configure the connection:
- URL: https://datathon-query-2025-aa.azurewebsites.net/api/query
- Method: POST
- Headers: Content-Type: application/json
- Body: JSON with SQL query: {"sql": "{{generated_sql_query}}"}

The agent should:

Generate SQL queries based on user questions
Call the Azure Function with the generated SQL
Format and return the results to the user

Step 5: Test Your Agent

Try asking questions like:

"What is the GPA of students who take Psychology in their 2nd year?"
"How many students are retained by cohort?"
"What is the retention rate for first-generation students?"

The agent will generate SQL queries, call the Azure Function, and return formatted results.

Option 2: Query Database Directly (Recommended for Testing)

# 1. Start the database (automatic bootstrap)
cd agent-sql-connector
pip install -r requirements.txt
python db_query.py

This will automatically:

Detect if database is running
Start Docker containers if needed
Wait for initialization
Connect and run example queries

Option 3: Use Azure Function API Directly

# 1. Start database (from sql_integration)
cd sql_integration
docker compose up -d

# 2. Start Azure Function locally
cd ../agent-sql-connector
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
func start

# 3. Test the endpoint
curl -X POST http://localhost:7071/api/query \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT TOP 10 * FROM ar_courses"}'

Architecture

This project has four main components:

1. Copilot Agent

Location: CoPilot Agent Prompts/
Purpose: LLM prompts and agent configurations for querying AR data
Prompts:
- PDP Data Alchemist.txt - Prompt for detailed data analysis queries
- PDP Question Seer.txt - Prompt for question understanding and discoverability
- Test Questions.txt - Sample questions for testing

Note: These prompts are designed for use with M365 Copilot Studio. See CoPilot Agent Prompts/prompts_readme.md for setup instructions.

2. SQL Server Integration

Location: sql_integration/
Purpose: Database backend with AR course, cohort, and financial aid data
Bootstrap Script: agent-sql-connector/db_query.py - Automatically starts database and connects
Features:
- Auto-detects if database is running
- Starts Docker containers if needed
- Query guardrails (blocks destructive operations)
- Returns results as pandas DataFrames
- Used by Azure Function handler

3. Azure Function Query Proxy

Location: agent-sql-connector/
Purpose: Processes database queries triggered by the Copilot Agent, handles HTTP POST logic, and executes safe SQL queries
Documentation: agent-sql-connector/README.md
Endpoint: POST /api/query (unauthenticated MVP)
Deployment: Azure Function App on Consumption Plan
Features:
- Receives triggers from Copilot Agent and processes HTTP POST requests
- Executes safe SQL SELECT queries via db_query.query() function
- Environment variable configuration for database connections
- Supports ngrok tunneling for local database testing
- Returns JSON responses with query results

4. Data Foundation

Location: metadata/
Pull & Transformed by tooling under data_fetching/
Purpose: AR data dictionary and institutional context
Files:
- ar_data_dictionary.csv - Complete data dictionary for all AR fields
- institutional_knowledge.csv - Context on data strengths, limitations, and common joins

Copilot Agent Setup

Simple Usage

See Quick Start

The Copilot Agent uses LLM prompts located in CoPilot Agent Prompts/ to interact with the AR data through the Azure Function Query Proxy.

Available Prompts

PDP Data Alchemist: Prompt for detailed data analysis responses (takes longer to process)
PDP Question Seer: Prompt for question understanding and discoverability
Test Questions: Sample questions for testing agent responses

Additionally:

Prompt Minion: Prompt for optimizing and creating prompts See CoPilot Agent Prompts/prompts_readme.md for detailed usage instructions.

Integration with Azure Function

The Copilot Agent triggers the Azure Function, which then handles the database query logic:

Copilot Agent → Invokes/triggers the Azure Function (via function invocation, not direct HTTP)
Azure Function → Receives the trigger and handles the HTTP POST request processing
Azure Function → Executes the SQL query via db_query.query() function
Azure Function → Returns JSON results back to the Copilot Agent

The Azure Function exposes the endpoint at:

POST https://datathon-query-2025-aa.azurewebsites.net/api/query
Content-Type: application/json

{
  "sql": "SELECT TOP 10 * FROM ar_courses"
}

Configuration

Configure the Copilot Agent to trigger the Azure Function endpoint
Use the prompts in CoPilot Agent Prompts/ as templates for your LLM agent
The Azure Function enforces query safety (SELECT-only queries) and returns results as JSON

See agent-sql-connector/README.md for Azure Function API documentation.

SQL Server Integration

Automatic Setup (Recommended)

# This one script does everything:
cd agent-sql-connector
python db_query.py

The script will:

Detect if database is running
Start Docker containers if needed
Wait for initialization (60-90 seconds)
Connect and run example queries

We ran out of time to include a cleanup step, but you can do the following to clean up the environment between runs:

cd sql_integration
make clean-data

Manual Setup

cd sql_integration
./bootstrap.sh           # Install dependencies
docker compose up --build # Start database
python test_query.py     # Test connection

Database Tables

ar_courses - Course-level enrollment data
ar_cohorts - Student cohort tracking and outcomes
ar_financial_aid - Financial aid information
llm_debug_logs - Chatbot interaction logs
weekly_trends - Aggregated question analytics

Architecture Documentation

See sql_integration/ARCHITECTURE.md for detailed architecture documentation.

Azure Function Query Proxy

The Azure Function provides a REST API endpoint that allows the Copilot Agent to execute database queries remotely.

Quick Start

cd agent-sql-connector
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
func start  # Local development

Deployment

# Deploy to Azure
func azure functionapp publish datathon-query-2025-aa --build remote

Features

HTTP Endpoint: POST /api/query accepts SQL queries and returns JSON results
Query Safety: Enforces SELECT-only queries (no destructive operations)
Remote Access: Allows Copilot Agent to query database from anywhere
Environment Configuration: Database connection via Azure App Settings
Local Testing: Supports ngrok tunnel for testing against local database

Testing with Local Database

For development, you can use ngrok to expose your local SQL Server:

# 1. Start ngrok tunnel
ngrok tcp 1433

# 2. Configure Azure Function App settings
az functionapp config appsettings set \
  --name datathon-query-2025-aa \
  --resource-group datathon \
  --settings "DB_SERVER=<ngrok_hostname>,<port>" \
             "DB_DATABASE=InsightAlchemists" \
             "DB_USERNAME=SA" \
             "DB_PASSWORD=<password>"

Documentation

See agent-sql-connector/README.md for:

Complete setup instructions
System architecture diagram
Environment variable configuration
Testing and deployment guides
Security considerations

Data Dictionary Generation

The metadata/ directory already contains generated data dictionaries. To regenerate:

Quick Start

cd data_fetching
make build
make dictionaries

Output Files

Generated files are saved to metadata/:

ar_data_dictionary.csv - Auto-generated data dictionary from source files
institutional_knowledge.csv - Manually maintained institutional context

Detailed Documentation

See data_fetching/README.md for:

Prerequisites and dependencies
Available make commands
Data sources and format
Troubleshooting tips

SQL Cookbooks

We saved some quick SQL cookbook patterns to sql_cookbooks/README.md. These queries are a beginning start at analyzing:

Retention metrics
Gateway completion impact on retention
Credit Momentum Analysis
Equity Gap Analysis across student populations
Course-Level Success (using course data); delivery method impact
Program of Study Success Rates
Toxic vs Synergetic course loads

Project Structure

insight-alchemists-datathon/
├── CoPilot Agent Prompts/    # LLM prompts for Copilot
│   ├── PDP Data Alchemist.txt
│   ├── PDP Question Seer.txt
│   ├── Prompt Minion.txt
│   ├── Test Questions.txt
│   └── prompts_readme.md
├── sql_integration/          # SQL Server Docker setup
│   ├── docker-compose.yml   # Database container config
│   ├── Dockerfile
│   ├── setup.sql            # Database schema and data import
│   ├── bootstrap.sh         # Dependency installation
│   ├── entrypoint.sh        # Container startup script
│   ├── test_query.py        # Connection test script
│   ├── Makefile            # Convenience commands
│   ├── ARCHITECTURE.md      # Architecture documentation
│   └── README.md
├── agent-sql-connector/      # Azure Function Query Proxy
│   ├── fetch_proxy/          # Function handler code
│   │   ├── __init__.py
│   │   ├── function_app.py
│   │   └── handler.py
│   ├── query/                # Function configuration
│   │   └── function.json
│   ├── tests/               # Unit tests
│   │   ├── __init__.py
│   │   └── test_handler.py
│   ├── synthetic_data/      # Local test data
│   ├── db_query.py          # Shared query function (canonical source)
│   ├── requirements.txt     # Function dependencies
│   ├── host.json            # Azure Functions config
│   ├── local.settings.json  # Local settings (gitignored)
│   ├── meta.yaml            # Function App metadata
│   ├── MVP_SECURITY_NOTE.md # Security documentation
│   ├── project_plan.md      # Development plan
│   └── README.md            # Complete Azure Function documentation
├── metadata/                 # Data dictionaries
│   ├── ar_data_dictionary.csv
│   └── institutional_knowledge.csv
├── data_fetching/           # Dictionary generation tools
│   ├── docker-compose.yml
│   ├── Dockerfile
│   ├── prep_dictionaries.py
│   ├── Makefile
│   └── README.md
├── synthetic_data/          # Sample/test data (root level)
│   ├── ar_cohorts.csv
│   ├── ar_courses.csv
│   ├── ar_financial_aid.csv
│   └── README.md
└── sql_cookbooks/           # Example SQL query patterns
    └── README.md

Quick Reference

Common Commands

# Query database directly (using agent-sql-connector)
cd agent-sql-connector && python db_query.py

# Start database
cd sql_integration && docker compose up -d

# View database logs
cd sql_integration && docker compose logs

# Stop database
cd sql_integration && docker compose down

# Test database connection
cd sql_integration && python test_query.py

# Test Azure Function locally
cd agent-sql-connector && func start

# Deploy Azure Function
cd agent-sql-connector && func azure functionapp publish datathon-query-2025-aa --build remote

# Regenerate data dictionaries
cd data_fetching && make dictionaries

Requirements

Python 3.10+ (Azure Function requires 3.10; local development works with 3.9+)
Docker Desktop
ODBC Driver 18 for SQL Server (auto-installed by bootstrap.sh)
Azure Functions Core Tools v4 (for Azure Function development)

Support

For issues or questions:

Check component-specific READMEs in subdirectories
Review Docker logs: docker compose logs
Verify Docker is running: docker ps

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
CoPilot Agent Prompts		CoPilot Agent Prompts
agent-sql-connector		agent-sql-connector
assets		assets
data_fetching		data_fetching
metadata		metadata
sql_cookbooks		sql_cookbooks
sql_integration		sql_integration
src		src
synthetic_data		synthetic_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

devcolor/insight-alchemists-datathon

Folders and files

Latest commit

History

Repository files navigation

2025 Gates Foundation Datathon

Team: Insight Alchemists

Table of Contents

Quick Start

Option 1: Create M365 Copilot Agent (Recommended for Production)

Step 1: Generate Data Dictionaries (optional - metadata already included)

Step 2: Create Copilot Agent

Step 3: Create and Deploy Azure Function

Step 4: Connect Agent to Azure Function

Step 5: Test Your Agent

Option 2: Query Database Directly (Recommended for Testing)

Option 3: Use Azure Function API Directly

Architecture

1. Copilot Agent

2. SQL Server Integration

3. Azure Function Query Proxy

4. Data Foundation

Copilot Agent Setup

Simple Usage

Available Prompts

Integration with Azure Function

Configuration

SQL Server Integration

Automatic Setup (Recommended)

Manual Setup

Database Tables

Architecture Documentation

Azure Function Query Proxy

Quick Start

Deployment

Features

Testing with Local Database

Documentation

Data Dictionary Generation

Quick Start

Output Files

Detailed Documentation

SQL Cookbooks

Project Structure

Quick Reference

Common Commands

Requirements

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages