An AI-powered Copilot chatbot that answers research questions about Postsecondary Data Partnership (PDP) data from the National Student Clearinghouse.
The Postsecondary Data Partnership (PDP) is a nationwide effort to help colleges and universities gain a fuller picture of student progress and outcomes, meet various reporting requirements, and identify where to focus their resources. Both institutions and educational organizations can participate in the PDP.
This chatbot will help employees learn how to leverage PDP AR course, cohort and financial aid data files in their work with students.
- Quick Start
- Architecture
- Copilot Agent Setup
- SQL Server Integration
- Azure Function Query Proxy
- Data Dictionary Generation
- SQL Cookbooks
Get started with the AR data analysis agent by choosing one of the following options:
This creates an AI-powered chatbot that can query the AR database through the Azure Function API.
The data dictionaries provide context about the AR data structure. The metadata/ directory already contains generated data dictionaries, but you can regenerate them if needed:
cd data_fetching
make build && make dictionariesThis generates:
metadata/ar_data_dictionary.csv- Complete data dictionary for all AR fieldsmetadata/institutional_knowledge.csv- Context on data strengths, limitations, and common joins
These files will be used by the Copilot Agent to understand the data structure.
- Navigate to M365 Copilot Studio
- Click "Create Agent" button in the left sidebar
- Choose a name for your agent (e.g., "AR Data Analyst")
- In the chatbox at the bottom left, paste one of the prompts from
CoPilot Agent Prompts/:- PDP Question Seer.txt - For question understanding and discoverability
- PDP Data Alchemist.txt - For detailed data analysis (takes longer to process)
- Prompt Minion.txt - For optimizing and creating prompts
- Press Enter - Copilot will create the agent based on the prompt
- Test the agent in the preview panel on the right side
- Click "Create" button in the top-right to finalize
Available Prompts (in CoPilot Agent Prompts/):
- PDP Question Seer.txt - Analyzes if questions can be answered with the dataset
- PDP Data Alchemist.txt - Provides detailed data analysis responses
- Prompt Minion.txt - Helps optimize agent prompts
- Test Questions.txt - Sample questions for testing
Note: See CoPilot Agent Prompts/prompts_readme.md for more details on each prompt.
The Azure Function provides the API endpoint that the Copilot Agent will call to execute SQL queries.
# Navigate to the Azure Function directory
cd agent-sql-connector
# Create and activate virtual environment (if not already done)
python3.10 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Deploy the Azure Function to Azure
func azure functionapp publish datathon-query-2025-aa --build remote
# Note the endpoint URL: https://datathon-query-2025-aa.azurewebsites.net/api/queryNote: Ensure you have Azure Functions Core Tools v4 installed and are authenticated with Azure CLI.
Now that both the agent and function are created, connect them:
- In your Copilot agent, go to Actions or Connections settings
- Add an HTTP action/connection to call the Azure Function endpoint
- Configure the connection:
- URL:
https://datathon-query-2025-aa.azurewebsites.net/api/query - Method:
POST - Headers:
Content-Type: application/json - Body: JSON with SQL query:
{"sql": "{{generated_sql_query}}"}
- URL:
The agent should:
- Generate SQL queries based on user questions
- Call the Azure Function with the generated SQL
- Format and return the results to the user
Try asking questions like:
- "What is the GPA of students who take Psychology in their 2nd year?"
- "How many students are retained by cohort?"
- "What is the retention rate for first-generation students?"
The agent will generate SQL queries, call the Azure Function, and return formatted results.
# 1. Start the database (automatic bootstrap)
cd agent-sql-connector
pip install -r requirements.txt
python db_query.pyThis will automatically:
- Detect if database is running
- Start Docker containers if needed
- Wait for initialization
- Connect and run example queries
# 1. Start database (from sql_integration)
cd sql_integration
docker compose up -d
# 2. Start Azure Function locally
cd ../agent-sql-connector
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
func start
# 3. Test the endpoint
curl -X POST http://localhost:7071/api/query \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT TOP 10 * FROM ar_courses"}'This project has four main components:
- Location:
CoPilot Agent Prompts/ - Purpose: LLM prompts and agent configurations for querying AR data
- Prompts:
-
PDP Data Alchemist.txt - Prompt for detailed data analysis queries
-
PDP Question Seer.txt - Prompt for question understanding and discoverability
-
Test Questions.txt - Sample questions for testing
-
Note: These prompts are designed for use with M365 Copilot Studio. See CoPilot Agent Prompts/prompts_readme.md for setup instructions.
- Location:
sql_integration/ - Purpose: Database backend with AR course, cohort, and financial aid data
- Bootstrap Script:
agent-sql-connector/db_query.py- Automatically starts database and connects - Features:
- Auto-detects if database is running
- Starts Docker containers if needed
- Query guardrails (blocks destructive operations)
- Returns results as pandas DataFrames
- Used by Azure Function handler
- Location:
agent-sql-connector/ - Purpose: Processes database queries triggered by the Copilot Agent, handles HTTP POST logic, and executes safe SQL queries
- Documentation: agent-sql-connector/README.md
- Endpoint:
POST /api/query(unauthenticated MVP) - Deployment: Azure Function App on Consumption Plan
- Features:
- Receives triggers from Copilot Agent and processes HTTP POST requests
- Executes safe SQL SELECT queries via
db_query.query()function - Environment variable configuration for database connections
- Supports ngrok tunneling for local database testing
- Returns JSON responses with query results
- Location:
metadata/ - Pull & Transformed by tooling under
data_fetching/ - Purpose: AR data dictionary and institutional context
- Files:
ar_data_dictionary.csv- Complete data dictionary for all AR fieldsinstitutional_knowledge.csv- Context on data strengths, limitations, and common joins
- See Quick Start
The Copilot Agent uses LLM prompts located in CoPilot Agent Prompts/ to interact with the AR data through the Azure Function Query Proxy.
-
PDP Data Alchemist: Prompt for detailed data analysis responses (takes longer to process)
-
PDP Question Seer: Prompt for question understanding and discoverability
-
Test Questions: Sample questions for testing agent responses
Additionally:
- Prompt Minion: Prompt for optimizing and creating prompts
See
CoPilot Agent Prompts/prompts_readme.mdfor detailed usage instructions.
The Copilot Agent triggers the Azure Function, which then handles the database query logic:
- Copilot Agent → Invokes/triggers the Azure Function (via function invocation, not direct HTTP)
- Azure Function → Receives the trigger and handles the HTTP POST request processing
- Azure Function → Executes the SQL query via
db_query.query()function - Azure Function → Returns JSON results back to the Copilot Agent
The Azure Function exposes the endpoint at:
POST https://datathon-query-2025-aa.azurewebsites.net/api/query
Content-Type: application/json
{
"sql": "SELECT TOP 10 * FROM ar_courses"
}
- Configure the Copilot Agent to trigger the Azure Function endpoint
- Use the prompts in
CoPilot Agent Prompts/as templates for your LLM agent - The Azure Function enforces query safety (SELECT-only queries) and returns results as JSON
See agent-sql-connector/README.md for Azure Function API documentation.
# This one script does everything:
cd agent-sql-connector
python db_query.pyThe script will:
- Detect if database is running
- Start Docker containers if needed
- Wait for initialization (60-90 seconds)
- Connect and run example queries
We ran out of time to include a cleanup step, but you can do the following to clean up the environment between runs:
cd sql_integrationmake clean-data
cd sql_integration
./bootstrap.sh # Install dependencies
docker compose up --build # Start database
python test_query.py # Test connectionar_courses- Course-level enrollment dataar_cohorts- Student cohort tracking and outcomesar_financial_aid- Financial aid informationllm_debug_logs- Chatbot interaction logsweekly_trends- Aggregated question analytics
See sql_integration/ARCHITECTURE.md for detailed architecture documentation.
The Azure Function provides a REST API endpoint that allows the Copilot Agent to execute database queries remotely.
cd agent-sql-connector
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
func start # Local development# Deploy to Azure
func azure functionapp publish datathon-query-2025-aa --build remote- HTTP Endpoint:
POST /api/queryaccepts SQL queries and returns JSON results - Query Safety: Enforces SELECT-only queries (no destructive operations)
- Remote Access: Allows Copilot Agent to query database from anywhere
- Environment Configuration: Database connection via Azure App Settings
- Local Testing: Supports ngrok tunnel for testing against local database
For development, you can use ngrok to expose your local SQL Server:
# 1. Start ngrok tunnel
ngrok tcp 1433
# 2. Configure Azure Function App settings
az functionapp config appsettings set \
--name datathon-query-2025-aa \
--resource-group datathon \
--settings "DB_SERVER=<ngrok_hostname>,<port>" \
"DB_DATABASE=InsightAlchemists" \
"DB_USERNAME=SA" \
"DB_PASSWORD=<password>"See agent-sql-connector/README.md for:
- Complete setup instructions
- System architecture diagram
- Environment variable configuration
- Testing and deployment guides
- Security considerations
The metadata/ directory already contains generated data dictionaries. To regenerate:
cd data_fetching
make build
make dictionariesGenerated files are saved to metadata/:
ar_data_dictionary.csv- Auto-generated data dictionary from source filesinstitutional_knowledge.csv- Manually maintained institutional context
See data_fetching/README.md for:
- Prerequisites and dependencies
- Available make commands
- Data sources and format
- Troubleshooting tips
We saved some quick SQL cookbook patterns to sql_cookbooks/README.md. These queries are a beginning start at analyzing:
- Retention metrics
- Gateway completion impact on retention
- Credit Momentum Analysis
- Equity Gap Analysis across student populations
- Course-Level Success (using course data); delivery method impact
- Program of Study Success Rates
- Toxic vs Synergetic course loads
insight-alchemists-datathon/
├── CoPilot Agent Prompts/ # LLM prompts for Copilot
│ ├── PDP Data Alchemist.txt
│ ├── PDP Question Seer.txt
│ ├── Prompt Minion.txt
│ ├── Test Questions.txt
│ └── prompts_readme.md
├── sql_integration/ # SQL Server Docker setup
│ ├── docker-compose.yml # Database container config
│ ├── Dockerfile
│ ├── setup.sql # Database schema and data import
│ ├── bootstrap.sh # Dependency installation
│ ├── entrypoint.sh # Container startup script
│ ├── test_query.py # Connection test script
│ ├── Makefile # Convenience commands
│ ├── ARCHITECTURE.md # Architecture documentation
│ └── README.md
├── agent-sql-connector/ # Azure Function Query Proxy
│ ├── fetch_proxy/ # Function handler code
│ │ ├── __init__.py
│ │ ├── function_app.py
│ │ └── handler.py
│ ├── query/ # Function configuration
│ │ └── function.json
│ ├── tests/ # Unit tests
│ │ ├── __init__.py
│ │ └── test_handler.py
│ ├── synthetic_data/ # Local test data
│ ├── db_query.py # Shared query function (canonical source)
│ ├── requirements.txt # Function dependencies
│ ├── host.json # Azure Functions config
│ ├── local.settings.json # Local settings (gitignored)
│ ├── meta.yaml # Function App metadata
│ ├── MVP_SECURITY_NOTE.md # Security documentation
│ ├── project_plan.md # Development plan
│ └── README.md # Complete Azure Function documentation
├── metadata/ # Data dictionaries
│ ├── ar_data_dictionary.csv
│ └── institutional_knowledge.csv
├── data_fetching/ # Dictionary generation tools
│ ├── docker-compose.yml
│ ├── Dockerfile
│ ├── prep_dictionaries.py
│ ├── Makefile
│ └── README.md
├── synthetic_data/ # Sample/test data (root level)
│ ├── ar_cohorts.csv
│ ├── ar_courses.csv
│ ├── ar_financial_aid.csv
│ └── README.md
└── sql_cookbooks/ # Example SQL query patterns
└── README.md
# Query database directly (using agent-sql-connector)
cd agent-sql-connector && python db_query.py
# Start database
cd sql_integration && docker compose up -d
# View database logs
cd sql_integration && docker compose logs
# Stop database
cd sql_integration && docker compose down
# Test database connection
cd sql_integration && python test_query.py
# Test Azure Function locally
cd agent-sql-connector && func start
# Deploy Azure Function
cd agent-sql-connector && func azure functionapp publish datathon-query-2025-aa --build remote
# Regenerate data dictionaries
cd data_fetching && make dictionaries- Python 3.10+ (Azure Function requires 3.10; local development works with 3.9+)
- Docker Desktop
- ODBC Driver 18 for SQL Server (auto-installed by bootstrap.sh)
- Azure Functions Core Tools v4 (for Azure Function development)
For issues or questions:
- Check component-specific READMEs in subdirectories
- Review Docker logs:
docker compose logs - Verify Docker is running:
docker ps



