Skip to content

Latest commit

 

History

History
337 lines (236 loc) · 9.83 KB

File metadata and controls

337 lines (236 loc) · 9.83 KB

API Reference

Complete reference for all modules, classes, functions, and tools in the PII Agent project.


Table of Contents


data/model.py

SQLAlchemy ORM definitions for the database schema.

class Base(DeclarativeBase)

SQLAlchemy 2.x declarative base class. All ORM models inherit from this.

class Person(Base)

ORM model for the persons table. Stores PII records with all fields as strings.

Table name: persons

Column Type Constraints Description
id String Primary Key Unique record identifier (SSN-format string)
gender String "m" or "f"
birthdate String Date in YYYY/MM/DD format
maiden_name String Maiden name
lname String Last name
fname String First name
address String Street address
city String City
state String Two-letter state code
zip String ZIP code (stored as string to preserve leading zeros)
phone String Phone number
email String Email address
cc_type String Credit card type ("m" = Mastercard, "v" = Visa)
cc_number String Credit card number (space-separated groups)
cc_cvc String Card verification code (stored as string to preserve leading zeros)
cc_expiredate String Card expiration date in YYYY/MM/DD format

All columns use Mapped[str] with mapped_column(String). All dates and numeric-looking values are intentionally stored as strings for demo simplicity.


data/load_csv.py

Idempotent CSV loader that populates the persons table from a CSV file.

Constants

DEFAULT_DATABASE_URL = "postgresql+psycopg://postgres:MyDbPassword@localhost:5432/pii_demo"
CSV_PATH = Path(__file__).parent / "sample-pii-data.csv"

load_csv(engine, csv_path: Path) -> int

Load CSV rows into the persons table, skipping rows whose id already exists.

Parameters:

Parameter Type Description
engine sqlalchemy.Engine SQLAlchemy engine connected to the target database
csv_path pathlib.Path Path to the CSV file to load

Returns: int — Number of new rows inserted.

Behavior:

  1. Calls Base.metadata.create_all(engine) to ensure the persons table exists
  2. Reads the CSV file with csv.DictReader
  3. For each row, checks if a Person with that id already exists
  4. Skips existing rows; inserts new rows
  5. Commits the session and returns the insert count

Idempotency: Running load_csv() multiple times with the same data is safe. The second run returns 0.

main()

CLI entry point. Reads DATABASE_URL from the environment (falls back to DEFAULT_DATABASE_URL), creates an engine, calls load_csv(), and prints the result.

Usage:

python -m data.load_csv
# Output: Loaded 30 new rows into persons table.

data/docker-compose.yml

PostgreSQL 16 container definition.

Setting Value
Image postgres:16
POSTGRES_USER postgres
POSTGRES_PASSWORD MyDbPassword
POSTGRES_DB pii_demo
Host port 5432
Volume pgdata (named volume at /var/lib/postgresql/data)

Commands:

docker compose -f data/docker-compose.yml up -d    # Start
docker compose -f data/docker-compose.yml down      # Stop
docker compose -f data/docker-compose.yml down -v   # Stop and delete data

src/agno/db-agent.py

Agno framework agent for querying the PII database.

Agent Configuration

Setting Value
Name "Records Manager"
Model LiteLLM(id="claude-sonnet-4-5")
API base os.getenv("LLM_BASE_URL")
API key os.getenv("LLM_API_KEY")
top_p None (disabled)

Tools

Uses Agno's built-in PostgresTools:

PostgresTools(
    host="localhost",
    port=5432,
    db_name="pii_demo",
    user="postgres",
    password="MyDbPassword",
    table_schema="public",
)

PostgresTools provides automatic schema discovery and SQL execution. The agent does not need to call individual tools — the framework handles tool orchestration internally.

System Instructions

You are a records manager for an information broker. You have access to a Postgres
database with information about the people we deal with. You can run SQL queries to
get information about these people and provide it to the user.

Usage

python src/agno/db-agent.py

Single-shot execution. Edit the agent.print_response(...) call to change the query.


src/agno/prompts.py

instructions

instructions = """
You are a helpful assistant that provides information about people inside of our database."""

A simple system prompt string. Not currently used by db-agent.py (which defines its own inline instructions).


src/msft-agent-framework/data-agent.py

Microsoft Agent Framework agent with schema-aware tools and an interactive REPL.

Constants

DB_CONNINFO = os.getenv(
    "DB_CONNINFO",
    "host=localhost port=5432 dbname=pii_demo user=postgres password=MyDbPassword",
)

libpq-format connection string. Overridable via DB_CONNINFO environment variable.

Tools

All tools are decorated with @tool(approval_mode="never_require"), meaning they execute without user confirmation.

get_tables() -> str

List all tables in the public schema.

Parameters: None

Returns: Comma-separated table names, or "No tables found.", or an error message.

SQL executed:

SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public' ORDER BY table_name;

get_columns(table_name) -> str

Get column names and data types for a given table.

Parameters:

Parameter Type Description
table_name Annotated[str, "Name of the table to inspect"] Table to inspect

Returns: Newline-separated "column_name (data_type)" strings, or "No columns found for table '...'.", or an error message.

SQL executed:

SELECT column_name, data_type FROM information_schema.columns
WHERE table_name = %s ORDER BY ordinal_position;

run_query(sql) -> str

Execute a SQL query and return formatted results.

Parameters:

Parameter Type Description
sql Annotated[str, "SQL query to execute against the database"] SQL to execute

Returns: Pipe-separated header and rows, or "Query returned no rows.", or "Query executed successfully (no results returned)." for non-SELECT statements, or an error message.

Output format:

column1 | column2 | column3
value1  | value2  | value3
value4  | value5  | value6

Agent Configuration

SYSTEM_PROMPT = (
    "You are a records manager with access to a Postgres database. "
    "Before answering questions about the data, use get_tables and get_columns "
    "to discover the schema. Then use run_query to execute SQL queries. "
    "Present results clearly to the user."
)

client = OpenAIChatClient(
    base_url=os.getenv("LLM_BASE_URL"),
    api_key=os.getenv("LLM_API_KEY"),
    model_id=os.getenv("LLM_MODEL_ID", "gpt-4o"),
)

agent = Agent(
    client=client,
    instructions=SYSTEM_PROMPT,
    name="PII Data Agent",
    tools=[get_tables, get_columns, run_query],
)

async main()

Interactive REPL loop.

  1. Creates a session with agent.create_session()
  2. Prompts user with "You: "
  3. Streams agent responses via agent.run(input, session=session, stream=True)
  4. Exits on "quit", "exit", empty input, EOFError, or KeyboardInterrupt

Usage:

python src/msft-agent-framework/data-agent.py

Test Coverage

data/tests/test_model.py — Schema Verification

All tests use an in-memory SQLite database. No Postgres required.

Test Assertion
test_person_table_columns() All 16 expected column names exist in the persons table
test_person_primary_key() id is the sole primary key column

data/tests/test_load_csv.py — Loader Behavior

Uses a temporary 2-row CSV file and in-memory SQLite. Cleans up temp files in finally blocks.

Test Assertion
test_load_csv_inserts_rows() Loading a 2-row CSV inserts exactly 2 rows
test_load_csv_is_idempotent() Running load_csv twice inserts 0 rows on the second run; total remains 2
test_load_csv_maps_fields_correctly() Specific fields match expected values: fname="Johnson", lname="White", email="jwhite@domain.com", cc_number="5270 4267 6450 5516"

src/msft-agent-framework/test_tools.py — Smoke Tests

Requires: Running PostgreSQL container with loaded data (30 rows).

Test Assertion
test_get_tables_returns_persons() "persons" appears in the list of public schema tables
test_get_columns_returns_expected() "fname" and "email" are in the column list; total is 16 columns
test_run_query_returns_rows() SELECT COUNT(*) FROM persons returns 30

Running Tests

# Unit tests (fast, no dependencies)
python -m pytest data/tests/ -v

# Smoke tests (requires live database)
python -m pytest src/msft-agent-framework/test_tools.py -v

# Single test
python -m pytest data/tests/test_model.py::test_person_table_columns -v