This document explains how the AI-SAST feedback system works to continuously improve scan accuracy based on developer feedback.
The feedback loop allows developers to mark findings as true positives (✅) or false positives (❌) directly in GitHub PR comments. This feedback is then:
- Stored in a database (SQLite by default, or Databricks)
- Retrieved during future scans
- Included in the AI prompt context to improve accuracy over time
┌─────────────────┐
│ PR Scan │
│ (scan code) │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐
│ Post Comment │◄─────┤ Retrieve │
│ with Checkboxes│ │ Historical │
└────────┬────────┘ │ Feedback │
│ └──────────────────┘
│ ▲
▼ │
┌─────────────────┐ │
│ Developer │ │
│ Checks Boxes │ │
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ Collect │──────────────┘
│ Feedback │
│ (GitHub Action)│
└────────┬────────┘
│
▼
┌─────────────────┐
│ Store in │
│ Database │
└─────────────────┘
When a PR is created or updated:
- PR Scan Action runs (
pr_scan.py) - Scans only changed code in the PR
- Generates a markdown report with:
- Unique ID for each finding (8-char hash)
- Interactive checkboxes for feedback
- Complete vulnerability details
- Posts the report as a PR comment
Example Comment:
### 🤖 AI-SAST Security Scan
**2** potential issue(s) found.
> 💡 **Help us improve!** Use the checkboxes below to mark each finding...
---
<!-- vuln-id: abc12345 -->
- [ ] ✅ True Positive
- [ ] ❌ False Positive
**ID**: `abc12345`
**Severity**: High
**Issue**: SQL Injection
**Location**: [`user_query.py:42`](link-to-code)
**CVSS Vector**: `CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H`
<details><summary>📋 Click to see details</summary>
**Risk:**
User input is directly concatenated into SQL query...
**Remediation:**Use parameterized queries instead...
</details>
**💬 Optional Comment**: (Reply to this PR to explain your feedback)
Developers review the findings and:
- Check one box per finding (True Positive OR False Positive)
- Optionally add comments explaining their decision
- The comment edit triggers the feedback collection workflow
When a PR comment is edited:
- GitHub Actions detects the comment edit
- collect-feedback.yml workflow triggers
- collect_feedback.py script:
- Parses the edited comment
- Extracts checked boxes and vulnerability details
- Stores feedback in database with:
- Repository URL
- PR number
- File path
- Vulnerability ID
- Issue type
- Severity
- Status (confirmed_vulnerability or false_positive)
- CVSS vector
- Location
- Optional developer comments
Feedback is stored in one of two backends:
- Location:
~/.ai-sast/scans.db - Configuration: None required (automatic)
- Use Case: Local development, single developer
- Tables:
feedback: Developer feedback on findingsscan_results: Historical scan results
- Location: Databricks SQL warehouse
- Configuration: Set environment variables:
AI_SAST_DATABRICKS_HOST=... AI_SAST_DATABRICKS_HTTP_PATH=... AI_SAST_DATABRICKS_TOKEN=... AI_SAST_DATABRICKS_CATALOG=... AI_SAST_DATABRICKS_SCHEMA=... AI_SAST_DATABRICKS_TABLE=...
- Use Case: Enterprise, team-wide feedback sharing
During future scans, the system:
- Queries the database for historical feedback from the same repository
- Retrieves:
- False positives (last 90 days, max 100)
- Confirmed vulnerabilities (last 90 days, max 100)
- Formats the feedback into context text
- Includes in the AI prompt
Example Context Added to Prompt:
## Historical False Positives
Avoid reporting similar issues:
1. **Issue**: SQL Injection
- **File**: user_query.py
- **Severity**: HIGH
- **Reason**: Uses parameterized queries - safe pattern
2. **Issue**: Missing Authentication
- **File**: internal_api.py
- **Severity**: HIGH
- **Reason**: Internal API, behind VPN
## Previously Confirmed Vulnerabilities
Be vigilant about similar patterns:
1. **Issue**: Weak Password Hashing
- **File**: auth.py
- **Severity**: CRITICALThis context helps the AI:
- ✅ Avoid reporting similar false positives
- ✅ Be more vigilant about confirmed vulnerability patterns
- ✅ Learn project-specific security patterns
| Variable | Required | Default | Description |
|---|---|---|---|
AI_SAST_FEEDBACK_BACKEND |
No | auto |
Backend selection: sqlite, databricks, or auto |
AI_SAST_DB_PATH |
No | ~/.ai-sast/scans.db |
SQLite database path |
AI_SAST_STORE_FINDINGS |
No | false |
Store scan findings in database (set to true to enable) |
AI_SAST_DATABRICKS_HOST |
For Databricks | - | Databricks workspace hostname |
AI_SAST_DATABRICKS_HTTP_PATH |
For Databricks | - | SQL warehouse HTTP path |
AI_SAST_DATABRICKS_TOKEN |
For Databricks | - | Personal access token |
AI_SAST_DATABRICKS_CATALOG |
For Databricks | - | Unity Catalog name |
AI_SAST_DATABRICKS_SCHEMA |
For Databricks | - | Schema name |
AI_SAST_DATABRICKS_TABLE |
For Databricks | - | Table name |
By default, only feedback is stored in the database to keep it lightweight. If you want to also store the original scan findings (for analytics, tracking, etc.), set:
export AI_SAST_STORE_FINDINGS=trueWhen to enable:
- ✅ You want to track all vulnerabilities found over time
- ✅ You need analytics on vulnerability trends
- ✅ You want to calculate false positive rates
- ✅ You want historical records of all scans
When to keep disabled (default):
- ✅ You only care about feedback for improving accuracy
- ✅ You want to minimize database size
- ✅ Findings are already visible in PR comments/reports
- If
AI_SAST_FEEDBACK_BACKEND=databricks→ Use Databricks - Else if all Databricks variables are set → Use Databricks
- Otherwise → Use SQLite (default)
The database has two main tables:
Stores developer feedback on security findings. This is the core of the feedback loop.
CREATE TABLE feedback (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
repository TEXT NOT NULL,
pr_number INTEGER,
file_path TEXT NOT NULL,
vuln_id TEXT NOT NULL,
issue TEXT NOT NULL,
severity TEXT NOT NULL,
status TEXT NOT NULL, -- 'confirmed_vulnerability' or 'false_positive'
feedback_text TEXT, -- Optional developer comment
cvss_vector TEXT,
location TEXT,
user TEXT,
created_at TEXT NOT NULL,
UNIQUE(repository, vuln_id, status)
);Stores original scan findings for analytics and tracking.
CREATE TABLE scan_results (
id INTEGER PRIMARY KEY AUTOINCREMENT,
scan_id TEXT NOT NULL,
timestamp TEXT NOT NULL,
repository TEXT NOT NULL,
pr_number INTEGER,
file_path TEXT NOT NULL,
vuln_id TEXT NOT NULL,
issue TEXT NOT NULL,
severity TEXT NOT NULL,
cvss_vector TEXT,
location TEXT,
description TEXT,
risk TEXT,
fix TEXT,
scan_type TEXT, -- 'pr' or 'full'
created_at TEXT NOT NULL,
UNIQUE(repository, vuln_id, scan_id)
);Note: The scan_results table is only populated when AI_SAST_STORE_FINDINGS=true is set.
from src.integrations.feedback import get_feedback_client
# Get appropriate client (SQLite or Databricks)
client = get_feedback_client()
# Store single feedback
client.store_feedback(
repo_url="https://github.com/org/repo",
pr_number=123,
file_path="src/app.py",
vulnerability_id="abc12345",
issue="SQL Injection",
severity="HIGH",
status="false_positive",
feedback="Uses parameterized queries",
cvss_vector="CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H",
location="Line 42",
user="developer@example.com"
)
# Store batch feedback
client.store_batch_feedback(
repo_url="https://github.com/org/repo",
pr_number=123,
feedback_list=[
{
'vuln_id': 'abc12345',
'file_path': 'src/app.py',
'issue': 'SQL Injection',
'severity': 'HIGH',
'status': 'false_positive',
'feedback': 'Uses parameterized queries',
'location': 'Line 42'
}
]
)
# Get false positives
false_positives = client.get_false_positives_for_project(
repo_url="https://github.com/org/repo",
days_back=90,
limit=100
)
# Get confirmed vulnerabilities
confirmed = client.get_confirmed_vulnerabilities_for_project(
repo_url="https://github.com/org/repo",
days_back=90,
limit=100
)
# Format for AI context
context = client.format_feedback_for_context(
false_positives=false_positives,
confirmed_vulnerabilities=confirmed
)
client.close()# Run the test suite
python tests/test_sqlite_feedback.py
# Test with actual repository
python -m src.integrations.scan_database --stats --repo-url https://github.com/org/repo# Set up test environment
export GITHUB_EVENT_PATH=/path/to/test-event.json
export GITHUB_TOKEN=your-token
# Run feedback collector
python -m src.main.collect_feedback# Run a scan and check logs for feedback context
python -m src.main.pr_scan
# Look for output:
# "✅ Loaded 5 historical feedback records for context"# Overall statistics
python -m src.integrations.scan_database --stats
# Repository-specific statistics
python -m src.integrations.scan_database --stats --repo-url https://github.com/org/repoExample Output:
✅ Database: /Users/username/.ai-sast/scans.db
📊 Statistics:
Scan results: 156
Total feedback: 43
False positives: 28
Confirmed vulnerabilities: 15
- Check workflow is enabled:
.github/workflows/collect-feedback.yml - Verify permissions: Workflow needs
pull-requests: readandissues: read - Check comment format: Must contain
🤖 AI-SAST Security Scanmarker - Verify checkbox format: Must be
- [x]or- [ ](lowercase x)
- Check database location: Default is
~/.ai-sast/scans.db - Verify repository URL match: Must match exactly
- Check feedback age: Only last 90 days included by default
- Enable debug logging: Set
LOGLEVEL=DEBUG
# Check SQLite database
sqlite3 ~/.ai-sast/scans.db ".tables"
sqlite3 ~/.ai-sast/scans.db "SELECT COUNT(*) FROM feedback;"
# Reset database (caution: deletes all data)
rm ~/.ai-sast/scans.db- ✅ Review findings carefully before marking as false positive
- ✅ Add comments explaining your reasoning (helps future reviews)
- ✅ Check only one box per finding (true positive OR false positive)
- ✅ Update checkboxes as findings are fixed
- ✅ Monitor feedback statistics regularly
- ✅ Review false positives to identify AI tuning opportunities
- ✅ Share confirmed vulnerabilities across teams (use Databricks)
- ✅ Adjust severity filters based on false positive rates
- ✅ Use Databricks for centralized feedback across multiple repos
- ✅ Set up notifications for critical confirmed vulnerabilities
- ✅ Export feedback periodically for analysis
- ✅ Train developers on using the feedback system