system-prompts-forensics/methodology.md at main · rmax-ai/system-prompts-forensics

Methodology — System Prompt Forensics

Overview

This document codifies the reproducible workflow we use to: (1) capture raw client invocations, (2) extract the agent invocation payload as structured JSON, (3) run automated normalization and structural analysis with OpenAI (gpt-5.2) using a canonical normalization prompt and schema, and (4) synthesize comparative and final research reports.

Scope

Clients: IDE assistants (e.g. VS Code Copilot), CLIs (Codex/Copilot/OpenCode), and similar agent-enabled tooling.
Artifacts: raw HTTP captures, redacted captures, invocation payload JSON, normalized YAML analyses, automated comparison artifacts, and final reports.

High level workflow (one-line steps)

Capture raw HTTP requests (data/raw/).
Redact sensitive headers and fields (produce redacted artifact alongside raw).
Extract invocation payload into structured JSON (data/payload/).
Validate payload JSON.
Run automated normalization using the canonical normalization prompt + schema with gpt-5.2 (temperature=0).
Validate and add provenance to normalized YAML (data/analysis/).
Human review & QA.
Produce technical comparison report and final research report.

Detailed Procedure

Capture

Methods: proxy interception (mitmproxy), CLI debug output, startup/init logs, and public docs.
Store raw captures under data/raw/ and name with a clear convention: ---.request
Record minimal metadata in the capture (tool, version, mode, capture_method, timestamp).

Redaction (mandatory)

Always redact sensitive headers and tokens before any public storage or analysis. Use the repo script:

data/scripts/redact-headers.sh raw/.request > raw/.redacted.request
Keep raw files offline where necessary and mark redactions_applied = true in provenance.
Use data/scripts/parse-headers.sh to inspect header names where needed.

Payload extraction

Convert redacted captures into structured invocation JSON. The repo provides a Makefile target to produce payload JSONs:

make -C data all
The invocation payload should include: messages (system/user/assistant), tool/function declarations, session metadata, client/mode tags, and any relevant headers needed for context.

Validate payload JSON

Quick check: jq . data/payload/.json
Ensure required fields exist (messages array, explicit system/developer messages if present, declared tools/functions). If a payload is malformed, document and fix the parser or note the capture as unusable.

Automated normalization & analysis (MODEL RUN)

Use the canonical normalization prompt and schema to generate a structured analysis. Command example:

python data/scripts/system-prompt-analysis.py
--prompt data/prompts/normalize-system-prompt.md
--schema data/schema/system-prompt.v0.yaml
--invocation data/payload/.json
--model gpt-5.2 > data/analysis/.analysis.yaml
Determinism: always run with deterministic settings (temperature=0). Log model metadata (model name, response id, prompt hash/timestamp, any usage information) for reproducibility.
The normalization prompt enforces strict output rules (single fenced YAML block only). If the model output violates the format, mark the run as failed and route to manual review.

Post-processing, provenance & validation

Validate YAML parseability: yq eval . data/analysis/.analysis.yaml
Add/verify provenance fields in the analysis (source_references pointing at redacted raw & payload, redactions_applied boolean, artifact_hash (sha256 of payload), model, and ISO8601 timestamp). Example provenance keys we require:

provenance: source_references: ["data/raw/.redacted.request","data/payload/.json"] redactions_applied: true artifact_hash: model: gpt-5.2 timestamp: 2025-12-31T23:59:59Z
If a model-generated analysis omits required fields, add them programmatically (yq) or manually and document the change in a review note.

Human review & QA

Human reviewer must:
- Confirm no secrets or proprietary prompt text were leaked in the analysis file.
- Check analysis.implicit_assumptions entries for plausibility and add rationale where inferred.
- Note conspicuous absences under analysis.notable_absences.
- Sign off on the analysis (a short reviewer note is sufficient).

Comparative analysis

Use the normalized analyses in data/analysis/ to compute pairwise diffs, presence/absence matrices, and summary statistics across architectural dimensions (identity, authority, scope, tools exposure, termination, correction mechanisms).
Recommended outputs:
- A CSV or matrix of feature presence across artifacts.
- A short technical comparison report that highlights clusters, outliers, and trade-offs.
Scripts to automate this are encouraged (examples: scripts/generate-comparison.py) and should be versioned in the repo.

Final research report

Produce a narrative final report that includes:
- Executive summary and key findings
- Dataset and methods (capturing, redaction, normalization details)
- Taxonomy of prompt architectures and primitives
- Representative case studies and risks
- Policy/engineering recommendations and a canonical agent constitution prompt
- Reproducibility appendix (how to re-run the analysis)

Data management, storage & provenance

Store redacted raw captures in data/raw/ (redacted files only for public commits).
Store invocation JSON in data/payload/ and analyses in data/analysis/.
Do NOT commit raw captures with secrets. If raw must be stored for internal use, mark as private and do not push to public repos.
Each analysis must include provenance (see above).

Safety, ethics & legal considerations

DO NOT reproduce proprietary prompts verbatim in public outputs.
DO NOT assist in bypassing safeguards, creating jailbreaks, or enabling misuse.
When in doubt about IP or privacy exposure, consult a project maintainer before publishing an artifact.

Reproducibility & determinism

Use deterministic model settings (temperature=0) and log the exact model name and metadata for each run.
Archive the normalization prompt and schema version used for each analysis.
Consider re-running analyses (n=3) to detect nondeterministic output; if variance is observed, escalate to manual review and note instability in provenance.

Tools & scripts referenced

data/scripts/redact-headers.sh — header redaction helper
data/scripts/parse-headers.sh — helper to inspect headers
data/scripts/system-prompt-analysis.py — normalization runner (model invocation)
Makefile (data) — converts raw captures into payload JSON
jq, yq — JSON/YAML validation and edits
shasum / openssl sha256 — compute artifact_hash

Deliverables & acceptance criteria

Normalized YAML for each payload in data/analysis/ with valid YAML, required provenance fields, and human reviewer sign-off.
Technical comparison report (metrics, matrix, clusters, case studies).
Final research report (executive summary, methods, taxonomy, canonical prompt, replication instructions).

Checklist (per artifact)

Raw capture exists and is recorded with metadata
Redacted capture present and redactions_applied = true
Payload JSON parseable (jq .)
Normalization run completed with gpt-5.2 (temperature=0)
Analysis YAML parseable and contains provenance
Human reviewer sign-off recorded

Open improvements (TODOs)

Add a YAML schema validator pass for analysis files (automated CI gate).
Add an --output and --strict flag to system-prompt-analysis.py so runs can write files and fail fast when output formatting is incorrect.
Create comparison automation scripts (pairwise diff, clustering, visualization) and CI checks for them.

Change control

Update this methodology.md whenever a significant process, script, or tooling change is introduced. Cite the commit and rationale for the change.

Contact & escalation

For questions about redaction policy, consult AGENTS.md.
For ambiguous legal or IP issues, stop and ask a project maintainer before proceeding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

methodology.md

Latest commit

History

methodology.md

File metadata and controls