This document describes the methodology, conventions, and rationale used for populating extended metadata fields (mappings, lifecycleStage, impactType, actorAccess) in Phase 2 of the CoSAI Risk Map development.
Version: 1.0 Last Updated: 2025-11-05 Phase: Phase 2 - Initial Data Population Status: Completed (15 risks, 8 controls)
- Overview
- Methodology
- Framework Mappings
- Lifecycle Stage Assignments
- Impact Type Categorization
- Actor Access Level Determination
- Known Gaps and Limitations
- Sources and References
This Phase 2 implementation populates extended metadata for high-priority AI security risks and controls using a best-effort research methodology. The goal is to provide meaningful cross-references to external frameworks and categorize risks/controls along key dimensions (lifecycle, impact, and access requirements).
Risks (15 total):
- riskDataPoisoning (Data Poisoning)
- riskUnauthorizedTrainingData (Unauthorized Training Data)
- riskModelSourceTampering (Model Source Tampering)
- riskExcessiveDataHandling (Excessive Data Handling)
- riskModelExfiltration (Model Exfiltration)
- riskModelDeploymentTampering (Model Deployment Tampering)
- riskDenialOfMLService (Denial of ML Service)
- riskModelReverseEngineering (Model Reverse Engineering)
- riskInsecureIntegratedComponent (Insecure Integrated Component)
- riskPromptInjection (Prompt Injection)
- riskModelEvasion (Model Evasion)
- riskSensitiveDataDisclosure (Sensitive Data Disclosure)
- riskInferredSensitiveData (Inferred Sensitive Data)
- riskInsecureModelOutput (Insecure Model Output)
- riskRogueActions (Rogue Actions)
Controls (8 total):
- controlTrainingDataSanitization
- controlModelAndDataIntegrityManagement
- controlSecureByDefaultMLTooling
- controlInputValidationAndSanitization
- controlOutputValidationAndSanitization
- controlAdversarialTrainingAndTesting
- controlApplicationAccessManagement
- controlAgentPluginPermissions
- Framework Analysis: Analyzed each external framework (MITRE ATLAS, NIST AI RMF, STRIDE, OWASP Top 10 for LLM) to understand their taxonomy and scope
- Risk/Control Review: Reviewed existing longDescription fields in risks.yaml and controls.yaml to understand the nature of each item
- Conceptual Mapping: Mapped CoSAI risks/controls to framework concepts based on:
- Attack techniques and tactics
- Security impact categories
- Real-world examples cited in descriptions
- Industry best practices
- Validation: Cross-referenced mappings with published security research and framework documentation
This is a best-effort initial population based on:
- Public framework documentation
- Security research papers
- Industry guidance and blog posts
- Conceptual alignment between frameworks
Important Notes:
- Mappings are not authoritative or officially endorsed by framework maintainers
- Some mappings are approximate when exact matches don't exist
- Empty fields indicate uncertainty or lack of clear mapping
- Community feedback and iteration will improve accuracy over time
Source: MITRE ATLAS - Adversarial Threat Landscape for Artificial-Intelligence Systems
Mapping Convention:
- Used technique IDs in format
AML.T####orAML.M####(for mitigations) - Mapped risks to attack techniques that directly enable or exemplify the risk
- Mapped controls to course-of-action mitigations where available
- Included sub-techniques (e.g.,
AML.T0010.002) when more specific than parent
Key Mappings:
| Risk/Control | ATLAS Technique(s) | Rationale |
|---|---|---|
| riskDataPoisoning (Data Poisoning) | AML.T0020, AML.T0019, AML.T0010.002 | Direct poisoning attacks on training data and datasets |
| riskPromptInjection (Prompt Injection) | AML.T0051 | Prompt injection technique |
| riskModelEvasion (Model Evasion) | AML.T0015, AML.T0043 | Evade ML model and craft adversarial data |
| riskSensitiveDataDisclosure (Sensitive Data Disclosure) | AML.T0024.*, AML.T0024.000, AML.T0024.001 | Exfiltration via ML inference API techniques |
| riskModelExfiltration (Model Exfiltration) | AML.T0024.002, AML.T0025, AML.T0048.004 | Extract ML model, exfiltration, IP theft |
| riskDenialOfMLService (Denial of ML Service) | AML.T0029, AML.T0034 | Denial of ML service and cost harvesting |
Gaps:
- Some CoSAI risks (e.g., riskUnauthorizedTrainingData, riskExcessiveDataHandling, riskInferredSensitiveData, riskInsecureModelOutput) don't have direct ATLAS technique mappings as they focus on policy/compliance rather than attacks
- ATLAS is attack-focused; some defensive controls lack mitigation mappings
Source: NIST AI Risk Management Framework v1.0
Mapping Convention:
- Used category codes from the framework (e.g.,
MS-2.7,GV-6.1) - Focused on controls rather than risks (RMF is control-oriented)
- Selected categories that address the control's primary function
- MS = Manage, GV = Govern, MP = Map, M = Measure
Key Mappings:
| Control | NIST AI RMF | Rationale |
|---|---|---|
| controlTrainingDataSanitization | MS-2.7, MS-2.8 | Data quality and provenance management |
| controlModelAndDataIntegrityManagement | MS-2.3 | Integrity verification of AI system data and models |
| controlAdversarialTrainingAndTesting | MS-2.6 | Adversarial testing and robustness |
| controlApplicationAccessManagement | GV-6.1, MS-2.11 | Access controls and authentication |
Gaps:
- NIST AI RMF codes are less granular than needed for specific technical controls
- Many risk items don't map directly (framework is control-focused)
- NIST 800-53 controls (like SC-8, SI-7) are not included as AI RMF uses its own category system
Source: Microsoft STRIDE Threat Model
Mapping Convention:
- Used lowercase category names:
spoofing,tampering,repudiation,information-disclosure,denial-of-service,elevation-of-privilege - Mapped based on primary security impact of the risk
- Multiple STRIDE categories used when risk affects multiple impact areas
Key Mappings:
| STRIDE Category | CoSAI Risks | Rationale |
|---|---|---|
| tampering | riskDataPoisoning, riskModelSourceTampering, riskModelDeploymentTampering, riskPromptInjection, riskModelEvasion, riskInsecureModelOutput, riskRogueActions | Unauthorized modification of data, models, or outputs |
| information-disclosure | riskUnauthorizedTrainingData, riskExcessiveDataHandling, riskModelExfiltration, riskModelReverseEngineering, riskSensitiveDataDisclosure, riskInferredSensitiveData | Unauthorized access to sensitive information |
| denial-of-service | riskDenialOfMLService | Availability attacks on ML services |
| elevation-of-privilege | riskModelSourceTampering, riskModelDeploymentTampering, riskInsecureIntegratedComponent, riskPromptInjection, riskRogueActions | Gaining higher access than intended |
Gaps:
- STRIDE is coarse-grained; doesn't capture AI-specific nuances
repudiationandspoofingcategories have limited applicability to current CoSAI risks- Originally designed for traditional software systems
Source: OWASP Top 10 for LLM Applications 2025
Mapping Convention:
- Used 2025 version codes:
LLM01throughLLM10 - Mapped risks that directly align with OWASP categories
- Strong alignment as OWASP Top 10 for LLM is AI-specific
2025 OWASP Top 10:
- LLM01: Prompt Injection
- LLM02: Sensitive Information Disclosure
- LLM03: Supply Chain Vulnerabilities
- LLM04: Data and Model Poisoning
- LLM05: Improper Output Handling
- LLM06: Excessive Agency
- LLM07: System Prompt Leakage
- LLM08: Vector and Embedding Weaknesses
- LLM09: Misinformation
- LLM10: Unbounded Consumption
Key Mappings:
| OWASP LLM | CoSAI Risks | Rationale |
|---|---|---|
| LLM01 | riskPromptInjection, riskModelEvasion | Prompt injection and manipulation attacks |
| LLM02 | riskExcessiveDataHandling, riskSensitiveDataDisclosure, riskInferredSensitiveData | Sensitive data exposure and disclosure |
| LLM03 | riskModelSourceTampering, riskModelDeploymentTampering | Supply chain compromise of models/infrastructure |
| LLM04 | riskDataPoisoning, riskUnauthorizedTrainingData | Training data poisoning |
| LLM05 | riskInsecureModelOutput | Insecure model outputs |
| LLM06 | riskInsecureIntegratedComponent, riskRogueActions | Excessive permissions and rogue agent actions |
| LLM10 | riskDenialOfMLService | Resource exhaustion and DoS |
Gaps:
- LLM08 (Vector and Embedding Weaknesses) not yet represented in CoSAI risks
- Some OWASP categories (LLM07, LLM09) have partial overlap with CoSAI but no exact matches
Source: lifecycle-stage.yaml
- planning - Initial planning, design, and architecture definition
- data-preparation - Data collection, cleaning, labeling, and preparation
- model-training - Model training, fine-tuning, and optimization
- development - Application development and AI model integration
- evaluation - Testing, validation, and performance evaluation
- deployment - Production deployment and initial rollout
- runtime - Active operation and serving in production
- maintenance - Ongoing monitoring, updates, and retraining
Lifecycle assignments based on:
- Where the risk is introduced into the system
- Where the risk is exposed or manifests
- Where mitigations are most effective
Examples:
| Risk/Control | Stages | Rationale |
|---|---|---|
| riskDataPoisoning (Data Poisoning) | data-preparation, model-training, maintenance | Introduced during data collection, exposed during training, recurring risk in retraining |
| riskPromptInjection (Prompt Injection) | runtime | Runtime exploitation via user prompts |
| riskModelSourceTampering (Model Source Tampering) | development, model-training, deployment | Supply chain attacks during model development |
| controlTrainingDataSanitization | data-preparation, model-training, evaluation | Applied during data processing and model development |
Patterns:
- Supply chain risks (riskDataPoisoning, riskModelSourceTampering, riskUnauthorizedTrainingData): data-preparation, model-training
- Runtime input risks (riskPromptInjection, riskModelEvasion, riskDenialOfMLService): evaluation, runtime
- Note: Evaluation stage added for risks that should be tested before deployment
- Data security risks (riskSensitiveDataDisclosure, riskInferredSensitiveData, riskExcessiveDataHandling): evaluation, runtime (+ training where applicable)
- Infrastructure risks (riskModelExfiltration, riskModelDeploymentTampering): deployment, runtime
- Development risks (riskInsecureIntegratedComponent): development, deployment, runtime
- Output security risks (riskInsecureModelOutput, riskRogueActions): evaluation, runtime
- Testing and validation critical before production deployment
Source: impact-type.yaml
Traditional Security:
- confidentiality - Protection from unauthorized access or disclosure
- integrity - Ensuring accuracy and preventing tampering
- availability - Maintaining system accessibility
- privacy - Protection of personal/sensitive information
- compliance - Adherence to regulations and standards
AI-Specific:
- safety - Prevention of physical harm or dangerous outcomes
- fairness - Equitable treatment and absence of bias
- accountability - Traceability and responsibility attribution
- reliability - Consistency and dependability of performance
- transparency - Explainability and interpretability
Impact types assigned based on:
- Primary harm caused by successful exploitation of the risk
- Key security properties protected by the control
- Multiple impacts assigned when risk affects several dimensions
Examples:
| Risk/Control | Impact Types | Rationale |
|---|---|---|
| riskDataPoisoning (Data Poisoning) | integrity, reliability, safety | Corrupts model behavior, reduces reliability, can cause unsafe outputs |
| riskSensitiveDataDisclosure (Sensitive Data Disclosure) | confidentiality, privacy | Exposes sensitive personal or confidential data |
| riskDenialOfMLService (Denial of ML Service) | availability | Makes system unavailable |
| riskPromptInjection (Prompt Injection) | integrity, confidentiality, safety | Alters behavior, may expose data, can cause harm |
| riskUnauthorizedTrainingData (Unauthorized Training Data) | compliance, privacy, fairness | Legal/regulatory violations, privacy concerns, potential bias |
Patterns:
- Poisoning/Tampering risks (riskDataPoisoning, riskModelSourceTampering, riskModelDeploymentTampering): integrity, reliability, safety
- Disclosure risks (riskSensitiveDataDisclosure, riskModelExfiltration, riskModelReverseEngineering): confidentiality, privacy
- DoS risks (riskDenialOfMLService): availability
- Inference/Fairness risks (riskInferredSensitiveData): fairness, privacy
- Policy/Legal risks (riskUnauthorizedTrainingData, riskExcessiveDataHandling): compliance, privacy
Source: actor-access.yaml
Traditional:
- external - External attackers with no direct system access
- api - Public or authenticated API endpoint access
- user - Standard authenticated user access
- privileged - Elevated privileges (admin, operator)
- physical - Physical access to hardware/facilities
Modern (AI-Specific):
- agent - AI agents with tool/plugin execution capabilities
- supply-chain - Position in software/data/model supply chain
- infrastructure-provider - Cloud or infrastructure provider access
- service-provider - Third-party service provider access
Access levels based on:
- For risks: Minimum access level required by attacker to exploit the vulnerability
- For controls: Access levels the control protects against
Examples:
| Risk/Control | Access Levels | Rationale |
|---|---|---|
| riskDataPoisoning (Data Poisoning) | supply-chain, privileged, service-provider | Requires access to training pipeline or data sources |
| riskPromptInjection (Prompt Injection) | external, api, user | Can be executed via public-facing interfaces |
| riskModelExfiltration (Model Exfiltration) | external, privileged, infrastructure-provider | Via external attack or insider/provider access |
| riskSensitiveDataDisclosure (Sensitive Data Disclosure) | external, api, user, agent | Exploitable by any user querying the model |
| riskModelEvasion (Model Evasion) | external, api, user, physical | Adversarial examples can be crafted externally or via physical access |
Patterns:
- Supply chain risks (riskDataPoisoning, riskModelSourceTampering, riskUnauthorizedTrainingData): supply-chain, service-provider
- Runtime API risks (riskPromptInjection, riskModelEvasion, riskDenialOfMLService, riskModelReverseEngineering): external, api, user
- Data disclosure risks (riskSensitiveDataDisclosure, riskInferredSensitiveData): external, api, user, (agent if applicable)
- Infrastructure risks (riskModelExfiltration, riskModelDeploymentTampering): privileged, infrastructure-provider
- Agent risks (riskRogueActions, riskInsecureIntegratedComponent): agent, api, external
Control Perspective:
- Control access levels indicate what level of actor the control defends against
- Example: controlApplicationAccessManagement defends against external, api, user (public-facing attacks)
-
MITRE ATLAS
- Limited coverage for policy/compliance risks (riskUnauthorizedTrainingData, riskExcessiveDataHandling)
- Newer attack techniques (prompt injection, agent risks) still evolving in framework
- Some CoSAI risks are broader than individual ATLAS techniques
-
NIST AI RMF
- Framework is control-oriented, making risk mapping difficult
- Version 1.0 is still relatively new; mappings may evolve
- Some specific controls lack direct RMF category equivalents
-
STRIDE
- Coarse-grained categories don't capture AI-specific nuances
repudiationandspoofinghave limited applicability- Better suited for infrastructure than model-level risks
-
OWASP Top 10 for LLM
- Strong coverage for LLM risks but less applicable to traditional ML
- 2025 version is recent; community consensus still forming
- Some CoSAI risks predate or don't align with Top 10 categories
Risks/controls with uncertain or incomplete mappings:
- riskExcessiveDataHandling (Excessive Data Handling) - Policy-focused risk with limited attack technique mappings
- riskInferredSensitiveData (Inferred Sensitive Data) - Emergent risk without established framework coverage
- controlSecureByDefaultMLTooling - Broad infrastructure control spanning many framework areas
- NIST AI RMF Refinement: As the framework matures, mappings should be updated with more specific subcategories
- MITRE ATLAS Updates: Track new technique additions and update mappings accordingly
- Cross-Framework Validation: Engage with framework maintainers for feedback on mapping accuracy
- Quantitative Metrics: Consider adding confidence scores or priority levels to mappings
- Additional Frameworks: Consider mapping to ISO/IEC standards, ENISA guidelines, or regional frameworks (EU AI Act, etc.)
- Subjective Interpretation: Mappings involve judgment calls where multiple interpretations are valid
- Framework Evolution: External frameworks update independently; mappings require ongoing maintenance
- No Official Endorsement: Mappings are interpretive and not validated by framework authors
- Best-Effort Basis: Some assignments are provisional pending deeper research or community input
-
MITRE ATLAS
- Official Website: https://atlas.mitre.org/
- GitHub Repository: https://github.com/mitre/advmlthreatmatrix
- MISP Galaxy Cluster: https://github.com/MISP/misp-galaxy/blob/main/clusters/mitre-atlas-attack-pattern.json
- Version: 5.0.1 (as of October 2025)
-
NIST AI Risk Management Framework
- Official Site: https://www.nist.gov/itl/ai-risk-management-framework
- PDF Document: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf
- Version: 1.0 (January 2023)
- Related: NIST SP 800-53 for specific security controls
-
STRIDE Threat Model
- Microsoft Threat Modeling Tool: https://learn.microsoft.com/en-us/azure/security/develop/threat-modeling-tool-threats
- Wikipedia: https://en.wikipedia.org/wiki/STRIDE_model
- Original Paper: Praerit Garg and Loren Kohnfelder (Microsoft, 1999)
-
OWASP Top 10 for LLM Applications
- Project Page: https://owasp.org/www-project-top-10-for-large-language-model-applications/
- Version: 2025 (released November 2024)
- LLM Risks Archive: https://genai.owasp.org/llm-top-10/
-
Academic Research
- Adversarial Machine Learning Survey Papers
- Data Poisoning Attack Papers (cited in risks.yaml)
- Prompt Injection Research (arXiv, security conferences)
-
Industry Guidance
- HiddenLayer: MITRE ATLAS Implementation Guides
- Practical DevSecOps: MITRE ATLAS Framework Guide (2025)
- Security Vendor Blogs: Trend Micro, Bluetuple.ai, Indusface
-
CoSAI Internal Documentation
- guide-metadata.md - Metadata field definitions
- guide-frameworks.md - Framework integration guide
- frameworks.yaml - Framework registry
- risks.yaml - Risk descriptions and examples
- controls.yaml - Control descriptions
This mapping research was conducted on November 5, 2025 and reflects the state of external frameworks as of that date.
This mapping document represents Phase 2 initial population and is intended to evolve:
- Community Feedback: Corrections and improvements welcome via GitHub issues
- Framework Expertise: Input from framework maintainers and security practitioners encouraged
- Ongoing Maintenance: Mappings will be updated as frameworks evolve and new research emerges
For questions or suggestions, please open an issue in the CoSAI secure-ai-tooling repository.
Document Version: 1.0 Last Updated: 2025-11-05 Authors: CoSAI Risk Map Working Group (Phase 2) License: Apache 2.0