| id | title | description | category | subcategory | edges |
|---|---|---|---|---|---|
| componentAgentInputHandling | Input Handling | An agent’s interaction with the world begins at the User or Application, which serves as the interface for collecting both explicit user instructions and passively collected contextual data from its environment. This blend of inputs creates a primary security challenge of reliably distinguishing trusted commands from the controlling user versus potentially untrusted information from other sources. An agent application processes explicit user instructions, which can be given directly (synchronously) like a typed command, or be configured to execute automatically when a specific event occurs (asynchronously). It also gathers implicit contextual inputs—data that isn’t a direct command but is passively collected from the environment, such as sensor readings, application state, tool call responses or the content of recently opened documents. Input Handling is is responsible for processing and understanding these inputs before they are sent to the agent’s reasoning core. This handoff is a critical security juncture, as the perception layer must reliably distinguish trusted user commands from untrusted data to prevent manipulation of the agent’s core logic. |
componentsApplication | componentsAgent | To: componentReasoningCore From: componentAgentUserQuery componentAgentSystemInstruction componentTheModel componentApplication |
| componentAgentOutputHandling | Output Handling | The final step in an agent’s workflow is response rendering, the process or formatting of an AI agent’s generated output for display and interaction within a user application. This stage is a critical security boundary because it involves taking dynamic content from the agent and displaying it within the trusted context of a user’s application, such as a web browser or mobile application. Flaws in this process can allow malicious content generated by a compromised agent to be executed by the application, leading to significant security breaches. Agents often produce content in a universal format like Markdown, which is then interpreted and rendered by the specific client application. If this output isn’t properly sanitized according to the content type, it can create severe vulnerabilities. For example, unsanitized output can enable data exfiltration when content-embedded resources like images are automatically loaded by the application, implicitly passing sensitive information to an attacker's server via the resource's URL. Similarly, improper sanitization can lead to cross-site scripting (XSS) attacks. |
componentsApplication | componentsAgent | To: componentApplication componentTheModel From: componentReasoningCore |
| componentAgentSystemInstruction | Agent System Instructions | These define an agent’s capabilities, permissions, and limitations, such as the actions it can take and the tools it is allowed to use. For security, it’s critical to unambiguously separate these instructions from user data and other inputs, often using special control tokens as a defense against prompt injection attacks. |
componentsApplication | componentsAgent | To: componentAgentInputHandling |
| componentAgentUserQuery | Agent User Query | These contain the specific details of a user’s request after being processed. The query is then combined with system instructions and other contextual data, like agent memory or external information, to create a single, structured prompt for the reasoning core to process. |
componentsApplication | componentsAgent | To: componentAgentInputHandling |
| componentApplication | Application | The application, product, or feature that uses an AI model for functionality. These applications might be directly user-facing, as in the case of a customer service chatbot, or the “user” might be a service within an organization, querying the model to power an upstream process. If an application has the ability to execute tools on behalf of its user, it is sometimes referred to as an Agent. |
componentsApplication | componentsApplicationCore | To: componentApplicationOutputHandling componentAgentInputHandling From: componentApplicationInputHandling componentAgentOutputHandling |
| componentApplicationInputHandling | Input Handling | Input handling components filter, sanitize, and protect against potentially malicious inputs, whether from a user or more generally from anything outside the trusted system. Input handling acts as a control against numerous risks and is an area ripe for more research and development. |
componentsApplication | componentsApplicationCore | To: componentApplication From: componentTheModel |
| componentApplicationOutputHandling | Output Handling | Similar to input handling, output handling components filter, sanitize, and protect against unwanted, unexpected or dangerous outputs from a model. Output handling is a major line of defense against various risks and an area primed for more development. |
componentsApplication | componentsApplicationCore | To: componentTheModel From: componentApplication |
| componentDataFilteringAndProcessing | Data Filtering and Processing | The processes of cleaning, transforming, and preparing raw data from various sources to make it suitable for training. This may include labeling data, removing duplicates or errors, and even generating new synthetic data to enhance the model's learning. |
componentsInfrastructure | componentsData | To: componentTrainingData From: componentDataSources |
| componentDataSources | Data Sources | The original sources or repositories from which data is gathered for potential use in training an AI model. These can include databases, APIs, web scraping, or even sensor data. The quality and diversity of data sources significantly impact the model's capabilities. |
componentsInfrastructure | componentsData | To: componentDataFilteringAndProcessing |
| componentDataStorage | Data Storage Infrastructure | Storage for training data. Training data is stored from ingestion through filtering and usage during training. |
componentsInfrastructure | componentsData | To: componentModelTrainingTuning From: componentTrainingData |
| componentMemory | Model Memory | Memory allows a model or agent to retain context and learn facts across interactions. Memory implementations may result in additional security and data risk exposures requiring mitigation. Examples include: Persistent attacks: Malicious data stored in memory could enable ongoing attacks against a user. Data loss through improper isolation: Sensitive data may be disclosed due to inadequate memory isolation between different users and contexts. Unintended data transfer: Sensitive information may be inappropriately transferred to third parties through undesired or unexpected tool calls. |
componentsModel | componentsOrchestration | To: componentOrchestrationOutputHandling From: componentOrchestrationInputHandling |
| componentModelEvaluation | Model Evaluation | The process of testing the model against new data to see how well it performs (evaluation). Evaluation happens in two stages: during the training process, when each checkpointed update to the model is evaluated, and after the model is trained, to assess how well it performs at its intended purpose. |
componentsModel | componentsModelTraining | To: componentModelTrainingTuning From: componentTheModel |
| componentModelFrameworksAndCode | Model Frameworks and Code | The code and frameworks necessary to train and use a model. Model code defines the model architecture and number and types of layers in the model. Framework code implements the steps for each layer to train and evaluate the model. The framework code is generally necessary not just for training a model, but also required to run inferences (i.e., make predictions) when the model is in use. Usually framework code is shipped separately from the model itself and needs to be installed to use the model. |
componentsModel | componentsModelTraining | To: componentModelTrainingTuning |
| componentModelServing | Model Serving Infrastructure | The systems and process to deploy a model in production, making them available for services and applications. Note: Many model consumers use remote models served via API. Those that serve their own models, though, should consider the same Model Serving risks that apply to model creators. |
componentsInfrastructure | componentsModelDeployment | To: componentTheModel |
| componentModelStorage | Model Storage | Storage for the model. Model storage refers to multiple stages in the development process: Local storage during training, in which each checkpoint is stored until overwritten. Published storage, after training is completed and the model is uploaded to a model hub (a centralized model repository). Note: Many model consumers use remote models served by API. Those model consumers that store models themselves, though, should consider the same Model Storage risks that apply to model creators. |
componentsInfrastructure | componentsModelDeployment | To: componentTheModel |
| componentModelTrainingTuning | Training and Tuning | The process of teaching a model to extract the correct patterns and inferences from data by adjusting the probability of a given outcome (training) and adjusting a smaller set of probabilities to tune a model to a specific task (tuning). Given the enormous cost of training, many model creators take a preexisting model and tune it to their needs, by focusing only on the training related to a specific type of task. |
componentsModel | componentsModelTraining | To: componentTheModel From: componentModelEvaluation componentModelFrameworksAndCode componentDataStorage |
| componentOrchestrationInputHandling | Input Handling | Orchestration input handling is responsible for validating, sanitizing, and normalizing all data entering the system from external sources before it reaches core orchestration logic. |
componentsModel | componentsOrchestration | To: componentTools componentMemory componentRAGContent From: componentTheModel componentReasoningCore |
| componentOrchestrationOutputHandling | Output Handling | Orchestration output is responsible for validating, sanitizing, and safely formatting data as it exits the system to external or downstream components. This control ensures outbound data meets defined schemas, strips sensitive information that shouldn't be exposed, prevents injection attacks by properly encoding outputs for their destination context (such as HTML encoding for web responses or parameterization for database queries), and enforces data classification policies. |
componentsModel | componentsOrchestration | To: componentReasoningCore componentTheModel From: componentTools componentMemory componentRAGContent |
| componentRAGContent | Retrieval Augmented Generation & Content | Content for Retrieval-Augmented Generation (RAG) provides the agent with curated knowledge to ground its responses and improve accuracy. The main security risk is data poisoning, where an attacker corrupts this knowledge source to manipulate the agent's output. |
componentsModel | componentsOrchestration | To: componentOrchestrationOutputHandling From: componentOrchestrationInputHandling |
| componentReasoningCore | Agent Reasoning Core | The core of an agent’s functionality is its ability to reason about a user’s goal and create a plan to achieve it. The reasoning core processes system instructions, user queries, and contextual information to generate a sequence of actions. The actions, or tool calls, allow the agent to affect the real world—interacting with external systems, retrieving new information, or making changes to data and resources. The reasoning core typically consists of one or more models—possibly separate models for the reasoning and then planning steps, or potentially one large model able to do both. The process of planning is often iterative, taking place in a “reasoning loop” where the plan is refined based on new information or the results of previous actions. This iterative nature, combined with the ingestion of external data, creates a vulnerability to indirect prompt injection, where adversarially crafted information can manipulate the agent's planning process. The complexity of plans determines the agent’s level of autonomy, which can range from selecting a predefined workflow to dynamically orchestrating multi-step actions. This level of autonomy directly governs the potential severity of a security failure—the more an agent can do on its own, the greater the risk from manipulation or misalignment. This risk can be, at least partially, mitigated through guardrails that constrain actions taken by an agent; for example, by making certain actions subject to mandatory user confirmation. However, this can in turn result in limitations on the agent's autonomy, thus requiring a careful tradeoff between autonomy and security. |
componentsApplication | componentsAgent | To: componentAgentOutputHandling componentOrchestrationInputHandling From: componentAgentInputHandling componentOrchestrationOutputHandling |
| componentTheModel | The Model | A pairing of code and weights, created with data during a training process. In the CoSAI Risk Map, the model is represented as the result of the output of the Data Components being trained, stored, and served using the Infrastructure Components. A model is ultimately useful when deployed in applications, using Application Components. |
componentsModel | componentsModelCore | To: componentModelEvaluation componentAgentInputHandling componentApplicationInputHandling componentOrchestrationInputHandling From: componentModelTrainingTuning componentModelServing componentModelStorage componentAgentOutputHandling componentApplicationOutputHandling componentOrchestrationOutputHandling |
| componentTools | External Tools and Services | Tools are the external APIs and services an agent uses to take action in the world, which must be secured with least-privilege permissions. A key risk comes from deceptive descriptions on third-party tools, which can trick the agent into performing unintended, harmful functions. |
componentsModel | componentsOrchestration | To: componentOrchestrationOutputHandling From: componentOrchestrationInputHandling |
| componentTrainingData | Training Data | The final, curated subset of data that is fed into the AI model during the training process. This data is used to adjust the model's internal parameters, enabling it to learn patterns and make predictions or inferences. |
componentsInfrastructure | componentsData | To: componentDataStorage From: componentDataFilteringAndProcessing |