[Doc]: fixing typos in different files (#369)

didier-durand · web-flow · commit 68e33196ee25 · 2025-12-19T13:09:33.000+01:00
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -307,8 +307,8 @@ This is an example of how a text sould be sanitized:
 
 Some annotation rules:
 - Each detected entity should be sanitized using the **format: [ENTITY_TYPE]**
-- Priorize IP_ADDRESS to URL: `https://192.168.2.100` is anonimized like this: `https://[IP_ADDRESS]:5050` instead of [URL]
-- DATE_TIME is used for dates and for times, in this case `2025-03-11 11:41 UTC` it sould be anonimized like this: ` [DATE_TIME] [DATE_TIME]`
+- Priorize IP_ADDRESS to URL: `https://192.168.2.100` is anonymized like this: `https://[IP_ADDRESS]:5050` instead of [URL]
+- DATE_TIME is used for dates and for times, in this case `2025-03-11 11:41 UTC` it should be anonymized like this: ` [DATE_TIME] [DATE_TIME]`
 
 If you have any questions about the annotation, please write to us.
 
diff --git a/docs/cai_benchmark.md b/docs/cai_benchmark.md
@@ -171,7 +171,7 @@ Currently, supporting the following benchmarks, refer to [`ctf_configs.jsonl`](h
 
 [^3]: **Medium (`Graduate Level`)**: Aimed at participants with a solid grasp of cybersecurity principles. Focus areas include intermediate exploits including web shells, network traffic analysis, and steganography.
 
-[^4]: **Hard (`Professionals`)**: Crafted for experienced penetration testers. Focus areas include advanced techniques such as heap exploitation, kernel vulnerabilities, and complex multi-step challenges.
+[^4]: **Hard (`Professionals`)**: Crafted for experienced penetration testers. Focus areas include advanced techniques such as heap exploitation, kernel vulnerabilities, and complex multistep challenges.
 
 [^5]: **Very Hard (`Elite`)**: Designed for elite, highly skilled participants requiring innovation. Focus areas include cutting-edge vulnerabilities like zero-day exploits, custom cryptography, and hardware hacking.
 
@@ -215,7 +215,7 @@ Some of the backends need and url to the api base, set as follows in .env: NAME_
 OLLAMA_API_BASE="..."
 OPENROUTER_API_BASE="..."
 ```
-Once evething is configured run the script
+Once everything is configured run the script
 
 ```bash
 python benchmarks/eval.py --model MODEL_NAME --dataset_file INPUT_FILE --eval EVAL_TYPE --backend BACKEND
@@ -324,7 +324,7 @@ IBAN
 EUROPEAN_BANK_ACCOUNT
 ```
 
-This is an example of how a text sould be sanitized:
+This is an example of how a text should be sanitized:
 
 ```
 "Contact Mikel at mikel@example.com" → "Contact [PERSON] at [EMAIL_ADDRESS]"
@@ -333,8 +333,8 @@ This is an example of how a text sould be sanitized:
 
 Some annotation rules:
 - Each detected entity should be sanitized using the **format: [ENTITY_TYPE]**
-- Priorize IP_ADDRESS to URL: `https://192.168.2.100` is anonimized like this: `https://[IP_ADDRESS]:5050` instead of [URL]
-- DATE_TIME is used for dates and for times, in this case `2025-03-11 11:41 UTC` it sould be anonimized like this: ` [DATE_TIME] [DATE_TIME]`
+- Priorize IP_ADDRESS to URL: `https://192.168.2.100` is anonymized like this: `https://[IP_ADDRESS]:5050` instead of [URL]
+- DATE_TIME is used for dates and for times, in this case `2025-03-11 11:41 UTC` it should be anonymized like this: ` [DATE_TIME] [DATE_TIME]`
 
 If you have any questions about the annotation, please write to us.
 
@@ -397,7 +397,7 @@ python benchmarks/eval.py --model alias1 --dataset_file benchmarks/cyberPII-benc
 The input CSV file must contain the following columns:
 
 - id: Unique row identifier
-- target_text: The original text from memory01_80 dataseto be annotated
+- target_text: The original text from memory01_80 dataset to be annotated
 - target_text_{annotator}_sanitized: The sanitized version of the text produced by each annotator
 
 
diff --git a/docs/cai_prompt_injection.md b/docs/cai_prompt_injection.md
@@ -2,7 +2,7 @@
 
 ## Summary
 
-This implementation adds guardrails to protect CAI agents from prompt injection attacks when interacting with untrusted external content (web pages, server responses, CTF challenges, etc).
+This implementation adds guardrails to protect CAI agents from prompt injection attacks when interacting with untrusted external content (web pages, server responses, CTF challenges, etc.).
 
 ## Problem
 
diff --git a/docs/index.md b/docs/index.md
@@ -118,7 +118,7 @@ CAI's capabilities are validated through rigorous peer-reviewed research demonst
 
 ## Motivation
 ### Why CAI?
-The cybersecurity landscape is undergoing a dramatic transformation as AI becomes increasingly integrated into security operations. **We predict that by 2028, AI-powered security testing tools will outnumber human pentesters**. This shift represents a fundamental change in how we approach cybersecurity challenges. *AI is not just another tool - it's becoming essential for addressing complex security vulnerabilities and staying ahead of sophisticated threats. As organizations face more advanced cyber attacks, AI-enhanced security testing will be crucial for maintaining robust defenses.*
+The cybersecurity landscape is undergoing a dramatic transformation as AI becomes increasingly integrated into security operations. **We predict that by 2028, AI-powered security testing tools will outnumber human pentesters**. This shift represents a fundamental change in how we approach cybersecurity challenges. *AI is not just another tool - it's becoming essential for addressing complex security vulnerabilities and staying ahead of sophisticated threats. As organizations face more advanced cyberattacks, AI-enhanced security testing will be crucial for maintaining robust defenses.*
 
 This work builds upon prior efforts[1] and similarly, we believe that democratizing access to advanced cybersecurity AI tools is vital for the entire security community. That's why we're releasing Cybersecurity AI (`CAI`) as an open source framework. Our goal is to empower security researchers, ethical hackers, and organizations to build and deploy powerful AI-driven security tools. By making these capabilities openly available, we aim to level the playing field and ensure that cutting-edge security AI technology isn't limited to well-funded private companies or state actors.
 
diff --git a/docs/multi_agent.md b/docs/multi_agent.md
@@ -26,7 +26,7 @@ While orchestrating via LLM is powerful, orchestrating via code makes tasks more
 
 - Using [Guardrails](guardrails.md) and LLM_as_judge: They are agents that evaluates and provides feedback, until they says the inputs/outputs passes certain criteria. The agent ensures inputs/outputs are appropriate.
 
-- Paralelization of task: Running multiple agents in parallel. This is useful for speed when you have multiple tasks.
+- Parallelization of task: Running multiple agents in parallel. This is useful for speed when you have multiple tasks.
 
 ## Running Agents in Parallel
 
diff --git a/src/cai/tools/web/search_web.py b/src/cai/tools/web/search_web.py
@@ -40,7 +40,7 @@ def query_perplexity(query: str = "", context: str = "") -> str:
                 "over general explanations. Your team relies on your research to "
                 "identify attack vectors, bypass security controls, and capture "
                 "flags. Always suggest concrete next steps based on your findings."
-                "Put the neccesary code in each iteration"
+                "Put the necessary code in each iteration"
             ),
         },
         {

Original file line number	Diff line number	Diff line change
`@@ -40,7 +40,7 @@ def query_perplexity(query: str = "", context: str = "") -> str:`
`40`	`40`	`"over general explanations. Your team relies on your research to "`
`41`	`41`	`"identify attack vectors, bypass security controls, and capture "`
`42`	`42`	`"flags. Always suggest concrete next steps based on your findings."`
`43`		`- "Put the neccesary code in each iteration"`
	`43`	`+ "Put the necessary code in each iteration"`
`44`	`44`	`),`
`45`	`45`	`},`
`46`	`46`	`{`