smaller improvements in docs

st0ffregen · st0ffregen · commit e300314f55e4 · 2025-08-14T21:02:12.000+02:00
diff --git a/documentation/README.md b/documentation/README.md
@@ -2,7 +2,7 @@ MemorAIs
 ====================
 *Your data, remembered. Your questions, answered.*
 
-MemorAIs is a full-fledged chat app to interact with a capable language model. It employs a Retrieval-Augmented Generation (RAG) pipeline to provide a more accurate and contextually relevant response. 
+MemorAIs is a full-fledged chat app to interact with a capable language model. It employs a Retrieval-Augmented Generation (RAG) pipeline to provide more accurate responses, taking into account context uploaded by the user (so called **memories**). 
 Further, memorAIs acts as an Model Context Protocol (MCP) host, to enable users agents to access the users' chat history with the language model.
 This AI-native application emphasizes user privacy and data security, ensuring that sensitive information is handled with care.
 
@@ -56,10 +56,10 @@ Via MCP the chat messages, for a given chat session ID, can be accessed for exam
 MemorAIs is built using microservices which include:
 - **Backend**: ASP.NET Core 9.0
 - **Frontend**: React.js, serving a single-page application (SPA)
-- **Proxy**: Nginx Reverse Proxy serves the frontend and proxies backend (running in frontend container)
-- **Database**: MariaDB to store relational data like user accounts, chat sessions and text representations of memories
-- **Vector Database**: Qdrant for storing and searching embeddings of memories (context)
-- **LM Runtime**: Ollama for hosting and running the language model (LM)
+- **Proxy**: Nginx Reverse Proxy (running in frontend container) serves the frontend and proxies the backend
+- **Database**: MariaDB to store relational data like user accounts, chat sessions and textual representations of memories (context)
+- **Vector Database**: Qdrant for storing and searching the embeddings of memories 
+- **LM Runtime**: Ollama for hosting and running the language model (SmolLM 135M)
 - **Cleanup**: A cleanup service that periodically removes old data from the database and vector database
 
 
@@ -69,13 +69,13 @@ In memorAIs are two vulnerabilities implemented.
 ## Embedding Inversion Attack
 ### Background
 Embeddings are a common way to represent text in a vector, allowing models to understand and process language. 
-A Retrieval-Augmented Generation pipeline is assembled by first encoding documents into embeddings using an 
+A Retrieval-Augmented Generation pipeline is assembled by first encoding given text (here: so-called memories) into embeddings using an 
 embedding model and storing them in a vector database for efficient retrieval. At query time, the user’s question 
-is embedded, similar vectors are retrieved from the database, and those results are passed to an LLM to generate a 
+is embedded, similar vectors are retrieved from the database with their textual representation, and those results are passed to an LLM to generate a 
 context-aware response. 
 
 This attack vector is designed to mimic a genuine embedding-inversion attack. 
-Classic inversion attacks reconstruct portions of a sentence by training a model to reverse its embedding vector, 
+Classic inversion attacks reconstruct portions of a sentence by training a model to reverse the embedding vector, 
 an approach that depends on each word’s semantic meaning being preserved in the embedding space.
 
 Since flags, in contrast, are random character strings with no inherent semantics and have to be fully and with high accuracy recovered from the embeddings, 
@@ -92,11 +92,11 @@ The flag can then be reconstructed.
 flowchart TD
     A["Get flag hint session ID"] --> B["Register an attacker user"]
     B --> C["Login as attacker"]
-    C --> D["Import shared session containing flag embeddings"]
+    C --> D["Import shared session containing flag embeddings by it's ID"]
     D --> E["Create search query with flag-alike characters"]
-    E --> F["Find documents in the chat session embedding collection"]
+    E --> F["Search for similar entities in the chat session <br/> and retrieve vector and norm"]
     F --> G["Build character embedding lookup table<br/>by embedding all possible flag characters"]
-    G --> H["For each found document"]
+    G --> H["For each found entities"]
     H --> I["Extract embedding vector and norm"]
     I --> J["Un-normalize embedding vector<br/>by multiplying with norm"]
     J --> K["Reconstruct character by character<br/>using nearest neighbor matching"]