Summary: The presentation introduces the concept of feeding personal knowledge to a local LLM -- "your second brain" -- but does not explain the mechanism that makes this work. This document fills that gap. It explains Retrieval Augmented Generation (RAG) in plain language, describes the practical copy-paste workflow that works today, maps the roadmap toward automated RAG integration, and covers personal knowledge management with Obsidian. Understanding this bridge between data and inference is what makes the entire local LLM value proposition operational.
- The Problem This Document Addresses
- What Is RAG (Retrieval Augmented Generation)
- Current Reality: The Manual Copy-Paste Workflow
- The Roadmap: Automated RAG Integration
- Personal Knowledge Management with Obsidian
- Putting It Together: Data Strategy by Phase
- Key Links
- References
The presentation introduces a compelling idea: feed personal notes, case knowledge, institutional memory, and analytical frameworks into a local LLM, transforming it from a generic text generator into a contextual assistant that "knows" what the analyst knows. This is the "second brain" concept -- an AI system augmented by personal and organizational knowledge.
The idea is sound. But the presentation skips the most important question: how does the data actually get from a collection of notes and documents into the LLM's context?
A local LLM (Ollama running Llama 3.1, for example) does not have access to files on the hard drive by default. It does not read Obsidian vaults, browse local directories, or index documents automatically. It processes only what is explicitly provided in each prompt. Without a mechanism to bridge the gap between stored documents and the LLM's inference engine, "feed your data to the AI" remains an aspiration rather than a workflow.
This document explains the three levels of that bridge:
| Level | Approach | Complexity | Phase |
|---|---|---|---|
| Level 1 | Manual copy-paste into prompts | None -- works today | Phase 1 |
| Level 2 | Document upload via Open WebUI | Low -- built into existing tools | Phase 1-2 |
| Level 3 | Automated RAG pipeline | Moderate -- requires additional tooling | Phase 2+ |
Each level builds on the previous one. Most deployments should start at Level 1 and advance only when the workflow demands it.
RAG is the technical mechanism that connects a collection of documents to an LLM. Understanding RAG, even at a conceptual level, is essential for anyone evaluating whether local LLMs can meet their analytical needs.
A standard LLM answers questions based on its training data -- the billions of words it processed during model training, which ended at a specific cutoff date. It has no knowledge of documents on the local machine, recent events, or organization-specific information.
RAG changes this by inserting a retrieval step before the LLM generates its answer. Instead of answering from training data alone, the system first searches through a collection of provided documents, finds the passages most relevant to the question, and then feeds those passages into the LLM's prompt as context. The LLM generates its response based on the retrieved context plus its general language capabilities.
In plain language: RAG is an automated way to find the right page in the right document and paste it into the prompt before the AI answers.
YOUR DOCUMENTS
(Obsidian vault / local files / reports)
|
v
+------------------------+
| STEP 1: INDEXING |
| (one-time setup) |
| |
| Documents are split |
| into chunks and |
| converted into |
| numerical vectors |
| (embeddings) |
+------------------------+
|
v
+------------------------+
| VECTOR DATABASE |
| (searchable index) |
| |
| Stores document |
| chunks as vectors |
| for fast similarity |
| search |
+------------------------+
^
|
USER ASKS A QUESTION
|
v
+------------------------+
| STEP 2: RETRIEVAL |
| (every query) |
| |
| Question is converted |
| to a vector and |
| compared against the |
| document index |
| |
| Most relevant chunks |
| are selected |
+------------------------+
|
v
+------------------------+
| STEP 3: GENERATION |
| (every query) |
| |
| Retrieved chunks + |
| original question |
| are combined into a |
| prompt and sent to |
| Ollama |
| |
| LLM generates answer |
| BASED ON the |
| retrieved context |
+------------------------+
|
v
RESPONSE TO USER
(grounded in your documents)
RAG addresses three fundamental limitations of standalone LLMs:
| Limitation | Without RAG | With RAG |
|---|---|---|
| Knowledge cutoff | Model only knows its training data (often months or years old) | Model answers from current documents provided at query time |
| Hallucination | Model generates plausible-sounding text that may not be factually grounded | Model generates text grounded in specific retrieved passages, significantly reducing hallucination (though not eliminating it entirely) |
| Organizational knowledge | Model has no knowledge of agency-specific procedures, cases, or context | Model can reference agency documents, SOPs, and analytical products |
Important caveat: RAG reduces hallucination but does not eliminate it. The model can still misinterpret retrieved passages, combine information from multiple passages incorrectly, or generate text that sounds like it came from the documents but is actually a confabulation. Human review remains mandatory. See Understanding LLMs for more on hallucination.
| Term | Definition |
|---|---|
| Embedding | A numerical representation (vector) of a piece of text. Similar texts have similar embeddings, enabling semantic search. |
| Vector database | A specialized database optimized for storing and searching embeddings. Examples: ChromaDB, FAISS, Qdrant. |
| Chunk | A segment of a document (typically 200-1000 words). Documents are split into chunks because most LLMs cannot process entire large documents at once. |
| Semantic search | Finding documents by meaning rather than exact keyword match. "Officer safety incident" can match passages about "use of force events" even without those exact words. |
| Context window | The maximum amount of text an LLM can process in a single prompt. Retrieved chunks must fit within this window alongside the question and instructions. |
| Retrieval | The process of searching the vector database to find chunks relevant to the user's question. |
For most initial deployments, RAG infrastructure is not necessary. The practical day-one workflow is manual: identify the relevant content, copy it into the prompt, and let the LLM process it.
Step 1: Analyst identifies relevant documents
(case file section, SOP excerpt, previous analysis)
|
v
Step 2: Analyst copies relevant text
(Ctrl+C from the source document)
|
v
Step 3: Analyst pastes into the LLM prompt
(along with the task instructions)
|
v
Step 4: LLM processes the provided context + task
|
v
Step 5: Analyst reviews and refines the output
| Advantage | Detail |
|---|---|
| Zero additional tooling | Works with Ollama and Open WebUI as-is, no configuration required |
| Complete human control | The analyst decides exactly what context enters the prompt |
| No indexing required | No need to pre-process or index any documents |
| Immediate value | Available from the first day of deployment |
| Transparency | The analyst knows exactly what data the model is working with |
| Limitation | Detail |
|---|---|
| Manual effort | Finding and copying relevant content takes time, especially across multiple documents |
| Context window constraints | Only as much text as fits in the model's context window can be included (typically 4K-8K tokens for 7B-13B models, roughly 3,000-6,000 words) |
| Does not scale | Practical for a handful of documents; impractical for searching across hundreds or thousands |
| No persistent indexing | Each session starts from zero -- there is no accumulated knowledge base the model can search |
| Analyst must know what to include | The quality of the output depends on the analyst selecting the right context |
The manual approach works well when:
- The analyst already knows which documents are relevant
- The relevant content fits within a few thousand words
- The task involves a small number of source documents (1-5)
- The deployment is in Phase 1 (non-CJI pilot) and the priority is proving value
This is where most organizations start, and it is a perfectly valid long-term approach for many workflows. The use cases in Use Cases are all designed around this manual approach.
Open WebUI includes a built-in document upload feature that serves as a middle ground between fully manual copy-paste and a complete RAG pipeline. When a user uploads a document through the Open WebUI interface, the system:
- Processes the document into chunks
- Creates embeddings using a local embedding model
- Stores the chunks in a local vector database
- Automatically retrieves relevant chunks when the user asks questions
This is, functionally, a simplified RAG pipeline built into the chat interface. It eliminates the manual copy-paste step for documents that have been uploaded.
Advantages:
- No additional tools to install beyond Open WebUI (which is recommended as the primary interface -- see Ollama Quickstart)
- Upload once, query multiple times
- Semantic search across uploaded documents
Limitations:
- Document processing quality varies (PDFs with complex formatting may not parse cleanly)
- Limited control over chunking strategy and embedding model
- Not designed for large-scale document collections (hundreds or thousands of files)
- Uploaded documents persist only within Open WebUI's local database
When to use this: Open WebUI document upload is the recommended second step after the manual workflow. Once an analyst finds themselves repeatedly copying the same documents into prompts, uploading those documents to Open WebUI saves time while maintaining simplicity.
Automated RAG is a Phase 2+ consideration. It is not required for initial deployment or for demonstrating value. However, understanding the options is important for planning.
The case for automated RAG emerges when:
- Analysts need to search across large document collections (hundreds of files or more)
- The same knowledge base is queried repeatedly by multiple users
- The manual workflow becomes a bottleneck -- more time is spent finding context than reviewing output
- The deployment moves beyond individual use toward team-level knowledge management
What it is: A dedicated local RAG platform designed specifically to connect document collections to local LLMs. Open source (MIT license).
| Feature | Detail |
|---|---|
| Document ingestion | Supports PDF, DOCX, TXT, CSV, and web page import |
| Vector database | Built-in (LanceDB) or external (ChromaDB, Pinecone, Qdrant) |
| LLM backend | Connects directly to Ollama |
| Workspace model | Documents organized into workspaces with separate indexes |
| User interface | Chat-based, similar to ChatGPT |
| Deployment | Local Docker container or native install |
Strengths: Purpose-built for the local RAG use case. Provides workspace isolation (different document sets for different projects). Straightforward setup for users already running Ollama.
Limitations: Another application to deploy and maintain. Adds complexity to the technology stack. Workspace management requires deliberate organizational structure.
Repository: github.com/Mintplex-Labs/anything-llm
What it is: A local RAG platform focused on privacy and data sovereignty. Open source (Apache 2.0 license). Designed from the ground up for environments where data must never leave the local machine.
| Feature | Detail |
|---|---|
| Document ingestion | PDF, DOCX, TXT, and other common formats |
| Vector database | Built-in (Qdrant or ChromaDB) |
| LLM backend | Supports Ollama and other local backends |
| Privacy focus | All processing is local -- no telemetry, no external API calls |
| API | OpenAI-compatible API for integration with other tools |
| Deployment | Local installation |
Strengths: Strong privacy posture aligns with law enforcement data handling requirements. API compatibility enables integration with other tools. Active development community.
Limitations: Setup is more involved than AnythingLLM. Documentation assumes some technical familiarity. Feature set is narrower than AnythingLLM.
Repository: github.com/zylon-ai/private-gpt
For analysts already using Obsidian for knowledge management (see next section), several community plugins connect the Obsidian vault directly to a local LLM.
Smart Connections is the most mature plugin in this space. It creates embeddings of Obsidian notes and enables semantic search and AI-assisted queries across the vault.
| Feature | Detail |
|---|---|
| Integration | Runs inside Obsidian -- no separate application |
| Indexing | Embeds all notes in the vault for semantic search |
| LLM connection | Connects to Ollama for local inference |
| Workflow | Ask questions about notes; get answers grounded in vault content |
| Privacy | All processing local when using Ollama backend |
Strengths: No additional application to deploy or manage. The analyst's existing Obsidian workflow becomes the knowledge base. Natural integration between note-taking and AI-assisted analysis.
Limitations: Requires Obsidian (which the presentation recommends). Plugin quality and maintenance depend on community contributors. Less robust than dedicated RAG platforms for large-scale document collections. Limited to content within the Obsidian vault.
| Feature | Open WebUI Upload | AnythingLLM | PrivateGPT | Obsidian Plugins |
|---|---|---|---|---|
| Setup complexity | None (built-in) | Low-moderate | Moderate | Low |
| Additional software | None | Docker container | Local install | Obsidian plugin |
| Document scale | Dozens | Hundreds-thousands | Hundreds-thousands | Vault-sized |
| Multi-user | Yes (Open WebUI) | Yes (workspaces) | Yes (API) | No (single user) |
| Best for | Getting started | Team knowledge base | Privacy-critical | Individual analyst |
| CJI considerations | Same as Ollama deployment | Requires same hardening | Requires same hardening | Single-user, local only |
Recommendation for Phase 1: Use the manual copy-paste workflow and Open WebUI document upload. These require no additional tooling and provide immediate value.
Recommendation for Phase 2+: Evaluate AnythingLLM or PrivateGPT if the deployment scales beyond individual use or if analysts need to search across large document collections. The choice between them depends on whether workspace management (AnythingLLM) or API integration (PrivateGPT) is the higher priority.
Important CJIS note: Any RAG platform that ingests, indexes, or stores CJI must meet the same CJIS compliance requirements as the LLM itself. The vector database, document store, and embedding index all constitute CJI processing and storage. See the CJIS Compliance Framework for requirements. During the non-CJI pilot (Phase 1), this is not a concern. During CJI production (Phase 4), it is critical.
The presentation references Obsidian as the "second brain" for building a personal knowledge repository. This section explains what Obsidian is, why it is well-suited for law enforcement professionals, and how it connects to the local LLM workflow.
Obsidian is a free, local-first note-taking application that stores all data as plain-text Markdown files in a folder on the local machine. There is no cloud dependency, no account required (for local use), and no vendor lock-in. Notes are interlinked, searchable, and organized however the user prefers.
| Feature | Detail |
|---|---|
| Data format | Plain-text Markdown (.md files) |
| Storage | Local folder (the "vault") -- user controls where files live |
| Cost | Free for personal use; paid sync is optional and not required |
| Linking | Notes can link to other notes, creating a knowledge graph |
| Search | Full-text search across all notes, with tag and metadata filtering |
| Plugins | Extensible via community plugins (500+), including LLM integration |
| Platforms | Windows, macOS, Linux, iOS, Android |
| Benefit | Relevance |
|---|---|
| All data stays local | Notes never leave the machine unless the user explicitly exports or syncs them. No cloud storage by default. This is critical for sensitive (but non-CJI) analytical notes. |
| Plain-text durability | Markdown files are readable by any text editor. If Obsidian disappears tomorrow, the notes remain accessible. No proprietary format lock-in. |
| Linking and graph view | Connecting notes about related subjects, locations, patterns, and cases creates a visual knowledge graph that reveals relationships across analytical products. |
| Tagging and metadata | Notes can be tagged by case, subject, crime type, jurisdiction, or any other taxonomy. Combined with search, this enables rapid retrieval across years of accumulated knowledge. |
| Templates | Obsidian supports templates for standardized notes (daily briefing template, case note template, meeting notes template), ensuring consistency across the knowledge base. |
| Free and self-contained | No budget request required. No IT approval needed for a note-taking application. Low barrier to entry. |
A well-organized Obsidian vault can serve as the foundation for the LLM-augmented "second brain" the presentation describes. The following structure is one approach -- not prescriptive, but illustrative:
Analyst Vault/
|
|-- Daily Notes/
| |-- 2025-01-13.md (daily observations, briefing notes)
| |-- 2025-01-14.md
|
|-- Cases/
| |-- Case-2025-001.md (case-specific notes and links)
| |-- Case-2025-002.md
|
|-- Intelligence/
| |-- Patterns/
| | |-- Retail-Theft-Ring.md
| | |-- Vehicle-Break-In-Series.md
| |-- Subjects/
| | |-- (subject tracking notes)
| |-- Areas/
| |-- District-3-Trends.md
|
|-- Procedures/
| |-- Search-Warrant-Checklist.md
| |-- Evidence-Submission-Process.md
|
|-- Training/
| |-- Interview-Techniques.md
| |-- Report-Writing-Standards.md
|
|-- Templates/
| |-- Daily-Note-Template.md
| |-- Case-Note-Template.md
| |-- Intelligence-Product-Template.md
|
|-- Prompts/
|-- Situational-Awareness-Brief.md
|-- Case-Summary-Prompt.md
|-- OSINT-Synthesis-Prompt.md
The connection between Obsidian and Ollama can operate at three levels:
Level 1: Manual reference (today) The analyst uses Obsidian to organize notes and knowledge, then manually copies relevant content into LLM prompts. Obsidian serves as the organized retrieval system; the human serves as the retrieval mechanism.
Level 2: Plugin-assisted (near-term) Obsidian plugins like Smart Connections create embeddings of vault content and enable semantic search within Obsidian, with the option to send relevant notes directly to Ollama for processing. The plugin acts as a simplified RAG layer within the Obsidian interface.
Level 3: Full RAG pipeline (future) A dedicated RAG platform (AnythingLLM, PrivateGPT) indexes the Obsidian vault as a document source, enabling automated retrieval across all vault content when querying the LLM. The analyst asks a question; the system finds the relevant notes and provides them as context automatically.
Obsidian is a note-taking and knowledge management tool, not a records management system. It is appropriate for:
- Personal analytical notes and observations
- Collected open-source information
- Procedural references and checklists
- Prompt templates and LLM workflow documentation
- Training notes and professional development
- Non-CJI intelligence products and assessments
It is not appropriate as a repository for CJI, original case files, evidence, or official records. Those belong in the agency's authorized records management system. Obsidian notes should reference case numbers and link to official records but should not contain CJI content -- particularly if the vault is not stored on a CJIS-compliant system.
The following table maps the knowledge management approach to the deployment phases described in the CJIS Compliance Framework:
| Phase | Knowledge Management Approach | Tools | CJI Status |
|---|---|---|---|
| Phase 1: Non-CJI Pilot | Manual copy-paste + Open WebUI document upload | Ollama, Open WebUI, Obsidian | Non-CJI only |
| Phase 2: Security Hardening | Begin evaluating RAG platforms; test with non-CJI documents | Add AnythingLLM or PrivateGPT evaluation | Non-CJI only |
| Phase 3: Compliance Validation | Validate RAG platform against CJIS requirements if CJI ingestion is planned | Include RAG platform in security assessment | Pre-validation |
| Phase 4: CJI Production | Deploy hardened RAG pipeline for CJI documents (if validated) | Full stack with CJIS controls | CJI authorized |
| Phase 5: Scale | Expand document collections, multi-user knowledge bases, advanced workflows | Team-level RAG deployment | CJI authorized |
Key principle: The knowledge management tooling advances in lockstep with the compliance posture. Do not ingest CJI into any system -- including a RAG vector database -- before that system has been hardened and validated to the same standard as the LLM itself.
| Resource | URL | Description |
|---|---|---|
| Obsidian | obsidian.md | Free, local-first knowledge management |
| Open WebUI | github.com/open-webui/open-webui | Chat interface for Ollama with built-in document upload |
| AnythingLLM | github.com/Mintplex-Labs/anything-llm | Local RAG platform with workspace management |
| PrivateGPT | github.com/zylon-ai/private-gpt | Privacy-focused local RAG platform |
| Smart Connections | github.com/brianpetro/obsidian-smart-connections | Obsidian plugin for LLM integration |
| ChromaDB | trychroma.com | Open-source embedding database |
| Ollama | ollama.com | Local LLM runtime |
- Ollama -- Local LLM runtime
- Open WebUI -- Chat interface with built-in RAG capability
- AnythingLLM -- Local RAG platform
- PrivateGPT -- Privacy-focused local RAG
- Obsidian -- Local-first knowledge management
- Lewis, P., et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. -- The original RAG paper.
- FBI CJIS Security Policy v6.0 -- CJIS Resource Center
This document is part of the Secure and Affordable In-House AI companion resource. It is an educational resource, not official guidance. Consult your agency's CJIS Systems Officer (CSO) for compliance decisions.