Your friendly workspace buddy for Miles on Slack. Milo uses local AI models through Ollama and Retrieval Augmented Generation (RAG) to provide helpful responses while keeping your data private.
This project demonstrates a complete RAG pipeline in .NET, integrated with Slack. Key features include:
- Setting up a PostgreSQL vector database for knowledge storage
- Generating embeddings from documents using local Ollama models
- Storing documents and their embeddings securely
- Retrieving relevant document chunks based on semantic search
- Using retrieved context to enhance LLM queries with Ollama
- Chatting with Milo directly in Slack by mentioning
@milo
- Privacy-focused: all processing happens locally or within your controlled environment
- .NET 9.0 SDK
- PostgreSQL 15+ with pgvector extension
- Ollama running locally with compatible models (embedding and generation models)
- (Optional) Slack Bot Token and App Token for Slack integration
The solution consists of five main projects:
- DBSetup: Contains the
Program.cs
which executes SQL scripts to set up the PostgreSQL database schema, including enabling thevector
extension and creatingdocuments
andembeddings
tables with appropriate indexing. - Utils: Provides shared utilities, including database connection helpers (
DatabaseHelper.cs
), Ollama client configuration (OllamaClient.cs
), and potentially text processing functions used across other projects. Relies onNpgsql
andMicrosoft.Extensions.AI
libraries. - Embeddings: Reads documents (e.g., from
HandbookDocuments
), processes them into chunks, generates vector embeddings using the configured Ollama embedding model viaUtils
, and stores both the document content/source and the embeddings in the PostgreSQL database usingUtils
. - QueryOllama: Takes a user query, generates an embedding for it, performs a semantic similarity search against the stored embeddings in PostgreSQL (using
pgvector
's<=>
operator), retrieves the top-k relevant document chunks, constructs a prompt including this context, sends the prompt to the configured Ollama generation model, and outputs the response. - SlackIntegration: Implements a Slack bot using
SlackNet
(and potentiallyMicrosoft.Extensions.Hosting
). It listens for mentions (@milo
), extracts the user's query, uses the logic fromQueryOllama
(or similar shared logic) to get a RAG-based answer, and posts the response back to the Slack channel. Handles configuration loading for Slack tokens and potentially Ollama/DB settings.
Set up the following environment variables. These are typically loaded using Microsoft.Extensions.Configuration
within the applications. Default values might be provided in the code if variables are not set.
# PostgreSQL connection string (Required by DBSetup, Embeddings, QueryOllama, SlackIntegration)
POSTGRES_CONNECTION_STRING="Host=localhost;Username=your_username;Password=your_password;Database=your_database"
# Ollama settings (Used by Embeddings, QueryOllama, SlackIntegration)
# Defaults: Endpoint=http://localhost:11434, Model=gemma3, EmbeddingModel=jeffh/intfloat-multilingual-e5-large-instruct:f16
OLLAMA_ENDPOINT="http://localhost:11434"
OLLAMA_MODEL="gemma3"
OLLAMA_EMBEDDING_MODEL="jeffh/intfloat-multilingual-e5-large-instruct:f16"
# Slack settings (Required by SlackIntegration)
SLACK_BOT_TOKEN="xoxb-your-bot-token"
SLACK_APP_TOKEN="xapp-your-app-level-token"
Ensure PostgreSQL is running and the pgvector
extension is available. Then, run the DBSetup project:
cd Milo/DBSetup
dotnet run
This creates:
- A
documents
table (id
,content
,source
) - An
embeddings
table (id
,document_id
,embedding VECTOR(1024)
) - An IVFFlat index (
idx_embeddings
) on theembedding
column for efficient similarity search.
Place your text-based documents (e.g., .md
, .txt
files) in the Milo/Embeddings/HandbookDocuments
directory. The Embeddings
project will recursively scan this directory.
Run the Embeddings project:
cd Milo/Embeddings
dotnet run
This process will:
- Find document files.
- Read and chunk the content of each file.
- For each chunk, call the Ollama API (via
OLLAMA_ENDPOINT
) to generate an embedding usingOLLAMA_EMBEDDING_MODEL
. - Store the document source, chunk content, and the generated embedding vector in the PostgreSQL database.
Run the QueryOllama project to ask questions using the RAG approach:
cd Milo/QueryOllama
dotnet run "Your question here?"
# Example: dotnet run "What is the process for requesting time off?"
You can provide your question as a command-line argument. The system performs the RAG steps: embedding the query, searching for similar chunks in the DB, creating a context-enhanced prompt, and querying the OLLAMA_MODEL
.
To interact with the RAG system via Slack:
cd Milo/SlackIntegration
dotnet run
This starts the Slack bot. Ensure your bot is configured in Slack (App Manifest, OAuth scopes, Socket Mode enabled, Event Subscriptions for app_mention
) and invited to relevant channels. Mention the bot (e.g., @milo How do I reset my password?
) to get a response.
The system leverages PostgreSQL with the pgvector
extension:
- Storage: Stores
VECTOR
data type directly in theembeddings
table. The dimension (e.g., 1024) is set during table creation, matching the output of the chosen embedding model. - Indexing: Uses an
IVFFlat
index for Approximate Nearest Neighbor (ANN) search, significantly speeding up similarity queries compared to exact K-Nearest Neighbor (KNN) search on large datasets. - Similarity Search: Employs the cosine distance operator (
<=>
) provided bypgvector
to find embeddings (document chunks) most similar to the query embedding. The query typically looks for1 - (embedding <=> queryEmbedding)
to get similarity scores (where 1 is most similar).
Database schema (simplified):
-- Ensure pgvector is enabled
CREATE EXTENSION IF NOT EXISTS vector;
-- Table for original document source and content
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
source TEXT -- e.g., filename or URL
);
-- Table for embeddings linked to document chunks
CREATE TABLE embeddings (
id SERIAL PRIMARY KEY,
document_id INT REFERENCES documents(id) ON DELETE CASCADE,
embedding VECTOR(1024) -- Dimension matches the embedding model
);
-- Index for fast similarity search
CREATE INDEX idx_embeddings ON embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Adjust 'lists' based on dataset size
The Embeddings
project uses an LLM-based approach for chunking:
- Reading: Reads text content from files in the specified directory (
HandbookDocuments
). - LLM-based Splitting: Instead of a simple rule-based splitter, it sends the entire document content to the configured Ollama generation model (e.g.,
gemma3
) via theEmbeddings/Chunking/LLMChunking.cs
class. - Prompting for Chunks: A specific system prompt instructs the LLM to act as an assistant that divides documents into meaningful segments suitable for embedding. The prompt encourages grouping related content, favoring larger chunks over smaller ones, and maintaining the original language. The LLM is expected to return the chunks in a structured JSON format (defined by
Utils/Models/ChunkedData.cs
). - Metadata: Each resulting chunk is associated with its original source document (
source
column indocuments
table) when stored. - Goal: Leverage the LLM's understanding to create semantically coherent chunks, potentially leading to better context retrieval compared to simple fixed-size or sentence splitting.
Vector embeddings are generated via the Ollama API:
- Model: Uses the model specified by
OLLAMA_EMBEDDING_MODEL
(e.g.,jeffh/intfloat-multilingual-e5-large-instruct:f16
). - Client: Sses
Microsoft.Extensions.AI.Ollama
to interact with the Ollama/api/embeddings
endpoint. - Process: Text chunks are sent to Ollama, which returns corresponding vector embeddings. These vectors are then stored in the
embeddings
table.
The QueryOllama
project (and SlackIntegration
) performs semantic search:
- Query Embedding: The user's question is first converted into a vector embedding using the same
OLLAMA_EMBEDDING_MODEL
. - Database Query: A SQL query is executed against the PostgreSQL database using
Npgsql
. It uses the<=>
operator to find theembeddings
with the smallest cosine distance to the query embedding. - Retrieval: The
content
andsource
from the correspondingdocuments
table for the top-k most similar embeddings are retrieved.
Example Npgsql
query structure:
using var conn = new NpgsqlConnection(_connectionString);
conn.Open();
using var command = new NpgsqlCommand();
command.Connection = conn;
command.CommandText =
"SELECT d.id, d.content, d.source, 1 - (e.embedding <=> @embedding::vector) as similarity " +
"FROM embeddings e " +
"JOIN documents d ON e.document_id = d.id " +
"ORDER BY similarity DESC " +
"LIMIT @limit";
command.Parameters.AddWithValue("@embedding", embeddings);
command.Parameters.AddWithValue("@limit", limit);
using var reader = command.ExecuteReader();
while (reader.Read())
{
results.Add(new DocumentSearchResult
{
Id = reader.GetInt32(0),
Content = reader.GetString(1),
Source = reader.GetString(2),
Similarity = reader.GetDouble(3)
});
}
The retrieved document chunks provide context for the final query to the LLM:
- Prompt Construction: A prompt is dynamically created, typically including:
- System instructions (e.g., "Answer the user's question based only on the provided context. Cite sources if possible.")
- The retrieved context chunks (formatted clearly).
- The original user question.
- LLM Interaction: The combined prompt is sent to the Ollama generation model.
- Response Generation: The LLM generates an answer based on the user's question and the provided context.
The SlackIntegration
project bridges the RAG pipeline with Slack:
- Framework: Uses
SlackNet
for handling Slack API interactions (Events API, Web API) via Socket Mode or HTTP. - Event Handling: Listens for
app mention
orslash command
events. - Workflow:
- Receives event.
- Extracts the user's text query from the event payload.
- Invokes the RAG logic (embedding, search, context prompt, LLM query).
- Replies with the final LLM response back to the user.
- Configuration: Loads
SLACK_BOT_TOKEN
andSLACK_APP_TOKEN
from environment variables or other configuration sources.
- .NET 9: Core framework.
- PostgreSQL: Relational database.
- pgvector: PostgreSQL extension for vector similarity search.
- Ollama: Local inference server for running LLMs and embedding models.
- Npgsql: .NET data provider for PostgreSQL.
- SlackNet: .NET library for Slack API interaction.
Milo uses Ollama models for both embeddings and language generation. You can pull different models to experiment or use specialized ones.
Install additional models with:
ollama pull <model-name>
Some recommended models:
gemma3
(Default LLM model)jeffh/intfloat-multilingual-e5-large-instruct:f16
(Default Embedding Model)llama
mistral
Ensure the models you want to use are specified in your environment variables or the application configuration.
- DB Connection Issues: Verify
POSTGRES_CONNECTION_STRING
is correct and accessible from where the application is running. Check PostgreSQL logs. Ensure the user has permissions on the database and tables. - pgvector Not Found: Ensure the
pgvector
extension is installed in your PostgreSQL instance and enabled in the target database (CREATE EXTENSION IF NOT EXISTS vector;
). - Ollama Connection Issues: Verify
OLLAMA_ENDPOINT
is correct and Ollama is running and accessible. Check firewall rules. Test the endpoint directly (e.g.,curl http://localhost:11434/api/tags
). Ensure the specifiedOLLAMA_MODEL
andOLLAMA_EMBEDDING_MODEL
are pulled (ollama list
). - Slack Integration Errors: Double-check
SLACK_BOT_TOKEN
andSLACK_APP_TOKEN
. Ensure the bot has the correct OAuth scopes (e.g.,app_mentions:read
,chat:write
). Verify Socket Mode is enabled in Slack app settings if used. Check Slack API dashboard for errors. - Embedding Dimension Mismatch: Ensure the
VECTOR(dimension)
size in theembeddings
table schema matches the output dimension of theOLLAMA_EMBEDDING_MODEL
. If you change the model, you may need to recreate the table and re-embed documents.
- Document Loaders: Modify
Embeddings/Program.cs
to support different file types (PDF, DOCX) by adding relevant parsing libraries (e.g.,PdfPig
,DocumentFormat.OpenXml
). - Chunking Strategies: Experiment with different text splitting methods (e.g., recursive character splitting, semantic chunking) in the
Embeddings
project. - Vector Databases: Replace PostgreSQL/pgvector with another vector store (e.g., ChromaDB, Qdrant, Weaviate) by updating the data access logic in
Utils
and dependent projects. - Models: Change
OLLAMA_MODEL
orOLLAMA_EMBEDDING_MODEL
environment variables to use different Ollama models. Ensure compatibility (e.g., embedding dimensions). - LLM Providers: Adapt the code in
QueryOllama
andUtils
to call different LLM APIs (e.g., OpenAI, Anthropic) instead of Ollama.
Milo processes all queries and document embeddings locally using Ollama and your PostgreSQL database, ensuring your conversations and data stay private and secure within your infrastructure.
Issues and pull requests are welcome! Feel free to contribute to make Milo even better.