Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
notes/
42 changes: 42 additions & 0 deletions smart-notes/rag_mvp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Smart Notes – Local Q&A (RAG MVP)

This is a minimal, local-first MVP that allows users to ask natural-language questions over their markdown notes.

## Features (Current MVP)

- Loads markdown files from a local `notes/` directory
- Supports natural-language questions (e.g., "what is AI", "where is AI used")
- Returns sentence-level answers from notes
- Shows the source note filename
- Interactive CLI loop (type `exit` to quit)

This is a starter implementation intended to be extended with embeddings and vector search in future iterations.

---

## How it works

1. Notes are loaded from the local `notes/` directory.
2. Question words (what, where, who, when, etc.) are filtered.
3. Notes are split into sentences.
4. Relevant sentences are returned based on keyword matching.

---

## How to run

```bash
python smart-notes/rag_mvp/qa_cli.py



>> what is AI

[1] From test.md:
Artificial Intelligence (AI) is the simulation of human intelligence in machines.


>> what is machine learning
how is machine learning used
difference between AI and ML

Comment on lines +28 to +42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The "How to run" code block is malformed and the example is confusing.

The ```bash block opened at line 28 is never closed—the remaining lines (example prompts, outputs, and follow-up queries) all run together inside it. Lines 39–41 also lack the >> prompt prefix, making it unclear whether they are user input or program output.

Consider closing the bash block after the run command and using a separate block for the example session:

📝 Suggested fix
 ## How to run
 
 ```bash
 python smart-notes/rag_mvp/qa_cli.py
+```
 
+### Example session
 
-
->> what is AI
-
-[1] From test.md:
-Artificial Intelligence (AI) is the simulation of human intelligence in machines.
-
-
->>  what is machine learning
-how is machine learning used
-difference between AI and ML
+```text
+>> what is AI
+[1] From test.md:
+Artificial Intelligence (AI) is the simulation of human intelligence in machines.
+
+>> what is machine learning
+[1] From test.md:
+Machine learning is a subset of AI.
+```
🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/README.md` around lines 28 - 42, Close the opening
```bash fence immediately after the run command (python
smart-notes/rag_mvp/qa_cli.py) and move the interactive example into its own
fenced block (e.g., ```text) so prompts and outputs are separated from the shell
instruction; in that example block ensure every user prompt is prefixed with
">>" and the outputs are plain text lines (add missing ">>" prefixes to the
lines currently at the end of the file and format outputs like "[1] From
test.md: ..." on separate lines) to match the suggested "Example session"
structure.

77 changes: 77 additions & 0 deletions smart-notes/rag_mvp/qa_cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import os
import re

QUESTION_WORDS = {
"what", "where", "who", "when", "which",
"is", "are", "was", "were", "the", "a", "an",
"of", "to", "in", "on", "for"
}

NOTES_DIR = "notes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

NOTES_DIR is relative to CWD, not to the script location.

If a user runs the script from any directory other than the repository root, the notes/ path won't resolve correctly. Consider deriving the path relative to the script file:

NOTES_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "notes")

Or at minimum, document the expected working directory clearly.

🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/qa_cli.py` at line 10, The NOTES_DIR constant is
currently a relative path (NOTES_DIR) which breaks when the script is run from a
different CWD; change NOTES_DIR to be computed relative to the script file by
using the script's directory (via __file__ and os.path.abspath/os.path.dirname)
and joining the repository's notes directory (e.g., two levels up then "notes")
with os.path.join so the path resolves regardless of working directory, or
alternatively add a clear comment documenting the required working directory if
you intentionally keep a relative path.



def load_notes():
notes = []
if not os.path.exists(NOTES_DIR):
print(f"Notes directory '{NOTES_DIR}' not found.")
return notes

for file in os.listdir(NOTES_DIR):
if file.endswith(".md"):
path = os.path.join(NOTES_DIR, file)
with open(path, "r", encoding="utf-8") as f:
notes.append({
"filename": file,
"content": f.read()
})
return notes


def split_sentences(text):
return re.split(r'(?<=[.!?])\s+', text)


def search_notes(query, notes):
results = []

query_words = [
word.lower()
for word in query.split()
if word.lower() not in QUESTION_WORDS
]

for note in notes:
sentences = split_sentences(note["content"])
for sentence in sentences:
sentence_lower = sentence.lower()
if any(word in sentence_lower for word in query_words):
results.append({
"filename": note["filename"],
"sentence": sentence.strip()
})

return results
Comment on lines +34 to +53
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Substring matching causes false positives — use word-boundary matching.

word in sentence_lower (line 47) performs a substring check, not a whole-word check. For example, the query "what is AI" filters to query_words = ["ai"], which then matches sentences containing "said", "explain", "brain", "aim", etc.

Use a regex word-boundary check or split the sentence into words and check set membership.

🐛 Proposed fix using word boundaries
 def search_notes(query, notes):
     results = []
 
     query_words = [
         word.lower()
         for word in query.split()
         if word.lower() not in QUESTION_WORDS
     ]
 
+    if not query_words:
+        return results
+
     for note in notes:
         sentences = split_sentences(note["content"])
         for sentence in sentences:
             sentence_lower = sentence.lower()
-            if any(word in sentence_lower for word in query_words):
+            if any(re.search(r'\b' + re.escape(word) + r'\b', sentence_lower) for word in query_words):
                 results.append({
                     "filename": note["filename"],
                     "sentence": sentence.strip()
                 })
 
     return results
🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/qa_cli.py` around lines 34 - 53, The search_notes
function currently does substring matching using "word in sentence_lower" which
yields false positives (e.g., "ai" matching "said"); update the matching to use
whole-word checks instead: for each sentence from
split_sentences(note["content"]) normalize/tokenize it into words (or use a
regex with word boundaries) and test membership against query_words (and respect
QUESTION_WORDS filtering already applied). Modify the inner loop where
sentence_lower is used and replace the substring check with either a compiled
word-boundary regex or a set-based word membership test so results.append still
uses note["filename"] and sentence.strip().



if __name__ == "__main__":
notes = load_notes()

print("Ask questions about your notes (type 'exit' to quit)\n")

while True:
query = input(">> ").strip()

if query.lower() == "exit":
print("Goodbye 👋")
break

matches = search_notes(query, notes)

if not matches:
print("No relevant notes found.\n")
else:
print("\n--- Answers ---\n")
for i, m in enumerate(matches, 1):
print(f"[{i}] From {m['filename']}:")
print(m["sentence"])
print()