-
-
Notifications
You must be signed in to change notification settings - Fork 50
Add local Q&A CLI MVP for markdown notes #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| notes/ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # Smart Notes – Local Q&A (RAG MVP) | ||
|
|
||
| This is a minimal, local-first MVP that allows users to ask natural-language questions over their markdown notes. | ||
|
|
||
| ## Features (Current MVP) | ||
|
|
||
| - Loads markdown files from a local `notes/` directory | ||
| - Supports natural-language questions (e.g., "what is AI", "where is AI used") | ||
| - Returns sentence-level answers from notes | ||
| - Shows the source note filename | ||
| - Interactive CLI loop (type `exit` to quit) | ||
|
|
||
| This is a starter implementation intended to be extended with embeddings and vector search in future iterations. | ||
|
|
||
| --- | ||
|
|
||
| ## How it works | ||
|
|
||
| 1. Notes are loaded from the local `notes/` directory. | ||
| 2. Question words (what, where, who, when, etc.) are filtered. | ||
| 3. Notes are split into sentences. | ||
| 4. Relevant sentences are returned based on keyword matching. | ||
|
|
||
| --- | ||
|
|
||
| ## How to run | ||
|
|
||
| ```bash | ||
| python smart-notes/rag_mvp/qa_cli.py | ||
|
|
||
|
|
||
|
|
||
| >> what is AI | ||
|
|
||
| [1] From test.md: | ||
| Artificial Intelligence (AI) is the simulation of human intelligence in machines. | ||
|
|
||
|
|
||
| >> what is machine learning | ||
| how is machine learning used | ||
| difference between AI and ML | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| import os | ||
| import re | ||
|
|
||
| QUESTION_WORDS = { | ||
| "what", "where", "who", "when", "which", | ||
| "is", "are", "was", "were", "the", "a", "an", | ||
| "of", "to", "in", "on", "for" | ||
| } | ||
|
|
||
| NOTES_DIR = "notes" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If a user runs the script from any directory other than the repository root, the NOTES_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "notes")Or at minimum, document the expected working directory clearly. 🤖 Prompt for AI Agents |
||
|
|
||
|
|
||
| def load_notes(): | ||
| notes = [] | ||
| if not os.path.exists(NOTES_DIR): | ||
| print(f"Notes directory '{NOTES_DIR}' not found.") | ||
| return notes | ||
|
|
||
| for file in os.listdir(NOTES_DIR): | ||
| if file.endswith(".md"): | ||
| path = os.path.join(NOTES_DIR, file) | ||
| with open(path, "r", encoding="utf-8") as f: | ||
| notes.append({ | ||
| "filename": file, | ||
| "content": f.read() | ||
| }) | ||
| return notes | ||
|
|
||
|
|
||
| def split_sentences(text): | ||
| return re.split(r'(?<=[.!?])\s+', text) | ||
|
|
||
|
|
||
| def search_notes(query, notes): | ||
| results = [] | ||
|
|
||
| query_words = [ | ||
| word.lower() | ||
| for word in query.split() | ||
| if word.lower() not in QUESTION_WORDS | ||
| ] | ||
|
|
||
| for note in notes: | ||
| sentences = split_sentences(note["content"]) | ||
| for sentence in sentences: | ||
| sentence_lower = sentence.lower() | ||
| if any(word in sentence_lower for word in query_words): | ||
| results.append({ | ||
| "filename": note["filename"], | ||
| "sentence": sentence.strip() | ||
| }) | ||
|
|
||
| return results | ||
|
Comment on lines
+34
to
+53
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Substring matching causes false positives — use word-boundary matching.
Use a regex word-boundary check or split the sentence into words and check set membership. 🐛 Proposed fix using word boundaries def search_notes(query, notes):
results = []
query_words = [
word.lower()
for word in query.split()
if word.lower() not in QUESTION_WORDS
]
+ if not query_words:
+ return results
+
for note in notes:
sentences = split_sentences(note["content"])
for sentence in sentences:
sentence_lower = sentence.lower()
- if any(word in sentence_lower for word in query_words):
+ if any(re.search(r'\b' + re.escape(word) + r'\b', sentence_lower) for word in query_words):
results.append({
"filename": note["filename"],
"sentence": sentence.strip()
})
return results🤖 Prompt for AI Agents |
||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| notes = load_notes() | ||
|
|
||
| print("Ask questions about your notes (type 'exit' to quit)\n") | ||
|
|
||
| while True: | ||
| query = input(">> ").strip() | ||
|
|
||
| if query.lower() == "exit": | ||
| print("Goodbye 👋") | ||
| break | ||
|
|
||
| matches = search_notes(query, notes) | ||
|
|
||
| if not matches: | ||
| print("No relevant notes found.\n") | ||
| else: | ||
| print("\n--- Answers ---\n") | ||
| for i, m in enumerate(matches, 1): | ||
| print(f"[{i}] From {m['filename']}:") | ||
| print(m["sentence"]) | ||
| print() | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "How to run" code block is malformed and the example is confusing.
The
```bashblock opened at line 28 is never closed—the remaining lines (example prompts, outputs, and follow-up queries) all run together inside it. Lines 39–41 also lack the>>prompt prefix, making it unclear whether they are user input or program output.Consider closing the bash block after the run command and using a separate block for the example session:
📝 Suggested fix
🤖 Prompt for AI Agents