Skip to content

docs : Improve README navigation and contribution guidance #45

Open
tubajavedd wants to merge 7 commits intoAOSSIE-Org:mainfrom
tubajavedd:improve-readme
Open

docs : Improve README navigation and contribution guidance #45
tubajavedd wants to merge 7 commits intoAOSSIE-Org:mainfrom
tubajavedd:improve-readme

Conversation

@tubajavedd
Copy link

@tubajavedd tubajavedd commented Feb 13, 2026

✨ Improve README navigation and contribution guidance

Summary

This PR improves the README structure and usability for new contributors and community members.

Changes Made

  • Added a Table of Contents for easier navigation
  • Converted absolute GitHub links to relative links
  • Added a "How to Contribute" section
  • Improved minor naming and formatting consistency

Why This Is Useful

These changes make the documentation cleaner, easier to maintain, and more welcoming for new contributors and GSoC applicants.

Scope

  • Documentation-only changes
  • No policy or functional changes

Fixes #40

Summary by CodeRabbit

  • New Features

    • Interactive Q&A CLI enabling intelligent search over markdown notes with semantic matching and source attribution.
  • Documentation

    • Updated project README with contribution guidelines and improved navigation structure.
    • Added architecture documentation for local-first markdown storage system.
    • New documentation for Q&A search implementations.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

This PR establishes a local-first markdown notes storage system with retrieval-augmented generation (RAG) capabilities. It introduces design documentation, updates the README with improved navigation, and implements a pipeline for chunking, embedding, and semantically searching markdown notes through a CLI interface.

Changes

Cohort / File(s) Summary
Documentation & Configuration
.gitignore, README.md, Design/local markdown storage.md, smart-notes-design/README.md, smart-notes/rag_mvp/README.md
Added .gitignore entry for notes directory. Restructured README with Table of Contents, relative links, and "How to Contribute" section. Created design documents outlining local markdown storage architecture and landing page UI design. Added README for RAG MVP project describing Q&A and embeddings-based implementations.
Embeddings & Indexing Infrastructure
smart-notes/rag_mvp/embeddings/chunker.py, embedder.py, indexer.py
Introduced text chunking utility supporting overlapping chunks. Created Embedder class wrapping SentenceTransformer for text-to-vector conversion. Implemented VectorIndexer class backed by FAISS for efficient semantic search over embeddings.
RAG Pipeline & CLI
smart-notes/rag_mvp/pipelines/embedding_pipeline.py, smart-notes/rag_mvp/qa_cli.py
Added EmbeddingPipeline orchestrating end-to-end workflow: chunking, embedding generation, FAISS index creation, and semantic search. Implemented CLI module providing interactive note search with keyword-based fallback and embedding pipeline demonstration.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as qa_cli
    participant Pipeline as EmbeddingPipeline
    participant Chunker as chunker
    participant Embedder as embedder
    participant Indexer as VectorIndexer

    User->>CLI: Enter query
    CLI->>Pipeline: process_notes(text)
    Pipeline->>Chunker: chunk_text(text)
    Chunker-->>Pipeline: List[str]
    Pipeline->>Embedder: embed(chunks)
    Embedder-->>Pipeline: np.ndarray
    Pipeline->>Indexer: add(embeddings, chunks)
    Indexer-->>Pipeline: ✓
    
    CLI->>Pipeline: semantic_search(query)
    Pipeline->>Embedder: embed(query)
    Embedder-->>Pipeline: np.ndarray
    Pipeline->>Indexer: search(query_embedding, k=3)
    Indexer-->>Pipeline: List[str]
    Pipeline-->>CLI: Results
    CLI-->>User: Display results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • Zahnentferner

Poem

🐰 A warren of notes, now nestled with care,
Chunked into fragments that float through the air,
Embeddings dance through the vector space wide,
While FAISS indexes help seekers find answers inside! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 3
❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR contains out-of-scope changes beyond issue #40 requirements: .gitignore modification, new design documentation, and RAG MVP implementation code are unrelated to README improvements. Separate non-documentation changes (embeddings, indexer, pipeline, CLI code) and architecture design into dedicated PRs to maintain focus on issue #40 objectives.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (3 files):

⚔️ GSoC-Ideas/2026/MiniChain.md (content)
⚔️ GSoC-Ideas/2026/Rein.md (content)
⚔️ README.md (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: improving README navigation and contribution guidance, which aligns with the PR objectives.
Linked Issues check ✅ Passed The PR fulfills all coding-related objectives from issue #40: adds Table of Contents [#40], converts absolute to relative links [#40], adds How to Contribute section [#40], and improves formatting consistency [#40].

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch improve-readme
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In `@Design/local` markdown storage.md:
- Line 170: The section numbering skips from 10 to 12: rename the header "###
12. Summary" to "### 11. Summary" and adjust any following numbered section
headers so numbering is consecutive (e.g., decrement subsequent headings by
one); search for the literal header "### 12. Summary" and update it and any
later section numbers to maintain proper sequence.
- Line 47: Fix the mismatched and inconsistent quotation marks in the phrase
containing "in-memory index " and "local db(SQLite)"; remove the stray trailing
space inside the first quoted phrase so it reads "in-memory index", add the
missing closing quote after local db(SQLite) so it reads "local db(SQLite)", and
ensure both quoted phrases use the same quote style and spacing.
- Line 15: The phrase "NO Cloud Serives" contains a typo; update the text to
read "NO Cloud Services" (replace the string "NO Cloud Serives" with "NO Cloud
Services") so the word "Services" is spelled correctly.

In `@smart-notes/rag_mvp/embeddings/chunker.py`:
- Around line 9-29: The chunk_text function can loop infinitely when overlap >=
max_length; add an input validation or clamp near the top of chunk_text to
ensure overlap < max_length (e.g., raise a ValueError or set overlap =
min(overlap, max_length - 1)), and then continue using start/end as before;
reference the variables overlap, max_length and the function chunk_text to
locate where to add the guard.

In `@smart-notes/rag_mvp/embeddings/indexer.py`:
- Around line 34-39: FAISS returns -1 for empty result slots so the loop in the
search logic must skip negative indices and avoid indexing self.texts with -1;
update the call to self.index.search to assign distances to _distances (to
address the unused-variable warning) and then filter indices before using them
(e.g., only consider idx >= 0 and idx < len(self.texts)) when building results
from self.texts using the query_embedding search results and indices returned by
self.index.search.

In `@smart-notes/rag_mvp/pipelines/embedding_pipeline.py`:
- Line 10: The code hardcodes a Windows-only cache path when instantiating
SentenceTransformer (self.model = SentenceTransformer(model_name,
cache_folder="D:/models_cache")), which will fail on other OSes; change this by
removing the cache_folder argument to use the library default or compute a
cross-platform path (e.g., using os.path.join(os.path.expanduser("~"), ".cache",
"sentence_transformers") or respect an env var like MODEL_CACHE_DIR) and pass
that variable as cache_folder when creating SentenceTransformer with model_name
so the code works across platforms.
- Around line 14-24: Delete the duplicate chunk_text method from
embedding_pipeline.py and instead import and call the canonical chunk_text
utility from smart-notes/rag_mvp/embeddings/chunker.py; update any local calls
in EmbeddingPipeline (or wherever chunk_text was used) to use the imported
function and pass max_length/overlap explicitly if you need the 300/50 behavior
(the canonical utility includes empty-input/strip safeguards and fixes the
overlap infinite-loop risk). Ensure the import targets the exact function name
chunk_text and remove the local definition to avoid shadowing.
- Around line 41-46: semantic_search currently assumes self.index and
self.chunks are ready; add a guard at the top of semantic_search to raise or
return an empty list if self.index is None (i.e., process_notes / build_index
hasn't been run), and after calling self.index.search(filter out FAISS sentinel
-1 index values from indices before dereferencing self.chunks). Specifically, in
semantic_search, check self.index (and optionally self.chunks) and handle the
case where it's None, ensure the query vector is the expected 2D shape, then
post-process the returned indices array to ignore any -1 entries and only map
valid indices into self.chunks so you don't accidentally read self.chunks[-1].

In `@smart-notes/rag_mvp/qa_cli.py`:
- Around line 63-70: In search_notes, detect when query_words (computed from
query and QUESTION_WORDS) is empty and surface a clear message instead of
silently returning no matches; update the function search_notes to check if not
query_words and then either raise a ValueError or return a sentinel (e.g., an
empty list plus a message or a tuple) or print "No searchable keywords found" so
callers know the query had no searchable terms; ensure the behavior is
documented in the function docstring and that callers of search_notes handle the
chosen sentinel/exception appropriately.
- Line 4: Fix the typo in the inline comment that reads
"#-------------------emedding-pipeline-chunking concept" by changing "emedding"
to "embedding" so the comment becomes
"#-------------------embedding-pipeline-chunking concept"; this is purely a
documentation change in the top-of-file comment in qa_cli.py.
- Around line 74-76: The current loop uses substring checks (if any(word in
sentence_lower for word in query_words)) which yields false positives; change it
to perform whole-word matching by either compiling each query word into a
word-boundary regex (e.g., r'\b{word}\b') and testing against sentence_lower, or
tokenize sentence_lower into a set of words (split/punctuation-normalize) and
check membership against that set; update the loop that iterates over sentences
(variables: sentences, sentence_lower, query_words) to use the chosen whole-word
approach so queries like "AI" don't match substrings like "again".

In `@smart-notes/rag_mvp/README.md`:
- Around line 28-43: The README has an unclosed triple-backtick code fence after
the example starting with the command "python smart-notes/rag_mvp/qa_cli.py",
causing the subsequent example queries and rest of the document to be inside the
code block; fix this by adding a closing ``` after the example output (after the
sample responses such as the "[1] From test.md: Artificial Intelligence..."
lines) to properly terminate the code block.
- Around line 75-84: The README's project structure is out of date: it lists
embed.py, index.py, and utils.py but the code now lives under embeddings/ and
pipelines/; update the README to reflect the actual modules (e.g.,
embeddings/chunker.py, embeddings/embedder.py, embeddings/indexer.py, and
pipelines/embedding_pipeline.py), keep notes/, requirements.txt and top-level
README.md entries, and adjust any references to qa_cli or utils to point to the
new files or document where equivalent functionality lives (for example note
which file provides embedding, chunking, indexing, and the CLI/pipeline
entrypoints).
🧹 Nitpick comments (5)
smart-notes/rag_mvp/pipelines/embedding_pipeline.py (1)

3-4: Hard imports will crash at import time if dependencies are missing.

Unlike embedder.py and indexer.py which gracefully handle missing dependencies with try/except ImportError, this file uses bare imports. Consider aligning with the pattern used in the sibling modules, or reusing Embedder and VectorIndexer directly instead of reimplementing their logic.

smart-notes/rag_mvp/qa_cli.py (4)

16-16: Unused variable embeddings.

The unpacked embeddings is never used, as flagged by Ruff. Prefix with _ to signal intent.

Fix
-    chunks, embeddings = pipeline.process_notes(note_text)
+    chunks, _embeddings = pipeline.process_notes(note_text)

85-89: demo_embeddings_pipeline() runs unconditionally, blocking the keyword-search CLI.

The demo requires sentence-transformers and faiss to be installed and downloads a model on first run. This makes the interactive keyword search unusable if those heavy dependencies aren't available. Consider guarding the demo behind a flag or try/except.

Suggested guard
 if __name__ == "__main__":
-
-    demo_embeddings_pipeline()      # Temporary demo for embeddings pipeline
+    try:
+        demo_embeddings_pipeline()      # Temporary demo for embeddings pipeline
+    except Exception as e:
+        print(f"Skipping embedding demo: {e}")
 
     notes = load_notes()

39-39: Relative NOTES_DIR is fragile — depends on the working directory.

If the script is invoked from a different directory, it will fail to find notes/. Consider resolving relative to the script's own location.

Suggested fix
-NOTES_DIR = "notes"
+NOTES_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "notes")

Adjust the relative traversal ("..") to match your actual project layout.


93-98: No handling for EOFError / KeyboardInterrupt in the input loop.

If stdin is piped or the user presses Ctrl+C / Ctrl+D, the program crashes with an unhandled exception.

Suggested fix
     while True:
-        query = input(">> ").strip()
-
-        if query.lower() == "exit":
-            print("Goodbye 👋")
-            break
+        try:
+            query = input(">> ").strip()
+        except (EOFError, KeyboardInterrupt):
+            print("\nGoodbye 👋")
+            break
+
+        if query.lower() == "exit":
+            print("Goodbye 👋")
+            break


Offline-first experience :
NO Internet,
NO Cloud Serives,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo: "Serives" → "Services".

-    NO Cloud Serives,
+    NO Cloud Services,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
NO Cloud Serives,
NO Cloud Services,
🤖 Prompt for AI Agents
In `@Design/local` markdown storage.md at line 15, The phrase "NO Cloud Serives"
contains a typo; update the text to read "NO Cloud Services" (replace the string
"NO Cloud Serives" with "NO Cloud Services") so the word "Services" is spelled
correctly.


The app maintains:

"in-memory index " and "local db(SQLite)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mismatched quotation marks.

The line has an opening " before local db(SQLite) but no closing ", and there's a stray space inside the first quoted phrase.

-  "in-memory index " and "local db(SQLite) 
+  "in-memory index" and "local db (SQLite)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"in-memory index " and "local db(SQLite)
"in-memory index" and "local db (SQLite)"
🤖 Prompt for AI Agents
In `@Design/local` markdown storage.md at line 47, Fix the mismatched and
inconsistent quotation marks in the phrase containing "in-memory index " and
"local db(SQLite)"; remove the stray trailing space inside the first quoted
phrase so it reads "in-memory index", add the missing closing quote after local
db(SQLite) so it reads "local db(SQLite)", and ensure both quoted phrases use
the same quote style and spacing.




### 12. Summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Section numbering skips from 10 to 12.

Section 11 is missing — ### 12. Summary should likely be ### 11. Summary, and subsequent sections renumbered accordingly.

🤖 Prompt for AI Agents
In `@Design/local` markdown storage.md at line 170, The section numbering skips
from 10 to 12: rename the header "### 12. Summary" to "### 11. Summary" and
adjust any following numbered section headers so numbering is consecutive (e.g.,
decrement subsequent headings by one); search for the literal header "### 12.
Summary" and update it and any later section numbers to maintain proper
sequence.

Comment on lines +9 to +29
def chunk_text(text: str, max_length: int = 500, overlap: int = 50) -> List[str]:
if not text:
return []

chunks = []
start = 0
text = text.strip()

while start < len(text):
end = start + max_length
chunk = text[start:end].strip()

if chunk:
chunks.append(chunk)

if end >= len(text):
break

start = end - overlap
if start < 0:
start = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Infinite loop when overlap >= max_length.

If overlap >= max_length, then start = end - overlap never advances past the current start, causing an infinite loop. Add a guard.

🐛 Proposed fix
 def chunk_text(text: str, max_length: int = 500, overlap: int = 50) -> List[str]:
-    if not text:
+    if not text or max_length <= 0:
+        return []
+    if overlap >= max_length:
+        raise ValueError("overlap must be less than max_length")
+    if not text.strip():
         return []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def chunk_text(text: str, max_length: int = 500, overlap: int = 50) -> List[str]:
if not text:
return []
chunks = []
start = 0
text = text.strip()
while start < len(text):
end = start + max_length
chunk = text[start:end].strip()
if chunk:
chunks.append(chunk)
if end >= len(text):
break
start = end - overlap
if start < 0:
start = 0
def chunk_text(text: str, max_length: int = 500, overlap: int = 50) -> List[str]:
if not text or max_length <= 0:
return []
if overlap >= max_length:
raise ValueError("overlap must be less than max_length")
if not text.strip():
return []
chunks = []
start = 0
text = text.strip()
while start < len(text):
end = start + max_length
chunk = text[start:end].strip()
if chunk:
chunks.append(chunk)
if end >= len(text):
break
start = end - overlap
if start < 0:
start = 0
🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/embeddings/chunker.py` around lines 9 - 29, The
chunk_text function can loop infinitely when overlap >= max_length; add an input
validation or clamp near the top of chunk_text to ensure overlap < max_length
(e.g., raise a ValueError or set overlap = min(overlap, max_length - 1)), and
then continue using start/end as before; reference the variables overlap,
max_length and the function chunk_text to locate where to add the guard.

Comment on lines +34 to +39
distances, indices = self.index.search(query_embedding.reshape(1, -1), k)
results = []

for idx in indices[0]:
if idx < len(self.texts):
results.append(self.texts[idx])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: FAISS returns -1 for unfilled result slots, which passes the idx < len check.

When fewer than k vectors are in the index, FAISS fills remaining indices with -1. In Python, -1 < len(self.texts) is True, so self.texts[-1] silently returns the last chunk — a wrong result.

🐛 Proposed fix
-        for idx in indices[0]:
-            if idx < len(self.texts):
+        _distances, indices = self.index.search(query_embedding.reshape(1, -1), k)
+        results = []
+
+        for idx in indices[0]:
+            if 0 <= idx < len(self.texts):
                 results.append(self.texts[idx])

This also addresses the Ruff warning about the unused distances variable (rename to _distances).

🧰 Tools
🪛 Ruff (0.15.0)

[warning] 34-34: Unpacked variable distances is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/embeddings/indexer.py` around lines 34 - 39, FAISS
returns -1 for empty result slots so the loop in the search logic must skip
negative indices and avoid indexing self.texts with -1; update the call to
self.index.search to assign distances to _distances (to address the
unused-variable warning) and then filter indices before using them (e.g., only
consider idx >= 0 and idx < len(self.texts)) when building results from
self.texts using the query_embedding search results and indices returned by
self.index.search.

import os
import re

#-------------------emedding-pipeline-chunking concept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo: "emedding" → "embedding".

-#-------------------emedding-pipeline-chunking concept
+#-------------------embedding-pipeline-chunking concept
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#-------------------emedding-pipeline-chunking concept
`#-------------------embedding-pipeline-chunking` concept
🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/qa_cli.py` at line 4, Fix the typo in the inline comment
that reads "#-------------------emedding-pipeline-chunking concept" by changing
"emedding" to "embedding" so the comment becomes
"#-------------------embedding-pipeline-chunking concept"; this is purely a
documentation change in the top-of-file comment in qa_cli.py.

Comment on lines +63 to +70
def search_notes(query, notes):
results = []

query_words = [
word.lower()
for word in query.split()
if word.lower() not in QUESTION_WORDS
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Empty query_words silently returns no results — consider informing the user.

If the user's query consists entirely of stop/question words (e.g., "What is the?"), query_words will be empty and any(...) will always be False, returning zero matches with no indication of why. A short message like "No searchable keywords found" would improve UX.

🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/qa_cli.py` around lines 63 - 70, In search_notes, detect
when query_words (computed from query and QUESTION_WORDS) is empty and surface a
clear message instead of silently returning no matches; update the function
search_notes to check if not query_words and then either raise a ValueError or
return a sentinel (e.g., an empty list plus a message or a tuple) or print "No
searchable keywords found" so callers know the query had no searchable terms;
ensure the behavior is documented in the function docstring and that callers of
search_notes handle the chosen sentinel/exception appropriately.

Comment on lines +74 to +76
for sentence in sentences:
sentence_lower = sentence.lower()
if any(word in sentence_lower for word in query_words):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Substring matching causes false positives in keyword search.

word in sentence_lower performs a substring check, not a whole-word match. For example, querying "AI" would match any sentence containing "contain", "again", "plain", etc., because "ai" is a substring of those words.

Use a word-boundary regex or tokenize sentences into word sets instead.

Proposed fix using word-boundary regex
-            if any(word in sentence_lower for word in query_words):
+            if any(re.search(r'\b' + re.escape(word) + r'\b', sentence_lower) for word in query_words):
🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/qa_cli.py` around lines 74 - 76, The current loop uses
substring checks (if any(word in sentence_lower for word in query_words)) which
yields false positives; change it to perform whole-word matching by either
compiling each query word into a word-boundary regex (e.g., r'\b{word}\b') and
testing against sentence_lower, or tokenize sentence_lower into a set of words
(split/punctuation-normalize) and check membership against that set; update the
loop that iterates over sentences (variables: sentences, sentence_lower,
query_words) to use the chosen whole-word approach so queries like "AI" don't
match substrings like "again".

Comment on lines +28 to +43
```bash
python smart-notes/rag_mvp/qa_cli.py



>> what is AI

[1] From test.md:
Artificial Intelligence (AI) is the simulation of human intelligence in machines.


>> what is machine learning
how is machine learning used
difference between AI and ML


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unclosed code block breaks rendering.

The code block opened at line 28 is never closed with a matching ```. The example queries (lines 33–41) bleed into the code block, and the rest of the document renders incorrectly.

Add a closing ``` after line 29 (or after the example output) to properly terminate the block.

🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/README.md` around lines 28 - 43, The README has an
unclosed triple-backtick code fence after the example starting with the command
"python smart-notes/rag_mvp/qa_cli.py", causing the subsequent example queries
and rest of the document to be inside the code block; fix this by adding a
closing ``` after the example output (after the sample responses such as the
"[1] From test.md: Artificial Intelligence..." lines) to properly terminate the
code block.

Comment on lines +75 to +84
```bash
smart-notes/
├── rag_mvp/
│ ├── embed.py # Embedding logic
│ ├── index.py # FAISS index creation
│ ├── qa_cli.py # CLI for asking questions
│ └── utils.py # Helper functions
├── notes/ # Put your .txt notes here
├── requirements.txt
└── README.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Project structure doesn't match actual files.

The documented structure references embed.py, index.py, and utils.py, but the actual modules in this PR are embeddings/chunker.py, embeddings/embedder.py, embeddings/indexer.py, and pipelines/embedding_pipeline.py. This will confuse contributors trying to navigate the codebase.

🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/README.md` around lines 75 - 84, The README's project
structure is out of date: it lists embed.py, index.py, and utils.py but the code
now lives under embeddings/ and pipelines/; update the README to reflect the
actual modules (e.g., embeddings/chunker.py, embeddings/embedder.py,
embeddings/indexer.py, and pipelines/embedding_pipeline.py), keep notes/,
requirements.txt and top-level README.md entries, and adjust any references to
qa_cli or utils to point to the new files or document where equivalent
functionality lives (for example note which file provides embedding, chunking,
indexing, and the CLI/pipeline entrypoints).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Improve README navigation and contribution guidance

1 participant