Added mkdocs and minor fixes.

souradipp76 · souradipp76 · commit 4cc34f1e7fa8 · 2024-11-23T23:31:14.000-06:00
diff --git a/.github/workflows/deploy_mkdocs.yml b/.github/workflows/deploy_mkdocs.yml
@@ -0,0 +1,31 @@
+name: Deploy MkDocs to GitHub Pages
+
+on:
+  push:
+    branches:
+      - main
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+
+    steps:
+    # Checkout the repository
+    - name: Checkout code
+      uses: actions/checkout@v4
+
+    # Set up Python
+    - name: Setup Python
+      uses: actions/setup-python@v5
+      with:
+        python-version: '3.x'
+
+    # Install dependencies
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install mkdocs mkdocs-material mkdocstrings-python
+
+    # Build the MkDocs site
+    - name: Build and Publish MkDocs site
+      run: mkdocs gh-deploy
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -2,11 +2,11 @@
 
 ReadmeReady welcomes contributions from the community.
 
-# Questions and Reporting Issues
+# Asking Questions and Reporting Issues
 
 Have a question? Have you identified a reproducible problem in ReadmeReady? Have a feature request? We want to hear about it!
 
-Submit a bug report or feature request on [GitHub Issues](https://github.com/souradipp76/ReadMeReady/issues).
+Ask a question, submit a bug report or feature request on [GitHub Issues](https://github.com/souradipp76/ReadMeReady/issues).
 
 # How to develop on this project
 
diff --git a/README.md b/README.md
@@ -10,6 +10,13 @@ Auto-generate code documentation in Markdown format in seconds.
 Automated documentation of programming source code is a challenging task with significant practical and scientific implications for the developer community. ReadmeReady is a large language model (LLM)-based application that developers can use as a support tool to generate basic documentation for any publicly available or custom repository. Over the last decade, several research have been done on generating documentation for source code using neural network architectures. With the recent advancements in LLM technology, some open-source applications have been developed to address this problem. However, these applications typically rely on the OpenAI APIs, which incur substantial financial costs, particularly for large repositories. Moreover, none of these open-source applications offer a fine-tuned model or features to enable users to fine-tune custom LLMs. Additionally, finding suitable data for fine-tuning is often challenging. Our application addresses these issues.
 
 ## Installation
+
+ReadmeReady is available only on Linux/Windows.
+
+### Dependencies
+
+Please follow the installation guide [here](https://pypi.org/project/python-magic/) to install `python-magic`.
+
 ### Install it from PyPI
 
 The simplest way to install ReadmeReady and its dependencies is from PyPI with pip, Python's preferred package installer.
diff --git a/docs/index.md b/docs/index.md
@@ -1,11 +1,123 @@
-# Welcome to ReadMeReady
+# ReadmeReady
 
-## Commands
+Auto-generate code documentation in Markdown format in seconds.
 
-* `readme_ready` - Start generating README documentation.
+## What is ReadmeReady?
 
-## Project layout
+Automated documentation of programming source code is a challenging task with significant practical and scientific implications for the developer community. ReadmeReady is a large language model (LLM)-based application that developers can use as a support tool to generate basic documentation for any publicly available or custom repository. Over the last decade, several research have been done on generating documentation for source code using neural network architectures. With the recent advancements in LLM technology, some open-source applications have been developed to address this problem. However, these applications typically rely on the OpenAI APIs, which incur substantial financial costs, particularly for large repositories. Moreover, none of these open-source applications offer a fine-tuned model or features to enable users to fine-tune custom LLMs. Additionally, finding suitable data for fine-tuning is often challenging. Our application addresses these issues.
 
-    mkdocs.yml    # The configuration file.
-    docs/
-        index.md  # The documentation homepage.
+## Installation
+
+ReadmeReady is available only on Linux/Windows.
+
+### Dependencies
+
+Please follow the installation guide [here](https://pypi.org/project/python-magic/) to install `python-magic`.
+
+### Install it from PyPI
+
+The simplest way to install ReadmeReady and its dependencies is from PyPI with pip, Python's preferred package installer.
+
+```bash
+pip install readme_ready
+```
+
+In order to upgrade ReadmeReady to the latest version, use pip as follows.
+
+```bash
+$ pip install -U readme_ready
+```
+
+### Install it from source
+
+You can also install ReadmeReady from source as follows.
+
+```bash
+$ git clone https://github.com/souradipp76/ReadMeReady.git
+$ cd ReadMeReady
+$ make install
+```
+
+To create a virtual environment before installing ReadmeReady, you can use the command:
+```bash
+$ make virtualenv
+$ source .venv/bin/activate
+```
+
+## Usage
+
+### Initialize
+```bash
+$ export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
+$ export HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN>
+```
+
+Set `OPENAI_API_KEY=dummy` to use only open-source models.
+
+### Command-Line
+
+```bash
+$ python -m readme_ready
+#or
+$ readme_ready
+```
+
+### In Code
+
+```py
+from readme_ready.query import query
+from readme_ready.index import index
+from readme_ready.types import (
+    AutodocReadmeConfig,
+    AutodocRepoConfig,
+    AutodocUserConfig,
+    LLMModels,
+)
+
+model = LLMModels.LLAMA2_7B_CHAT_GPTQ # Choose model from supported models
+
+repo_config = AutodocRepoConfig (
+    name = "<NAME>", # Replace <NAME>
+    root = "<PROJECT_ROOT>", # Replace <PROJECT_ROOT>
+    repository_url = "<PROJECT_URL>", # Replace <PROJECT_URL>
+    output = "<OUTPUT_DIR>", # Replace <OUTPUT_DIR>
+    llms = [model],
+    peft_model_path = "<PEFT_MODEL_NAME_OR_PATH>", # Replace <PEFT_MODEL_NAME_OR_PATH>
+    ignore = [
+        ".*",
+        "*package-lock.json",
+        "*package.json",
+        "node_modules",
+        "*dist*",
+        "*build*",
+        "*test*",
+        "*.svg",
+        "*.md",
+        "*.mdx",
+        "*.toml"
+    ],
+    file_prompt = "",
+    folder_prompt = "",
+    chat_prompt = "",
+    content_type = "docs",
+    target_audience = "smart developer",
+    link_hosted = True,
+    priority = None,
+    max_concurrent_calls = 50,
+    add_questions = False,
+    device = "auto", # Select device "cpu" or "auto"
+)
+
+user_config = AutodocUserConfig(
+    llms = [model]
+)
+
+readme_config = AutodocReadmeConfig(
+    headings = "Description,Requirements,Installation,Usage,Contributing,License"
+)
+
+index.index(repo_config)
+query.generate_readme(repo_config, user_config, readme_config)
+```
+
+Run the sample script in the `examples/example.py` to see a typical code usage.
diff --git a/docs/reference.md b/docs/reference.md
@@ -0,0 +1,43 @@
+# API Reference
+
+::: readme_ready.index.index
+    handler: python
+    options:
+      members:
+        - index
+      show_root_heading: true
+      show_source: false
+
+::: readme_ready.index.create_vector_store
+    handler: python
+    options:
+      members:
+        - create_vector_store
+      show_root_heading: true
+      show_source: false
+
+::: readme_ready.index.process_repository
+    handler: python
+    options:
+      members:
+        - process_repository
+      show_root_heading: true
+      show_source: false
+      
+::: readme_ready.query.query
+    handler: python
+    options:
+      members:
+        - query
+        - generate_readme
+      show_root_heading: true
+      show_source: false
+
+::: readme_ready.query.create_chat_chain
+    handler: python
+    options:
+      members:
+        - make_qa_chain
+        - make_readme_chain
+      show_root_heading: true
+      show_source: false
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -1,2 +1,5 @@
-site_name: ReadMeReady
+site_name: ReadmeReady
 theme: readthedocs
+plugins:
+  - search
+  - mkdocstrings
diff --git a/readme_ready/main.py b/readme_ready/main.py
@@ -74,6 +74,8 @@ def url_validator(x):
             LLMModels.CODELLAMA_13B_INSTRUCT_HF.value,
             LLMModels.GOOGLE_GEMMA_2B_INSTRUCT.value,
             LLMModels.GOOGLE_GEMMA_7B_INSTRUCT.value,
+            LLMModels.GOOGLE_CODEGEMMA_2B.value,
+            LLMModels.GOOGLE_CODEGEMMA_7B_INSTRUCT.value,
         ],
         default=LLMModels.TINYLLAMA_1p1B_CHAT_GGUF.value,
     ).ask()
@@ -113,6 +115,10 @@ def url_validator(x):
             model = LLMModels.GOOGLE_GEMMA_2B_INSTRUCT
         case LLMModels.GOOGLE_GEMMA_7B_INSTRUCT.value:
             model = LLMModels.GOOGLE_GEMMA_7B_INSTRUCT
+        case LLMModels.GOOGLE_CODEGEMMA_2B.value:
+            model = LLMModels.GOOGLE_CODEGEMMA_2B
+        case LLMModels.GOOGLE_CODEGEMMA_7B_INSTRUCT.value:
+            model = LLMModels.GOOGLE_CODEGEMMA_7B_INSTRUCT
         case _:
             model = LLMModels.LLAMA2_7B_CHAT_HF
     print("Initialization Complete.\n")
diff --git a/readme_ready/query/query.py b/readme_ready/query/query.py
@@ -128,9 +128,11 @@ def generate_readme(
             )
             try:
                 response = chain.invoke({"input": question})
-                print("\n\nMarkdown:\n")
-                print(markdown(response["answer"]))
+                # print("\n\nMarkdown:\n")
+                # print(markdown(response["answer"]))
                 file.write(markdown(response["answer"]))
             except RuntimeError as error:
                 print(f"Something went wrong: {error}")
                 traceback.print_exc()
+
+    print(f"README generated at {readme_path}")
diff --git a/readme_ready/types.py b/readme_ready/types.py
@@ -15,6 +15,7 @@ class LLMModels(str, Enum):
     GPT4 = "gpt-4"
     GPT432k = "gpt-4-32k"
     TINYLLAMA_1p1B_CHAT_GGUF = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
+    GOOGLE_GEMMA_2B_INSTRUCT_GGUF = "bartowski/gemma-2-2b-it-GGUF"
     LLAMA2_7B_CHAT_GPTQ = "TheBloke/Llama-2-7B-Chat-GPTQ"
     LLAMA2_13B_CHAT_GPTQ = "TheBloke/Llama-2-13B-Chat-GPTQ"
     CODELLAMA_7B_INSTRUCT_GPTQ = "TheBloke/CodeLlama-7B-Instruct-GPTQ"
@@ -25,9 +26,8 @@ class LLMModels(str, Enum):
     CODELLAMA_13B_INSTRUCT_HF = "meta-llama/CodeLlama-13b-Instruct-hf"
     GOOGLE_GEMMA_2B_INSTRUCT = "google/gemma-2b-it"
     GOOGLE_GEMMA_7B_INSTRUCT = "google/gemma-7b-it"
-    GOOGLE_CODEGEMMA_2B_INSTRUCT = "google/codegemma-2b-it"
+    GOOGLE_CODEGEMMA_2B = "google/codegemma-2b"
     GOOGLE_CODEGEMMA_7B_INSTRUCT = "google/codegemma-7b-it"
-    GOOGLE_GEMMA_2B_INSTRUCT_GGUF = "bartowski/gemma-2-2b-it-GGUF"
 
 
 class Priority(str, Enum):
diff --git a/readme_ready/utils/llm_utils.py b/readme_ready/utils/llm_utils.py
@@ -22,9 +22,7 @@ def get_gemma_chat_model(model_name: str, streaming=False, model_kwargs=None):
         gguf_file = model_kwargs["gguf_file"]
         _ = hf_hub_download(model_name, gguf_file)
     tokenizer = get_tokenizer(model_name, gguf_file)
-    if (
-        sys.platform == "linux" or sys.platform == "linux2"
-    ) and "gptq" not in model_name.lower():
+    if sys.platform != "darwin" and "gptq" not in model_name.lower():
         from transformers import BitsAndBytesConfig
 
         bnb_config = BitsAndBytesConfig(
@@ -57,6 +55,10 @@ def get_gemma_chat_model(model_name: str, streaming=False, model_kwargs=None):
         PEFT_MODEL = model_kwargs["peft_model_path"]
         model = PeftModel.from_pretrained(model, PEFT_MODEL)
 
+    print(
+        f"Memory footprint: {model.get_memory_footprint() / 1024 **3:.2f} GB."
+    )
+
     return HuggingFacePipeline(
         pipeline=pipeline(
             "text-generation",
@@ -78,9 +80,7 @@ def get_llama_chat_model(model_name: str, streaming=False, model_kwargs=None):
         _ = hf_hub_download(model_name, gguf_file)
     tokenizer = get_tokenizer(model_name, gguf_file)
     tokenizer.pad_token = tokenizer.eos_token
-    if (
-        sys.platform == "linux" or sys.platform == "linux2"
-    ) and "gptq" not in model_name.lower():
+    if sys.platform != "darwin" and "gptq" not in model_name.lower():
         from transformers import BitsAndBytesConfig
 
         bnb_config = BitsAndBytesConfig(
@@ -112,6 +112,10 @@ def get_llama_chat_model(model_name: str, streaming=False, model_kwargs=None):
         PEFT_MODEL = model_kwargs["peft_model"]
         model = PeftModel.from_pretrained(model, PEFT_MODEL)
 
+    print(
+        f"Memory footprint: {model.get_memory_footprint() / 1024 **3:.2f} GB."
+    )
+
     return HuggingFacePipeline(
         pipeline=pipeline(
             "text-generation",
diff --git a/requirements.txt b/requirements.txt
@@ -9,8 +9,9 @@ questionary
 accelerate
 bitsandbytes
 optimum
-auto-gptq; sys_platform == 'linux'
+auto-gptq; sys_platform != 'darwin'
 python-magic
+mkdocstrings-python
 sentencepiece
 sentence_transformers
 pymarkdownlnt
diff --git a/tests/test_types.py b/tests/test_types.py
@@ -47,7 +47,7 @@ def test_llm_models():
     )
     assert LLMModels.GOOGLE_GEMMA_2B_INSTRUCT == "google/gemma-2b-it"
     assert LLMModels.GOOGLE_GEMMA_7B_INSTRUCT == "google/gemma-7b-it"
-    assert LLMModels.GOOGLE_CODEGEMMA_2B_INSTRUCT == "google/codegemma-2b-it"
+    assert LLMModels.GOOGLE_CODEGEMMA_2B == "google/codegemma-2b"
     assert LLMModels.GOOGLE_CODEGEMMA_7B_INSTRUCT == "google/codegemma-7b-it"
     assert (
         LLMModels.GOOGLE_GEMMA_2B_INSTRUCT_GGUF
diff --git a/tests/utils/test_llm_utils.py b/tests/utils/test_llm_utils.py