Skip to content

lshtech2021/local-docs-manager

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Local Docs Manager

A Python application for managing local documentation files, extracting GitHub repository URLs, structuring notes, and providing a searchable web interface.

Features

  • πŸ“„ File Parser: Scan and parse .txt and .md files to extract GitHub repository URLs with descriptions
  • πŸ”— GitHub API Integration: Fetch repository metadata (name, description, topics, stars, language) with rate limiting
  • πŸ“ Notes Structuring: Parse freeform notes into organized format with topics, summaries, and keywords
  • πŸ’Ύ SQLite Database: Store repositories, notes, and files with full-text search capabilities
  • πŸ” Web Interface: Flask-based UI for searching, filtering, and browsing content
  • πŸ€– Optional LLM Integration: Use OpenAI API for enhanced content analysis and summarization

Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Setup

  1. Clone the repository:
git clone https://github.com/lshtech2021/local-docs-manager.git
cd local-docs-manager
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure environment variables:
cp .env.example .env
# Edit .env with your configuration

Environment Configuration

Edit the .env file with your settings:

  • GITHUB_TOKEN: Your GitHub personal access token (optional but recommended)
  • OPENAI_API_KEY: Your OpenAI API key (optional, for LLM features)
  • FLASK_SECRET_KEY: Secret key for Flask sessions
  • DATABASE_PATH: Path to SQLite database file
  • DOCS_FOLDER: Path to your documentation folder

Usage

Commands

The application provides four main commands:

1. Scan Local Files

Scan your documentation folder for .txt and .md files, extract GitHub URLs and notes:

python main.py scan

With LLM enhancement (requires OpenAI API key):

python main.py scan --llm

2. Fetch GitHub Metadata

Fetch metadata for all repositories in the database:

python main.py fetch

3. Start Web Server

Launch the Flask web interface:

python main.py server

Then open your browser to http://localhost:5000

4. View Statistics

Display database statistics:

python main.py stats

Project Structure

local-docs-manager/
β”œβ”€β”€ main.py                 # Main entry point
β”œβ”€β”€ config.py              # Configuration management
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ .env.example          # Environment template
β”œβ”€β”€ .gitignore            # Git ignore rules
β”œβ”€β”€ README.md             # This file
β”‚
β”œβ”€β”€ parsers/              # Parser modules
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ file_parser.py    # GitHub URL extraction
β”‚   └── notes_parser.py   # Note structuring
β”‚
β”œβ”€β”€ api/                  # API integration modules
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ github_api.py     # GitHub API client
β”‚   └── llm_api.py        # OpenAI LLM integration
β”‚
β”œβ”€β”€ database/             # Database layer
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ models.py         # Data models
β”‚   └── db_manager.py     # Database operations
β”‚
β”œβ”€β”€ ui/                   # Web interface
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── app.py            # Flask application
β”‚
└── templates/            # HTML templates
    β”œβ”€β”€ base.html
    β”œβ”€β”€ index.html
    β”œβ”€β”€ repositories.html
    β”œβ”€β”€ notes.html
    β”œβ”€β”€ files.html
    └── search.html

Web Interface

The Flask web interface provides:

  • Home: Dashboard with statistics and recent items
  • Repositories: Browse and filter GitHub repositories
  • Notes: View and search structured notes
  • Files: Track processed files
  • Search: Unified search across all content

Search Features

  • Filter repositories by keyword, topic, or language
  • Search notes by keyword or topic
  • View repository details including stars, language, and topics
  • Browse note content with expandable details

API Endpoints

The application also provides REST API endpoints:

  • GET /api/stats - Get database statistics
  • GET /api/repositories?keyword=&topic=&language= - Search repositories
  • GET /api/notes?keyword=&topic= - Search notes

Database Schema

Repositories Table

  • repo_id: Primary key
  • url: Unique repository URL
  • name: Repository name
  • description: Repository description
  • topics: Comma-separated topics/tags
  • stars: Star count
  • language: Primary language
  • created_at: Creation timestamp
  • updated_at: Update timestamp

Notes Table

  • note_id: Primary key
  • topic: Note topic/title
  • summary: Brief summary
  • keywords: Comma-separated keywords
  • source_file: Source file path
  • content: Full note content
  • created_at: Creation timestamp
  • updated_at: Update timestamp

Files Table

  • file_id: Primary key
  • filepath: Unique file path
  • file_type: File extension
  • status: Processing status (pending/processed/error)
  • error_message: Error details if any
  • created_at: Creation timestamp
  • updated_at: Update timestamp

LLM Integration

When enabled with --llm flag, the application uses OpenAI's API to:

  • Generate summaries for repositories
  • Extract keywords from notes
  • Categorize content automatically
  • Enhance note metadata

Note: Requires a valid OpenAI API key in .env

Error Handling

The application includes comprehensive error handling:

  • Graceful handling of network failures
  • Rate limiting for GitHub API
  • File encoding error handling
  • Database transaction safety
  • Logging of all operations

Development

Adding New Features

The modular structure makes it easy to extend:

  • Add new parsers in parsers/
  • Add new API integrations in api/
  • Extend database models in database/models.py
  • Add new routes in ui/app.py
  • Create new templates in templates/

Logging

All modules use Python's logging system. Configure log level in config.py:

LOG_LEVEL = 'INFO'  # DEBUG, INFO, WARNING, ERROR, CRITICAL

Best Practices

  • Place your documentation files in the configured DOCS_FOLDER
  • Use .txt or .md extensions for better parsing
  • Run scan before fetch to populate repository data
  • Set a GitHub token to avoid API rate limits
  • Use LLM features sparingly to manage API costs

Troubleshooting

No files found

  • Check DOCS_FOLDER path in .env
  • Ensure files have .txt or .md extensions

GitHub API rate limit

  • Add a GITHUB_TOKEN to .env
  • Wait for rate limit to reset (shown in logs)

OpenAI errors

  • Verify OPENAI_API_KEY in .env
  • Check API quota and billing

License

This project is open source and available under the MIT License.

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

Author

Created by lshtech2021

About

local documents manage tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors