A Python application for managing local documentation files, extracting GitHub repository URLs, structuring notes, and providing a searchable web interface.
- π File Parser: Scan and parse
.txtand.mdfiles to extract GitHub repository URLs with descriptions - π GitHub API Integration: Fetch repository metadata (name, description, topics, stars, language) with rate limiting
- π Notes Structuring: Parse freeform notes into organized format with topics, summaries, and keywords
- πΎ SQLite Database: Store repositories, notes, and files with full-text search capabilities
- π Web Interface: Flask-based UI for searching, filtering, and browsing content
- π€ Optional LLM Integration: Use OpenAI API for enhanced content analysis and summarization
- Python 3.8 or higher
- pip package manager
- Clone the repository:
git clone https://github.com/lshtech2021/local-docs-manager.git
cd local-docs-manager- Install dependencies:
pip install -r requirements.txt- Configure environment variables:
cp .env.example .env
# Edit .env with your configurationEdit the .env file with your settings:
GITHUB_TOKEN: Your GitHub personal access token (optional but recommended)OPENAI_API_KEY: Your OpenAI API key (optional, for LLM features)FLASK_SECRET_KEY: Secret key for Flask sessionsDATABASE_PATH: Path to SQLite database fileDOCS_FOLDER: Path to your documentation folder
The application provides four main commands:
Scan your documentation folder for .txt and .md files, extract GitHub URLs and notes:
python main.py scanWith LLM enhancement (requires OpenAI API key):
python main.py scan --llmFetch metadata for all repositories in the database:
python main.py fetchLaunch the Flask web interface:
python main.py serverThen open your browser to http://localhost:5000
Display database statistics:
python main.py statslocal-docs-manager/
βββ main.py # Main entry point
βββ config.py # Configuration management
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
βββ .gitignore # Git ignore rules
βββ README.md # This file
β
βββ parsers/ # Parser modules
β βββ __init__.py
β βββ file_parser.py # GitHub URL extraction
β βββ notes_parser.py # Note structuring
β
βββ api/ # API integration modules
β βββ __init__.py
β βββ github_api.py # GitHub API client
β βββ llm_api.py # OpenAI LLM integration
β
βββ database/ # Database layer
β βββ __init__.py
β βββ models.py # Data models
β βββ db_manager.py # Database operations
β
βββ ui/ # Web interface
β βββ __init__.py
β βββ app.py # Flask application
β
βββ templates/ # HTML templates
βββ base.html
βββ index.html
βββ repositories.html
βββ notes.html
βββ files.html
βββ search.html
The Flask web interface provides:
- Home: Dashboard with statistics and recent items
- Repositories: Browse and filter GitHub repositories
- Notes: View and search structured notes
- Files: Track processed files
- Search: Unified search across all content
- Filter repositories by keyword, topic, or language
- Search notes by keyword or topic
- View repository details including stars, language, and topics
- Browse note content with expandable details
The application also provides REST API endpoints:
GET /api/stats- Get database statisticsGET /api/repositories?keyword=&topic=&language=- Search repositoriesGET /api/notes?keyword=&topic=- Search notes
repo_id: Primary keyurl: Unique repository URLname: Repository namedescription: Repository descriptiontopics: Comma-separated topics/tagsstars: Star countlanguage: Primary languagecreated_at: Creation timestampupdated_at: Update timestamp
note_id: Primary keytopic: Note topic/titlesummary: Brief summarykeywords: Comma-separated keywordssource_file: Source file pathcontent: Full note contentcreated_at: Creation timestampupdated_at: Update timestamp
file_id: Primary keyfilepath: Unique file pathfile_type: File extensionstatus: Processing status (pending/processed/error)error_message: Error details if anycreated_at: Creation timestampupdated_at: Update timestamp
When enabled with --llm flag, the application uses OpenAI's API to:
- Generate summaries for repositories
- Extract keywords from notes
- Categorize content automatically
- Enhance note metadata
Note: Requires a valid OpenAI API key in .env
The application includes comprehensive error handling:
- Graceful handling of network failures
- Rate limiting for GitHub API
- File encoding error handling
- Database transaction safety
- Logging of all operations
The modular structure makes it easy to extend:
- Add new parsers in
parsers/ - Add new API integrations in
api/ - Extend database models in
database/models.py - Add new routes in
ui/app.py - Create new templates in
templates/
All modules use Python's logging system. Configure log level in config.py:
LOG_LEVEL = 'INFO' # DEBUG, INFO, WARNING, ERROR, CRITICAL- Place your documentation files in the configured
DOCS_FOLDER - Use
.txtor.mdextensions for better parsing - Run
scanbeforefetchto populate repository data - Set a GitHub token to avoid API rate limits
- Use LLM features sparingly to manage API costs
- Check
DOCS_FOLDERpath in.env - Ensure files have
.txtor.mdextensions
- Add a
GITHUB_TOKENto.env - Wait for rate limit to reset (shown in logs)
- Verify
OPENAI_API_KEYin.env - Check API quota and billing
This project is open source and available under the MIT License.
Contributions are welcome! Please feel free to submit pull requests or open issues.
Created by lshtech2021