Book Library Parser & Web Explorer

A comprehensive book library indexing and analysis system with genre detection, topic modeling, and a modern web interface for exploring your book collection.

Entirely vibe-coded in a single week-end so please be indulgent with the code quality. :)

-> My scriptorium website

🚀 Quick Start

1. Install Dependencies

pip install -r config/requirements.txt

2. Index Your Library

python scripts/index_library.py --directory data/full_library

3. Start the Web Application

python webapp/run_refactored.py

The web interface will be available at http://localhost:5000

📁 Project Structure

poubelle_parsing/
├── webapp/                    # Modern Flask web application
│   ├── config.py             # Configuration management
│   ├── models.py             # Database models
│   ├── services.py           # Business logic layer
│   ├── controllers.py        # Route handlers
│   ├── utils.py              # Utility functions
│   ├── extensions.py         # Flask extensions
│   ├── app_refactored.py    # Application factory
│   ├── run_refactored.py    # Web app runner
│   └── templates/            # HTML templates
├── src/                      # Core parsing modules
│   ├── epub_parser.py       # EPUB file parsing
│   ├── genre_detector.py    # Genre classification
│   ├── topic_modeler.py     # Topic modeling
│   └── database.py          # Database operations
├── scripts/                  # Indexing and utility scripts
│   ├── index_library.py     # Main indexing tool
│   ├── regenerate_subset.py # Database subset generation
│   └── tools/               # Utility tools
├── data/                     # Book library data
│   ├── full_library/        # Complete book collection
│   ├── genres/              # Genre definitions
│   └── themes/              # Theme definitions
├── databases/                # SQLite databases
│   ├── books.db             # Main database
│   └── subset_books.db      # Subset for development
├── docs/                     # Documentation
│   ├── QUICK_START.md       # Quick start guide
│   └── README_INDEXER.md    # Indexing documentation
└── config/                   # Configuration files
    └── requirements.txt      # Python dependencies

✨ Features

📚 Book Analysis

EPUB Parsing: Extract metadata and content from EPUB files
Genre Detection: Automatic genre classification using ML and rule-based methods
Topic Modeling: Extract themes and topics from book content
Text Analysis: Analyze text complexity, readability, and language
Multi-language Support: French and English content analysis

🌐 Web Interface

Modern UI: Clean, responsive web interface
Book Explorer: Browse and search your entire library
Book Details: Detailed view with metadata, analysis, and recommendations
Statistics: Library overview with charts and metrics
Recommendations: Find similar books based on content analysis
Search: Full-text search across titles, authors, and content

🗄️ Data Management

SQLite Database: Efficient storage with indexing
Auto-reload: Database changes are automatically detected
Export Capabilities: Export data in various formats
Backup System: Automatic database backups

🛠️ Development

Architecture

The webapp uses a modern modular architecture:

Models: Database schema and data structures
Services: Business logic and data processing
Controllers: HTTP route handlers
Utils: Helper functions and utilities
Extensions: Flask extensions and middleware

Running in Development

# Start with auto-reload
python webapp/run_refactored.py

# Or use Flask development server
export FLASK_DEBUG=True
python webapp/run_refactored.py

Database Management

# Index new books
python scripts/index_library.py --directory /path/to/books

# Regenerate subset database
python scripts/regenerate_subset.py

# View database statistics
python scripts/index_library.py stats

Regenerating the Webapp Database

To regenerate the webapp database using a different book collection:

Using small_library folder (recommended for development):

python scripts/index_library.py --db-path databases/subset_books.db index "data/small_library"

Using full_library folder (for production):

python scripts/index_library.py --db-path databases/books.db index "data/full_library"

Using a custom directory:

python scripts/index_library.py --db-path databases/custom.db index "/path/to/your/books"

Note: The webapp is configured to use databases/subset_books.db by default. If you regenerate the main database (books.db), you'll need to either:

Update the webapp configuration to point to the new database, or
Run python scripts/regenerate_subset.py to create a subset from the main database

Database Statistics:

# Check database contents
python scripts/index_library.py --db-path databases/subset_books.db stats

📖 Documentation

Quick Start: See docs/QUICK_START.md for getting started
Indexing Guide: See docs/README_INDEXER.md for detailed indexing instructions
Webapp Architecture: See webapp/README_REFACTORED.md for webapp documentation

🔧 Configuration

The application uses environment variables for configuration:

FLASK_DEBUG: Enable debug mode (default: True)
FLASK_HOST: Host to bind to (default: 0.0.0.0)
FLASK_PORT: Port to bind to (default: 5000)
DATABASE_PATH: Path to SQLite database

🎯 Recent Improvements

Refactored Webapp: Modular architecture with clear separation of concerns
Improved Performance: Optimized database queries and caching
Better UI: Enhanced web interface with modern design
Clean Codebase: Removed debug files and organized structure
Comprehensive Documentation: Detailed guides for all components

📄 License

This project is for educational and personal use.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
data		data
docs		docs
img		img
scripts		scripts
src		src
webapp		webapp
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
README_DEPLOY.md		README_DEPLOY.md
analyze_genre_issues.py		analyze_genre_issues.py
book_tags_v2.json		book_tags_v2.json
check_genres.py		check_genres.py
docker-compose.yml		docker-compose.yml
refactor_library.log		refactor_library.log
test_autocomplete.py		test_autocomplete.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Book Library Parser & Web Explorer

🚀 Quick Start

1. Install Dependencies

2. Index Your Library

3. Start the Web Application

📁 Project Structure

✨ Features

📚 Book Analysis

🌐 Web Interface

🗄️ Data Management

🛠️ Development

Architecture

Running in Development

Database Management

Regenerating the Webapp Database

📖 Documentation

🔧 Configuration

🎯 Recent Improvements

📄 License

About

Uh oh!

Releases

Packages

Languages

az-fouche/scriptorium

Folders and files

Latest commit

History

Repository files navigation

Book Library Parser & Web Explorer

🚀 Quick Start

1. Install Dependencies

2. Index Your Library

3. Start the Web Application

📁 Project Structure

✨ Features

📚 Book Analysis

🌐 Web Interface

🗄️ Data Management

🛠️ Development

Architecture

Running in Development

Database Management

Regenerating the Webapp Database

📖 Documentation

🔧 Configuration

🎯 Recent Improvements

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages