Releases: huypq02/word-frequency-mini-project
Release v2.1.1
Release Notes - v2.1.1
Release Date: January 17, 2026
Type: Hotfix Release
Overview
This hotfix release addresses critical deployment issues encountered when running the application in containerized environments (Docker/Render) with non-root users.
Fixes
🐛 Container Permission Issues
Problem: Application failed to start in production due to permission errors when running as non-root user.
Issues Fixed:
-
NLTK Data Download Failures
- Error:
PermissionError: [Errno 13] Permission denied: '/home/app' - Root Cause: NLTK attempted to download language resources to
/home/app/nltk_dataat runtime, which is not writable by theappuser - Solution: Pre-download NLTK data (
punkt,punkt_tab,stopwords) during Docker build as root user to/usr/local/share/nltk_data/ - Impact: Faster startup time, no runtime network calls, eliminates permission errors
- Error:
-
Matplotlib Cache Directory Errors
- Error:
mkdir -p failed for path /home/app/.config/matplotlib: Permission denied - Root Cause: Matplotlib tried to create cache directory in user home directory
- Solution: Set
MPLCONFIGDIR=/tmp/matplotlibenvironment variable - Impact: Matplotlib can now write cache files to writable
/tmpdirectory
- Error:
-
Fontconfig Cache Errors
- Error:
Fontconfig error: No writable cache directories - Root Cause: Fontconfig (used by matplotlib for font rendering) couldn't write cache
- Solution: Set
XDG_CACHE_HOME=/tmp/.cacheenvironment variable - Impact: Eliminates font cache warnings, improves matplotlib performance
- Error:
🚀 CI/CD Improvements
CD Workflow Enhancement
- Added conditional check to deploy only when CI tests pass
- Change: Added
if: ${{ github.event.workflow_run.conclusion == 'success' }}to deployment job - Impact: Prevents deploying broken builds to production
Technical Details
Dockerfile Changes
# Pre-download NLTK data as root before switching to non-root user
RUN pip install --no-cache-dir -r requirements.txt && \
python -c "import nltk; \
nltk.download('punkt', download_dir='/usr/local/share/nltk_data'); \
nltk.download('punkt_tab', download_dir='/usr/local/share/nltk_data'); \
nltk.download('stopwords', download_dir='/usr/local/share/nltk_data')"
# Set cache directories to /tmp to avoid permission issues
ENV MPLCONFIGDIR=/tmp/matplotlib
ENV XDG_CACHE_HOME=/tmp/.cacheFiles Modified
Dockerfile- Added NLTK pre-download and cache environment variables.github/workflows/cd.yml- Added success-only deployment condition
Testing
✅ Verified on Render deployment platform
✅ Confirmed NLTK resources load successfully
✅ Matplotlib/fontconfig errors eliminated
✅ Application starts without permission errors
Upgrade Notes
- No breaking changes
- No database migrations required
- No API changes
- Simply redeploy using the updated Docker image
Full Changelog: v2.1.0...v2.1.1
Release v2.1.0
Release Notes - Version 2.1.0
Release Date: January 13, 2026
🎉 What's New in v2.1.0
1. CI/CD Workflow with GitHub Actions
We've implemented automated Continuous Integration and Continuous Deployment workflows to improve code quality and streamline the deployment process.
CI Pipeline (.github/workflows/ci.yml)
- Automated Linting: Code quality checks using
blackandflake8 - Multi-Version Testing: Automatic testing across Python 3.9, 3.10, 3.11, 3.12, and 3.13
- Code Coverage: Generates coverage reports to track test coverage
- Docker Build: Automated Docker image builds and publishing to GitHub Container Registry (ghcr.io)
- Triggers: Runs on push to
mainbranch orreleases/**branches
CD Pipeline (.github/workflows/cd.yml)
- Automated Deployment: Deploys to Render production environment after successful CI
- Production Tracking: Environment tracking with deployment URLs
- Triggers: Runs after CI workflow completes successfully on
mainbranch
Benefits:
- Ensures code quality before merging
- Prevents breaking changes from reaching production
- Automatic deployment on successful builds
- Multi-version Python compatibility verification
2. Basic Test Cases for import_data Function
Added initial unit tests for the import_data function to ensure reliable file handling.
Test Coverage (tests/test_text_stats.py)
- ✅ Verifies function returns a string
- ✅ Tests file reading functionality
- ✅ Ensures proper UTF-8 encoding support
Benefits:
- Validates core functionality
- Prevents regressions
- Foundation for expanded test coverage
- Automated testing in CI pipeline
📚 Documentation Updates
- Added comprehensive CI/CD Pipeline section to README.md
- Documented testing procedures and commands
- Added CI/CD requirements (GitHub secrets and variables)
- Included badge status examples for repository visibility
🚀 How to Use
Run Tests Locally
# Run all tests
python -m unittest discover tests
# Run specific test
python -m unittest tests.test_text_statsCI/CD Setup
See the CI/CD Pipeline section in README.md for complete setup instructions.
📦 Installation
No breaking changes. Simply pull the latest code:
git pull origin main
pip install -r requirements.txt🔗 Links
👥 Contributors
Thank you to everyone who contributed to this release!
Previous Version: 2.0.0
Current Version: 2.1.0
Full Changelog: https://github.com/huypq02/word-frequency-mini-project/commits/releases/v2.1.0
Release v2.0.0
Release Notes - Word Frequency Mini Project
Version 2.0.0 (January 4, 2026)
🚀 Major Release - Web API & Enhanced Architecture
Complete rewrite with FastAPI web service, professional NLP processing, and automated setup.
🎯 What's New
🌐 Web API Service
- RESTful API with FastAPI framework
- Swagger UI at
/docsfor interactive testing - Two endpoints:
/analyses/text(JSON input) and/analyses/file(file upload) - Multiple formats: JSON, CSV, PNG outputs
🔒 Security & Validation
- File size limits (5MB default)
- Content-Type validation (text/plain only)
- Pydantic models for request/response validation
- Comprehensive error handling
🛠️ Developer Experience
- One-click setup:
python start.pyorstart.bat/start.sh - Automated installer: Checks dependencies, downloads NLTK data, starts server
- Modular architecture: Clean separation (app, pipeline, models, middleware)
- Cross-platform: Windows, Linux, macOS support
📊 Version Comparison
| Feature | v1.0.0 | v2.0.0 |
|---|---|---|
| Interface | CLI script | FastAPI REST API |
| Server | None | Uvicorn ASGI |
| Setup | Manual | Automated scripts |
| Documentation | Basic | Comprehensive + API docs |
| File Upload | Manual script execution | HTTP multipart upload |
🆕 New Dependencies
# NEW in v2.0.0
fastapi>=0.104.0 # Web framework
uvicorn[standard]>=0.24.0 # ASGI server
python-multipart>=0.0.6 # File upload support
pandas>=1.5.0 # Data manipulation (replaces csv module)
numpy>=1.21.0 # Numerical operations
nltk>=3.8 # English NLP (replaces re module)
underthesea>=1.3.0 # Vietnamese NLP (NEW)
# CARRIED OVER from v1.0.0
matplotlib>=3.5.0 # Visualization
setuptools>=65.0.0 # Build tools
wheel>=0.37.0 # Build tools🚀 Quick Start
# Automated setup (recommended)
python start.py
# Manual setup
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
uvicorn src.app.main:app --reload --port 5000Access: http://localhost:5000/docs
📡 API Usage Examples
Text Analysis:
curl -X POST "http://localhost:5000/analyses/text" \
-H "Content-Type: application/json" \
-d '{"text": "Xin chào! Đây là ví dụ.", "format": "json"}'File Upload:
curl -X POST "http://localhost:5000/analyses/file" \
-F "file=@sample.txt" \
-F "format=csv"Python:
import requests
response = requests.post("http://localhost:5000/analyses/text",
json={"text": "Hello world!", "format": "json"})
print(response.json())⚠️ Breaking Changes
Migration required from v1.0.0:
- Installation: Install new dependencies (
fastapi,uvicorn,nltk,underthesea,pandas,numpy) - Execution: Run via
uvicornor startup scripts instead of direct Python execution - Interface: Primary interface is HTTP API (CLI script deprecated)
- Output: Files saved to
output/directory with timestamp naming - Imports: Module structure changed to
src.pipeline.text_stats
Direct function usage still available:
from src.pipeline import text_stats as ts
tokens = ts.preprocessing(text)
stats = ts.statistics(tokens)🐛 Bug Fixes
- ✅ Better handling of punctuation and special characters
📁 Project Structure
word-frequency-mini-project/
├── src/
│ ├── app/
│ │ ├── main.py # FastAPI application
│ │ ├── models.py # Request/Response models
│ │ └── middleware.py # Security middleware
│ └── pipeline/
│ └── text_stats.py # Core NLP processing
├── data/ # Input files
├── output/ # Generated reports
├── start.py # Automated setup
├── start.bat/start.sh # Platform-specific launchers
└── requirements.txt # Dependencies
🔮 Roadmap
v2.1.0 (Q2 2026):
- Word cloud visualization (from v1.0.0 roadmap)
- TF-IDF analysis (from v1.0.0 roadmap)
- Batch file processing (from v1.0.0 roadmap)
- Export to PDF
v3.0.0 (Q3 2026):
- .docx and .pdf file support (from v1.0.0 roadmap)
- Database integration
- User authentication
- Docker containerization
- Web UI frontend (from v1.0.0 roadmap)
🙏 Acknowledgments
- NLTK Team - Natural Language Toolkit
- Underthesea Team - Vietnamese NLP
- FastAPI - Modern web framework
- Matplotlib - Visualization library
🛠️ Tech Stack
Language: Python 3.x
Libraries:
matplotlib- Data visualizationcsv- CSV file handlingre- Regular expressions for text processing
📝 Notes
- First stable release
- Tested with Vietnamese educational content
- CSV output compatible with Excel and data analysis tools
🔮 Future Improvements
- Support for additional file formats (.docx, .pdf)
- Web interface for file upload
- Word cloud visualization
- TF-IDF analysis
- Multi-file batch processing
Full Changelog: https://github.com/huypq02/word-frequency-mini-project/commits/releases/v2.0.0/
Release v1.0.0
Release Notes
Version 1.0.0 (September 24, 2025)
🎉 Initial Release
Word Frequency Mini Project - A text analysis tool for Vietnamese and English documents.
✅ Completed Features
Input
- Support for
.txtfile input - Support for Vietnamese text
- Support for English text
Text Processing
- Convert all text to lowercase
- Remove punctuation and special characters (keep only letters and numbers)
- Split text into individual words (tokenization)
Statistics
- Count the number of occurrences of each word in the text
Output
- Print the list of words with their frequencies
- Sort results in descending order of frequency
- Save results to
.csvfile
Advanced Features (Optional)
- Remove stopwords for English text
- Remove stopwords for Vietnamese text
- Visualize results with bar chart (using matplotlib)
- Count phrase frequencies (bigram analysis)
📁 Output Sample
| Words | Counts |
|---|---|
| học tập | 6 |
| công nghệ | 4 |
| phát triển | 4 |
| giúp | 4 |
| mà còn | 2 |
| xã hội | 2 |
🛠️ Tech Stack
- Language: Python 3.x
- Libraries:
matplotlib- Data visualizationcsv- CSV file handlingre- Regular expressions for text processing
📝 Notes
- First stable release
- Tested with Vietnamese educational content
- CSV output compatible with Excel and data analysis tools
🔮 Future Improvements
- Support for additional file formats (
.docx,.pdf) - Web interface for file upload
- Word cloud visualization
- TF-IDF analysis
- Multi-file batch processing