17 Jan 16:53

huypq02

e64207e

Release v2.1.1 Latest

Latest

Release Notes - v2.1.1

Release Date: January 17, 2026
Type: Hotfix Release

Overview

This hotfix release addresses critical deployment issues encountered when running the application in containerized environments (Docker/Render) with non-root users.

Fixes

🐛 Container Permission Issues

Problem: Application failed to start in production due to permission errors when running as non-root user.

Issues Fixed:

NLTK Data Download Failures
- Error: PermissionError: [Errno 13] Permission denied: '/home/app'
- Root Cause: NLTK attempted to download language resources to /home/app/nltk_data at runtime, which is not writable by the app user
- Solution: Pre-download NLTK data (punkt, punkt_tab, stopwords) during Docker build as root user to /usr/local/share/nltk_data/
- Impact: Faster startup time, no runtime network calls, eliminates permission errors
Matplotlib Cache Directory Errors
- Error: mkdir -p failed for path /home/app/.config/matplotlib: Permission denied
- Root Cause: Matplotlib tried to create cache directory in user home directory
- Solution: Set MPLCONFIGDIR=/tmp/matplotlib environment variable
- Impact: Matplotlib can now write cache files to writable /tmp directory
Fontconfig Cache Errors
- Error: Fontconfig error: No writable cache directories
- Root Cause: Fontconfig (used by matplotlib for font rendering) couldn't write cache
- Solution: Set XDG_CACHE_HOME=/tmp/.cache environment variable
- Impact: Eliminates font cache warnings, improves matplotlib performance

🚀 CI/CD Improvements

CD Workflow Enhancement

Added conditional check to deploy only when CI tests pass
Change: Added if: ${{ github.event.workflow_run.conclusion == 'success' }} to deployment job
Impact: Prevents deploying broken builds to production

Technical Details

Dockerfile Changes

# Pre-download NLTK data as root before switching to non-root user
RUN pip install --no-cache-dir -r requirements.txt && \
    python -c "import nltk; \
    nltk.download('punkt', download_dir='/usr/local/share/nltk_data'); \
    nltk.download('punkt_tab', download_dir='/usr/local/share/nltk_data'); \
    nltk.download('stopwords', download_dir='/usr/local/share/nltk_data')"

# Set cache directories to /tmp to avoid permission issues
ENV MPLCONFIGDIR=/tmp/matplotlib
ENV XDG_CACHE_HOME=/tmp/.cache

Files Modified

Dockerfile - Added NLTK pre-download and cache environment variables
.github/workflows/cd.yml - Added success-only deployment condition

Testing

✅ Verified on Render deployment platform
✅ Confirmed NLTK resources load successfully
✅ Matplotlib/fontconfig errors eliminated
✅ Application starts without permission errors

Upgrade Notes

No breaking changes
No database migrations required
No API changes
Simply redeploy using the updated Docker image

Full Changelog: v2.1.0...v2.1.1

Assets 2

12 Jan 18:37

huypq02

v2.1.0

b92255f

Release v2.1.0

Release Notes - Version 2.1.0

Release Date: January 13, 2026

🎉 What's New in v2.1.0

1. CI/CD Workflow with GitHub Actions

We've implemented automated Continuous Integration and Continuous Deployment workflows to improve code quality and streamline the deployment process.

CI Pipeline (`.github/workflows/ci.yml`)

Automated Linting: Code quality checks using black and flake8
Multi-Version Testing: Automatic testing across Python 3.9, 3.10, 3.11, 3.12, and 3.13
Code Coverage: Generates coverage reports to track test coverage
Docker Build: Automated Docker image builds and publishing to GitHub Container Registry (ghcr.io)
Triggers: Runs on push to main branch or releases/** branches

CD Pipeline (`.github/workflows/cd.yml`)

Automated Deployment: Deploys to Render production environment after successful CI
Production Tracking: Environment tracking with deployment URLs
Triggers: Runs after CI workflow completes successfully on main branch

Benefits:

Ensures code quality before merging
Prevents breaking changes from reaching production
Automatic deployment on successful builds
Multi-version Python compatibility verification

2. Basic Test Cases for `import_data` Function

Added initial unit tests for the import_data function to ensure reliable file handling.

Test Coverage (`tests/test_text_stats.py`)

✅ Verifies function returns a string
✅ Tests file reading functionality
✅ Ensures proper UTF-8 encoding support

Benefits:

Validates core functionality
Prevents regressions
Foundation for expanded test coverage
Automated testing in CI pipeline

📚 Documentation Updates

Added comprehensive CI/CD Pipeline section to README.md
Documented testing procedures and commands
Added CI/CD requirements (GitHub secrets and variables)
Included badge status examples for repository visibility

🚀 How to Use

Run Tests Locally

# Run all tests
python -m unittest discover tests

# Run specific test
python -m unittest tests.test_text_stats

CI/CD Setup

See the CI/CD Pipeline section in README.md for complete setup instructions.

📦 Installation

No breaking changes. Simply pull the latest code:

git pull origin main
pip install -r requirements.txt

🔗 Links

👥 Contributors

Thank you to everyone who contributed to this release!

Previous Version: 2.0.0
Current Version: 2.1.0

Full Changelog: https://github.com/huypq02/word-frequency-mini-project/commits/releases/v2.1.0

Assets 2

04 Jan 17:02

huypq02

v2.0.0

694a19d

Release v2.0.0

Release Notes - Word Frequency Mini Project

Version 2.0.0 (January 4, 2026)

🚀 Major Release - Web API & Enhanced Architecture

Complete rewrite with FastAPI web service, professional NLP processing, and automated setup.

🎯 What's New

🌐 Web API Service

RESTful API with FastAPI framework
Swagger UI at /docs for interactive testing
Two endpoints: /analyses/text (JSON input) and /analyses/file (file upload)
Multiple formats: JSON, CSV, PNG outputs

🔒 Security & Validation

File size limits (5MB default)
Content-Type validation (text/plain only)
Pydantic models for request/response validation
Comprehensive error handling

🛠️ Developer Experience

One-click setup: python start.py or start.bat/start.sh
Automated installer: Checks dependencies, downloads NLTK data, starts server
Modular architecture: Clean separation (app, pipeline, models, middleware)
Cross-platform: Windows, Linux, macOS support

📊 Version Comparison

Feature	v1.0.0	v2.0.0
Interface	CLI script	FastAPI REST API
Server	None	Uvicorn ASGI
Setup	Manual	Automated scripts
Documentation	Basic	Comprehensive + API docs
File Upload	Manual script execution	HTTP multipart upload

🆕 New Dependencies

# NEW in v2.0.0
fastapi>=0.104.0          # Web framework
uvicorn[standard]>=0.24.0 # ASGI server
python-multipart>=0.0.6   # File upload support
pandas>=1.5.0             # Data manipulation (replaces csv module)
numpy>=1.21.0             # Numerical operations
nltk>=3.8                 # English NLP (replaces re module)
underthesea>=1.3.0        # Vietnamese NLP (NEW)

# CARRIED OVER from v1.0.0
matplotlib>=3.5.0         # Visualization
setuptools>=65.0.0        # Build tools
wheel>=0.37.0             # Build tools

🚀 Quick Start

# Automated setup (recommended)
python start.py

# Manual setup
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
uvicorn src.app.main:app --reload --port 5000

Access: http://localhost:5000/docs

📡 API Usage Examples

Text Analysis:

curl -X POST "http://localhost:5000/analyses/text" \
  -H "Content-Type: application/json" \
  -d '{"text": "Xin chào! Đây là ví dụ.", "format": "json"}'

File Upload:

curl -X POST "http://localhost:5000/analyses/file" \
  -F "file=@sample.txt" \
  -F "format=csv"

Python:

import requests
response = requests.post("http://localhost:5000/analyses/text",
    json={"text": "Hello world!", "format": "json"})
print(response.json())

⚠️ Breaking Changes

Migration required from v1.0.0:

Installation: Install new dependencies (fastapi, uvicorn, nltk, underthesea, pandas, numpy)
Execution: Run via uvicorn or startup scripts instead of direct Python execution
Interface: Primary interface is HTTP API (CLI script deprecated)
Output: Files saved to output/ directory with timestamp naming
Imports: Module structure changed to src.pipeline.text_stats

Direct function usage still available:

from src.pipeline import text_stats as ts
tokens = ts.preprocessing(text)
stats = ts.statistics(tokens)

🐛 Bug Fixes

✅ Better handling of punctuation and special characters

📁 Project Structure

word-frequency-mini-project/
├── src/
│   ├── app/
│   │   ├── main.py          # FastAPI application
│   │   ├── models.py        # Request/Response models
│   │   └── middleware.py    # Security middleware
│   └── pipeline/
│       └── text_stats.py    # Core NLP processing
├── data/                    # Input files
├── output/                  # Generated reports
├── start.py                 # Automated setup
├── start.bat/start.sh       # Platform-specific launchers
└── requirements.txt         # Dependencies

🔮 Roadmap

v2.1.0 (Q2 2026):

Word cloud visualization (from v1.0.0 roadmap)
TF-IDF analysis (from v1.0.0 roadmap)
Batch file processing (from v1.0.0 roadmap)
Export to PDF

v3.0.0 (Q3 2026):

.docx and .pdf file support (from v1.0.0 roadmap)
Database integration
User authentication
Docker containerization
Web UI frontend (from v1.0.0 roadmap)

🙏 Acknowledgments

NLTK Team - Natural Language Toolkit
Underthesea Team - Vietnamese NLP
FastAPI - Modern web framework
Matplotlib - Visualization library

🛠️ Tech Stack

Language: Python 3.x

Libraries:

matplotlib - Data visualization
csv - CSV file handling
re - Regular expressions for text processing

📝 Notes

First stable release
Tested with Vietnamese educational content
CSV output compatible with Excel and data analysis tools

🔮 Future Improvements

Support for additional file formats (.docx, .pdf)
Web interface for file upload
Word cloud visualization
TF-IDF analysis
Multi-file batch processing

Full Changelog: https://github.com/huypq02/word-frequency-mini-project/commits/releases/v2.0.0/

Assets 2

04 Dec 16:58

huypq02

v1.0.0

cd1454d

Release v1.0.0

Release Notes

Version 1.0.0 (September 24, 2025)

🎉 Initial Release

Word Frequency Mini Project - A text analysis tool for Vietnamese and English documents.

✅ Completed Features

Input

Support for .txt file input
Support for Vietnamese text
Support for English text

Text Processing

Convert all text to lowercase
Remove punctuation and special characters (keep only letters and numbers)
Split text into individual words (tokenization)

Statistics

Count the number of occurrences of each word in the text

Output

Print the list of words with their frequencies
Sort results in descending order of frequency
Save results to .csv file

Advanced Features (Optional)

Remove stopwords for English text
Remove stopwords for Vietnamese text
Visualize results with bar chart (using matplotlib)
Count phrase frequencies (bigram analysis)

📁 Output Sample

Words	Counts
học tập	6
công nghệ	4
phát triển	4
giúp	4
mà còn	2
xã hội	2

🛠️ Tech Stack

Language: Python 3.x
Libraries:
- matplotlib - Data visualization
- csv - CSV file handling
- re - Regular expressions for text processing

📝 Notes

First stable release
Tested with Vietnamese educational content
CSV output compatible with Excel and data analysis tools

🔮 Future Improvements

Support for additional file formats (.docx, .pdf)
Web interface for file upload
Word cloud visualization
TF-IDF analysis
Multi-file batch processing

Assets 2

Releases: huypq02/word-frequency-mini-project

Release v2.1.1

Release Notes - v2.1.1

Overview

Fixes

🐛 Container Permission Issues

🚀 CI/CD Improvements

Technical Details

Dockerfile Changes

Files Modified

Testing

Upgrade Notes

Uh oh!

Release v2.1.0

Release Notes - Version 2.1.0

🎉 What's New in v2.1.0

1. CI/CD Workflow with GitHub Actions

CI Pipeline (.github/workflows/ci.yml)

CD Pipeline (.github/workflows/cd.yml)

2. Basic Test Cases for import_data Function

Test Coverage (tests/test_text_stats.py)

📚 Documentation Updates

🚀 How to Use

Run Tests Locally

CI/CD Setup

📦 Installation

🔗 Links

👥 Contributors

Uh oh!

Release v2.0.0

Release Notes - Word Frequency Mini Project

Version 2.0.0 (January 4, 2026)

🎯 What's New

🌐 Web API Service

🔒 Security & Validation

🛠️ Developer Experience

📊 Version Comparison

🆕 New Dependencies

🚀 Quick Start

📡 API Usage Examples

⚠️ Breaking Changes

🐛 Bug Fixes

📁 Project Structure

🔮 Roadmap

🙏 Acknowledgments

🛠️ Tech Stack

📝 Notes

🔮 Future Improvements

Uh oh!

Release v1.0.0

Release Notes

Version 1.0.0 (September 24, 2025)

🎉 Initial Release

✅ Completed Features

Input

Text Processing

Statistics

Output

Advanced Features (Optional)

📁 Output Sample

🛠️ Tech Stack

📝 Notes

🔮 Future Improvements

Uh oh!

CI Pipeline (`.github/workflows/ci.yml`)

CD Pipeline (`.github/workflows/cd.yml`)

2. Basic Test Cases for `import_data` Function

Test Coverage (`tests/test_text_stats.py`)