Release Notes - Word Frequency Mini Project

Version 2.0.0 (January 4, 2026)

🚀 Major Release - Web API & Enhanced Architecture

Complete rewrite with FastAPI web service, professional NLP processing, and automated setup.

🎯 What's New

🌐 Web API Service

RESTful API with FastAPI framework
Swagger UI at /docs for interactive testing
Two endpoints: /analyses/text (JSON input) and /analyses/file (file upload)
Multiple formats: JSON, CSV, PNG outputs

🔒 Security & Validation

File size limits (5MB default)
Content-Type validation (text/plain only)
Pydantic models for request/response validation
Comprehensive error handling

🛠️ Developer Experience

One-click setup: python start.py or start.bat/start.sh
Automated installer: Checks dependencies, downloads NLTK data, starts server
Modular architecture: Clean separation (app, pipeline, models, middleware)
Cross-platform: Windows, Linux, macOS support

📊 Version Comparison

Feature	v1.0.0	v2.0.0
Interface	CLI script	FastAPI REST API
Server	None	Uvicorn ASGI
Setup	Manual	Automated scripts
Documentation	Basic	Comprehensive + API docs
File Upload	Manual script execution	HTTP multipart upload

🆕 New Dependencies

# NEW in v2.0.0
fastapi>=0.104.0          # Web framework
uvicorn[standard]>=0.24.0 # ASGI server
python-multipart>=0.0.6   # File upload support
pandas>=1.5.0             # Data manipulation (replaces csv module)
numpy>=1.21.0             # Numerical operations
nltk>=3.8                 # English NLP (replaces re module)
underthesea>=1.3.0        # Vietnamese NLP (NEW)

# CARRIED OVER from v1.0.0
matplotlib>=3.5.0         # Visualization
setuptools>=65.0.0        # Build tools
wheel>=0.37.0             # Build tools

🚀 Quick Start

# Automated setup (recommended)
python start.py

# Manual setup
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
uvicorn src.app.main:app --reload --port 5000

Access: http://localhost:5000/docs

📡 API Usage Examples

Text Analysis:

curl -X POST "http://localhost:5000/analyses/text" \
  -H "Content-Type: application/json" \
  -d '{"text": "Xin chào! Đây là ví dụ.", "format": "json"}'

File Upload:

curl -X POST "http://localhost:5000/analyses/file" \
  -F "file=@sample.txt" \
  -F "format=csv"

Python:

import requests
response = requests.post("http://localhost:5000/analyses/text",
    json={"text": "Hello world!", "format": "json"})
print(response.json())

⚠️ Breaking Changes

Migration required from v1.0.0:

Installation: Install new dependencies (fastapi, uvicorn, nltk, underthesea, pandas, numpy)
Execution: Run via uvicorn or startup scripts instead of direct Python execution
Interface: Primary interface is HTTP API (CLI script deprecated)
Output: Files saved to output/ directory with timestamp naming
Imports: Module structure changed to src.pipeline.text_stats

Direct function usage still available:

from src.pipeline import text_stats as ts
tokens = ts.preprocessing(text)
stats = ts.statistics(tokens)

🐛 Bug Fixes

✅ Better handling of punctuation and special characters

📁 Project Structure

word-frequency-mini-project/
├── src/
│   ├── app/
│   │   ├── main.py          # FastAPI application
│   │   ├── models.py        # Request/Response models
│   │   └── middleware.py    # Security middleware
│   └── pipeline/
│       └── text_stats.py    # Core NLP processing
├── data/                    # Input files
├── output/                  # Generated reports
├── start.py                 # Automated setup
├── start.bat/start.sh       # Platform-specific launchers
└── requirements.txt         # Dependencies

🔮 Roadmap

v2.1.0 (Q2 2026):

Word cloud visualization (from v1.0.0 roadmap)
TF-IDF analysis (from v1.0.0 roadmap)
Batch file processing (from v1.0.0 roadmap)
Export to PDF

v3.0.0 (Q3 2026):

.docx and .pdf file support (from v1.0.0 roadmap)
Database integration
User authentication
Docker containerization
Web UI frontend (from v1.0.0 roadmap)

🙏 Acknowledgments

NLTK Team - Natural Language Toolkit
Underthesea Team - Vietnamese NLP
FastAPI - Modern web framework
Matplotlib - Visualization library

🛠️ Tech Stack

Language: Python 3.x

Libraries:

matplotlib - Data visualization
csv - CSV file handling
re - Regular expressions for text processing

📝 Notes

First stable release
Tested with Vietnamese educational content
CSV output compatible with Excel and data analysis tools

🔮 Future Improvements

Support for additional file formats (.docx, .pdf)
Web interface for file upload
Word cloud visualization
TF-IDF analysis
Multi-file batch processing

Full Changelog: https://github.com/huypq02/word-frequency-mini-project/commits/releases/v2.0.0/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v2.0.0

Choose a tag to compare

Sorry, something went wrong.