AI-powered text summarization with real-time streaming, built with FastAPI and Next.js
Smart Summary App is an enterprise-ready solution that transforms lengthy documents into concise, actionable summaries using artificial intelligence. The application addresses a common business challenge: information overload. Teams across organizations spend significant time reading and processing large volumes of text—reports, articles, research papers, and documentation.
This solution enables employees to paste any text and receive an intelligent summary within seconds. The AI processes the content in real-time, showing results as they are generated rather than requiring users to wait for completion. This streaming approach provides immediate feedback and improves productivity.
Key Business Benefits:
- Reduce time spent reading lengthy documents by up to 80%
- Maintain consistency in how information is summarized across teams
- Scale document processing without increasing headcount
- Secure, authenticated access ensures only authorized users can access the system
- Flexible deployment options support cloud or on-premise infrastructure
The application is designed to integrate with existing enterprise workflows and can be customized to meet specific organizational needs. It supports multiple AI providers, allowing organizations to choose the model that best fits their requirements and budget.
- Technical Requirements
- Features
- Architecture
- Summarization Strategies
- Compression Ratio
- Technology Stack
- Project Structure
- Getting Started
- Deployment
- API Documentation
- Testing
- Security
- Scaling Considerations
- Future Roadmap
- Design Decisions
- Assumptions and Constraints
The application fulfills the following technical specifications:
| Requirement | Implementation |
|---|---|
| Frontend | React with Next.js 14 (App Router) |
| Backend | FastAPI with Python 3.11+ |
| LLM Integration | LangChain with Anthropic Claude (switchable to OpenAI/Gemini) |
| Streaming | Server-Sent Events (SSE) for progressive summary generation |
| Deployment | Configured for Vercel (frontend) and Render (backend) |
Demo Credentials (Testing Only):
- Username:
demo - Password: As configured in environment variables
Important: For production environments, create users with strong passwords. Demo credentials should never be used in production.
The application provides two categories of functionality: core features essential for summarization and advanced features that enhance the user experience.
Real-time Streaming enables users to see the summary being generated token by token, providing immediate feedback rather than waiting for the entire response.
Multiple Summarization Strategies allow users to choose between simple, hierarchical, or detailed approaches depending on their needs.
JWT Authentication secures all API endpoints with industry-standard token-based authentication.
LLM Flexibility allows organizations to switch between Anthropic, OpenAI, or Gemini models without modifying application code.
Responsive Interface works seamlessly across desktop and mobile devices using Tailwind CSS.
Docker Support provides full containerization for consistent development and deployment environments.
- Progress indicators during summarization
- Adjustable compression ratio (5% to 50% of original text)
- Real-time character and word counting
- One-click copy to clipboard
- Input validation and prompt injection protection
The application follows a modern three-tier architecture separating the user interface, business logic, and AI processing layers. This separation enables independent scaling and maintenance of each component.
┌────────────────────────────────────────────────────┐
│ Next.js Frontend (Vercel) │
│ - React Components (TypeScript) │
│ - SSE Client for Streaming │
│ - JWT Token Management │
└─────────────────┬──────────────────────────────────┘
│ HTTPS + SSE
┌─────────────────▼──────────────────────────────────┐
│ FastAPI Backend (Render.com) │
│ ┌──────────────────────────────────────────────┐ │
│ │ API Routes (JWT Protected) │ │
│ │ - /api/auth/* - Login/Register │ │
│ │ - /api/summary/* - Streaming/Sync │ │
│ └──────────────────┬───────────────────────────┘ │
│ ┌──────────────────▼───────────────────────────┐ │
│ │ Services Layer │ │
│ │ - SummarizerService (3 strategies) │ │
│ │ - LLMService (LangChain abstraction) │ │
│ │ - TextProcessor (chunking, cleaning) │ │
│ └──────────────────┬───────────────────────────┘ │
│ ┌──────────────────▼───────────────────────────┐ │
│ │ LangChain + LLM APIs │ │
│ │ - Anthropic Claude 4.5 Sonnet │ │
│ └──────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────┘
Technical Details:
- The frontend communicates with the backend over HTTPS using Server-Sent Events for streaming responses
- All API routes are protected by JWT authentication except for login and registration
- The Services Layer implements business logic independently from the API layer
- LangChain provides an abstraction layer that enables switching between AI providers without code changes
Users can choose from three summarization approaches depending on the length and complexity of their source material. Each strategy is optimized for different use cases.
Best for quick summaries of shorter documents. The system sends the entire text to the AI model in a single request and returns the summary.
| Attribute | Value |
|---|---|
| Recommended text length | Under 50,000 characters |
| Speed | Fast (single LLM call) |
| Use cases | Quick summaries, simple documents |
Best for most business documents. The system breaks the text into semantic chunks, summarizes each chunk in parallel, then combines the results into a coherent final summary.
| Attribute | Value |
|---|---|
| Recommended text length | 50,000 to 300,000 characters |
| Method | Semantic chunking, parallel summarization, combination |
| Use cases | Articles, reports, documentation |
Best for comprehensive analysis where preserving key details is critical. The system first extracts important sentences using TextRank algorithm, then generates an abstractive summary from those extractions.
| Attribute | Value |
|---|---|
| Recommended text length | Any size |
| Method | Extractive sentence extraction plus abstractive LLM summary |
| Use cases | Research papers, detailed analysis |
The compression ratio determines the target length of the summary as a percentage of the original text. Users can adjust this setting based on their needs.
| Ratio | Description | Use Case |
|---|---|---|
| 5% | Ultra-brief | Executive summaries, key points only |
| 15% | Brief | Main ideas and conclusions |
| 20% | Balanced (default) | General-purpose summaries |
| 30% | Moderate detail | Comprehensive overviews |
| 50% | Comprehensive | Detailed summaries preserving nuance |
Note: Actual compression may vary based on content complexity and the selected strategy.
This section details the technologies used in each layer of the application.
| Layer | Technology | Version | Purpose |
|---|---|---|---|
| Frontend | Next.js | 14+ | App Router, server-side rendering, streaming |
| UI Framework | Tailwind CSS | Latest | Responsive styling |
| Backend | FastAPI | 0.109+ | High-performance async API |
| LLM Integration | LangChain | Latest | Provider abstraction |
| AI Models | Anthropic Claude | 4.5 Sonnet | Text summarization |
| Authentication | python-jose | Latest | JWT token management |
| Password Security | bcrypt | Latest | Password hashing |
| Testing | Pytest, Jest | Latest | Unit and integration tests |
| Containerization | Docker | Latest | Development and deployment |
The repository is organized into separate frontend and backend directories, each with its own configuration and deployment settings.
smart-summary/
├── backend/
│ ├── app/
│ │ ├── api/routes/ # API endpoints
│ │ │ ├── auth.py # Login/register
│ │ │ └── summary.py # Summarization endpoints
│ │ ├── services/ # Business logic
│ │ │ ├── llm_service.py # LangChain wrapper
│ │ │ ├── summarizer.py # Strategies
│ │ │ └── text_processor.py # Text utilities
│ │ ├── core/ # Configuration
│ │ │ ├── config.py # Settings
│ │ │ └── security.py # JWT & passwords
│ │ ├── models/
│ │ │ └── schemas.py # Pydantic models
│ │ └── main.py # FastAPI app
│ ├── tests/
│ │ ├── test_api.py # Endpoint tests
│ │ └── test_services.py # Service tests
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── render.yaml # Deployment config
│ └── .env.example
├── frontend/
│ ├── app/
│ │ ├── page.tsx # Main page
│ │ ├── layout.tsx # Root layout
│ │ └── globals.css # Global styles
│ ├── components/
│ │ ├── SummaryForm.tsx # Input form
│ │ ├── SummaryDisplay.tsx # Output display
│ │ └── AuthModal.tsx # Login/register
│ ├── lib/
│ │ └── api.ts # API client
│ ├── Dockerfile
│ ├── vercel.json # Deployment config
│ └── .env.example
└── docker-compose.yml # Local development
This section provides instructions for setting up the application locally. For a faster setup experience, refer to the Quick Start Guide.
Before beginning, ensure the following software is installed:
- Node.js 20 or later
- Python 3.11 or later
- Docker and Docker Compose
- An Anthropic API key (obtain from console.anthropic.com)
Docker provides the simplest path to running the application locally. Follow these steps:
Step 1: Clone the repository
git clone https://github.com/ebertolo/smart-summary.git
cd smart-summaryStep 2: Configure the backend
cp backend/.env.example backend/.envEdit backend/.env and configure the following values:
ANTHROPIC_API_KEY: Your Anthropic API keyJWT_SECRET: A secure random string for token signingDEMO_USER_PASSWORD: Password for the demo user account
Step 3: Configure the frontend
cp frontend/.env.example frontend/.env.localThe default value NEXT_PUBLIC_API_URL=http://localhost:8000 is correct for local development.
Step 4: Start the application
docker-compose up --buildStep 5: Access the application
- Frontend: http://localhost:3000
- Backend API Documentation: http://localhost:8000/docs
Log in with username demo and the password configured in your .env file.
Important: The .env files must be created before running Docker. The application will not start without proper configuration.
For development work where you need to modify code and see changes immediately, you may prefer running the services directly.
Backend Setup:
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add ANTHROPIC_API_KEY
python scripts/init_db.py # Creates demo user
uvicorn app.main:app --reloadBackend API documentation: http://localhost:8000/docs
Frontend Setup:
cd frontend
npm install
cp .env.example .env.local
# Verify NEXT_PUBLIC_API_URL=http://localhost:8000
npm run devFrontend: http://localhost:3000
For detailed backend setup instructions, see backend/QUICKSTART.md.
This section covers deploying the application to production environments.
Before deploying to production, complete the following security tasks:
- Change
JWT_SECRETto a cryptographically secure random value - Create production users with strong passwords (never use demo credentials)
- Update
CORS_ORIGINSwith your production frontend URL - Set
PYTHON_ENV=productionin environment variables - Review and rotate all API keys
- Consider migrating from SQLite to PostgreSQL for production workloads
Option 1: Vercel Dashboard
- Navigate to vercel.com and sign in
- Import your GitHub repository
- Set the root directory to
frontend - Add environment variable:
NEXT_PUBLIC_API_URLwith your backend URL - Deploy
Option 2: Vercel CLI
cd frontend
npm install -g vercel
vercel --prodOption 1: Render Dashboard
- Navigate to render.com and sign in
- Create a new Web Service and connect your GitHub repository
- Set the root directory to
backend - Render will automatically detect the
render.yamlconfiguration - Add environment variable:
ANTHROPIC_API_KEY - Deploy
Post-Deployment: Update CORS Configuration
After deploying the backend, update the CORS configuration to allow requests from your frontend:
# backend/app/core/config.py
CORS_ORIGINS = [
"http://localhost:3000",
"https://your-app.vercel.app" # Add your Vercel URL
]Commit and push this change to trigger a redeployment.
The backend exposes a RESTful API with authentication and summarization endpoints. Full interactive documentation is available at /docs when running the backend.
POST /api/auth/login
Authenticates a user and returns a JWT token for subsequent API calls.
Request:
{
"username": "demo",
"password": "your_password"
}Response:
{
"access_token": "eyJhbGciOiJIUz...",
"token_type": "bearer",
"expires_in": 3600
}POST /api/auth/register
Creates a new user account.
Request:
{
"username": "newuser",
"password": "your_secure_password"
}POST /api/summary/summarize (Streaming)
Generates a summary with real-time streaming response.
curl -X POST http://localhost:8000/api/summary/summarize \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Your long text here...",
"strategy": "hierarchical",
"compression_ratio": 0.20
}'Parameters:
| Parameter | Required | Description |
|---|---|---|
| text | Yes | Text to summarize (100 to 300,000 characters) |
| strategy | No | simple, hierarchical, or detailed (default: hierarchical) |
| compression_ratio | No | 0.05 to 0.50 (default: 0.20) |
Response (SSE Stream):
data: {"type": "content", "content": "First chunk", "done": false}
data: {"type": "content", "content": " continues...", "done": false}
data: {"type": "complete", "done": true}
POST /api/summary/summarize-sync (Non-Streaming)
Returns the complete summary in a single response. Use this endpoint when streaming is not required.
The application includes comprehensive test suites for both backend and frontend components.
Backend Tests:
cd backend
pytest --cov=app --cov-report=htmlRun a specific test:
pytest tests/test_api.py::TestAuthEndpoints::test_login_with_valid_credentialsFrontend Tests:
cd frontend
npm testThe test suites cover the following areas:
- Authentication endpoints (login, registration, token validation)
- All summarization strategies
- Text processing utilities
- JWT token generation and validation
- Input validation and error handling
This section details the security measures implemented in the application and recommendations for production environments.
Authentication and Authorization:
- JWT authentication with configurable token expiration
- Bcrypt password hashing with salt
- Token-based API access control
Input Protection:
- CORS protection limiting allowed origins
- Pydantic-based input validation
- Text length limits to prevent abuse
- Prompt injection protection
Configuration Security:
- Environment variable management for secrets
- Separation of development and production configurations
For production deployments, implement the following additional measures:
Critical (Must Implement):
- Generate a new
JWT_SECRETusing:python -c "import secrets; print(secrets.token_urlsafe(32))" - Create production users with strong passwords using
scripts/init_db.py - Use HTTPS exclusively
- Update CORS configuration with production URLs only
Recommended:
- Implement rate limiting (consider slowapi library)
- Add refresh token rotation
- Implement API key rotation schedule
- Set up logging and monitoring (APM tools)
- Add request signing for sensitive operations
- Build a user management interface
Never use demo credentials in production. Create new users with the following command:
cd backend
python scripts/init_db.py --username admin --password YOUR_SECURE_PASSWORD --email admin@yourcompany.comThe application is designed to scale from small teams to enterprise deployments. This section outlines the current capacity and the path to higher scale.
| Metric | Capacity |
|---|---|
| Concurrent users | 100 to 1,000 |
| Requests per second | 50 to 100 |
| Maximum text size | 300,000 characters (~75,000 tokens) |
Phase 1: 1,000 to 10,000 Users
- Migrate from SQLite to PostgreSQL for reliable concurrent access
- Enable Render auto-scaling for the backend
- Implement per-user rate limiting
Phase 2: 10,000 to 100,000 Users
- Add Celery for background job processing
- Deploy a CDN for static assets
- Configure database read replicas
- Implement caching layer
Phase 3: 100,000+ Users
- Migrate to microservices architecture
- Implement message queue (RabbitMQ or Kafka)
- Deploy on Kubernetes for orchestration
- Configure multi-region deployment for global availability
- Summary history with database persistence
- File upload support (PDF, DOCX, TXT)
- Export summaries to PDF and Markdown
- Rate limiting implementation
- Multi-language support
- Custom prompts and templates
- Batch processing interface
- Summary sharing via public links
- Mobile application
- Voice input and output
- Collaborative features
- Analytics dashboard
This section explains the rationale behind key architectural and technology choices.
The application maintains no server-side session state. All user data is stored in the database (SQLite for development, PostgreSQL recommended for production). This design enables horizontal scaling—multiple backend instances can serve requests without coordination.
LangChain provides an abstraction layer over multiple LLM providers. This enables organizations to switch between Anthropic, OpenAI, or Gemini models by changing configuration rather than modifying code. This flexibility protects against vendor lock-in and allows optimization based on cost and performance requirements.
Server-Sent Events deliver summary content to users as it is generated. For long documents, this means users see results within seconds rather than waiting 30-60 seconds for completion. This approach significantly improves perceived performance and user satisfaction.
JSON Web Tokens provide stateless authentication suitable for scalable APIs. Tokens are self-contained—the backend can validate them without database lookups. This eliminates the need for session storage and simplifies horizontal scaling.
| Constraint | Details |
|---|---|
| Text size | Maximum 300,000 characters (~75,000 tokens), aligned with Claude model limits |
| Compression range | 5% to 50% of original text, configurable per request |
| User storage | SQLite for development; PostgreSQL recommended for production |
| Rate limiting | Not implemented; required for production deployment |
| Monitoring | Basic logging only; APM tools recommended for production |
| Security | Prompt injection protection via input validation |