A detailed, end-to-end explanation of the entire DMS platform: every microservice, every database, every infrastructure piece, every concept (Kafka, Kubernetes, JWT, Redis, MinIO/S3, Cassandra…) and how everything is wired together.
- Project Overview
- Architectural Style
- Components at a Glance
- Full System Architecture Diagram
- Service-by-Service Deep Dive
- 5.1 Gateway Service
- 5.2 Authentication Service
- 5.3 Documents Service
- 5.4 Comments Service
- 5.5 Orchestration Service
- 5.6 Translator Service
- 5.7 Frontend (React + Vite)
- Data Stores
- Messaging: Apache Kafka
- Authentication & Security (JWT)
- Inter-Service Communication
- End-to-End Request Flows
- Containerization (Docker)
- Kubernetes Deployment
- Local Development Workflow
- Operational Concerns
- Glossary of Concepts
The Document Management System (DMS) is a distributed, microservices-based academic lab platform that lets users:
- Register, log in, and obtain a JWT (JSON Web Token).
- Upload, list, version, download, and delete documents (with file blobs stored in S3-compatible object storage).
- Comment on documents.
- Trigger asynchronous translations of document titles/contents through an LLM (Gemini / OpenRouter).
- Manage users, roles, departments, and categories from an admin UI.
It is intentionally built as multiple independent services to illustrate real-world distributed-system concepts:
| Concept | Where it lives |
|---|---|
| API Gateway | Spring Cloud Gateway (backend/gateway) |
| Stateless JWT-based auth | Auth service (backend/authentication) |
| Domain-driven service boundaries | Documents / Comments / Orchestration / Translator |
| Polyglot persistence | PostgreSQL + Cassandra + Redis + MinIO |
| Event-driven asynchronous workflow | Kafka topics for translation events |
| External AI integration | OpenRouter / Gemini API |
| Containerization | Per-service Dockerfile (multi-stage) |
| Orchestration | Kubernetes manifests (infra/k8s) on Minikube/kind |
The project follows a microservices style with these properties:
- One service, one responsibility — every backend module owns a single bounded context (auth, documents, comments, orchestration, translation).
- Database-per-service — each domain service owns its own datastore so that storage outages or schema changes never cascade across services.
- Hybrid sync + async communication
- Synchronous REST for user-facing requests through the Gateway.
- Asynchronous Kafka events for slow / failure-prone work (translation).
- Stateless services — services don't keep request state in memory. The only "state" is in the databases / object store. This enables horizontal scaling (multiple pod replicas) and trivial failover.
- Token-based security — the Auth service issues a signed JWT once at login; all other services validate it locally with the shared HS256 secret, so no per-request call back to Auth is required.
- Single entry point — clients only ever talk to the Gateway (the internal cluster topology is hidden from the outside world).
| Component | Tech | Port | Folder | Role |
|---|---|---|---|---|
| Frontend UI | React 19 + Vite | 5173 | frontend/ui/ | User & admin web app |
| Gateway | Spring Cloud Gateway (MVC) | 8080 | backend/gateway/ | API entry point, routing, CORS |
| Auth | Spring Boot 3 + JPA | 8083 | backend/authentication/ | Login, register, JWT issuance |
| Documents | Spring Boot 3 + JPA + Spring Kafka + AWS SDK | 8081 | backend/documents/ | Document CRUD, files, departments, translation orchestration |
| Comments | Spring Boot 3 + Spring Data Cassandra | 8082 | backend/comments/ | Comments per document |
| Orchestration | Spring Boot + Spring Integration | 8084 | backend/orchestration/ | ESB-style aggregator (docs+comments) |
| Translator | Python 3.12 + kafka-python + boto3 + OpenAI SDK | – | backend/translation/ | Kafka consumer that calls LLM |
| Auth DB | PostgreSQL 16 | 5432 | StatefulSet auth-postgres |
Persists auth users |
| Documents DB | PostgreSQL 16 | 5432 | StatefulSet docs-postgres |
Persists documents / categories / departments |
| Comments DB | Cassandra 4.1 | 9042 | StatefulSet cassandra |
Persists comments |
| Cache | Redis 7 | 6379 | Deployment redis |
Document cache layer (Spring Cache) |
| Object store | MinIO (S3 API) | 9000 / 9001 | Deployment minio |
Holds raw file bytes |
| Message broker | Apache Kafka 3.7 (KRaft) | 9092 | Deployment kafka |
Translation event bus |
| External LLM | OpenRouter (Gemini / Llama) | HTTPS | – | Translation provider |
Internal naming convention: every service exposes a unique port so that local non-Kubernetes runs don't collide:
Auth 8083
Docs 8081
Comments 8082
Orch 8084
Gateway 8080
MinIO 9000 (api) / 9001 (console)
Kafka 9092 (broker) / 9094 (host advertised in compose)
Postgres 5432
Cassandra 9042
Redis 6379
UI (vite) 5173
flowchart LR
UI[Frontend UI<br/>React + Vite]
GW[Gateway<br/>Spring Cloud Gateway]
AUTH[Auth Service]
DOCS[Documents Service]
COMM[Comments Service]
ORCH[Orchestration Service]
TRANS[Translator Worker<br/>Python]
AUTHDB[(Auth PostgreSQL)]
DOCSDB[(Documents PostgreSQL)]
CASS[(Comments Cassandra)]
REDIS[(Redis cache)]
MINIO[(MinIO / S3)]
KAFKA[[Kafka broker]]
LLM((OpenRouter / Gemini))
UI -->|HTTPS REST| GW
GW -->|/api/auth| AUTH
GW -->|/api/documents<br/>/api/users<br/>/api/admin<br/>/api/categories<br/>/api/departments| DOCS
GW -->|/api/comments| COMM
GW -->|optional| ORCH
AUTH --> AUTHDB
DOCS --> DOCSDB
DOCS --> REDIS
DOCS --> MINIO
DOCS -->|RestClient HTTP| COMM
COMM --> CASS
ORCH -->|RestTemplate| DOCS
ORCH -->|RestTemplate| COMM
DOCS ==>|publish dms.documents.uploaded| KAFKA
KAFKA ==>|consume| TRANS
TRANS -->|getObject / putObject| MINIO
TRANS -->|chat.completions| LLM
TRANS ==>|publish dms.documents.translated<br/>dms.documents.translation-progress<br/>dms.documents.translation-failed| KAFKA
KAFKA ==>|consume| DOCS
Legend:
- Solid arrows: synchronous HTTP / gRPC-ish calls
- Double arrows (
==>): asynchronous Kafka publish/consume -->to a database = JDBC / CQL / S3 SDK call
Tech: Spring Cloud Gateway (MVC variant) on Spring Boot 3, port 8080. Folder: backend/gateway/
Responsibilities:
- Single externally-reachable entry point for all client traffic.
- Path-based routing to the correct downstream service.
- CORS handling for the Vite dev server (
http://localhost:5173). - Stripping the
/api/prefix before forwarding so downstream services see clean paths (e.g./documents/list).
Routes (application.yml:11-55):
| Predicate | Forwards to | Resolved env var |
|---|---|---|
/api/auth/** |
Auth service :8083 |
AUTH_SERVICE_URL |
/api/documents/** |
Documents service :8081 |
DOCUMENTS_SERVICE_URL |
/api/users/** |
Documents service :8081 |
DOCUMENTS_SERVICE_URL |
/api/admin/** |
Documents service :8081 |
DOCUMENTS_SERVICE_URL |
/api/categories/** |
Documents service :8081 |
DOCUMENTS_SERVICE_URL |
/api/departments/** |
Documents service :8081 |
DOCUMENTS_SERVICE_URL |
/api/comments/** |
Comments service :8082 |
COMMENTS_SERVICE_URL |
All routes use a RewritePath=/api/(?<segment>.*), /${segment} filter so a
request to /api/documents/list becomes /documents/list on the downstream
service.
Why a gateway?
- Hides internal cluster topology from clients.
- One CORS configuration to manage instead of N.
- Centralizes the spot where rate-limiting, auth-introspection, request logging or tracing could be added later.
- Externalizes service URLs as env variables so the same image works in any environment (local dev → Kubernetes).
CORS is defined in GatewayCorsConfig.java and intentionally only allows the Vite dev server origin.
Tech: Spring Boot 3 + Spring Data JPA + jjwt, port 8083.
Folder: backend/authentication/
DB: Dedicated PostgreSQL instance (auth-postgres) — schema authdb.
Responsibilities:
- Persist user accounts (
auth_usertable) with salted password hashes. - Issue a signed JWT after a successful
POST /auth/login. - Manage user roles (collection table
user_roles). - Handle password changes and logout timestamps.
- Provide internal endpoints to sync departments and wipe users (admin use).
Endpoints (AuthController.java):
| Method | Path | Purpose |
|---|---|---|
| POST | /auth/login |
Verify credentials, return JWT + identity payload |
| POST | /auth/register |
Create a new user account |
| POST | /auth/change-password |
Self-service password change (requires bearer token) |
| POST | /auth/logout |
Stamp last_logout (no token blacklist — stateless JWT) |
| POST | /auth/users/{email}/department |
Update a user's department (internal sync) |
| DELETE | /auth/users?email=... |
Delete a user by email |
| DELETE | /auth/admin/users/wipe |
Delete all users |
| GET | /auth/health |
Simple health check |
Why is SecurityConfig permitAll?
The Auth service is the issuer — clients must reach /auth/login without
already having a token. Spring Security is enabled (CSRF disabled) but every
endpoint is permitted; authentication is enforced inside the controllers
when needed (e.g. for change-password).
Password storage (PasswordService):
- Per-user random salt + salted hash.
- Verified with
passwordService.matches(rawPassword, hash, salt)on login.
JWT issuance (JwtService.java):
- Algorithm: HMAC-SHA256 (HS256).
- Secret: provided via
JWT_SECRETenv var (base64 encoded by default). - Default lifetime: 24 hours (
jwt.expiration-seconds=86400). - Claims embedded in the token:
id,email,name,role,roles,department,mustChangePassword. Subject = email.
The exact same secret is configured in the Documents service so it can verify tokens locally — see §8 Authentication & Security.
Tech: Spring Boot 3 + Spring Data JPA + Spring Security + Spring Kafka +
AWS SDK v2 (S3) + Spring Cache (Redis), port 8081.
Folder: backend/documents/
DB: Its own PostgreSQL instance (docs-postgres) — schema documentsdb.
This is the largest service. It owns the document domain, the file storage integration, the user/department/category catalog, JWT validation, the translation request pipeline, and the cross-service call to Comments.
Domain entities (JPA):
Document(model/Document.java) — the central entity. Holds title, ownership, department FK, S3fileKey, translation status fields, and a JSONversionsJsoncolumn for past file versions.AppUser(model/AppUser.java) — application-side user record with@ManyToManySet<Department>through theuser_departmentjoin table. Separate from the Auth user but kept in sync by email.Department,Category— admin-managed reference data.
Note: there are two
AppUsertables in the system. The Auth service hasauth_user(credentials + roles). The Documents service hasapp_user(departments + admin-style metadata). They are linked by email — the Documents service looks the user up by the JWT
Controllers:
- DocumentController.java —
/documents/**(list, get, add, update, delete, file upload/download, presign, translation request, versioning, full-with-comments). - UserController —
/users/**user management. - DepartmentController —
/departments/**. - CategoryController —
/categories/**. - AdminController.java —
destructive
/admin/**operations (wipe documents/categories/etc.). - AuthController — legacy /local-mode login (the real auth lives in the Auth service).
Key service-layer components:
| Class | Role |
|---|---|
DocumentService |
CRUD + Redis caching (@CachePut, @CacheEvict). |
DocumentS3Service |
MinIO upload/download/presign; ensures bucket exists at startup. |
CommentClient |
Synchronous HTTP call to Comments service via Spring 6.1 RestClient. Returns empty list on failure (graceful degradation). |
DocumentEventProducer |
Publishes DocumentUploadedEvent to Kafka dms.documents.uploaded. |
TranslationConsumerService |
@KafkaListener consuming dms.documents.translated, …translation-progress, …translation-failed. Persists translation results back into PostgreSQL and uploads the translated .txt to MinIO. |
Security (security/):
JwtAuthFilterruns before Spring Security's username/password filter. Every non-actuator request must carry aAuthorization: Bearer …header. The filter parses claims, builds Spring authorities (ROLE_*prefixed), and populatesSecurityContextHolderwithUsernamePasswordAuthenticationTokenwhose principal is the parsedClaimsobject — controllers can then readid,email,name,roledirectly from the JWT without DB hits.SecurityConfigisSTATELESS, anonymous disabled, every/documents/**endpoint requires a valid token. Only/actuator/health*and/actuator/infoare public for K8s probes.
Access control: documents are scoped by department.
- Admins see everything.
- Regular users only see documents whose
departmentEntity.idis in their assignedSet<Department>(DocumentController.callerHasAccessToDocument). - A user who tries to read a document in a foreign department gets 403 (not 404) to avoid leaking the existence of cross-department documents.
Translation lifecycle (per-document fields on Document):
title ← original
translatedTitle ← filled in once translation succeeds
translationStatus ∈ {NOT_REQUESTED, PENDING, SUCCESS, FAILED}
translationProgress ∈ 0..100
translationError ← message if FAILED
translatedLanguage ← e.g. "French"
translatedFileKey ← S3 key of the .txt produced by the consumer
The flow:
- User clicks Translate in the UI.
- UI calls
POST /api/documents/{id}/request-translation. - Documents sets the doc to
PENDING/0%and publishes aDocumentUploadedEventtodms.documents.uploaded(Kafka). - The translator worker consumes, calls the LLM, publishes progress events, and finally publishes a SUCCESS or FAILED event.
- The Documents service's
TranslationConsumerServiceupdates the DB and uploads the translated text to MinIO atdocuments/{id}/translated_{lang}.txt.
Tech: Spring Boot 3 + Spring Data Cassandra, port 8082.
Folder: backend/comments/
DB: Apache Cassandra 4.1 (keyspace dms).
Why Cassandra? Comments are write-heavy, append-only, and naturally
partition by docId. Cassandra's wide-row model is a great fit: each
document becomes a partition and comments are stored as clustered rows
ordered by commentId. The keyspace is created at deploy time by the
ensure-keyspace init container in 05-comments-service.yml.
Schema (auto-managed by Spring Data Cassandra, schema-action=CREATE_IF_NOT_EXISTS):
- Table
comments - Composite primary key:
- Partition:
docId(Long) - Clustering:
commentId(UUID), descending so the newest comment is read first.
- Partition:
- Columns:
author,content.
Endpoints (CommentController.java):
GET /comments/list/{docId}— list comments newest-first.POST /comments/add— body{docId, author, content}(server generates UUID if missing).
Interactions:
- Called synchronously by Documents (for
/{id}/full) viaCommentClient. - Called directly by clients through the Gateway (
/api/comments/**). - Called by Orchestration as an alternative aggregation path.
Tech: Spring Boot + Spring Integration (Enterprise-Integration-Pattern DSL), port 8084. Folder: backend/orchestration/
Purpose: demonstrate an ESB-style (Enterprise Service Bus) orchestrator that aggregates the responses of multiple services and returns a single combined document.
Endpoint: GET /document/{id} returns { document: {...}, comments: [...] }.
How it works (OrchestrationIntegrationConfig.java):
Two Spring Integration IntegrationFlows, each sitting between a
request DirectChannel and a response QueueChannel:
documentRequestChannel → HTTP GET service.d.url/{id} → documentResponseChannel
commentsRequestChannel → HTTP GET service.m.url/{id} → commentsResponseChannel
The controller sends the document ID to both request channels, then blocks on both response channels with a 5-second timeout. If Service M is unreachable, the comments flow catches the exception and forwards an empty list — the document data is still returned (availability over consistency).
Why a separate service for this when Documents already has /documents/{id}/full?
- Different style: Spring Integration vs. plain RestClient — useful for a course/lab comparing patterns.
- Decouples aggregation from the canonical Documents service.
Tech: Python 3.12 + kafka-python-ng + boto3 (S3) + openai SDK (used
against OpenRouter), no HTTP server, only Kafka consumer/producer.
Folder: backend/translation/
Entry point: translator.py
Why Python? This service is integrating with external LLM APIs that have well-supported Python SDKs and demonstrates polyglot microservices.
Kafka topology used by the translator:
| Topic | Direction | Purpose |
|---|---|---|
dms.documents.uploaded |
consume | A new document needs translation |
dms.documents.translation-progress |
produce | Streaming progress 0–99% |
dms.documents.translated |
produce | SUCCESS event with translated title (and optional content) |
dms.documents.translation-failed |
produce | Dead-letter queue (DLQ) for failed translations |
Consumer group: translator-service (durable offsets).
Offset reset: earliest (don't silently miss documents uploaded while the worker was down).
Auto-commit: disabled — the worker commits after publishing the result. This is the at-least-once invariant: if it crashes between Gemini and the publish, the message is re-processed; if it crashes after publish but before commit, the downstream sees a duplicate (TranslationConsumerService is idempotent).
Retry strategy (translate_with_retry):
- Up to
MAX_RETRIES=3Gemini calls per event. - Exponential backoff: 1 s, 2 s, 4 s, with override from
retry_delay_secondsextracted from Gemini'sRESOURCE_EXHAUSTEDerror. - Errors are classified (
classify_gemini_error) into retryable vs. non-retryable (insufficient_credits,model_not_found,quota_exhausted_daily, …). If a model isn't usable, the worker falls through to the next configured model inGEMINI_MODELS. - After all retries are exhausted the worker publishes to the DLQ topic with the original payload + error info, then commits the offset to make forward progress. Operators can replay from the DLQ later.
Streaming progress: the worker uses the OpenAI SDK in stream=True mode
so it can publish progress events as the LLM streams chunks, letting the UI
show a live progress bar.
Security: GEMINI_API_KEY is never baked into the image. It is
injected from a Kubernetes Secret (openrouter-secret) created by
deploy.fish from the local openrouter_key.txt file (which is gitignored).
S3 access: the worker uses service credentials to read source files directly from MinIO (no JWT path through Documents) — this is intentional: the translator only speaks Kafka + Gemini + S3, never the Documents HTTP API.
Tech: React 19, Vite 6, React Router 6, vanilla CSS. Tests via Cypress and Playwright. Folder: frontend/ui/
Entry points:
src/main.jsx— React root.src/App.jsx— routing.
Layout:
src/
services/api.js — fetch wrapper, JWT injection, base URLs
context/AppContext.jsx — global auth + user state
hooks/ — useUsers, useDocuments, useAppContext
pages/ — Login, Signup, DocumentList, DocumentDetail,
DocumentUploadPage, AdminDashboard, …
components/ — DocumentUploadModal, UserImportWizard, …
API base URLs (src/services/api.js):
VITE_DMS_API_BASE_URLdefaults to/api(Vite proxy).- The token is stored in
localStorageunder the keydms-authand replayed on every fetch asAuthorization: Bearer <token>.
Vite proxy (vite.config.js):
- Dev server runs on
:5173. /apiis proxied tohttp://127.0.0.1:8090— which is whatkubectl port-forward svc/gateway-service 8090:8080exposes locally.
Optional mock backend: npm run db runs a JSON-server using
db.json so the UI can be developed without the full
backend stack.
- Two physically separate instances to honor the "database-per-service"
pattern. Each runs as a Kubernetes
StatefulSetwith aPersistentVolumeClaimso data survives pod restarts. - Image:
postgres:16-alpine. - Auth DB credentials in
auth-postgres-secret(authuser/authpass/authdb). - Documents DB credentials in
docs-postgres-secret(dmsuser/dmspass/documentsdb). - Each service uses Hibernate with
ddl-auto=updateso tables are auto-managed (acceptable for a lab; production would use Flyway/Liquibase).
- Single-node
StatefulSetcassandra:4.1with 2Gi storage. - Keyspace
dmsis created at startup by theensure-keyspaceinit container withSimpleStrategy+ RF=1 (suitable for one-node dev clusters). - Schema-on-startup via Spring Data Cassandra (
schema-action=CREATE_IF_NOT_EXISTS). - Why Cassandra instead of just another Postgres? Comments are append-heavy
with a clear partition key (
docId) — the canonical Cassandra workload — and the project uses this to demonstrate polyglot persistence.
- Image:
redis:7-alpine, single replica Deployment. - Used by the Documents service via Spring Cache abstraction
(
spring.cache.type=redis, TTL 60 s). - DocumentService
annotates writes with
@CachePutand deletes with@CacheEvictso the next read serves the cached entity instead of hitting Postgres.
- Image:
quay.io/minio/minio:latest, single replica Deployment with a 5Gi PVC. - Two ports: 9000 (S3 API) and 9001 (web console at http://localhost:9001 — login
admin/ensia2026). - Bucket:
ensia(auto-created byDocumentS3Service.ensureBucket()). - Object key conventions:
- Original file:
documents/{documentId}/{originalFilename} - Translated text file:
documents/{documentId}/translated_{lang}.txt
- Original file:
- The Documents service uses the AWS SDK v2 for Java with path-style access enabled (required by MinIO).
- The Documents service can generate pre-signed URLs (5 minutes default) so the browser can download large files directly from MinIO without proxying through Spring Boot.
- The Translator uses boto3 to fetch sources and write translated files.
- Image:
apache/kafka:3.7.1running in KRaft mode (no ZooKeeper) — see 10-kafka.yml. - Single-node cluster with
process.roles=broker,controller,num.partitions=1, RF=1 for the internal topics — typical for a lab. - Advertised listener:
kafka:9092so other pods in thedmsnamespace can resolve it via DNS.
| Topic | Producer | Consumer | Payload |
|---|---|---|---|
dms.documents.uploaded |
Documents | Translator | DocumentUploadedEvent (JSON string) |
dms.documents.translation-progress |
Translator | Documents | { documentId, translationStatus, translationProgress } |
dms.documents.translated |
Translator | Documents | { documentId, translatedTitle, translatedContent, targetLanguage, translationModel, … } |
dms.documents.translation-failed |
Translator | (DLQ — Documents updates status to FAILED) | { documentId, error, errorMessage, retryable, modelsTried, … } |
The Documents service intentionally uses StringSerializer and serializes
the event with Jackson manually. The reason is documented at
KafkaProducerConfig.java:21-32:
mixing Jackson 2.x and 3.x on the same classpath (some Spring Kafka
internals use 2.x; the project has 3.x deps) breaks JsonSerializer.
Avoiding it sidesteps the classpath conflict.
For every translation event the message key is String.valueOf(documentId).
This guarantees all events for the same document land on the same partition,
preserving ordering per-document (especially important for progress events).
- Producer:
acks=all— wait for leader + all in-sync replicas before ack. - Consumer: at-least-once.
- Translator manually commits after a successful publish to
dms.documents.translated(or DLQ). - Documents service uses
enable.auto.commit=truebecause the update is idempotent (setting the same translatedTitle twice is harmless).
- Translator manually commits after a successful publish to
If the translator is down, documents pile up in the topic but no data is
lost (Kafka keeps them durably). When it restarts, auto.offset.reset=earliest
plus the persistent consumer group offset means it picks up exactly where it
left off.
- Client
POST /api/auth/login→ Gateway → Auth service. - Auth verifies the salted hash and signs a JWT with HS256 and the shared
secret
JWT_SECRET. - Response body returns the token plus profile (
id,email,name,role,roles,department,mustChangePassword). - The browser persists it in
localStorage(dms-auth).
- The Documents service validates JWTs locally with the same secret
using
JwtAuthFilter(code). - No call to Auth is made on every request — the secret is shared via the
Kubernetes Secret
jwt-secretso both services can sign/verify. - This means if Auth is down, existing valid tokens still work; only new logins fail.
id→ owner ID stamped on new documents.email→ looked up inapp_userto resolveSet<Department>.roles→ mapped to Spring authorities (ROLE_ADMIN,ROLE_USER, …).name→ stamped as documentownerName.
- All
/documents/**endpoints are authenticated and authorized by department. - Actuator probes (
/actuator/health*,/actuator/info) are public so Kubernetes liveness/readiness probes don't need credentials. - Sessions are
STATELESS— noHttpSession, no CSRF (REST APIs). - CORS allowed only from the dev Vite origin (the Gateway pins the list).
| Caller | Callee | Mechanism |
|---|---|---|
| UI | Gateway | fetch |
| Gateway | All backends | Spring Cloud Gateway HTTP routing |
| Documents | Comments | Spring RestClient (CommentClient.java) |
| Orchestration | Documents | Spring Integration HTTP outbound gateway |
| Orchestration | Comments | Spring RestTemplate (wrapped in try/catch for graceful degradation) |
| Producer | Topic | Consumer |
|---|---|---|
| Documents | dms.documents.uploaded |
Translator |
| Translator | dms.documents.translated |
Documents |
| Translator | dms.documents.translation-progress |
Documents |
| Translator | dms.documents.translation-failed |
Documents |
The Documents service and the Translator both speak the S3 API directly to MinIO. The translator deliberately avoids hitting the Documents HTTP API to keep the asynchronous tier loosely coupled.
All in-cluster communication uses the Kubernetes DNS name of the
target service: e.g. the Documents pod reaches Comments at
http://comments-service:8082 because both live in the dms namespace.
Outside the cluster, kubectl port-forward is used.
Browser → Gateway → Auth service → auth-postgres
↑ │
└──── JWT (+ profile) ─────┘
Browser
→ GET /api/documents/list (Bearer <jwt>)
→ Gateway rewrites to /documents/list
→ Documents.JwtAuthFilter parses claims
→ DocumentController.getAllDocuments:
if admin: documentRepository.findAll()
else: documentRepository.findByDepartmentIdIn(userDepartments)
→ Postgres + Redis cache
→ JSON list
Browser
→ GET /api/documents/42/full
→ Gateway → Documents
→ Documents reads doc from Postgres,
verifies caller has access to its department,
then CommentClient.fetchComments("42")
→ HTTP GET http://comments-service:8082/comments/list/42
→ Comments queries Cassandra
→ Documents returns { document, comments }
If Comments is down the response still contains the document and an empty
comments: [] (graceful degradation).
Browser ──POST multipart── /api/documents/add-with-file
│
▼
Documents controller:
1. Resolves department from JWT
2. Creates Document row in Postgres
3. Uploads bytes to MinIO at
documents/{id}/{filename}
4. Stores fileKey on the Document row
│
▼
201 + Document JSON
1. UI clicks "Translate"
2. POST /api/documents/42/request-translation
3. Documents sets status=PENDING, progress=0, publishes
key="42"
topic="dms.documents.uploaded"
value=DocumentUploadedEvent{...}
4. Translator consumer (group=translator-service) picks the event up
5. Translator publishes progress 5%
6. Translator calls OpenRouter (Gemini) with streaming
7. As chunks arrive, translator publishes progress 10..90%
on dms.documents.translation-progress
(Documents service's KafkaListener writes them to the doc row)
8. Translator finishes:
- SUCCESS → publish to dms.documents.translated
- FAILURE → publish to dms.documents.translation-failed
9. Documents.TranslationConsumerService:
- On success: updates doc fields + uploads translated_FR.txt to MinIO
- On failure: marks status=FAILED with truncated error message
10. UI polls /documents/42 and sees the live status / progress / final title
This single flow exercises every pillar of the system: REST, JWT, Postgres, Kafka producer/consumer, an external API call, retry/DLQ, S3 upload, and the cache invalidation that follows the document update.
Every service has its own Dockerfile. All Java services follow the same
multi-stage pattern (illustrated by backend/authentication/Dockerfile):
FROM eclipse-temurin:21-jdk-alpine AS builder
COPY . .
RUN ./mvnw -q package -DskipTests # produces target/<artifact>.jar
FROM eclipse-temurin:21-jre-alpine
COPY --from=builder /build/target/*.jar app.jar
EXPOSE 8083
ENTRYPOINT ["java", "-jar", "app.jar"]
Benefits:
- Final image only contains the JRE + the fat JAR, not Maven and the JDK → much smaller, much faster pulls.
- Build is reproducible: the Maven Wrapper (
mvnw) is checked in. - No JDK is needed on the developer's host.
The Python translator uses the same idea (backend/translation/Dockerfile):
- Stage 1 installs dependencies into
/install. - Stage 2 copies them into a clean
python:3.12-slim, drops privileges to a non-roottranslatoruser, and runspython translator.py. GEMINI_API_KEYis never baked into the image (it lives in a Secret).
The script build-and-load.fish automates this:
- For minikube it runs
eval (minikube docker-env)sodocker buildruns inside the minikube VM's daemon. Result: K8s can pull the images immediately viaimagePullPolicy: Never. No registry needed. - For kind it builds locally then runs
kind load docker-image ….
Images produced:
dms/auth-service:latest
dms/documents-service:latest
dms/comments-service:latest
dms/gateway-service:latest
dms/orchestration-service:latest
dms/translator:latest
All manifests live in infra/k8s/, numbered so they apply in dependency order.
| File | What it creates |
|---|---|
00-namespace.yml |
Namespace dms + Secret jwt-secret |
01-documents-postgres.yml |
Documents Postgres: Secret + StatefulSet + ClusterIP Service |
02-comments-postgres.yml |
Cassandra: StatefulSet + ClusterIP Service |
03-minio.yml |
MinIO: Secret + PVC + Deployment + Service (9000 + 9001) |
04-documents-service.yml |
Documents app: Deployment (+ initContainer waiting for Postgres) + ClusterIP Service |
05-comments-service.yml |
Comments app: Deployment (+ initContainers waiting for Cassandra and creating keyspace) + Service |
06-gateway-service.yml |
Gateway: Deployment + NodePort Service (30080) |
07-auth-service.yml |
Auth app: Deployment (+ initContainer waiting for auth-postgres) + Service |
08-auth-postgres.yml |
Auth Postgres: Secret + StatefulSet + Service |
09-redis.yml |
Redis: Deployment + Service |
10-kafka.yml |
Kafka (KRaft single-node): Deployment + Service |
11-orchestration-service.yml |
Orchestration: Deployment + Service |
12-translator.yml |
Translator: Deployment (consumes openrouter-secret for the LLM API key) |
| Concept | How the project uses it |
|---|---|
| Namespace | Everything lives in dms. kubectl delete ns dms wipes the world. |
| Pod | One running container; all our pods carry exactly one main container (plus init containers where needed). |
| Deployment | Used for stateless apps: gateway, auth, docs, comments, orchestration, translator, redis, kafka, minio. |
| StatefulSet | Used for stateful apps that need stable hostnames and per-replica persistent disks: docs-postgres, auth-postgres, cassandra. |
| PersistentVolumeClaim (PVC) / volumeClaimTemplates | Requests disk space; Kubernetes binds a PersistentVolume. Survives pod restarts. Used by all databases and MinIO. |
| Secret | Holds DB credentials, MinIO root creds, JWT secret, OpenRouter API key. Mounted as env vars via secretKeyRef. |
| ConfigMap | Not currently used — non-sensitive config is passed as inline env vars. |
| Service / ClusterIP | Stable virtual IP + DNS name (<svc>.<ns>.svc.cluster.local). All internal calls use it. |
| Service / NodePort | The gateway is exposed on port 30080 of every node so external clients can reach it. |
| initContainer | Runs before the main container starts. Used to nc -z poll a database port or to cqlsh CREATE KEYSPACE for Cassandra. Prevents Spring Boot from crashing on "Connection refused". |
| Probes (readiness / liveness) | Every app pod exposes /actuator/health/readiness and /actuator/health/liveness. K8s stops sending traffic to an unready pod and restarts an unhealthy one. |
imagePullPolicy: Never |
Tells K8s to use the locally loaded image rather than pulling from a remote registry — required for kind/minikube. |
| Resource ordering | The deploy.fish script applies infra first, waits for it to be Ready, then applies app services. |
Pod → Service (ClusterIP) → selected Pods
Example: the gateway pod sets DOCUMENTS_SERVICE_URL=http://documents-service:8081.
DNS resolves documents-service to the ClusterIP of that Service, which
load-balances over all Pods labelled app: documents-service. If we
scaled the Deployment to replicas: 3, the Service would automatically
fan out traffic across the three pods — no code change needed.
For Minikube the recommended path is:
kubectl -n dms port-forward svc/gateway-service 8090:8080
This forwards localhost:8090 on the dev machine to port 8080 on the
gateway Service inside the cluster. The Vite dev server is then configured
(see vite.config.js) to proxy /api to
127.0.0.1:8090.
The NodePort 30080 is also defined on the Service so minikube service gateway-service -n dms works as an alternative.
- Real microservice independence: an outage in
docs-postgresdoesn't take out auth login. - Schema isolation: each team can evolve its schema without coordinating with others.
- Demonstrates the database-per-service pattern.
cd infra/k8s
fish build-and-load.fish minikube # build all images inside minikube
fish deploy.fish # apply manifests in order, wait at each step
kubectl -n dms port-forward svc/gateway-service 8090:8080In another shell:
cd frontend/ui
npm install
npm run dev
# open http://localhost:5173cd frontend/ui
npm install
npm run db # JSON-server with frontend/ui/db.json
npm run devdocker build -t dms/auth-service:latest backend/authentication
minikube image load dms/auth-service:latest
kubectl -n dms rollout restart deployment/auth-servicekubectl get pods -n dms
kubectl logs -n dms deploy/translator -f
kubectl exec -it -n dms deploy/documents-service -- sh
kubectl -n dms exec statefulset/cassandra -- cqlsh -e "DESCRIBE KEYSPACES"kubectl delete namespace dms # wipes pods, PVCs, secrets, services, …
minikube stop # or `minikube delete` to start freshEvery service follows the 12-factor app principle: configuration comes
from environment variables, with sensible local defaults in
application.properties / application.yml:
spring.datasource.url=jdbc:postgresql://${DB_HOST:localhost}:${DB_PORT:5432}/${DB_NAME:documentsdb}
spring.kafka.bootstrap-servers=${KAFKA_BOOTSTRAP_SERVERS:localhost:9094}
This means the same JAR/Docker image runs unchanged from npm run dev on
your laptop to a Kubernetes pod — only the env vars change.
The Documents service caches single-document reads in Redis with a 60-second
TTL. @CachePut updates the cache on every write so the next read is fresh.
| Pattern | Where |
|---|---|
| Graceful degradation | CommentClient.fetchComments returns [] if Comments is down |
| Init container waits | every service that depends on a DB waits with nc -z |
| Liveness/readiness probes | every Java service exposes /actuator/health/{liveness,readiness} |
| Idempotent consumers | TranslationConsumerService is safe to replay |
| At-least-once messaging | translator commits after publish; Documents auto-commits but is idempotent |
| Dead-letter queue | dms.documents.translation-failed carries unrecoverable events |
| Pre-signed URLs | offload bulk file traffic from the JVM to MinIO directly |
- Every Spring app exposes Spring Boot Actuator endpoints:
health,info,metrics,env,beans(where useful). - Probes use the
health/readinessandhealth/livenessgroups (MANAGEMENT_ENDPOINT_HEALTH_PROBES_ENABLED=true).
There's a tiny k6 benchmark in infra/benchmark/
(comments-benchmark.js + sample output files for the Postgres vs.
Cassandra showdown that motivated the C2 migration).
| Term | Meaning in this project |
|---|---|
| API Gateway | Spring Cloud Gateway pod that fronts the cluster and routes /api/**. |
| Microservice | A small, independently-deployable service that owns its data and exposes a narrow API. |
| JWT (JSON Web Token) | Signed token issued by Auth, validated locally by Documents using a shared HMAC-SHA256 secret. |
| Spring Boot | Java framework used by every Java service for HTTP, JPA, security, Kafka, caching. |
| Spring Cloud Gateway | Reactive/MVC API gateway with declarative routing. |
| Spring Data JPA | ORM layer used by Auth and Documents to talk to Postgres. |
| Spring Data Cassandra | Same idea, for Cassandra (Comments service). |
| Spring Kafka | KafkaTemplate + @KafkaListener annotations used by the Documents service. |
| Spring Integration | EIP framework used by the Orchestration service for channels + flows. |
| Spring Security | Filters / SecurityFilterChain configuration enforcing JWT auth in Documents. |
Spring Cache (@CachePut, @CacheEvict) |
Annotation-based caching layer wired to Redis. |
| JPA / Hibernate | The implementation behind Spring Data JPA. ddl-auto=update lets it manage schema in dev. |
| PostgreSQL | Relational DB used twice (Auth + Documents), each on its own StatefulSet. |
| Cassandra | Wide-column NoSQL DB for comments; great for write-heavy partitioned data. |
| Redis | In-memory key/value store used as a cache. |
| MinIO | Self-hosted S3-compatible object store for raw files. |
| S3 | AWS's object storage API; MinIO speaks it. |
| Pre-signed URL | A short-lived URL that grants direct read access to an S3 object without server-side proxying. |
| Apache Kafka | Distributed event log used for async translation events. |
| KRaft | Kafka's built-in metadata mode (no ZooKeeper) — what 10-kafka.yml uses. |
| Topic / Partition / Offset | Kafka primitives. We use 1 partition per topic; the message key is documentId so per-doc ordering is preserved. |
| Consumer group | Set of consumer instances that share work. translator-service reads dms.documents.uploaded; dms-translation-consumer reads the result topics. |
| At-least-once delivery | The translator commits offsets after publishing the result; on crash, an event might be re-processed. Downstream consumers are written to be idempotent. |
| Dead-Letter Queue (DLQ) | dms.documents.translation-failed is the DLQ for translation events that couldn't be processed. |
| Docker | Container engine; every service has a Dockerfile. |
| Multi-stage Docker build | Separate builder stage (with JDK + Maven) from a minimal runtime stage (JRE only). |
| Kubernetes | The cluster orchestrator running everything. |
| Minikube / kind | Local single-node K8s distributions. The project supports both. |
| kubectl | CLI used to apply manifests, port-forward, view logs, exec into pods. |
| Namespace | Logical isolation inside the cluster; everything is in dms. |
| Pod | Smallest deployable unit (one container in our case). |
| Deployment | Manages stateless replicas with rolling updates. |
| StatefulSet | Manages stateful replicas with stable hostnames and per-replica disks. |
| PVC (PersistentVolumeClaim) | A request for storage that survives pod restarts. |
| Service (ClusterIP) | Stable cluster-internal DNS + virtual IP for a set of pods. |
| Service (NodePort) | Same + exposes the port on every cluster node (used by the gateway). |
| Secret | Base64-encoded sensitive value injected as env var (DB creds, JWT secret, OpenRouter API key). |
| initContainer | Runs to completion before the main container starts; we use it to wait for dependencies (Postgres ready, Cassandra keyspace created). |
| Readiness probe | When it fails, the pod stops receiving traffic. |
| Liveness probe | When it fails enough times, the pod is restarted. |
| OpenRouter / Gemini | External LLM provider used for translation; called by the translator with the OpenAI-compatible SDK. |
| CORS | Cross-Origin Resource Sharing — configured on the Gateway to allow the Vite dev server origin. |
Everything in this document is grounded in the actual code paths in the repository. If you want to dig deeper, the most informative single files are, in order:
- README.md — high-level overview + operational runbook.
- infra/k8s/README.md — Kubernetes walkthrough with diagrams.
- backend/documents/src/main/java/com/documents_service/documents/controller/DocumentController.java — the heart of the domain logic.
- backend/translation/translator.py — the entire asynchronous pipeline in 600 commented lines.
- backend/documents/src/main/java/com/documents_service/documents/service/TranslationConsumerService.java — how the Documents service closes the translation loop.
- infra/k8s/deploy.fish — the deployment order that ties it all together.