Document Management System — Complete Project Documentation

A detailed, end-to-end explanation of the entire DMS platform: every microservice, every database, every infrastructure piece, every concept (Kafka, Kubernetes, JWT, Redis, MinIO/S3, Cassandra…) and how everything is wired together.

Project Overview
Architectural Style
Components at a Glance
Full System Architecture Diagram
Service-by-Service Deep Dive
- 5.1 Gateway Service
- 5.2 Authentication Service
- 5.3 Documents Service
- 5.4 Comments Service
- 5.5 Orchestration Service
- 5.6 Translator Service
- 5.7 Frontend (React + Vite)
Data Stores
- 6.1 PostgreSQL (Auth & Documents)
- 6.2 Cassandra (Comments)
- 6.3 Redis (Cache)
- 6.4 MinIO / S3 (Object Storage)
Messaging: Apache Kafka
Authentication & Security (JWT)
Inter-Service Communication
End-to-End Request Flows
Containerization (Docker)
Kubernetes Deployment
Local Development Workflow
Operational Concerns
Glossary of Concepts

1. Project Overview

The Document Management System (DMS) is a distributed, microservices-based academic lab platform that lets users:

Register, log in, and obtain a JWT (JSON Web Token).
Upload, list, version, download, and delete documents (with file blobs stored in S3-compatible object storage).
Comment on documents.
Trigger asynchronous translations of document titles/contents through an LLM (Gemini / OpenRouter).
Manage users, roles, departments, and categories from an admin UI.

It is intentionally built as multiple independent services to illustrate real-world distributed-system concepts:

Concept	Where it lives
API Gateway	Spring Cloud Gateway (`backend/gateway`)
Stateless JWT-based auth	Auth service (`backend/authentication`)
Domain-driven service boundaries	Documents / Comments / Orchestration / Translator
Polyglot persistence	PostgreSQL + Cassandra + Redis + MinIO
Event-driven asynchronous workflow	Kafka topics for translation events
External AI integration	OpenRouter / Gemini API
Containerization	Per-service Dockerfile (multi-stage)
Orchestration	Kubernetes manifests (`infra/k8s`) on Minikube/kind

2. Architectural Style

The project follows a microservices style with these properties:

One service, one responsibility — every backend module owns a single bounded context (auth, documents, comments, orchestration, translation).
Database-per-service — each domain service owns its own datastore so that storage outages or schema changes never cascade across services.
Hybrid sync + async communication
- Synchronous REST for user-facing requests through the Gateway.
- Asynchronous Kafka events for slow / failure-prone work (translation).
Stateless services — services don't keep request state in memory. The only "state" is in the databases / object store. This enables horizontal scaling (multiple pod replicas) and trivial failover.
Token-based security — the Auth service issues a signed JWT once at login; all other services validate it locally with the shared HS256 secret, so no per-request call back to Auth is required.
Single entry point — clients only ever talk to the Gateway (the internal cluster topology is hidden from the outside world).

3. Components at a Glance

Component	Tech	Port	Folder	Role
Frontend UI	React 19 + Vite	5173	frontend/ui/	User & admin web app
Gateway	Spring Cloud Gateway (MVC)	8080	backend/gateway/	API entry point, routing, CORS
Auth	Spring Boot 3 + JPA	8083	backend/authentication/	Login, register, JWT issuance
Documents	Spring Boot 3 + JPA + Spring Kafka + AWS SDK	8081	backend/documents/	Document CRUD, files, departments, translation orchestration
Comments	Spring Boot 3 + Spring Data Cassandra	8082	backend/comments/	Comments per document
Orchestration	Spring Boot + Spring Integration	8084	backend/orchestration/	ESB-style aggregator (docs+comments)
Translator	Python 3.12 + kafka-python + boto3 + OpenAI SDK	–	backend/translation/	Kafka consumer that calls LLM
Auth DB	PostgreSQL 16	5432	StatefulSet `auth-postgres`	Persists auth users
Documents DB	PostgreSQL 16	5432	StatefulSet `docs-postgres`	Persists documents / categories / departments
Comments DB	Cassandra 4.1	9042	StatefulSet `cassandra`	Persists comments
Cache	Redis 7	6379	Deployment `redis`	Document cache layer (Spring Cache)
Object store	MinIO (S3 API)	9000 / 9001	Deployment `minio`	Holds raw file bytes
Message broker	Apache Kafka 3.7 (KRaft)	9092	Deployment `kafka`	Translation event bus
External LLM	OpenRouter (Gemini / Llama)	HTTPS	–	Translation provider

Internal naming convention: every service exposes a unique port so that local non-Kubernetes runs don't collide:

Auth        8083
Docs        8081
Comments    8082
Orch        8084
Gateway     8080
MinIO       9000 (api) / 9001 (console)
Kafka       9092 (broker) / 9094 (host advertised in compose)
Postgres    5432
Cassandra   9042
Redis       6379
UI (vite)   5173

4. Full System Architecture Diagram

flowchart LR
    UI[Frontend UI<br/>React + Vite]
    GW[Gateway<br/>Spring Cloud Gateway]
    AUTH[Auth Service]
    DOCS[Documents Service]
    COMM[Comments Service]
    ORCH[Orchestration Service]
    TRANS[Translator Worker<br/>Python]
    AUTHDB[(Auth PostgreSQL)]
    DOCSDB[(Documents PostgreSQL)]
    CASS[(Comments Cassandra)]
    REDIS[(Redis cache)]
    MINIO[(MinIO / S3)]
    KAFKA[[Kafka broker]]
    LLM((OpenRouter / Gemini))

    UI -->|HTTPS REST| GW
    GW -->|/api/auth| AUTH
    GW -->|/api/documents<br/>/api/users<br/>/api/admin<br/>/api/categories<br/>/api/departments| DOCS
    GW -->|/api/comments| COMM
    GW -->|optional| ORCH

    AUTH --> AUTHDB
    DOCS --> DOCSDB
    DOCS --> REDIS
    DOCS --> MINIO
    DOCS -->|RestClient HTTP| COMM
    COMM --> CASS
    ORCH -->|RestTemplate| DOCS
    ORCH -->|RestTemplate| COMM

    DOCS ==>|publish dms.documents.uploaded| KAFKA
    KAFKA ==>|consume| TRANS
    TRANS -->|getObject / putObject| MINIO
    TRANS -->|chat.completions| LLM
    TRANS ==>|publish dms.documents.translated<br/>dms.documents.translation-progress<br/>dms.documents.translation-failed| KAFKA
    KAFKA ==>|consume| DOCS

Legend:

Solid arrows: synchronous HTTP / gRPC-ish calls
Double arrows (==>): asynchronous Kafka publish/consume
--> to a database = JDBC / CQL / S3 SDK call

5. Service-by-Service Deep Dive

5.1 Gateway Service

Tech: Spring Cloud Gateway (MVC variant) on Spring Boot 3, port 8080. Folder: backend/gateway/

Responsibilities:

Single externally-reachable entry point for all client traffic.
Path-based routing to the correct downstream service.
CORS handling for the Vite dev server (http://localhost:5173).
Stripping the /api/ prefix before forwarding so downstream services see clean paths (e.g. /documents/list).

Routes (application.yml:11-55):

Predicate	Forwards to	Resolved env var
`/api/auth/**`	Auth service `:8083`	`AUTH_SERVICE_URL`
`/api/documents/**`	Documents service `:8081`	`DOCUMENTS_SERVICE_URL`
`/api/users/**`	Documents service `:8081`	`DOCUMENTS_SERVICE_URL`
`/api/admin/**`	Documents service `:8081`	`DOCUMENTS_SERVICE_URL`
`/api/categories/**`	Documents service `:8081`	`DOCUMENTS_SERVICE_URL`
`/api/departments/**`	Documents service `:8081`	`DOCUMENTS_SERVICE_URL`
`/api/comments/**`	Comments service `:8082`	`COMMENTS_SERVICE_URL`

All routes use a RewritePath=/api/(?<segment>.*), /${segment} filter so a request to /api/documents/list becomes /documents/list on the downstream service.

Why a gateway?

Hides internal cluster topology from clients.
One CORS configuration to manage instead of N.
Centralizes the spot where rate-limiting, auth-introspection, request logging or tracing could be added later.
Externalizes service URLs as env variables so the same image works in any environment (local dev → Kubernetes).

CORS is defined in GatewayCorsConfig.java and intentionally only allows the Vite dev server origin.

5.2 Authentication Service

Tech: Spring Boot 3 + Spring Data JPA + jjwt, port 8083. Folder: backend/authentication/ DB: Dedicated PostgreSQL instance (auth-postgres) — schema authdb.

Responsibilities:

Persist user accounts (auth_user table) with salted password hashes.
Issue a signed JWT after a successful POST /auth/login.
Manage user roles (collection table user_roles).
Handle password changes and logout timestamps.
Provide internal endpoints to sync departments and wipe users (admin use).

Endpoints (AuthController.java):

Method	Path	Purpose
POST	`/auth/login`	Verify credentials, return JWT + identity payload
POST	`/auth/register`	Create a new user account
POST	`/auth/change-password`	Self-service password change (requires bearer token)
POST	`/auth/logout`	Stamp `last_logout` (no token blacklist — stateless JWT)
POST	`/auth/users/{email}/department`	Update a user's department (internal sync)
DELETE	`/auth/users?email=...`	Delete a user by email
DELETE	`/auth/admin/users/wipe`	Delete all users
GET	`/auth/health`	Simple health check

Why is SecurityConfig permitAll? The Auth service is the issuer — clients must reach /auth/login without already having a token. Spring Security is enabled (CSRF disabled) but every endpoint is permitted; authentication is enforced inside the controllers when needed (e.g. for change-password).

Password storage (PasswordService):

Per-user random salt + salted hash.
Verified with passwordService.matches(rawPassword, hash, salt) on login.

JWT issuance (JwtService.java):

Algorithm: HMAC-SHA256 (HS256).
Secret: provided via JWT_SECRET env var (base64 encoded by default).
Default lifetime: 24 hours (jwt.expiration-seconds=86400).
Claims embedded in the token: id, email, name, role, roles, department, mustChangePassword. Subject = email.

The exact same secret is configured in the Documents service so it can verify tokens locally — see §8 Authentication & Security.

5.3 Documents Service

Tech: Spring Boot 3 + Spring Data JPA + Spring Security + Spring Kafka + AWS SDK v2 (S3) + Spring Cache (Redis), port 8081. Folder: backend/documents/ DB: Its own PostgreSQL instance (docs-postgres) — schema documentsdb.

This is the largest service. It owns the document domain, the file storage integration, the user/department/category catalog, JWT validation, the translation request pipeline, and the cross-service call to Comments.

Domain entities (JPA):

Document (model/Document.java) — the central entity. Holds title, ownership, department FK, S3 fileKey, translation status fields, and a JSON versionsJson column for past file versions.
AppUser (model/AppUser.java) — application-side user record with @ManyToMany Set<Department> through the user_department join table. Separate from the Auth user but kept in sync by email.
Department, Category — admin-managed reference data.

Note: there are two AppUser tables in the system. The Auth service has auth_user (credentials + roles). The Documents service has app_user (departments + admin-style metadata). They are linked by email — the Documents service looks the user up by the JWT email claim.

Controllers:

DocumentController.java — /documents/** (list, get, add, update, delete, file upload/download, presign, translation request, versioning, full-with-comments).
UserController — /users/** user management.
DepartmentController — /departments/**.
CategoryController — /categories/**.
AdminController.java — destructive /admin/** operations (wipe documents/categories/etc.).
AuthController — legacy /local-mode login (the real auth lives in the Auth service).

Key service-layer components:

Class	Role
`DocumentService`	CRUD + Redis caching (`@CachePut`, `@CacheEvict`).
`DocumentS3Service`	MinIO upload/download/presign; ensures bucket exists at startup.
`CommentClient`	Synchronous HTTP call to Comments service via Spring 6.1 `RestClient`. Returns empty list on failure (graceful degradation).
`DocumentEventProducer`	Publishes `DocumentUploadedEvent` to Kafka `dms.documents.uploaded`.
`TranslationConsumerService`	`@KafkaListener` consuming `dms.documents.translated`, `…translation-progress`, `…translation-failed`. Persists translation results back into PostgreSQL and uploads the translated `.txt` to MinIO.

Security (security/):

JwtAuthFilter runs before Spring Security's username/password filter. Every non-actuator request must carry a Authorization: Bearer … header. The filter parses claims, builds Spring authorities (ROLE_* prefixed), and populates SecurityContextHolder with UsernamePasswordAuthenticationToken whose principal is the parsed Claims object — controllers can then read id, email, name, role directly from the JWT without DB hits.
SecurityConfig is STATELESS, anonymous disabled, every /documents/** endpoint requires a valid token. Only /actuator/health* and /actuator/info are public for K8s probes.

Access control: documents are scoped by department.

Admins see everything.
Regular users only see documents whose departmentEntity.id is in their assigned Set<Department> (DocumentController.callerHasAccessToDocument).
A user who tries to read a document in a foreign department gets 403 (not 404) to avoid leaking the existence of cross-department documents.

Translation lifecycle (per-document fields on Document):

title              ← original
translatedTitle    ← filled in once translation succeeds
translationStatus  ∈ {NOT_REQUESTED, PENDING, SUCCESS, FAILED}
translationProgress ∈ 0..100
translationError   ← message if FAILED
translatedLanguage ← e.g. "French"
translatedFileKey  ← S3 key of the .txt produced by the consumer

The flow:

User clicks Translate in the UI.
UI calls POST /api/documents/{id}/request-translation.
Documents sets the doc to PENDING/0% and publishes a DocumentUploadedEvent to dms.documents.uploaded (Kafka).
The translator worker consumes, calls the LLM, publishes progress events, and finally publishes a SUCCESS or FAILED event.
The Documents service's TranslationConsumerService updates the DB and uploads the translated text to MinIO at documents/{id}/translated_{lang}.txt.

5.4 Comments Service

Tech: Spring Boot 3 + Spring Data Cassandra, port 8082. Folder: backend/comments/ DB: Apache Cassandra 4.1 (keyspace dms).

Why Cassandra? Comments are write-heavy, append-only, and naturally partition by docId. Cassandra's wide-row model is a great fit: each document becomes a partition and comments are stored as clustered rows ordered by commentId. The keyspace is created at deploy time by the ensure-keyspace init container in 05-comments-service.yml.

Schema (auto-managed by Spring Data Cassandra, schema-action=CREATE_IF_NOT_EXISTS):

Table comments
Composite primary key:
- Partition: docId (Long)
- Clustering: commentId (UUID), descending so the newest comment is read first.
Columns: author, content.

Endpoints (CommentController.java):

GET /comments/list/{docId} — list comments newest-first.
POST /comments/add — body {docId, author, content} (server generates UUID if missing).

Interactions:

Called synchronously by Documents (for /{id}/full) via CommentClient.
Called directly by clients through the Gateway (/api/comments/**).
Called by Orchestration as an alternative aggregation path.

5.5 Orchestration Service

Tech: Spring Boot + Spring Integration (Enterprise-Integration-Pattern DSL), port 8084. Folder: backend/orchestration/

Purpose: demonstrate an ESB-style (Enterprise Service Bus) orchestrator that aggregates the responses of multiple services and returns a single combined document.

Endpoint: GET /document/{id} returns { document: {...}, comments: [...] }.

How it works (OrchestrationIntegrationConfig.java):

Two Spring Integration IntegrationFlows, each sitting between a request DirectChannel and a response QueueChannel:

documentRequestChannel  → HTTP GET service.d.url/{id} → documentResponseChannel
commentsRequestChannel  → HTTP GET service.m.url/{id} → commentsResponseChannel

The controller sends the document ID to both request channels, then blocks on both response channels with a 5-second timeout. If Service M is unreachable, the comments flow catches the exception and forwards an empty list — the document data is still returned (availability over consistency).

Why a separate service for this when Documents already has /documents/{id}/full?

Different style: Spring Integration vs. plain RestClient — useful for a course/lab comparing patterns.
Decouples aggregation from the canonical Documents service.

5.6 Translator Service

Tech: Python 3.12 + kafka-python-ng + boto3 (S3) + openai SDK (used against OpenRouter), no HTTP server, only Kafka consumer/producer. Folder: backend/translation/ Entry point: translator.py

Why Python? This service is integrating with external LLM APIs that have well-supported Python SDKs and demonstrates polyglot microservices.

Kafka topology used by the translator:

Topic	Direction	Purpose
`dms.documents.uploaded`	consume	A new document needs translation
`dms.documents.translation-progress`	produce	Streaming progress 0–99%
`dms.documents.translated`	produce	SUCCESS event with translated title (and optional content)
`dms.documents.translation-failed`	produce	Dead-letter queue (DLQ) for failed translations

Consumer group: translator-service (durable offsets). Offset reset: earliest (don't silently miss documents uploaded while the worker was down). Auto-commit: disabled — the worker commits after publishing the result. This is the at-least-once invariant: if it crashes between Gemini and the publish, the message is re-processed; if it crashes after publish but before commit, the downstream sees a duplicate (TranslationConsumerService is idempotent).

Retry strategy (translate_with_retry):

Up to MAX_RETRIES=3 Gemini calls per event.
Exponential backoff: 1 s, 2 s, 4 s, with override from retry_delay_seconds extracted from Gemini's RESOURCE_EXHAUSTED error.
Errors are classified (classify_gemini_error) into retryable vs. non-retryable (insufficient_credits, model_not_found, quota_exhausted_daily, …). If a model isn't usable, the worker falls through to the next configured model in GEMINI_MODELS.
After all retries are exhausted the worker publishes to the DLQ topic with the original payload + error info, then commits the offset to make forward progress. Operators can replay from the DLQ later.

Streaming progress: the worker uses the OpenAI SDK in stream=True mode so it can publish progress events as the LLM streams chunks, letting the UI show a live progress bar.

Security: GEMINI_API_KEY is never baked into the image. It is injected from a Kubernetes Secret (openrouter-secret) created by deploy.fish from the local openrouter_key.txt file (which is gitignored).

S3 access: the worker uses service credentials to read source files directly from MinIO (no JWT path through Documents) — this is intentional: the translator only speaks Kafka + Gemini + S3, never the Documents HTTP API.

5.7 Frontend (React + Vite)

Tech: React 19, Vite 6, React Router 6, vanilla CSS. Tests via Cypress and Playwright. Folder: frontend/ui/

Entry points:

src/main.jsx — React root.
src/App.jsx — routing.

Layout:

src/
  services/api.js          — fetch wrapper, JWT injection, base URLs
  context/AppContext.jsx   — global auth + user state
  hooks/                   — useUsers, useDocuments, useAppContext
  pages/                   — Login, Signup, DocumentList, DocumentDetail,
                              DocumentUploadPage, AdminDashboard, …
  components/              — DocumentUploadModal, UserImportWizard, …

API base URLs (src/services/api.js):

VITE_DMS_API_BASE_URL defaults to /api (Vite proxy).
The token is stored in localStorage under the key dms-auth and replayed on every fetch as Authorization: Bearer <token>.

Vite proxy (vite.config.js):

Dev server runs on :5173.
/api is proxied to http://127.0.0.1:8090 — which is what kubectl port-forward svc/gateway-service 8090:8080 exposes locally.

Optional mock backend: npm run db runs a JSON-server using db.json so the UI can be developed without the full backend stack.

6. Data Stores

6.1 PostgreSQL (Auth & Documents)

Two physically separate instances to honor the "database-per-service" pattern. Each runs as a Kubernetes StatefulSet with a PersistentVolumeClaim so data survives pod restarts.
Image: postgres:16-alpine.
Auth DB credentials in auth-postgres-secret (authuser / authpass / authdb).
Documents DB credentials in docs-postgres-secret (dmsuser / dmspass / documentsdb).
Each service uses Hibernate with ddl-auto=update so tables are auto-managed (acceptable for a lab; production would use Flyway/Liquibase).

6.2 Cassandra (Comments)

Single-node StatefulSet cassandra:4.1 with 2Gi storage.
Keyspace dms is created at startup by the ensure-keyspace init container with SimpleStrategy + RF=1 (suitable for one-node dev clusters).
Schema-on-startup via Spring Data Cassandra (schema-action=CREATE_IF_NOT_EXISTS).
Why Cassandra instead of just another Postgres? Comments are append-heavy with a clear partition key (docId) — the canonical Cassandra workload — and the project uses this to demonstrate polyglot persistence.

6.3 Redis (Cache)

Image: redis:7-alpine, single replica Deployment.
Used by the Documents service via Spring Cache abstraction (spring.cache.type=redis, TTL 60 s).
DocumentService annotates writes with @CachePut and deletes with @CacheEvict so the next read serves the cached entity instead of hitting Postgres.

6.4 MinIO / S3 (Object Storage)

Image: quay.io/minio/minio:latest, single replica Deployment with a 5Gi PVC.
Two ports: 9000 (S3 API) and 9001 (web console at http://localhost:9001 — login admin / ensia2026).
Bucket: ensia (auto-created by DocumentS3Service.ensureBucket()).
Object key conventions:
- Original file: documents/{documentId}/{originalFilename}
- Translated text file: documents/{documentId}/translated_{lang}.txt
The Documents service uses the AWS SDK v2 for Java with path-style access enabled (required by MinIO).
The Documents service can generate pre-signed URLs (5 minutes default) so the browser can download large files directly from MinIO without proxying through Spring Boot.
The Translator uses boto3 to fetch sources and write translated files.

7. Messaging: Apache Kafka

Broker setup

Image: apache/kafka:3.7.1 running in KRaft mode (no ZooKeeper) — see 10-kafka.yml.
Single-node cluster with process.roles=broker,controller, num.partitions=1, RF=1 for the internal topics — typical for a lab.
Advertised listener: kafka:9092 so other pods in the dms namespace can resolve it via DNS.

Topics used

Topic	Producer	Consumer	Payload
`dms.documents.uploaded`	Documents	Translator	`DocumentUploadedEvent` (JSON string)
`dms.documents.translation-progress`	Translator	Documents	`{ documentId, translationStatus, translationProgress }`
`dms.documents.translated`	Translator	Documents	`{ documentId, translatedTitle, translatedContent, targetLanguage, translationModel, … }`
`dms.documents.translation-failed`	Translator	(DLQ — Documents updates status to FAILED)	`{ documentId, error, errorMessage, retryable, modelsTried, … }`

Why string-serialized JSON instead of Spring's JsonSerializer?

The Documents service intentionally uses StringSerializer and serializes the event with Jackson manually. The reason is documented at KafkaProducerConfig.java:21-32: mixing Jackson 2.x and 3.x on the same classpath (some Spring Kafka internals use 2.x; the project has 3.x deps) breaks JsonSerializer. Avoiding it sidesteps the classpath conflict.

Message key

For every translation event the message key is String.valueOf(documentId). This guarantees all events for the same document land on the same partition, preserving ordering per-document (especially important for progress events).

Delivery semantics

Producer: acks=all — wait for leader + all in-sync replicas before ack.
Consumer: at-least-once.
- Translator manually commits after a successful publish to dms.documents.translated (or DLQ).
- Documents service uses enable.auto.commit=true because the update is idempotent (setting the same translatedTitle twice is harmless).

Failure isolation

If the translator is down, documents pile up in the topic but no data is lost (Kafka keeps them durably). When it restarts, auto.offset.reset=earliest plus the persistent consumer group offset means it picks up exactly where it left off.

8. Authentication & Security (JWT)

Token issuance

Client POST /api/auth/login → Gateway → Auth service.
Auth verifies the salted hash and signs a JWT with HS256 and the shared secret JWT_SECRET.
Response body returns the token plus profile (id, email, name, role, roles, department, mustChangePassword).
The browser persists it in localStorage (dms-auth).

Token validation

The Documents service validates JWTs locally with the same secret using JwtAuthFilter (code).
No call to Auth is made on every request — the secret is shared via the Kubernetes Secret jwt-secret so both services can sign/verify.
This means if Auth is down, existing valid tokens still work; only new logins fail.

Claims used by Documents

id → owner ID stamped on new documents.
email → looked up in app_user to resolve Set<Department>.
roles → mapped to Spring authorities (ROLE_ADMIN, ROLE_USER, …).
name → stamped as document ownerName.

Security defaults

All /documents/** endpoints are authenticated and authorized by department.
Actuator probes (/actuator/health*, /actuator/info) are public so Kubernetes liveness/readiness probes don't need credentials.
Sessions are STATELESS — no HttpSession, no CSRF (REST APIs).
CORS allowed only from the dev Vite origin (the Gateway pins the list).

9. Inter-Service Communication

Synchronous HTTP

Caller	Callee	Mechanism
UI	Gateway	`fetch`
Gateway	All backends	Spring Cloud Gateway HTTP routing
Documents	Comments	Spring `RestClient` (CommentClient.java)
Orchestration	Documents	Spring Integration HTTP outbound gateway
Orchestration	Comments	Spring `RestTemplate` (wrapped in try/catch for graceful degradation)

Asynchronous (Kafka)

Producer	Topic	Consumer
Documents	`dms.documents.uploaded`	Translator
Translator	`dms.documents.translated`	Documents
Translator	`dms.documents.translation-progress`	Documents
Translator	`dms.documents.translation-failed`	Documents

Object store

The Documents service and the Translator both speak the S3 API directly to MinIO. The translator deliberately avoids hitting the Documents HTTP API to keep the asynchronous tier loosely coupled.

Service discovery

All in-cluster communication uses the Kubernetes DNS name of the target service: e.g. the Documents pod reaches Comments at http://comments-service:8082 because both live in the dms namespace. Outside the cluster, kubectl port-forward is used.

10. End-to-End Request Flows

Flow 1 — Login (synchronous)

Browser  →  Gateway  →  Auth service  →  auth-postgres
   ↑                          │
   └──── JWT (+ profile) ─────┘

Flow 2 — List my documents (synchronous, authorized by department)

Browser
  → GET /api/documents/list (Bearer <jwt>)
  → Gateway rewrites to /documents/list
  → Documents.JwtAuthFilter parses claims
  → DocumentController.getAllDocuments:
        if admin: documentRepository.findAll()
        else:    documentRepository.findByDepartmentIdIn(userDepartments)
  → Postgres + Redis cache
  → JSON list

Flow 3 — Document + comments aggregated (Documents → Comments)

Browser
  → GET /api/documents/42/full
  → Gateway → Documents
  → Documents reads doc from Postgres,
    verifies caller has access to its department,
    then CommentClient.fetchComments("42")
  → HTTP GET http://comments-service:8082/comments/list/42
  → Comments queries Cassandra
  → Documents returns { document, comments }

If Comments is down the response still contains the document and an empty comments: [] (graceful degradation).

Flow 4 — Upload a file

Browser  ──POST multipart── /api/documents/add-with-file
                                  │
                                  ▼
                         Documents controller:
                           1. Resolves department from JWT
                           2. Creates Document row in Postgres
                           3. Uploads bytes to MinIO at
                              documents/{id}/{filename}
                           4. Stores fileKey on the Document row
                                  │
                                  ▼
                                 201 + Document JSON

Flow 5 — Translation (the asynchronous showpiece)

1.  UI clicks "Translate"
2.  POST /api/documents/42/request-translation
3.  Documents sets status=PENDING, progress=0, publishes
      key="42"
      topic="dms.documents.uploaded"
      value=DocumentUploadedEvent{...}
4.  Translator consumer (group=translator-service) picks the event up
5.  Translator publishes progress 5%
6.  Translator calls OpenRouter (Gemini) with streaming
7.  As chunks arrive, translator publishes progress 10..90%
    on dms.documents.translation-progress
    (Documents service's KafkaListener writes them to the doc row)
8.  Translator finishes:
      - SUCCESS  → publish to dms.documents.translated
      - FAILURE  → publish to dms.documents.translation-failed
9.  Documents.TranslationConsumerService:
      - On success: updates doc fields + uploads translated_FR.txt to MinIO
      - On failure: marks status=FAILED with truncated error message
10. UI polls /documents/42 and sees the live status / progress / final title

This single flow exercises every pillar of the system: REST, JWT, Postgres, Kafka producer/consumer, an external API call, retry/DLQ, S3 upload, and the cache invalidation that follows the document update.

11. Containerization (Docker)

Every service has its own Dockerfile. All Java services follow the same multi-stage pattern (illustrated by backend/authentication/Dockerfile):

FROM eclipse-temurin:21-jdk-alpine AS builder
COPY . .
RUN ./mvnw -q package -DskipTests          # produces target/<artifact>.jar

FROM eclipse-temurin:21-jre-alpine
COPY --from=builder /build/target/*.jar app.jar
EXPOSE 8083
ENTRYPOINT ["java", "-jar", "app.jar"]

Benefits:

Final image only contains the JRE + the fat JAR, not Maven and the JDK → much smaller, much faster pulls.
Build is reproducible: the Maven Wrapper (mvnw) is checked in.
No JDK is needed on the developer's host.

The Python translator uses the same idea (backend/translation/Dockerfile):

Stage 1 installs dependencies into /install.
Stage 2 copies them into a clean python:3.12-slim, drops privileges to a non-root translator user, and runs python translator.py.
GEMINI_API_KEY is never baked into the image (it lives in a Secret).

Building & loading into the cluster

The script build-and-load.fish automates this:

For minikube it runs eval (minikube docker-env) so docker build runs inside the minikube VM's daemon. Result: K8s can pull the images immediately via imagePullPolicy: Never. No registry needed.
For kind it builds locally then runs kind load docker-image ….

Images produced:

dms/auth-service:latest
dms/documents-service:latest
dms/comments-service:latest
dms/gateway-service:latest
dms/orchestration-service:latest
dms/translator:latest

12. Kubernetes Deployment

All manifests live in infra/k8s/, numbered so they apply in dependency order.

Manifest map

File	What it creates
`00-namespace.yml`	Namespace `dms` + Secret `jwt-secret`
`01-documents-postgres.yml`	Documents Postgres: Secret + StatefulSet + ClusterIP Service
`02-comments-postgres.yml`	Cassandra: StatefulSet + ClusterIP Service
`03-minio.yml`	MinIO: Secret + PVC + Deployment + Service (9000 + 9001)
`04-documents-service.yml`	Documents app: Deployment (+ initContainer waiting for Postgres) + ClusterIP Service
`05-comments-service.yml`	Comments app: Deployment (+ initContainers waiting for Cassandra and creating keyspace) + Service
`06-gateway-service.yml`	Gateway: Deployment + NodePort Service (`30080`)
`07-auth-service.yml`	Auth app: Deployment (+ initContainer waiting for auth-postgres) + Service
`08-auth-postgres.yml`	Auth Postgres: Secret + StatefulSet + Service
`09-redis.yml`	Redis: Deployment + Service
`10-kafka.yml`	Kafka (KRaft single-node): Deployment + Service
`11-orchestration-service.yml`	Orchestration: Deployment + Service
`12-translator.yml`	Translator: Deployment (consumes `openrouter-secret` for the LLM API key)

Kubernetes concepts used

Concept	How the project uses it
Namespace	Everything lives in `dms`. `kubectl delete ns dms` wipes the world.
Pod	One running container; all our pods carry exactly one main container (plus init containers where needed).
Deployment	Used for stateless apps: gateway, auth, docs, comments, orchestration, translator, redis, kafka, minio.
StatefulSet	Used for stateful apps that need stable hostnames and per-replica persistent disks: docs-postgres, auth-postgres, cassandra.
PersistentVolumeClaim (PVC) / volumeClaimTemplates	Requests disk space; Kubernetes binds a PersistentVolume. Survives pod restarts. Used by all databases and MinIO.
Secret	Holds DB credentials, MinIO root creds, JWT secret, OpenRouter API key. Mounted as env vars via `secretKeyRef`.
ConfigMap	Not currently used — non-sensitive config is passed as inline env vars.
Service / ClusterIP	Stable virtual IP + DNS name (`<svc>.<ns>.svc.cluster.local`). All internal calls use it.
Service / NodePort	The gateway is exposed on port `30080` of every node so external clients can reach it.
initContainer	Runs before the main container starts. Used to `nc -z` poll a database port or to `cqlsh CREATE KEYSPACE` for Cassandra. Prevents Spring Boot from crashing on "Connection refused".
Probes (readiness / liveness)	Every app pod exposes `/actuator/health/readiness` and `/actuator/health/liveness`. K8s stops sending traffic to an unready pod and restarts an unhealthy one.
`imagePullPolicy: Never`	Tells K8s to use the locally loaded image rather than pulling from a remote registry — required for kind/minikube.
Resource ordering	The `deploy.fish` script applies infra first, waits for it to be Ready, then applies app services.

Networking inside the cluster

Pod  →  Service (ClusterIP)  →  selected Pods

Example: the gateway pod sets DOCUMENTS_SERVICE_URL=http://documents-service:8081. DNS resolves documents-service to the ClusterIP of that Service, which load-balances over all Pods labelled app: documents-service. If we scaled the Deployment to replicas: 3, the Service would automatically fan out traffic across the three pods — no code change needed.

Networking into the cluster

For Minikube the recommended path is:

kubectl -n dms port-forward svc/gateway-service 8090:8080

This forwards localhost:8090 on the dev machine to port 8080 on the gateway Service inside the cluster. The Vite dev server is then configured (see vite.config.js) to proxy /api to 127.0.0.1:8090.

The NodePort 30080 is also defined on the Service so minikube service gateway-service -n dms works as an alternative.

Why a separate Postgres per service?

Real microservice independence: an outage in docs-postgres doesn't take out auth login.
Schema isolation: each team can evolve its schema without coordinating with others.
Demonstrates the database-per-service pattern.

13. Local Development Workflow

Run the whole platform

cd infra/k8s
fish build-and-load.fish minikube     # build all images inside minikube
fish deploy.fish                      # apply manifests in order, wait at each step
kubectl -n dms port-forward svc/gateway-service 8090:8080

In another shell:

cd frontend/ui
npm install
npm run dev
# open http://localhost:5173

Frontend only (mock backend)

cd frontend/ui
npm install
npm run db          # JSON-server with frontend/ui/db.json
npm run dev

Rebuild a single service after code changes

docker build -t dms/auth-service:latest backend/authentication
minikube image load dms/auth-service:latest
kubectl -n dms rollout restart deployment/auth-service

Useful inspection commands

kubectl get pods -n dms
kubectl logs -n dms deploy/translator -f
kubectl exec -it -n dms deploy/documents-service -- sh
kubectl -n dms exec statefulset/cassandra -- cqlsh -e "DESCRIBE KEYSPACES"

Teardown

kubectl delete namespace dms      # wipes pods, PVCs, secrets, services, …
minikube stop                     # or `minikube delete` to start fresh

14. Operational Concerns

Configuration

Every service follows the 12-factor app principle: configuration comes from environment variables, with sensible local defaults in application.properties / application.yml:

spring.datasource.url=jdbc:postgresql://${DB_HOST:localhost}:${DB_PORT:5432}/${DB_NAME:documentsdb}
spring.kafka.bootstrap-servers=${KAFKA_BOOTSTRAP_SERVERS:localhost:9094}

This means the same JAR/Docker image runs unchanged from npm run dev on your laptop to a Kubernetes pod — only the env vars change.

Caching

The Documents service caches single-document reads in Redis with a 60-second TTL. @CachePut updates the cache on every write so the next read is fresh.

Resilience patterns

Pattern	Where
Graceful degradation	`CommentClient.fetchComments` returns `[]` if Comments is down
Init container waits	every service that depends on a DB waits with `nc -z`
Liveness/readiness probes	every Java service exposes `/actuator/health/{liveness,readiness}`
Idempotent consumers	`TranslationConsumerService` is safe to replay
At-least-once messaging	translator commits after publish; Documents auto-commits but is idempotent
Dead-letter queue	`dms.documents.translation-failed` carries unrecoverable events
Pre-signed URLs	offload bulk file traffic from the JVM to MinIO directly

Observability

Every Spring app exposes Spring Boot Actuator endpoints: health, info, metrics, env, beans (where useful).
Probes use the health/readiness and health/liveness groups (MANAGEMENT_ENDPOINT_HEALTH_PROBES_ENABLED=true).

Performance benchmark

There's a tiny k6 benchmark in infra/benchmark/ (comments-benchmark.js + sample output files for the Postgres vs. Cassandra showdown that motivated the C2 migration).

15. Glossary of Concepts

Term	Meaning in this project
API Gateway	Spring Cloud Gateway pod that fronts the cluster and routes `/api/**`.
Microservice	A small, independently-deployable service that owns its data and exposes a narrow API.
JWT (JSON Web Token)	Signed token issued by Auth, validated locally by Documents using a shared HMAC-SHA256 secret.
Spring Boot	Java framework used by every Java service for HTTP, JPA, security, Kafka, caching.
Spring Cloud Gateway	Reactive/MVC API gateway with declarative routing.
Spring Data JPA	ORM layer used by Auth and Documents to talk to Postgres.
Spring Data Cassandra	Same idea, for Cassandra (Comments service).
Spring Kafka	`KafkaTemplate` + `@KafkaListener` annotations used by the Documents service.
Spring Integration	EIP framework used by the Orchestration service for channels + flows.
Spring Security	Filters / `SecurityFilterChain` configuration enforcing JWT auth in Documents.
Spring Cache (`@CachePut`, `@CacheEvict`)	Annotation-based caching layer wired to Redis.
JPA / Hibernate	The implementation behind Spring Data JPA. `ddl-auto=update` lets it manage schema in dev.
PostgreSQL	Relational DB used twice (Auth + Documents), each on its own StatefulSet.
Cassandra	Wide-column NoSQL DB for comments; great for write-heavy partitioned data.
Redis	In-memory key/value store used as a cache.
MinIO	Self-hosted S3-compatible object store for raw files.
S3	AWS's object storage API; MinIO speaks it.
Pre-signed URL	A short-lived URL that grants direct read access to an S3 object without server-side proxying.
Apache Kafka	Distributed event log used for async translation events.
KRaft	Kafka's built-in metadata mode (no ZooKeeper) — what `10-kafka.yml` uses.
Topic / Partition / Offset	Kafka primitives. We use 1 partition per topic; the message key is `documentId` so per-doc ordering is preserved.
Consumer group	Set of consumer instances that share work. `translator-service` reads `dms.documents.uploaded`; `dms-translation-consumer` reads the result topics.
At-least-once delivery	The translator commits offsets after publishing the result; on crash, an event might be re-processed. Downstream consumers are written to be idempotent.
Dead-Letter Queue (DLQ)	`dms.documents.translation-failed` is the DLQ for translation events that couldn't be processed.
Docker	Container engine; every service has a Dockerfile.
Multi-stage Docker build	Separate `builder` stage (with JDK + Maven) from a minimal runtime stage (JRE only).
Kubernetes	The cluster orchestrator running everything.
Minikube / kind	Local single-node K8s distributions. The project supports both.
kubectl	CLI used to apply manifests, port-forward, view logs, exec into pods.
Namespace	Logical isolation inside the cluster; everything is in `dms`.
Pod	Smallest deployable unit (one container in our case).
Deployment	Manages stateless replicas with rolling updates.
StatefulSet	Manages stateful replicas with stable hostnames and per-replica disks.
PVC (PersistentVolumeClaim)	A request for storage that survives pod restarts.
Service (ClusterIP)	Stable cluster-internal DNS + virtual IP for a set of pods.
Service (NodePort)	Same + exposes the port on every cluster node (used by the gateway).
Secret	Base64-encoded sensitive value injected as env var (DB creds, JWT secret, OpenRouter API key).
initContainer	Runs to completion before the main container starts; we use it to wait for dependencies (Postgres ready, Cassandra keyspace created).
Readiness probe	When it fails, the pod stops receiving traffic.
Liveness probe	When it fails enough times, the pod is restarted.
OpenRouter / Gemini	External LLM provider used for translation; called by the translator with the OpenAI-compatible SDK.
CORS	Cross-Origin Resource Sharing — configured on the Gateway to allow the Vite dev server origin.

Closing notes

Everything in this document is grounded in the actual code paths in the repository. If you want to dig deeper, the most informative single files are, in order:

README.md — high-level overview + operational runbook.
infra/k8s/README.md — Kubernetes walkthrough with diagrams.
backend/documents/src/main/java/com/documents_service/documents/controller/DocumentController.java — the heart of the domain logic.
backend/translation/translator.py — the entire asynchronous pipeline in 600 commented lines.
backend/documents/src/main/java/com/documents_service/documents/service/TranslationConsumerService.java — how the Documents service closes the translation loop.
infra/k8s/deploy.fish — the deployment order that ties it all together.

FilesExpand file tree

PROJECT_DOCUMENTATION.md

Latest commit

History