Skip to content

Latest commit

 

History

History
1062 lines (846 loc) · 48.6 KB

File metadata and controls

1062 lines (846 loc) · 48.6 KB

Document Management System — Complete Project Documentation

A detailed, end-to-end explanation of the entire DMS platform: every microservice, every database, every infrastructure piece, every concept (Kafka, Kubernetes, JWT, Redis, MinIO/S3, Cassandra…) and how everything is wired together.


Table of Contents

  1. Project Overview
  2. Architectural Style
  3. Components at a Glance
  4. Full System Architecture Diagram
  5. Service-by-Service Deep Dive
  6. Data Stores
  7. Messaging: Apache Kafka
  8. Authentication & Security (JWT)
  9. Inter-Service Communication
  10. End-to-End Request Flows
  11. Containerization (Docker)
  12. Kubernetes Deployment
  13. Local Development Workflow
  14. Operational Concerns
  15. Glossary of Concepts

1. Project Overview

The Document Management System (DMS) is a distributed, microservices-based academic lab platform that lets users:

  • Register, log in, and obtain a JWT (JSON Web Token).
  • Upload, list, version, download, and delete documents (with file blobs stored in S3-compatible object storage).
  • Comment on documents.
  • Trigger asynchronous translations of document titles/contents through an LLM (Gemini / OpenRouter).
  • Manage users, roles, departments, and categories from an admin UI.

It is intentionally built as multiple independent services to illustrate real-world distributed-system concepts:

Concept Where it lives
API Gateway Spring Cloud Gateway (backend/gateway)
Stateless JWT-based auth Auth service (backend/authentication)
Domain-driven service boundaries Documents / Comments / Orchestration / Translator
Polyglot persistence PostgreSQL + Cassandra + Redis + MinIO
Event-driven asynchronous workflow Kafka topics for translation events
External AI integration OpenRouter / Gemini API
Containerization Per-service Dockerfile (multi-stage)
Orchestration Kubernetes manifests (infra/k8s) on Minikube/kind

2. Architectural Style

The project follows a microservices style with these properties:

  1. One service, one responsibility — every backend module owns a single bounded context (auth, documents, comments, orchestration, translation).
  2. Database-per-service — each domain service owns its own datastore so that storage outages or schema changes never cascade across services.
  3. Hybrid sync + async communication
    • Synchronous REST for user-facing requests through the Gateway.
    • Asynchronous Kafka events for slow / failure-prone work (translation).
  4. Stateless services — services don't keep request state in memory. The only "state" is in the databases / object store. This enables horizontal scaling (multiple pod replicas) and trivial failover.
  5. Token-based security — the Auth service issues a signed JWT once at login; all other services validate it locally with the shared HS256 secret, so no per-request call back to Auth is required.
  6. Single entry point — clients only ever talk to the Gateway (the internal cluster topology is hidden from the outside world).

3. Components at a Glance

Component Tech Port Folder Role
Frontend UI React 19 + Vite 5173 frontend/ui/ User & admin web app
Gateway Spring Cloud Gateway (MVC) 8080 backend/gateway/ API entry point, routing, CORS
Auth Spring Boot 3 + JPA 8083 backend/authentication/ Login, register, JWT issuance
Documents Spring Boot 3 + JPA + Spring Kafka + AWS SDK 8081 backend/documents/ Document CRUD, files, departments, translation orchestration
Comments Spring Boot 3 + Spring Data Cassandra 8082 backend/comments/ Comments per document
Orchestration Spring Boot + Spring Integration 8084 backend/orchestration/ ESB-style aggregator (docs+comments)
Translator Python 3.12 + kafka-python + boto3 + OpenAI SDK backend/translation/ Kafka consumer that calls LLM
Auth DB PostgreSQL 16 5432 StatefulSet auth-postgres Persists auth users
Documents DB PostgreSQL 16 5432 StatefulSet docs-postgres Persists documents / categories / departments
Comments DB Cassandra 4.1 9042 StatefulSet cassandra Persists comments
Cache Redis 7 6379 Deployment redis Document cache layer (Spring Cache)
Object store MinIO (S3 API) 9000 / 9001 Deployment minio Holds raw file bytes
Message broker Apache Kafka 3.7 (KRaft) 9092 Deployment kafka Translation event bus
External LLM OpenRouter (Gemini / Llama) HTTPS Translation provider

Internal naming convention: every service exposes a unique port so that local non-Kubernetes runs don't collide:

Auth        8083
Docs        8081
Comments    8082
Orch        8084
Gateway     8080
MinIO       9000 (api) / 9001 (console)
Kafka       9092 (broker) / 9094 (host advertised in compose)
Postgres    5432
Cassandra   9042
Redis       6379
UI (vite)   5173

4. Full System Architecture Diagram

flowchart LR
    UI[Frontend UI<br/>React + Vite]
    GW[Gateway<br/>Spring Cloud Gateway]
    AUTH[Auth Service]
    DOCS[Documents Service]
    COMM[Comments Service]
    ORCH[Orchestration Service]
    TRANS[Translator Worker<br/>Python]
    AUTHDB[(Auth PostgreSQL)]
    DOCSDB[(Documents PostgreSQL)]
    CASS[(Comments Cassandra)]
    REDIS[(Redis cache)]
    MINIO[(MinIO / S3)]
    KAFKA[[Kafka broker]]
    LLM((OpenRouter / Gemini))

    UI -->|HTTPS REST| GW
    GW -->|/api/auth| AUTH
    GW -->|/api/documents<br/>/api/users<br/>/api/admin<br/>/api/categories<br/>/api/departments| DOCS
    GW -->|/api/comments| COMM
    GW -->|optional| ORCH

    AUTH --> AUTHDB
    DOCS --> DOCSDB
    DOCS --> REDIS
    DOCS --> MINIO
    DOCS -->|RestClient HTTP| COMM
    COMM --> CASS
    ORCH -->|RestTemplate| DOCS
    ORCH -->|RestTemplate| COMM

    DOCS ==>|publish dms.documents.uploaded| KAFKA
    KAFKA ==>|consume| TRANS
    TRANS -->|getObject / putObject| MINIO
    TRANS -->|chat.completions| LLM
    TRANS ==>|publish dms.documents.translated<br/>dms.documents.translation-progress<br/>dms.documents.translation-failed| KAFKA
    KAFKA ==>|consume| DOCS
Loading

Legend:

  • Solid arrows: synchronous HTTP / gRPC-ish calls
  • Double arrows (==>): asynchronous Kafka publish/consume
  • --> to a database = JDBC / CQL / S3 SDK call

5. Service-by-Service Deep Dive

5.1 Gateway Service

Tech: Spring Cloud Gateway (MVC variant) on Spring Boot 3, port 8080. Folder: backend/gateway/

Responsibilities:

  • Single externally-reachable entry point for all client traffic.
  • Path-based routing to the correct downstream service.
  • CORS handling for the Vite dev server (http://localhost:5173).
  • Stripping the /api/ prefix before forwarding so downstream services see clean paths (e.g. /documents/list).

Routes (application.yml:11-55):

Predicate Forwards to Resolved env var
/api/auth/** Auth service :8083 AUTH_SERVICE_URL
/api/documents/** Documents service :8081 DOCUMENTS_SERVICE_URL
/api/users/** Documents service :8081 DOCUMENTS_SERVICE_URL
/api/admin/** Documents service :8081 DOCUMENTS_SERVICE_URL
/api/categories/** Documents service :8081 DOCUMENTS_SERVICE_URL
/api/departments/** Documents service :8081 DOCUMENTS_SERVICE_URL
/api/comments/** Comments service :8082 COMMENTS_SERVICE_URL

All routes use a RewritePath=/api/(?<segment>.*), /${segment} filter so a request to /api/documents/list becomes /documents/list on the downstream service.

Why a gateway?

  • Hides internal cluster topology from clients.
  • One CORS configuration to manage instead of N.
  • Centralizes the spot where rate-limiting, auth-introspection, request logging or tracing could be added later.
  • Externalizes service URLs as env variables so the same image works in any environment (local dev → Kubernetes).

CORS is defined in GatewayCorsConfig.java and intentionally only allows the Vite dev server origin.


5.2 Authentication Service

Tech: Spring Boot 3 + Spring Data JPA + jjwt, port 8083. Folder: backend/authentication/ DB: Dedicated PostgreSQL instance (auth-postgres) — schema authdb.

Responsibilities:

  • Persist user accounts (auth_user table) with salted password hashes.
  • Issue a signed JWT after a successful POST /auth/login.
  • Manage user roles (collection table user_roles).
  • Handle password changes and logout timestamps.
  • Provide internal endpoints to sync departments and wipe users (admin use).

Endpoints (AuthController.java):

Method Path Purpose
POST /auth/login Verify credentials, return JWT + identity payload
POST /auth/register Create a new user account
POST /auth/change-password Self-service password change (requires bearer token)
POST /auth/logout Stamp last_logout (no token blacklist — stateless JWT)
POST /auth/users/{email}/department Update a user's department (internal sync)
DELETE /auth/users?email=... Delete a user by email
DELETE /auth/admin/users/wipe Delete all users
GET /auth/health Simple health check

Why is SecurityConfig permitAll? The Auth service is the issuer — clients must reach /auth/login without already having a token. Spring Security is enabled (CSRF disabled) but every endpoint is permitted; authentication is enforced inside the controllers when needed (e.g. for change-password).

Password storage (PasswordService):

  • Per-user random salt + salted hash.
  • Verified with passwordService.matches(rawPassword, hash, salt) on login.

JWT issuance (JwtService.java):

  • Algorithm: HMAC-SHA256 (HS256).
  • Secret: provided via JWT_SECRET env var (base64 encoded by default).
  • Default lifetime: 24 hours (jwt.expiration-seconds=86400).
  • Claims embedded in the token: id, email, name, role, roles, department, mustChangePassword. Subject = email.

The exact same secret is configured in the Documents service so it can verify tokens locally — see §8 Authentication & Security.


5.3 Documents Service

Tech: Spring Boot 3 + Spring Data JPA + Spring Security + Spring Kafka + AWS SDK v2 (S3) + Spring Cache (Redis), port 8081. Folder: backend/documents/ DB: Its own PostgreSQL instance (docs-postgres) — schema documentsdb.

This is the largest service. It owns the document domain, the file storage integration, the user/department/category catalog, JWT validation, the translation request pipeline, and the cross-service call to Comments.

Domain entities (JPA):

  • Document (model/Document.java) — the central entity. Holds title, ownership, department FK, S3 fileKey, translation status fields, and a JSON versionsJson column for past file versions.
  • AppUser (model/AppUser.java) — application-side user record with @ManyToMany Set<Department> through the user_department join table. Separate from the Auth user but kept in sync by email.
  • Department, Category — admin-managed reference data.

Note: there are two AppUser tables in the system. The Auth service has auth_user (credentials + roles). The Documents service has app_user (departments + admin-style metadata). They are linked by email — the Documents service looks the user up by the JWT email claim.

Controllers:

Key service-layer components:

Class Role
DocumentService CRUD + Redis caching (@CachePut, @CacheEvict).
DocumentS3Service MinIO upload/download/presign; ensures bucket exists at startup.
CommentClient Synchronous HTTP call to Comments service via Spring 6.1 RestClient. Returns empty list on failure (graceful degradation).
DocumentEventProducer Publishes DocumentUploadedEvent to Kafka dms.documents.uploaded.
TranslationConsumerService @KafkaListener consuming dms.documents.translated, …translation-progress, …translation-failed. Persists translation results back into PostgreSQL and uploads the translated .txt to MinIO.

Security (security/):

  • JwtAuthFilter runs before Spring Security's username/password filter. Every non-actuator request must carry a Authorization: Bearer … header. The filter parses claims, builds Spring authorities (ROLE_* prefixed), and populates SecurityContextHolder with UsernamePasswordAuthenticationToken whose principal is the parsed Claims object — controllers can then read id, email, name, role directly from the JWT without DB hits.
  • SecurityConfig is STATELESS, anonymous disabled, every /documents/** endpoint requires a valid token. Only /actuator/health* and /actuator/info are public for K8s probes.

Access control: documents are scoped by department.

  • Admins see everything.
  • Regular users only see documents whose departmentEntity.id is in their assigned Set<Department> (DocumentController.callerHasAccessToDocument).
  • A user who tries to read a document in a foreign department gets 403 (not 404) to avoid leaking the existence of cross-department documents.

Translation lifecycle (per-document fields on Document):

title              ← original
translatedTitle    ← filled in once translation succeeds
translationStatus  ∈ {NOT_REQUESTED, PENDING, SUCCESS, FAILED}
translationProgress ∈ 0..100
translationError   ← message if FAILED
translatedLanguage ← e.g. "French"
translatedFileKey  ← S3 key of the .txt produced by the consumer

The flow:

  1. User clicks Translate in the UI.
  2. UI calls POST /api/documents/{id}/request-translation.
  3. Documents sets the doc to PENDING/0% and publishes a DocumentUploadedEvent to dms.documents.uploaded (Kafka).
  4. The translator worker consumes, calls the LLM, publishes progress events, and finally publishes a SUCCESS or FAILED event.
  5. The Documents service's TranslationConsumerService updates the DB and uploads the translated text to MinIO at documents/{id}/translated_{lang}.txt.

5.4 Comments Service

Tech: Spring Boot 3 + Spring Data Cassandra, port 8082. Folder: backend/comments/ DB: Apache Cassandra 4.1 (keyspace dms).

Why Cassandra? Comments are write-heavy, append-only, and naturally partition by docId. Cassandra's wide-row model is a great fit: each document becomes a partition and comments are stored as clustered rows ordered by commentId. The keyspace is created at deploy time by the ensure-keyspace init container in 05-comments-service.yml.

Schema (auto-managed by Spring Data Cassandra, schema-action=CREATE_IF_NOT_EXISTS):

  • Table comments
  • Composite primary key:
    • Partition: docId (Long)
    • Clustering: commentId (UUID), descending so the newest comment is read first.
  • Columns: author, content.

Endpoints (CommentController.java):

  • GET /comments/list/{docId} — list comments newest-first.
  • POST /comments/add — body {docId, author, content} (server generates UUID if missing).

Interactions:

  • Called synchronously by Documents (for /{id}/full) via CommentClient.
  • Called directly by clients through the Gateway (/api/comments/**).
  • Called by Orchestration as an alternative aggregation path.

5.5 Orchestration Service

Tech: Spring Boot + Spring Integration (Enterprise-Integration-Pattern DSL), port 8084. Folder: backend/orchestration/

Purpose: demonstrate an ESB-style (Enterprise Service Bus) orchestrator that aggregates the responses of multiple services and returns a single combined document.

Endpoint: GET /document/{id} returns { document: {...}, comments: [...] }.

How it works (OrchestrationIntegrationConfig.java):

Two Spring Integration IntegrationFlows, each sitting between a request DirectChannel and a response QueueChannel:

documentRequestChannel  → HTTP GET service.d.url/{id} → documentResponseChannel
commentsRequestChannel  → HTTP GET service.m.url/{id} → commentsResponseChannel

The controller sends the document ID to both request channels, then blocks on both response channels with a 5-second timeout. If Service M is unreachable, the comments flow catches the exception and forwards an empty list — the document data is still returned (availability over consistency).

Why a separate service for this when Documents already has /documents/{id}/full?

  • Different style: Spring Integration vs. plain RestClient — useful for a course/lab comparing patterns.
  • Decouples aggregation from the canonical Documents service.

5.6 Translator Service

Tech: Python 3.12 + kafka-python-ng + boto3 (S3) + openai SDK (used against OpenRouter), no HTTP server, only Kafka consumer/producer. Folder: backend/translation/ Entry point: translator.py

Why Python? This service is integrating with external LLM APIs that have well-supported Python SDKs and demonstrates polyglot microservices.

Kafka topology used by the translator:

Topic Direction Purpose
dms.documents.uploaded consume A new document needs translation
dms.documents.translation-progress produce Streaming progress 0–99%
dms.documents.translated produce SUCCESS event with translated title (and optional content)
dms.documents.translation-failed produce Dead-letter queue (DLQ) for failed translations

Consumer group: translator-service (durable offsets). Offset reset: earliest (don't silently miss documents uploaded while the worker was down). Auto-commit: disabled — the worker commits after publishing the result. This is the at-least-once invariant: if it crashes between Gemini and the publish, the message is re-processed; if it crashes after publish but before commit, the downstream sees a duplicate (TranslationConsumerService is idempotent).

Retry strategy (translate_with_retry):

  • Up to MAX_RETRIES=3 Gemini calls per event.
  • Exponential backoff: 1 s, 2 s, 4 s, with override from retry_delay_seconds extracted from Gemini's RESOURCE_EXHAUSTED error.
  • Errors are classified (classify_gemini_error) into retryable vs. non-retryable (insufficient_credits, model_not_found, quota_exhausted_daily, …). If a model isn't usable, the worker falls through to the next configured model in GEMINI_MODELS.
  • After all retries are exhausted the worker publishes to the DLQ topic with the original payload + error info, then commits the offset to make forward progress. Operators can replay from the DLQ later.

Streaming progress: the worker uses the OpenAI SDK in stream=True mode so it can publish progress events as the LLM streams chunks, letting the UI show a live progress bar.

Security: GEMINI_API_KEY is never baked into the image. It is injected from a Kubernetes Secret (openrouter-secret) created by deploy.fish from the local openrouter_key.txt file (which is gitignored).

S3 access: the worker uses service credentials to read source files directly from MinIO (no JWT path through Documents) — this is intentional: the translator only speaks Kafka + Gemini + S3, never the Documents HTTP API.


5.7 Frontend (React + Vite)

Tech: React 19, Vite 6, React Router 6, vanilla CSS. Tests via Cypress and Playwright. Folder: frontend/ui/

Entry points:

  • src/main.jsx — React root.
  • src/App.jsx — routing.

Layout:

src/
  services/api.js          — fetch wrapper, JWT injection, base URLs
  context/AppContext.jsx   — global auth + user state
  hooks/                   — useUsers, useDocuments, useAppContext
  pages/                   — Login, Signup, DocumentList, DocumentDetail,
                              DocumentUploadPage, AdminDashboard, …
  components/              — DocumentUploadModal, UserImportWizard, …

API base URLs (src/services/api.js):

  • VITE_DMS_API_BASE_URL defaults to /api (Vite proxy).
  • The token is stored in localStorage under the key dms-auth and replayed on every fetch as Authorization: Bearer <token>.

Vite proxy (vite.config.js):

  • Dev server runs on :5173.
  • /api is proxied to http://127.0.0.1:8090 — which is what kubectl port-forward svc/gateway-service 8090:8080 exposes locally.

Optional mock backend: npm run db runs a JSON-server using db.json so the UI can be developed without the full backend stack.


6. Data Stores

6.1 PostgreSQL (Auth & Documents)

  • Two physically separate instances to honor the "database-per-service" pattern. Each runs as a Kubernetes StatefulSet with a PersistentVolumeClaim so data survives pod restarts.
  • Image: postgres:16-alpine.
  • Auth DB credentials in auth-postgres-secret (authuser / authpass / authdb).
  • Documents DB credentials in docs-postgres-secret (dmsuser / dmspass / documentsdb).
  • Each service uses Hibernate with ddl-auto=update so tables are auto-managed (acceptable for a lab; production would use Flyway/Liquibase).

6.2 Cassandra (Comments)

  • Single-node StatefulSet cassandra:4.1 with 2Gi storage.
  • Keyspace dms is created at startup by the ensure-keyspace init container with SimpleStrategy + RF=1 (suitable for one-node dev clusters).
  • Schema-on-startup via Spring Data Cassandra (schema-action=CREATE_IF_NOT_EXISTS).
  • Why Cassandra instead of just another Postgres? Comments are append-heavy with a clear partition key (docId) — the canonical Cassandra workload — and the project uses this to demonstrate polyglot persistence.

6.3 Redis (Cache)

  • Image: redis:7-alpine, single replica Deployment.
  • Used by the Documents service via Spring Cache abstraction (spring.cache.type=redis, TTL 60 s).
  • DocumentService annotates writes with @CachePut and deletes with @CacheEvict so the next read serves the cached entity instead of hitting Postgres.

6.4 MinIO / S3 (Object Storage)

  • Image: quay.io/minio/minio:latest, single replica Deployment with a 5Gi PVC.
  • Two ports: 9000 (S3 API) and 9001 (web console at http://localhost:9001 — login admin / ensia2026).
  • Bucket: ensia (auto-created by DocumentS3Service.ensureBucket()).
  • Object key conventions:
    • Original file: documents/{documentId}/{originalFilename}
    • Translated text file: documents/{documentId}/translated_{lang}.txt
  • The Documents service uses the AWS SDK v2 for Java with path-style access enabled (required by MinIO).
  • The Documents service can generate pre-signed URLs (5 minutes default) so the browser can download large files directly from MinIO without proxying through Spring Boot.
  • The Translator uses boto3 to fetch sources and write translated files.

7. Messaging: Apache Kafka

Broker setup

  • Image: apache/kafka:3.7.1 running in KRaft mode (no ZooKeeper) — see 10-kafka.yml.
  • Single-node cluster with process.roles=broker,controller, num.partitions=1, RF=1 for the internal topics — typical for a lab.
  • Advertised listener: kafka:9092 so other pods in the dms namespace can resolve it via DNS.

Topics used

Topic Producer Consumer Payload
dms.documents.uploaded Documents Translator DocumentUploadedEvent (JSON string)
dms.documents.translation-progress Translator Documents { documentId, translationStatus, translationProgress }
dms.documents.translated Translator Documents { documentId, translatedTitle, translatedContent, targetLanguage, translationModel, … }
dms.documents.translation-failed Translator (DLQ — Documents updates status to FAILED) { documentId, error, errorMessage, retryable, modelsTried, … }

Why string-serialized JSON instead of Spring's JsonSerializer?

The Documents service intentionally uses StringSerializer and serializes the event with Jackson manually. The reason is documented at KafkaProducerConfig.java:21-32: mixing Jackson 2.x and 3.x on the same classpath (some Spring Kafka internals use 2.x; the project has 3.x deps) breaks JsonSerializer. Avoiding it sidesteps the classpath conflict.

Message key

For every translation event the message key is String.valueOf(documentId). This guarantees all events for the same document land on the same partition, preserving ordering per-document (especially important for progress events).

Delivery semantics

  • Producer: acks=all — wait for leader + all in-sync replicas before ack.
  • Consumer: at-least-once.
    • Translator manually commits after a successful publish to dms.documents.translated (or DLQ).
    • Documents service uses enable.auto.commit=true because the update is idempotent (setting the same translatedTitle twice is harmless).

Failure isolation

If the translator is down, documents pile up in the topic but no data is lost (Kafka keeps them durably). When it restarts, auto.offset.reset=earliest plus the persistent consumer group offset means it picks up exactly where it left off.


8. Authentication & Security (JWT)

Token issuance

  1. Client POST /api/auth/login → Gateway → Auth service.
  2. Auth verifies the salted hash and signs a JWT with HS256 and the shared secret JWT_SECRET.
  3. Response body returns the token plus profile (id, email, name, role, roles, department, mustChangePassword).
  4. The browser persists it in localStorage (dms-auth).

Token validation

  • The Documents service validates JWTs locally with the same secret using JwtAuthFilter (code).
  • No call to Auth is made on every request — the secret is shared via the Kubernetes Secret jwt-secret so both services can sign/verify.
  • This means if Auth is down, existing valid tokens still work; only new logins fail.

Claims used by Documents

  • id → owner ID stamped on new documents.
  • email → looked up in app_user to resolve Set<Department>.
  • roles → mapped to Spring authorities (ROLE_ADMIN, ROLE_USER, …).
  • name → stamped as document ownerName.

Security defaults

  • All /documents/** endpoints are authenticated and authorized by department.
  • Actuator probes (/actuator/health*, /actuator/info) are public so Kubernetes liveness/readiness probes don't need credentials.
  • Sessions are STATELESS — no HttpSession, no CSRF (REST APIs).
  • CORS allowed only from the dev Vite origin (the Gateway pins the list).

9. Inter-Service Communication

Synchronous HTTP

Caller Callee Mechanism
UI Gateway fetch
Gateway All backends Spring Cloud Gateway HTTP routing
Documents Comments Spring RestClient (CommentClient.java)
Orchestration Documents Spring Integration HTTP outbound gateway
Orchestration Comments Spring RestTemplate (wrapped in try/catch for graceful degradation)

Asynchronous (Kafka)

Producer Topic Consumer
Documents dms.documents.uploaded Translator
Translator dms.documents.translated Documents
Translator dms.documents.translation-progress Documents
Translator dms.documents.translation-failed Documents

Object store

The Documents service and the Translator both speak the S3 API directly to MinIO. The translator deliberately avoids hitting the Documents HTTP API to keep the asynchronous tier loosely coupled.

Service discovery

All in-cluster communication uses the Kubernetes DNS name of the target service: e.g. the Documents pod reaches Comments at http://comments-service:8082 because both live in the dms namespace. Outside the cluster, kubectl port-forward is used.


10. End-to-End Request Flows

Flow 1 — Login (synchronous)

Browser  →  Gateway  →  Auth service  →  auth-postgres
   ↑                          │
   └──── JWT (+ profile) ─────┘

Flow 2 — List my documents (synchronous, authorized by department)

Browser
  → GET /api/documents/list (Bearer <jwt>)
  → Gateway rewrites to /documents/list
  → Documents.JwtAuthFilter parses claims
  → DocumentController.getAllDocuments:
        if admin: documentRepository.findAll()
        else:    documentRepository.findByDepartmentIdIn(userDepartments)
  → Postgres + Redis cache
  → JSON list

Flow 3 — Document + comments aggregated (Documents → Comments)

Browser
  → GET /api/documents/42/full
  → Gateway → Documents
  → Documents reads doc from Postgres,
    verifies caller has access to its department,
    then CommentClient.fetchComments("42")
  → HTTP GET http://comments-service:8082/comments/list/42
  → Comments queries Cassandra
  → Documents returns { document, comments }

If Comments is down the response still contains the document and an empty comments: [] (graceful degradation).

Flow 4 — Upload a file

Browser  ──POST multipart── /api/documents/add-with-file
                                  │
                                  ▼
                         Documents controller:
                           1. Resolves department from JWT
                           2. Creates Document row in Postgres
                           3. Uploads bytes to MinIO at
                              documents/{id}/{filename}
                           4. Stores fileKey on the Document row
                                  │
                                  ▼
                                 201 + Document JSON

Flow 5 — Translation (the asynchronous showpiece)

1.  UI clicks "Translate"
2.  POST /api/documents/42/request-translation
3.  Documents sets status=PENDING, progress=0, publishes
      key="42"
      topic="dms.documents.uploaded"
      value=DocumentUploadedEvent{...}
4.  Translator consumer (group=translator-service) picks the event up
5.  Translator publishes progress 5%
6.  Translator calls OpenRouter (Gemini) with streaming
7.  As chunks arrive, translator publishes progress 10..90%
    on dms.documents.translation-progress
    (Documents service's KafkaListener writes them to the doc row)
8.  Translator finishes:
      - SUCCESS  → publish to dms.documents.translated
      - FAILURE  → publish to dms.documents.translation-failed
9.  Documents.TranslationConsumerService:
      - On success: updates doc fields + uploads translated_FR.txt to MinIO
      - On failure: marks status=FAILED with truncated error message
10. UI polls /documents/42 and sees the live status / progress / final title

This single flow exercises every pillar of the system: REST, JWT, Postgres, Kafka producer/consumer, an external API call, retry/DLQ, S3 upload, and the cache invalidation that follows the document update.


11. Containerization (Docker)

Every service has its own Dockerfile. All Java services follow the same multi-stage pattern (illustrated by backend/authentication/Dockerfile):

FROM eclipse-temurin:21-jdk-alpine AS builder
COPY . .
RUN ./mvnw -q package -DskipTests          # produces target/<artifact>.jar

FROM eclipse-temurin:21-jre-alpine
COPY --from=builder /build/target/*.jar app.jar
EXPOSE 8083
ENTRYPOINT ["java", "-jar", "app.jar"]

Benefits:

  • Final image only contains the JRE + the fat JAR, not Maven and the JDK → much smaller, much faster pulls.
  • Build is reproducible: the Maven Wrapper (mvnw) is checked in.
  • No JDK is needed on the developer's host.

The Python translator uses the same idea (backend/translation/Dockerfile):

  • Stage 1 installs dependencies into /install.
  • Stage 2 copies them into a clean python:3.12-slim, drops privileges to a non-root translator user, and runs python translator.py.
  • GEMINI_API_KEY is never baked into the image (it lives in a Secret).

Building & loading into the cluster

The script build-and-load.fish automates this:

  • For minikube it runs eval (minikube docker-env) so docker build runs inside the minikube VM's daemon. Result: K8s can pull the images immediately via imagePullPolicy: Never. No registry needed.
  • For kind it builds locally then runs kind load docker-image ….

Images produced:

dms/auth-service:latest
dms/documents-service:latest
dms/comments-service:latest
dms/gateway-service:latest
dms/orchestration-service:latest
dms/translator:latest

12. Kubernetes Deployment

All manifests live in infra/k8s/, numbered so they apply in dependency order.

Manifest map

File What it creates
00-namespace.yml Namespace dms + Secret jwt-secret
01-documents-postgres.yml Documents Postgres: Secret + StatefulSet + ClusterIP Service
02-comments-postgres.yml Cassandra: StatefulSet + ClusterIP Service
03-minio.yml MinIO: Secret + PVC + Deployment + Service (9000 + 9001)
04-documents-service.yml Documents app: Deployment (+ initContainer waiting for Postgres) + ClusterIP Service
05-comments-service.yml Comments app: Deployment (+ initContainers waiting for Cassandra and creating keyspace) + Service
06-gateway-service.yml Gateway: Deployment + NodePort Service (30080)
07-auth-service.yml Auth app: Deployment (+ initContainer waiting for auth-postgres) + Service
08-auth-postgres.yml Auth Postgres: Secret + StatefulSet + Service
09-redis.yml Redis: Deployment + Service
10-kafka.yml Kafka (KRaft single-node): Deployment + Service
11-orchestration-service.yml Orchestration: Deployment + Service
12-translator.yml Translator: Deployment (consumes openrouter-secret for the LLM API key)

Kubernetes concepts used

Concept How the project uses it
Namespace Everything lives in dms. kubectl delete ns dms wipes the world.
Pod One running container; all our pods carry exactly one main container (plus init containers where needed).
Deployment Used for stateless apps: gateway, auth, docs, comments, orchestration, translator, redis, kafka, minio.
StatefulSet Used for stateful apps that need stable hostnames and per-replica persistent disks: docs-postgres, auth-postgres, cassandra.
PersistentVolumeClaim (PVC) / volumeClaimTemplates Requests disk space; Kubernetes binds a PersistentVolume. Survives pod restarts. Used by all databases and MinIO.
Secret Holds DB credentials, MinIO root creds, JWT secret, OpenRouter API key. Mounted as env vars via secretKeyRef.
ConfigMap Not currently used — non-sensitive config is passed as inline env vars.
Service / ClusterIP Stable virtual IP + DNS name (<svc>.<ns>.svc.cluster.local). All internal calls use it.
Service / NodePort The gateway is exposed on port 30080 of every node so external clients can reach it.
initContainer Runs before the main container starts. Used to nc -z poll a database port or to cqlsh CREATE KEYSPACE for Cassandra. Prevents Spring Boot from crashing on "Connection refused".
Probes (readiness / liveness) Every app pod exposes /actuator/health/readiness and /actuator/health/liveness. K8s stops sending traffic to an unready pod and restarts an unhealthy one.
imagePullPolicy: Never Tells K8s to use the locally loaded image rather than pulling from a remote registry — required for kind/minikube.
Resource ordering The deploy.fish script applies infra first, waits for it to be Ready, then applies app services.

Networking inside the cluster

Pod  →  Service (ClusterIP)  →  selected Pods

Example: the gateway pod sets DOCUMENTS_SERVICE_URL=http://documents-service:8081. DNS resolves documents-service to the ClusterIP of that Service, which load-balances over all Pods labelled app: documents-service. If we scaled the Deployment to replicas: 3, the Service would automatically fan out traffic across the three pods — no code change needed.

Networking into the cluster

For Minikube the recommended path is:

kubectl -n dms port-forward svc/gateway-service 8090:8080

This forwards localhost:8090 on the dev machine to port 8080 on the gateway Service inside the cluster. The Vite dev server is then configured (see vite.config.js) to proxy /api to 127.0.0.1:8090.

The NodePort 30080 is also defined on the Service so minikube service gateway-service -n dms works as an alternative.

Why a separate Postgres per service?

  • Real microservice independence: an outage in docs-postgres doesn't take out auth login.
  • Schema isolation: each team can evolve its schema without coordinating with others.
  • Demonstrates the database-per-service pattern.

13. Local Development Workflow

Run the whole platform

cd infra/k8s
fish build-and-load.fish minikube     # build all images inside minikube
fish deploy.fish                      # apply manifests in order, wait at each step
kubectl -n dms port-forward svc/gateway-service 8090:8080

In another shell:

cd frontend/ui
npm install
npm run dev
# open http://localhost:5173

Frontend only (mock backend)

cd frontend/ui
npm install
npm run db          # JSON-server with frontend/ui/db.json
npm run dev

Rebuild a single service after code changes

docker build -t dms/auth-service:latest backend/authentication
minikube image load dms/auth-service:latest
kubectl -n dms rollout restart deployment/auth-service

Useful inspection commands

kubectl get pods -n dms
kubectl logs -n dms deploy/translator -f
kubectl exec -it -n dms deploy/documents-service -- sh
kubectl -n dms exec statefulset/cassandra -- cqlsh -e "DESCRIBE KEYSPACES"

Teardown

kubectl delete namespace dms      # wipes pods, PVCs, secrets, services, …
minikube stop                     # or `minikube delete` to start fresh

14. Operational Concerns

Configuration

Every service follows the 12-factor app principle: configuration comes from environment variables, with sensible local defaults in application.properties / application.yml:

spring.datasource.url=jdbc:postgresql://${DB_HOST:localhost}:${DB_PORT:5432}/${DB_NAME:documentsdb}
spring.kafka.bootstrap-servers=${KAFKA_BOOTSTRAP_SERVERS:localhost:9094}

This means the same JAR/Docker image runs unchanged from npm run dev on your laptop to a Kubernetes pod — only the env vars change.

Caching

The Documents service caches single-document reads in Redis with a 60-second TTL. @CachePut updates the cache on every write so the next read is fresh.

Resilience patterns

Pattern Where
Graceful degradation CommentClient.fetchComments returns [] if Comments is down
Init container waits every service that depends on a DB waits with nc -z
Liveness/readiness probes every Java service exposes /actuator/health/{liveness,readiness}
Idempotent consumers TranslationConsumerService is safe to replay
At-least-once messaging translator commits after publish; Documents auto-commits but is idempotent
Dead-letter queue dms.documents.translation-failed carries unrecoverable events
Pre-signed URLs offload bulk file traffic from the JVM to MinIO directly

Observability

  • Every Spring app exposes Spring Boot Actuator endpoints: health, info, metrics, env, beans (where useful).
  • Probes use the health/readiness and health/liveness groups (MANAGEMENT_ENDPOINT_HEALTH_PROBES_ENABLED=true).

Performance benchmark

There's a tiny k6 benchmark in infra/benchmark/ (comments-benchmark.js + sample output files for the Postgres vs. Cassandra showdown that motivated the C2 migration).


15. Glossary of Concepts

Term Meaning in this project
API Gateway Spring Cloud Gateway pod that fronts the cluster and routes /api/**.
Microservice A small, independently-deployable service that owns its data and exposes a narrow API.
JWT (JSON Web Token) Signed token issued by Auth, validated locally by Documents using a shared HMAC-SHA256 secret.
Spring Boot Java framework used by every Java service for HTTP, JPA, security, Kafka, caching.
Spring Cloud Gateway Reactive/MVC API gateway with declarative routing.
Spring Data JPA ORM layer used by Auth and Documents to talk to Postgres.
Spring Data Cassandra Same idea, for Cassandra (Comments service).
Spring Kafka KafkaTemplate + @KafkaListener annotations used by the Documents service.
Spring Integration EIP framework used by the Orchestration service for channels + flows.
Spring Security Filters / SecurityFilterChain configuration enforcing JWT auth in Documents.
Spring Cache (@CachePut, @CacheEvict) Annotation-based caching layer wired to Redis.
JPA / Hibernate The implementation behind Spring Data JPA. ddl-auto=update lets it manage schema in dev.
PostgreSQL Relational DB used twice (Auth + Documents), each on its own StatefulSet.
Cassandra Wide-column NoSQL DB for comments; great for write-heavy partitioned data.
Redis In-memory key/value store used as a cache.
MinIO Self-hosted S3-compatible object store for raw files.
S3 AWS's object storage API; MinIO speaks it.
Pre-signed URL A short-lived URL that grants direct read access to an S3 object without server-side proxying.
Apache Kafka Distributed event log used for async translation events.
KRaft Kafka's built-in metadata mode (no ZooKeeper) — what 10-kafka.yml uses.
Topic / Partition / Offset Kafka primitives. We use 1 partition per topic; the message key is documentId so per-doc ordering is preserved.
Consumer group Set of consumer instances that share work. translator-service reads dms.documents.uploaded; dms-translation-consumer reads the result topics.
At-least-once delivery The translator commits offsets after publishing the result; on crash, an event might be re-processed. Downstream consumers are written to be idempotent.
Dead-Letter Queue (DLQ) dms.documents.translation-failed is the DLQ for translation events that couldn't be processed.
Docker Container engine; every service has a Dockerfile.
Multi-stage Docker build Separate builder stage (with JDK + Maven) from a minimal runtime stage (JRE only).
Kubernetes The cluster orchestrator running everything.
Minikube / kind Local single-node K8s distributions. The project supports both.
kubectl CLI used to apply manifests, port-forward, view logs, exec into pods.
Namespace Logical isolation inside the cluster; everything is in dms.
Pod Smallest deployable unit (one container in our case).
Deployment Manages stateless replicas with rolling updates.
StatefulSet Manages stateful replicas with stable hostnames and per-replica disks.
PVC (PersistentVolumeClaim) A request for storage that survives pod restarts.
Service (ClusterIP) Stable cluster-internal DNS + virtual IP for a set of pods.
Service (NodePort) Same + exposes the port on every cluster node (used by the gateway).
Secret Base64-encoded sensitive value injected as env var (DB creds, JWT secret, OpenRouter API key).
initContainer Runs to completion before the main container starts; we use it to wait for dependencies (Postgres ready, Cassandra keyspace created).
Readiness probe When it fails, the pod stops receiving traffic.
Liveness probe When it fails enough times, the pod is restarted.
OpenRouter / Gemini External LLM provider used for translation; called by the translator with the OpenAI-compatible SDK.
CORS Cross-Origin Resource Sharing — configured on the Gateway to allow the Vite dev server origin.

Closing notes

Everything in this document is grounded in the actual code paths in the repository. If you want to dig deeper, the most informative single files are, in order:

  1. README.md — high-level overview + operational runbook.
  2. infra/k8s/README.md — Kubernetes walkthrough with diagrams.
  3. backend/documents/src/main/java/com/documents_service/documents/controller/DocumentController.java — the heart of the domain logic.
  4. backend/translation/translator.py — the entire asynchronous pipeline in 600 commented lines.
  5. backend/documents/src/main/java/com/documents_service/documents/service/TranslationConsumerService.java — how the Documents service closes the translation loop.
  6. infra/k8s/deploy.fish — the deployment order that ties it all together.