JavaVisionMind

A modular Spring Boot toolkit that bundles computer-vision and multimodal utilities such as detection, retrieval, and LLM integration.

Overview

JavaVisionMind is a collection of independent Spring Boot services that cover object detection, pose estimation, face recognition, person re-identification, text-based image retrieval, and large-language-model interactions. Each capability ships as a separate module so you can deploy only what you need.

Repository Layout

Module	Description
`vision-mind-yolo-core`	Core inference utilities for YOLOv11, FAST-SAM, pose estimation, and segmentation models.
`vision-mind-yolo-app`	REST facade that exposes the image-analysis capabilities from `vision-mind-yolo-core`.
`vision-mind-ocr-core`	PaddleOCR detector/recognizer/classifier pipeline reused by the OCR service.
`vision-mind-ocr-app`	REST wrapper that surfaces OCR results as JSON or annotated images.
`vision-mind-ffe-app`	Face feature extraction service including detection, alignment, similarity search, and index maintenance.
`vision-mind-reid-app`	Person re-identification workflows backed by Lucene for vector retrieval.
`vision-mind-tbir-app`	Text-Based Image Retrieval service built on CLIP embeddings plus Lucene vector search.
`vision-mind-llm-core`	Wrapper around OpenAI/Ollama style chat endpoints that powers multimodal prompts.
`vision-mind-common`	Shared DTOs, math helpers, and image/vector utilities.
`vision-mind-test-sth`	Scratchpad used for integration experiments and manual verification.

Environment Setup

Install JDK 17 and Maven 3.8+.
Download the required model bundles and OpenCV native runtime. Define the VISION_MIND_PATH environment variable so every module can locate weights and .dll/.so files:

The model files have been uploaded to Alibaba Cloud Drive at https://www.alipan.com/s/ChvZFAKXUDp. Extraction code: 7i5y

# Windows PowerShell
setx VISION_MIND_PATH "F:\\TestSth\\JavaVisionMind\\resource"

# Linux / macOS shell
export VISION_MIND_PATH=/opt/JavaVisionMind/resource

Expected structure:

${VISION_MIND_PATH}
|-- lib
    |-- opencv
        |-- opencv_java490.dll   # Windows
        `-- libopencv_java490.so # Linux

Verify the JVM can load opencv_java490 for your OS (the services auto-pick .dll or .so).
Download resource.7z from the project release page, extract it to the repository root so that model files sit alongside the modules (for example resource/yolo/model/yolo.onnx).

Quick Start

Build

mvn clean install -DskipTests

Run Services

YOLO API: mvn -pl vision-mind-yolo-app spring-boot:run
OCR service: mvn -pl vision-mind-ocr-app spring-boot:run
Face feature service: mvn -pl vision-mind-ffe-app spring-boot:run
Person re-identification: mvn -pl vision-mind-reid-app spring-boot:run
Text-based image retrieval: mvn -pl vision-mind-tbir-app spring-boot:run
LLM chat facade: mvn -pl vision-mind-llm-core spring-boot:run

Each service uses /api as the context root. Default ports can be overridden in the respective application.properties.

Vector storage toggle

vision-mind-ffe-app, vision-mind-reid-app, and vision-mind-tbir-app expose a vector.store.mode switch.
Set to lucene (default) to persist vectors on disk, memory to use the embedded chroma store, or elasticsearch to back vectors with an external ES cluster.
The Elasticsearch mode shares full-dimension embeddings; only the Lucene backend applies the ReID projection matrix.

OCR Capability

The OCR stack combines vision-mind-ocr-core (engine orchestration) and vision-mind-ocr-app (REST facade), wrapping PaddleOCR into ONNX runtimes with optional post-processing.

Switch between lite and ex detector/recognizer pairs with the detectionLevel flag (lite by default, ex for higher accuracy).
Choose the reconstruction strategy via plan, or call /detectWithSR and /detectWithLLM for semantic or LLM-driven text refinement.
Request JPEG overlays from /detectI or /detectWithLLMI to visualise polygons and fine-tuned spans.
Ensure VISION_MIND_PATH points to the OCR ONNX bundle and dictionary so both engines initialise correctly.

curl -X POST http://localhost:17006/vision-mind-ocr/api/v1/ocr/detect \
  -H "Content-Type: application/json" \
  -d '{ "imgUrl": "https://example.com/receipt.jpg", "detectionLevel": "lite" }'

The response is wrapped in HttpResult<List<OcrDetectionResult>>, where each detection includes polygon coordinates, recognised text, and confidence.

API Reference

Below tables outline the primary REST endpoints exposed by each runnable module. HttpResult<T> denotes the project-wide response wrapper containing success, message, and data fields.

vision-mind-yolo-app (Image Analysis)

Method	Path	Description	Request Body	Response
POST	`/api/v1/img/detect`	Run object detection within optional include/exclude polygons.	`DetectionRequestWithArea` JSON (`imgUrl`, `threshold?`, `types?`, `detectionFrames?`, `blockingFrames?`)	`HttpResult<List<Box>>`
POST	`/api/v1/img/detectI`	Same as above but returns the annotated image.	`DetectionRequestWithArea`	`image/jpeg` bytes
POST	`/api/v1/img/detectFace`	Detect faces in given regions.	`DetectionRequestWithArea`	`HttpResult<List<Box>>`
POST	`/api/v1/img/detectFaceI`	Face detection with inline visualization.	`DetectionRequestWithArea`	`image/jpeg` bytes
POST	`/api/v1/img/pose`	Human pose estimation.	`DetectionRequestWithArea`	`HttpResult<List<BoxWithKeypoints>>`
POST	`/api/v1/img/poseI`	Pose estimation with skeleton overlay.	`DetectionRequestWithArea`	`image/jpeg` bytes
POST	`/api/v1/img/sam`	FAST-SAM segmentation, returns bounding boxes.	`DetectionRequest` (`imgUrl`, `threshold?`, `types?`)	`HttpResult<List<Box>>`
POST	`/api/v1/img/samI`	FAST-SAM segmentation visualization.	`DetectionRequest`	`image/jpeg` bytes
POST	`/api/v1/img/seg`	YOLO segmentation output with masks.	`DetectionRequestWithArea`	`HttpResult<List<SegDetection>>`
POST	`/api/v1/img/segI`	Segmentation visualization.	`DetectionRequestWithArea`	`image/jpeg` bytes

vision-mind-ocr-app (Optical Character Recognition)

Method	Path	Description	Request Body	Response
POST	`/api/v1/ocr/detect`	Run PaddleOCR text detection/recognition with switchable lite (`det/rec.onnx`) or ex (`det2/rec2.onnx`) models across the full image.	`OcrDetectionRequest` (`detectionLevel?`, `imgUrl`)	`HttpResult<List<OcrDetectionResult>>`
POST	`/api/v1/ocr/detectI`	Same as above but streams the annotated image.	`OcrDetectionRequest` (`detectionLevel?`, `imgUrl`)	`image/jpeg` bytes
POST	`/api/v1/ocr/detectWithSR`	Applies the semantic reconstruction decoder to smooth noisy OCR output.	`OcrDetectionRequest` (`detectionLevel?`, `plan?`, `imgUrl`)	`HttpResult<String>`
POST	`/api/v1/ocr/detectWithLLM`	Feeds detections through the LLM prompt for higher-level reasoning.	`OcrDetectionRequest` (`detectionLevel?`, `plan?`, `imgUrl`)	`HttpResult<String>`
POST	`/api/v1/ocr/detectWithLLMI`	Returns an LLM-refined overlay image with polygon annotations.	`OcrDetectionRequest` (`detectionLevel?`, `plan?`, `imgUrl`)	`image/jpeg` bytes

vision-mind-ffe-app (Face Feature Extraction)

Method	Path	Description	Request Body	Response
POST	`/api/v1/face/computeFaceVector`	Detect faces and return embeddings without persisting.	`InputWithUrl` (`imgUrl`, `groupId?`, `faceScoreThreshold?`)	`HttpResult<FaceImage>`
POST	`/api/v1/face/saveFaceVector`	Persist an externally computed face vector.	`Input4Save` (`imgUrl`, `groupId`, `id`, `embeds`)	`HttpResult<Void>`
POST	`/api/v1/face/computeAndSaveFaceVector`	Detect faces, store high-quality embeddings, and return inserted items.	`InputWithUrl`	`HttpResult<List<FaceInfo4Add>>`
POST	`/api/v1/face/deleteFace`	Remove a stored face vector by document ID.	`Input4Del` (`id`)	`HttpResult<Void>`
POST	`/api/v1/face/findMostSimilarFace`	Search the index with a probe image.	`Input4Search` (`imgUrl`, `groupId?`, `faceScoreThreshold?`, `confidenceThreshold?`)	`HttpResult<List<FaceInfo4Search>>`
POST	`/api/v1/face/findMostSimilarFaceI`	Retrieve the best match preview image.	`Input4Search`	`image/jpeg` bytes
POST	`/api/v1/face/calculateSimilarity`	Compare two image URLs using cosine similarity.	`Input4Compare` (`imgUrl`, `imgUrl2`)	`HttpResult<Double>`
POST	`/api/v1/face/findSave`	Search first; if nothing matches insert the face into the index.	`Input4Search`	`HttpResult<FaceInfo4SearchAdd>`

vision-mind-reid-app (Person Re-identification)

Method	Path	Description	Request Body	Response
POST	`/api/v1/reid/feature/single`	Extract a single body feature vector.	JSON map `{ "imgUrl": "..." }`	`HttpResult<Feature>`
POST	`/api/v1/reid/feature/calculateSimilarity`	Compare two person crops.	JSON map `{ "imgUrl1", "imgUrl2" }`	`HttpResult<Float>`
POST	`/api/v1/reid/feature/multi`	Detect multiple persons and return vectors for each.	JSON map `{ "imgUrl": "..." }`	`HttpResult<List<Feature>>`
POST	`/api/v1/reid/store/single`	Extract and store a feature with metadata.	JSON map `{ "imgUrl", "cameraId?", "humanId?" }`	`HttpResult<Feature>`
POST	`/api/v1/reid/search`	Search the gallery by image.	JSON map `{ "imgUrl", "cameraId?", "topN", "threshold" }`	`HttpResult<List<Human>>`
POST	`/api/v1/reid/searchOrStore`	Single-cover workflow: search first, otherwise insert.	JSON map `{ "imgUrl", "threshold" }`	`HttpResult<Human>`
POST	`/api/v1/reid/associateStore`	Multi-cover workflow: always store the probe and link to the match.	JSON map `{ "imgUrl", "threshold" }`	`HttpResult<Human>`

vision-mind-tbir-app (Text-Based Image Retrieval)

Method	Path	Description	Request Body	Response
POST	`/api/v1/tbir/saveImg`	Ingest an image: detect, augment, vectorize, and index.	`SaveImageRequest` (`imgUrl`, `imgId?`, `cameraId?`, `groupId?`, `meta?`, `threshold?`, `types?`)	`HttpResult<ImageSaveResult>`
POST	`/api/v1/tbir/deleteImg`	Remove an image and its variants from the index.	`DeleteImageRequest` (`imgId`)	`HttpResult<Void>`
POST	`/api/v1/tbir/searchImg`	Retrieve metadata by stored image ID.	`SearchImageRequest` (`imgId`)	`HttpResult<SearchResult>`
POST	`/api/v1/tbir/searchImgI`	Render bounding boxes for search results of an image ID.	`SearchImageRequest`	`image/jpeg` bytes
POST	`/api/v1/tbir/search`	Text-to-image retrieval.	`SearchRequest` (`query`, `cameraId?`, `groupId?`, `topN?`)	`HttpResult<SearchResult>`
POST	`/api/v1/tbir/searchI`	Text-to-image retrieval with visualization.	`SearchRequest`	`image/jpeg` bytes
POST	`/api/v1/tbir/imgSearch`	Image-to-image search via multipart upload.	`multipart/form-data` (`image`, `topN`)	`HttpResult<SearchResult>`

DTO quick reference

SaveImageRequest extends DetectionRequestWithArea, adding optional imgId, cameraId, groupId, and arbitrary metadata map.

SearchResult wraps a list of HitImage entries (image URL, boxes, score, metadata).

HitImage retains matched sub-boxes for visualization endpoints.

vision-mind-llm-core (Language Services)

Method	Path	Description	Request Body	Response
POST	`/api/translate`	Prompt the configured LLM to translate Chinese text to English.	`Message` (`message`, optional `img`)	Plain text
POST	`/api/chat`	Free-form chat completion.	`Message` (`message`)	Plain text
POST	`/api/chatWithImg`	Multimodal chat using an image URL/base64 plus prompt.	`Message` (`message`, `img`)	Plain text

Resources

JavaVisionMind.postman_collection.json (repository root) provides ready-to-run Postman/Apifox requests for every endpoint.
Model configuration lives under each module鈥檚 src/main/resources/application*.properties for per-service tuning.

Roadmap

LLaMA deployment support with streaming responses.
Alternative in-memory vector backends alongside Lucene.
YOLO video-stream processing pipeline resurrection in vision-mind-yolo-core.

Endpoint Flow Reference

vision-mind-yolo-app

/api/v1/img/detect

Controller validates imgUrl and logs before delegating (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:45).
ImgAnalysisService.detectArea downloads the image into an OpenCV Mat (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:70).
analysis runs YOLOv11 inference, maps raw outputs to Box objects, and filters by requested class IDs (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:121).
Detections must overlap include polygons and avoid block polygons according to the configured ratios before they are returned (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:82).
Remaining boxes are wrapped in HttpResult and returned (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:60).

/api/v1/img/detectI

Controller repeats validation and timing (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:70).
detectAreaI renders the image as BufferedImage and reuses detectArea (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:110).
Include/block frames and boxes are drawn over the image before the controller streams JPEG bytes (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:80).

/api/v1/img/detectFace

Controller checks the payload (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:99).
ImgAnalysisService.detectFace runs the face-trained YOLO model (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:213).
Polygon filtering is applied identically to generic detections (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:220).
Boxes are returned to the controller for response wrapping (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:112).

/api/v1/img/detectFaceI

Validation mirrors the JSON endpoint (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:118).
detectFaceI draws bounding boxes plus include/exclude frames and returns the annotated image (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:253).
Controller streams the JPEG bytes (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:128).

/api/v1/img/pose

Controller validates payload and logs (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:147).
poseArea invokes the YOLOv11 pose model and filters polygons (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:148).
Filtered BoxWithKeypoints are returned (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:160).

/api/v1/img/poseI

Controller handles validation (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:173).
poseAreaI reuses poseArea, draws skeleton overlays, and returns a BufferedImage (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:187).
Controller streams JPEG (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:183).

/api/v1/img/sam

Controller validates and passes through (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:197).
sam executes FastSAM segmentation and returns boxes (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:279).

/api/v1/img/samI

Controller validates (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:216).
samI draws FastSAM boxes onto the image and returns annotated bytes (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:284).

/api/v1/img/seg

Controller checks payload and delegates (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:260).
segArea runs segmentation and returns per-class polygons (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:294).

/api/v1/img/segI

Controller forwards to the service (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:238).
segAreaI draws segmentation polygons on the original image and returns them (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:299).

vision-mind-ocr-app

/api/v1/ocr/detect

Controller validates input, logs timing, and delegates to the service (vision-mind-ocr-app/src/main/java/com/yuqiangdede/ocr/controller/OcrController.java:30).
OcrService.detect routes the request into the shared inference pipeline (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:93).
runInference downloads the image, selects the light/heavy engine, executes PaddleOCR, and applies include/exclude polygons (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:115).
Area-filtered detections are returned to the controller for wrapping (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:146).

/api/v1/ocr/detectI

Controller invokes the overlay variant and prepares HTTP headers (vision-mind-ocr-app/src/main/java/com/yuqiangdede/ocr/controller/OcrController.java:47).
detectWithOverlayBytes reuses detectWithOverlay and encodes the annotated image as JPEG (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:107).
detectWithOverlay draws OCR polygons plus include/exclude frames prior to returning (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:98).

vision-mind-ffe-app

/api/v1/face/computeFaceVector

Controller validates imgUrl and logs (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:60).
FaceService.computeFaceVector extracts faces and embeddings (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:142).
getFaceInfos strips base64 payloads before returning (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:154).

/api/v1/face/saveFaceVector

Controller demands vector info (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:78).
saveFaceVector persists embeddings with FfeVectorStoreUtil.add (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:95).

/api/v1/face/computeAndSaveFaceVector

Controller validates payload (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:96).
computeAndSaveFaceVector filters faces by the requested threshold, stores qualifying embeddings, and returns the trimmed list (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:77).

/api/v1/face/deleteFace

Controller checks document ID (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:118).
delete removes the Lucene record (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:105).

/api/v1/face/findMostSimilarFace

Controller validates thresholds (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:135).
findMostSimilarFace runs extraction, filters by quality, and executes a Lucene top-1 search (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:116).

/api/v1/face/findMostSimilarFaceI

Controller repeats validation (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:153).
The controller streams the top match image returned by the service (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:163).

/api/v1/face/calculateSimilarity

Controller ensures two URLs (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:186).
calculateSimilarity extracts both embeddings, normalizes them, and computes cosine similarity (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:177).

/api/v1/face/findSave

Controller validates the request (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:212).
findSave searches for each quality face, inserting any misses and returning both found and added items (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:197).

vision-mind-reid-app

/api/v1/reid/feature/single

Controller validates request (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:23).
featureSingle embeds the probe and tags it with a UUID (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:75).

/api/v1/reid/feature/calculateSimilarity

Controller checks both URLs (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:39).
calculateSimilarity embeds both probes and computes cosine similarity (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:82).

/api/v1/reid/feature/multi

Controller validates payload (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:56).
featureMulti runs YOLO detection via ImgAnalysisService.detectArea, crops each person, embeds them, and returns the list (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:89).

/api/v1/reid/store/single

Controller enforces required IDs (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:72).
storeSingle embeds the probe, assigns a UUID, and stores using ReidVectorStoreUtil.add (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:109).

/api/v1/reid/search

Controller validates imgUrl, topN, and threshold (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:106).
search embeds the probe and queries Lucene for matching humans with optional camera scoping (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:117).

/api/v1/reid/searchOrStore

Controller validates payload (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:125).
searchOrStore returns the best match or persists a new feature when none is found (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:123).

/api/v1/reid/associateStore

Controller validates request (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:142).
associateStore searches for an existing match and always persists the new embedding, linking it to the matched human if available (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:138).

vision-mind-tbir-app

/api/v1/tbir/saveImg

Controller validates payload (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:46).
saveImg generates or reuses imgId, optionally collects YOLO/FastSAM detections, crops and augments regions, embeds both main and sub-images with CLIP, and persists embeddings with metadata (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:61).

/api/v1/tbir/deleteImg

Controller checks imgId (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:66).
deleteImg validates the identifier, invokes the vector store deletion, and records execution time (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:167).

/api/v1/tbir/searchImg

Controller validates (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:82).
searchImg collects Lucene hits by stored ID and merges them into HitImage DTOs (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:321).

/api/v1/tbir/searchImgI

Controller validates payload (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:98).
searchImgI reuses searchImg, downloads matched images, draws boxes, and returns buffered previews (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:331).

/api/v1/tbir/search

Controller validates query text (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:124).
searchByText expands prompts via LLM, embeds each with CLIP, queries Lucene, merges hits through getFinalList, and returns ranked HitImage results (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:182).

/api/v1/tbir/searchI

Controller validates and delegates (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:143).
searchByTextI draws matched boxes on each result image for preview streaming (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:285).

/api/v1/tbir/imgSearch

Controller accepts multipart upload (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:170).
imgSearch embeds the probe image, queries Lucene, and returns ranked matches (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:302).

vision-mind-llm-core

/api/translate

Controller applies a translation prompt wrapper and delegates (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/controller/ChatController.java:23).
LLMService.chat validates input and routes to OpenAI or Ollama, throwing if neither is configured (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/service/LLMService.java:22).

/api/chat

Controller forwards the free-form prompt (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/controller/ChatController.java:39).
LLMService.chat handles provider selection as above (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/service/LLMService.java:22).

/api/chatWithImg

Controller validates text and optional image (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/controller/ChatController.java:50).
chatWithImg enforces payload completeness, injects a default system prompt if needed, and calls the configured OpenAI vision endpoint (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/service/LLMService.java:49).

Contributions and issue reports are welcome.

Ideas for Future Enhancements

The following directions can extend the current toolkit and may serve as inspiration for upcoming releases:

Multi-object tracking (MOT): Integrate trackers such as DeepSORT or ByteTrack within vision-mind-yolo-core and pair them with detection outputs to provide cross-frame trajectories for security patrols or pedestrian-path analytics.
Fine-grained attribute recognition: Add attribute classifiers for pedestrians, faces, or vehicles (e.g., gender, clothing color, license-plate region) so that vector indexes can support richer filtering.
Video structuring pipeline: Build a batch video ingestion service that runs detection, segmentation, and re-identification on key frames, then archives the structured results for large-scale video libraries or case investigations.
Cross-camera association: Combine the existing re-identification stack with spatiotemporal constraints to correlate identities across camera feeds and trigger rule-based alerts.
Richer multimodal interactions: Extend vision-mind-llm-core with image captioning, visual question answering (VQA), or prompt-template management to improve multimodal Q&A use cases.
Model management & observability: Provide unified model versioning, hot swapping, and inference performance dashboards to streamline operating multiple models in production.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
vision-mind-common		vision-mind-common
vision-mind-ffe-app		vision-mind-ffe-app
vision-mind-llm-core		vision-mind-llm-core
vision-mind-ocr-app		vision-mind-ocr-app
vision-mind-ocr-core		vision-mind-ocr-core
vision-mind-reid-app		vision-mind-reid-app
vision-mind-tbir-app		vision-mind-tbir-app
vision-mind-test-sth		vision-mind-test-sth
vision-mind-yolo-app		vision-mind-yolo-app
vision-mind-yolo-core		vision-mind-yolo-core
.gitignore		.gitignore
AGENTS.md		AGENTS.md
JavaVisionMind.postman_collection.json		JavaVisionMind.postman_collection.json
README.md		README.md
README.zh-CN.md		README.zh-CN.md
pom.xml		pom.xml

yuqiangdede/JavaVisionMind

Folders and files

Latest commit

History

Repository files navigation

JavaVisionMind

Table of Contents

Overview

Repository Layout

Environment Setup

Quick Start

Build

Run Services

Vector storage toggle

OCR Capability

API Reference

vision-mind-yolo-app (Image Analysis)

vision-mind-ocr-app (Optical Character Recognition)

vision-mind-ffe-app (Face Feature Extraction)

vision-mind-reid-app (Person Re-identification)

vision-mind-tbir-app (Text-Based Image Retrieval)

vision-mind-llm-core (Language Services)

Resources

Roadmap

Endpoint Flow Reference

vision-mind-yolo-app

/api/v1/img/detect

/api/v1/img/detectI

/api/v1/img/detectFace

/api/v1/img/detectFaceI

/api/v1/img/pose

/api/v1/img/poseI

/api/v1/img/sam

/api/v1/img/samI

/api/v1/img/seg

/api/v1/img/segI

vision-mind-ocr-app

/api/v1/ocr/detect

/api/v1/ocr/detectI

vision-mind-ffe-app

/api/v1/face/computeFaceVector

/api/v1/face/saveFaceVector

/api/v1/face/computeAndSaveFaceVector

/api/v1/face/deleteFace

/api/v1/face/findMostSimilarFace

/api/v1/face/findMostSimilarFaceI

/api/v1/face/calculateSimilarity

/api/v1/face/findSave

vision-mind-reid-app

/api/v1/reid/feature/single

/api/v1/reid/feature/calculateSimilarity

/api/v1/reid/feature/multi

/api/v1/reid/store/single

/api/v1/reid/search

/api/v1/reid/searchOrStore

/api/v1/reid/associateStore

vision-mind-tbir-app

/api/v1/tbir/saveImg

/api/v1/tbir/deleteImg

/api/v1/tbir/searchImg

/api/v1/tbir/searchImgI

/api/v1/tbir/search

/api/v1/tbir/searchI

/api/v1/tbir/imgSearch

vision-mind-llm-core

/api/translate

/api/chat

/api/chatWithImg

Ideas for Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages