A modular Spring Boot toolkit that bundles computer-vision and multimodal utilities such as detection, retrieval, and LLM integration.
- Overview
- Repository Layout
- Environment Setup
- Quick Start
- API Reference
- Resources
- Endpoint Flow Reference
- Roadmap
JavaVisionMind is a collection of independent Spring Boot services that cover object detection, pose estimation, face recognition, person re-identification, text-based image retrieval, and large-language-model interactions. Each capability ships as a separate module so you can deploy only what you need.
| Module | Description |
|---|---|
vision-mind-yolo-core |
Core inference utilities for YOLOv11, FAST-SAM, pose estimation, and segmentation models. |
vision-mind-yolo-app |
REST facade that exposes the image-analysis capabilities from vision-mind-yolo-core. |
vision-mind-ocr-core |
PaddleOCR detector/recognizer/classifier pipeline reused by the OCR service. |
vision-mind-ocr-app |
REST wrapper that surfaces OCR results as JSON or annotated images. |
vision-mind-ffe-app |
Face feature extraction service including detection, alignment, similarity search, and index maintenance. |
vision-mind-reid-app |
Person re-identification workflows backed by Lucene for vector retrieval. |
vision-mind-tbir-app |
Text-Based Image Retrieval service built on CLIP embeddings plus Lucene vector search. |
vision-mind-llm-core |
Wrapper around OpenAI/Ollama style chat endpoints that powers multimodal prompts. |
vision-mind-common |
Shared DTOs, math helpers, and image/vector utilities. |
vision-mind-test-sth |
Scratchpad used for integration experiments and manual verification. |
-
Install JDK 17 and Maven 3.8+.
-
Download the required model bundles and OpenCV native runtime. Define the
VISION_MIND_PATHenvironment variable so every module can locate weights and.dll/.sofiles: -
The model files have been uploaded to Alibaba Cloud Drive at https://www.alipan.com/s/ChvZFAKXUDp. Extraction code: 7i5y
# Windows PowerShell setx VISION_MIND_PATH "F:\\TestSth\\JavaVisionMind\\resource" # Linux / macOS shell export VISION_MIND_PATH=/opt/JavaVisionMind/resource
Expected structure:
${VISION_MIND_PATH} |-- lib |-- opencv |-- opencv_java490.dll # Windows `-- libopencv_java490.so # Linux -
Verify the JVM can load
opencv_java490for your OS (the services auto-pick.dllor.so). -
Download
resource.7zfrom the project release page, extract it to the repository root so that model files sit alongside the modules (for exampleresource/yolo/model/yolo.onnx).
mvn clean install -DskipTests- YOLO API:
mvn -pl vision-mind-yolo-app spring-boot:run - OCR service:
mvn -pl vision-mind-ocr-app spring-boot:run - Face feature service:
mvn -pl vision-mind-ffe-app spring-boot:run - Person re-identification:
mvn -pl vision-mind-reid-app spring-boot:run - Text-based image retrieval:
mvn -pl vision-mind-tbir-app spring-boot:run - LLM chat facade:
mvn -pl vision-mind-llm-core spring-boot:run
Each service uses /api as the context root. Default ports can be overridden in the respective application.properties.
vision-mind-ffe-app,vision-mind-reid-app, andvision-mind-tbir-appexpose avector.store.modeswitch.- Set to
lucene(default) to persist vectors on disk,memoryto use the embedded chroma store, orelasticsearchto back vectors with an external ES cluster. - The Elasticsearch mode shares full-dimension embeddings; only the Lucene backend applies the ReID projection matrix.
The OCR stack combines vision-mind-ocr-core (engine orchestration) and vision-mind-ocr-app (REST facade), wrapping PaddleOCR into ONNX runtimes with optional post-processing.
- Switch between lite and ex detector/recognizer pairs with the
detectionLevelflag (liteby default,exfor higher accuracy). - Choose the reconstruction strategy via
plan, or call/detectWithSRand/detectWithLLMfor semantic or LLM-driven text refinement. - Request JPEG overlays from
/detectIor/detectWithLLMIto visualise polygons and fine-tuned spans. - Ensure
VISION_MIND_PATHpoints to the OCR ONNX bundle and dictionary so both engines initialise correctly.
curl -X POST http://localhost:17006/vision-mind-ocr/api/v1/ocr/detect \
-H "Content-Type: application/json" \
-d '{ "imgUrl": "https://example.com/receipt.jpg", "detectionLevel": "lite" }'The response is wrapped in HttpResult<List<OcrDetectionResult>>, where each detection includes polygon coordinates, recognised text, and confidence.
Below tables outline the primary REST endpoints exposed by each runnable module. HttpResult<T> denotes the project-wide response wrapper containing success, message, and data fields.
| Method | Path | Description | Request Body | Response |
|---|---|---|---|---|
| POST | /api/v1/img/detect |
Run object detection within optional include/exclude polygons. | DetectionRequestWithArea JSON (imgUrl, threshold?, types?, detectionFrames?, blockingFrames?) |
HttpResult<List<Box>> |
| POST | /api/v1/img/detectI |
Same as above but returns the annotated image. | DetectionRequestWithArea |
image/jpeg bytes |
| POST | /api/v1/img/detectFace |
Detect faces in given regions. | DetectionRequestWithArea |
HttpResult<List<Box>> |
| POST | /api/v1/img/detectFaceI |
Face detection with inline visualization. | DetectionRequestWithArea |
image/jpeg bytes |
| POST | /api/v1/img/pose |
Human pose estimation. | DetectionRequestWithArea |
HttpResult<List<BoxWithKeypoints>> |
| POST | /api/v1/img/poseI |
Pose estimation with skeleton overlay. | DetectionRequestWithArea |
image/jpeg bytes |
| POST | /api/v1/img/sam |
FAST-SAM segmentation, returns bounding boxes. | DetectionRequest (imgUrl, threshold?, types?) |
HttpResult<List<Box>> |
| POST | /api/v1/img/samI |
FAST-SAM segmentation visualization. | DetectionRequest |
image/jpeg bytes |
| POST | /api/v1/img/seg |
YOLO segmentation output with masks. | DetectionRequestWithArea |
HttpResult<List<SegDetection>> |
| POST | /api/v1/img/segI |
Segmentation visualization. | DetectionRequestWithArea |
image/jpeg bytes |
| Method | Path | Description | Request Body | Response |
|---|---|---|---|---|
| POST | /api/v1/ocr/detect |
Run PaddleOCR text detection/recognition with switchable lite (det/rec.onnx) or ex (det2/rec2.onnx) models across the full image. |
OcrDetectionRequest (detectionLevel?, imgUrl) |
HttpResult<List<OcrDetectionResult>> |
| POST | /api/v1/ocr/detectI |
Same as above but streams the annotated image. | OcrDetectionRequest (detectionLevel?, imgUrl) |
image/jpeg bytes |
| POST | /api/v1/ocr/detectWithSR |
Applies the semantic reconstruction decoder to smooth noisy OCR output. | OcrDetectionRequest (detectionLevel?, plan?, imgUrl) |
HttpResult<String> |
| POST | /api/v1/ocr/detectWithLLM |
Feeds detections through the LLM prompt for higher-level reasoning. | OcrDetectionRequest (detectionLevel?, plan?, imgUrl) |
HttpResult<String> |
| POST | /api/v1/ocr/detectWithLLMI |
Returns an LLM-refined overlay image with polygon annotations. | OcrDetectionRequest (detectionLevel?, plan?, imgUrl) |
image/jpeg bytes |
| Method | Path | Description | Request Body | Response |
|---|---|---|---|---|
| POST | /api/v1/face/computeFaceVector |
Detect faces and return embeddings without persisting. | InputWithUrl (imgUrl, groupId?, faceScoreThreshold?) |
HttpResult<FaceImage> |
| POST | /api/v1/face/saveFaceVector |
Persist an externally computed face vector. | Input4Save (imgUrl, groupId, id, embeds) |
HttpResult<Void> |
| POST | /api/v1/face/computeAndSaveFaceVector |
Detect faces, store high-quality embeddings, and return inserted items. | InputWithUrl |
HttpResult<List<FaceInfo4Add>> |
| POST | /api/v1/face/deleteFace |
Remove a stored face vector by document ID. | Input4Del (id) |
HttpResult<Void> |
| POST | /api/v1/face/findMostSimilarFace |
Search the index with a probe image. | Input4Search (imgUrl, groupId?, faceScoreThreshold?, confidenceThreshold?) |
HttpResult<List<FaceInfo4Search>> |
| POST | /api/v1/face/findMostSimilarFaceI |
Retrieve the best match preview image. | Input4Search |
image/jpeg bytes |
| POST | /api/v1/face/calculateSimilarity |
Compare two image URLs using cosine similarity. | Input4Compare (imgUrl, imgUrl2) |
HttpResult<Double> |
| POST | /api/v1/face/findSave |
Search first; if nothing matches insert the face into the index. | Input4Search |
HttpResult<FaceInfo4SearchAdd> |
| Method | Path | Description | Request Body | Response |
|---|---|---|---|---|
| POST | /api/v1/reid/feature/single |
Extract a single body feature vector. | JSON map { "imgUrl": "..." } |
HttpResult<Feature> |
| POST | /api/v1/reid/feature/calculateSimilarity |
Compare two person crops. | JSON map { "imgUrl1", "imgUrl2" } |
HttpResult<Float> |
| POST | /api/v1/reid/feature/multi |
Detect multiple persons and return vectors for each. | JSON map { "imgUrl": "..." } |
HttpResult<List<Feature>> |
| POST | /api/v1/reid/store/single |
Extract and store a feature with metadata. | JSON map { "imgUrl", "cameraId?", "humanId?" } |
HttpResult<Feature> |
| POST | /api/v1/reid/search |
Search the gallery by image. | JSON map { "imgUrl", "cameraId?", "topN", "threshold" } |
HttpResult<List<Human>> |
| POST | /api/v1/reid/searchOrStore |
Single-cover workflow: search first, otherwise insert. | JSON map { "imgUrl", "threshold" } |
HttpResult<Human> |
| POST | /api/v1/reid/associateStore |
Multi-cover workflow: always store the probe and link to the match. | JSON map { "imgUrl", "threshold" } |
HttpResult<Human> |
| Method | Path | Description | Request Body | Response |
|---|---|---|---|---|
| POST | /api/v1/tbir/saveImg |
Ingest an image: detect, augment, vectorize, and index. | SaveImageRequest (imgUrl, imgId?, cameraId?, groupId?, meta?, threshold?, types?) |
HttpResult<ImageSaveResult> |
| POST | /api/v1/tbir/deleteImg |
Remove an image and its variants from the index. | DeleteImageRequest (imgId) |
HttpResult<Void> |
| POST | /api/v1/tbir/searchImg |
Retrieve metadata by stored image ID. | SearchImageRequest (imgId) |
HttpResult<SearchResult> |
| POST | /api/v1/tbir/searchImgI |
Render bounding boxes for search results of an image ID. | SearchImageRequest |
image/jpeg bytes |
| POST | /api/v1/tbir/search |
Text-to-image retrieval. | SearchRequest (query, cameraId?, groupId?, topN?) |
HttpResult<SearchResult> |
| POST | /api/v1/tbir/searchI |
Text-to-image retrieval with visualization. | SearchRequest |
image/jpeg bytes |
| POST | /api/v1/tbir/imgSearch |
Image-to-image search via multipart upload. | multipart/form-data (image, topN) |
HttpResult<SearchResult> |
DTO quick reference
SaveImageRequestextendsDetectionRequestWithArea, adding optionalimgId,cameraId,groupId, and arbitrary metadata map.SearchResultwraps a list ofHitImageentries (image URL, boxes, score, metadata).HitImageretains matched sub-boxes for visualization endpoints.
| Method | Path | Description | Request Body | Response |
|---|---|---|---|---|
| POST | /api/translate |
Prompt the configured LLM to translate Chinese text to English. | Message (message, optional img) |
Plain text |
| POST | /api/chat |
Free-form chat completion. | Message (message) |
Plain text |
| POST | /api/chatWithImg |
Multimodal chat using an image URL/base64 plus prompt. | Message (message, img) |
Plain text |
JavaVisionMind.postman_collection.json(repository root) provides ready-to-run Postman/Apifox requests for every endpoint.- Model configuration lives under each module鈥檚
src/main/resources/application*.propertiesfor per-service tuning.
- LLaMA deployment support with streaming responses.
- Alternative in-memory vector backends alongside Lucene.
- YOLO video-stream processing pipeline resurrection in
vision-mind-yolo-core.
- Controller validates imgUrl and logs before delegating (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:45).
- ImgAnalysisService.detectArea downloads the image into an OpenCV Mat (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:70).
- analysis runs YOLOv11 inference, maps raw outputs to Box objects, and filters by requested class IDs (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:121).
- Detections must overlap include polygons and avoid block polygons according to the configured ratios before they are returned (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:82).
- Remaining boxes are wrapped in HttpResult and returned (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:60).
- Controller repeats validation and timing (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:70).
- detectAreaI renders the image as BufferedImage and reuses detectArea (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:110).
- Include/block frames and boxes are drawn over the image before the controller streams JPEG bytes (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:80).
- Controller checks the payload (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:99).
- ImgAnalysisService.detectFace runs the face-trained YOLO model (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:213).
- Polygon filtering is applied identically to generic detections (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:220).
- Boxes are returned to the controller for response wrapping (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:112).
- Validation mirrors the JSON endpoint (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:118).
- detectFaceI draws bounding boxes plus include/exclude frames and returns the annotated image (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:253).
- Controller streams the JPEG bytes (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:128).
- Controller validates payload and logs (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:147).
- poseArea invokes the YOLOv11 pose model and filters polygons (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:148).
- Filtered BoxWithKeypoints are returned (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:160).
- Controller handles validation (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:173).
- poseAreaI reuses poseArea, draws skeleton overlays, and returns a BufferedImage (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:187).
- Controller streams JPEG (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:183).
- Controller validates and passes through (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:197).
- sam executes FastSAM segmentation and returns boxes (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:279).
- Controller validates (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:216).
- samI draws FastSAM boxes onto the image and returns annotated bytes (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:284).
- Controller checks payload and delegates (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:260).
- segArea runs segmentation and returns per-class polygons (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:294).
- Controller forwards to the service (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:238).
- segAreaI draws segmentation polygons on the original image and returns them (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:299).
- Controller validates input, logs timing, and delegates to the service (vision-mind-ocr-app/src/main/java/com/yuqiangdede/ocr/controller/OcrController.java:30).
OcrService.detectroutes the request into the shared inference pipeline (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:93).runInferencedownloads the image, selects the light/heavy engine, executes PaddleOCR, and applies include/exclude polygons (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:115).- Area-filtered detections are returned to the controller for wrapping (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:146).
- Controller invokes the overlay variant and prepares HTTP headers (vision-mind-ocr-app/src/main/java/com/yuqiangdede/ocr/controller/OcrController.java:47).
detectWithOverlayBytesreusesdetectWithOverlayand encodes the annotated image as JPEG (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:107).detectWithOverlaydraws OCR polygons plus include/exclude frames prior to returning (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:98).
- Controller validates imgUrl and logs (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:60).
- FaceService.computeFaceVector extracts faces and embeddings (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:142).
- getFaceInfos strips base64 payloads before returning (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:154).
- Controller demands vector info (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:78).
- saveFaceVector persists embeddings with FfeVectorStoreUtil.add (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:95).
- Controller validates payload (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:96).
- computeAndSaveFaceVector filters faces by the requested threshold, stores qualifying embeddings, and returns the trimmed list (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:77).
- Controller checks document ID (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:118).
- delete removes the Lucene record (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:105).
- Controller validates thresholds (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:135).
- findMostSimilarFace runs extraction, filters by quality, and executes a Lucene top-1 search (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:116).
- Controller repeats validation (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:153).
- The controller streams the top match image returned by the service (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:163).
- Controller ensures two URLs (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:186).
- calculateSimilarity extracts both embeddings, normalizes them, and computes cosine similarity (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:177).
- Controller validates the request (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:212).
- findSave searches for each quality face, inserting any misses and returning both found and added items (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:197).
- Controller validates request (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:23).
- featureSingle embeds the probe and tags it with a UUID (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:75).
- Controller checks both URLs (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:39).
- calculateSimilarity embeds both probes and computes cosine similarity (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:82).
- Controller validates payload (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:56).
- featureMulti runs YOLO detection via ImgAnalysisService.detectArea, crops each person, embeds them, and returns the list (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:89).
- Controller enforces required IDs (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:72).
- storeSingle embeds the probe, assigns a UUID, and stores using ReidVectorStoreUtil.add (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:109).
- Controller validates imgUrl, topN, and threshold (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:106).
- search embeds the probe and queries Lucene for matching humans with optional camera scoping (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:117).
- Controller validates payload (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:125).
- searchOrStore returns the best match or persists a new feature when none is found (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:123).
- Controller validates request (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:142).
- associateStore searches for an existing match and always persists the new embedding, linking it to the matched human if available (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:138).
- Controller validates payload (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:46).
- saveImg generates or reuses imgId, optionally collects YOLO/FastSAM detections, crops and augments regions, embeds both main and sub-images with CLIP, and persists embeddings with metadata (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:61).
- Controller checks imgId (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:66).
- deleteImg validates the identifier, invokes the vector store deletion, and records execution time (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:167).
- Controller validates (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:82).
- searchImg collects Lucene hits by stored ID and merges them into HitImage DTOs (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:321).
- Controller validates payload (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:98).
- searchImgI reuses searchImg, downloads matched images, draws boxes, and returns buffered previews (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:331).
- Controller validates query text (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:124).
- searchByText expands prompts via LLM, embeds each with CLIP, queries Lucene, merges hits through getFinalList, and returns ranked HitImage results (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:182).
- Controller validates and delegates (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:143).
- searchByTextI draws matched boxes on each result image for preview streaming (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:285).
- Controller accepts multipart upload (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:170).
- imgSearch embeds the probe image, queries Lucene, and returns ranked matches (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:302).
- Controller applies a translation prompt wrapper and delegates (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/controller/ChatController.java:23).
- LLMService.chat validates input and routes to OpenAI or Ollama, throwing if neither is configured (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/service/LLMService.java:22).
- Controller forwards the free-form prompt (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/controller/ChatController.java:39).
- LLMService.chat handles provider selection as above (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/service/LLMService.java:22).
- Controller validates text and optional image (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/controller/ChatController.java:50).
- chatWithImg enforces payload completeness, injects a default system prompt if needed, and calls the configured OpenAI vision endpoint (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/service/LLMService.java:49).
Contributions and issue reports are welcome.
The following directions can extend the current toolkit and may serve as inspiration for upcoming releases:
- Multi-object tracking (MOT): Integrate trackers such as DeepSORT or ByteTrack within
vision-mind-yolo-coreand pair them with detection outputs to provide cross-frame trajectories for security patrols or pedestrian-path analytics. - Fine-grained attribute recognition: Add attribute classifiers for pedestrians, faces, or vehicles (e.g., gender, clothing color, license-plate region) so that vector indexes can support richer filtering.
- Video structuring pipeline: Build a batch video ingestion service that runs detection, segmentation, and re-identification on key frames, then archives the structured results for large-scale video libraries or case investigations.
- Cross-camera association: Combine the existing re-identification stack with spatiotemporal constraints to correlate identities across camera feeds and trigger rule-based alerts.
- Richer multimodal interactions: Extend
vision-mind-llm-corewith image captioning, visual question answering (VQA), or prompt-template management to improve multimodal Q&A use cases. - Model management & observability: Provide unified model versioning, hot swapping, and inference performance dashboards to streamline operating multiple models in production.