-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Currently, the KnowledgeSpace AI agent primarily relies on text-based queries. However, much valuable neuroscience metadata is "trapped" within non-editable formats like research paper figures, tables, and presentation screenshots. I propose a multimodal feature that allows users to upload images and automatically extract refined search queries to discover datasets in the KnowledgeSpace ecosystem.
1. Proposed Solution
I have developed a prototype for an end-to-end pipeline that includes:
-
Frontend
- React-based upload interface with a dynamic, auto-expanding search area to handle long scientific queries.
-
Backend
- FastAPI endpoint that utilizes Pytesseract for raw text extraction.
Intelligence Layer
-
Intelligence Layer:
- A refinement step using Gemini 2.0 Flash-Lite to perform zero-shot Named Entity Recognition (NER).
-
Refinement Logic:
- The prompt is optimized to move away from conversational explanations (e.g., “Here are your options…”) and instead produce a clean, comma-separated list of extracted entities.
-
Impact:
- Prevents query pollution by ensuring the backend search engine receives only high-signal scientific terms.
- Significantly improves the relevance and precision of discovered datasets.
2. Technical Stack
- Python 3.12 & FastAPI
- React (TypeScript)
- Pytesseract & Pillow
- Google GenAI SDK (Gemini 2.0 Flash-Lite)
3. Example Use Case
Example 1
- Input Image (from a research paper table):
- Refined OCR Output:
- Rattus norvegicus
- Striatum
- Two-photon
- Sprague-Dawley
- Motor Cortex
- Patch-clamp
- Assistant Discovery Results:
- IonChannelGenealogy: Kir2 channel model (Rattus norvegicus, Striatum)
- EBRAINS: MiniVess 3D vasculature (Rattus norvegicus, Two-photon)
- NeuroMorpho.Org: Neuron morphology (Sprague-Dawley, frontal neocortex)
Example 2
- Input Image( from actual research paper) :
- Refined OCR Output:
- Sst-Cre
- Ai32
- Vip-Cre
- CA1
- Assistant Discovery Results:
- IonChannelGenealogy: Relevant ion channel and electrophysiology datasets (Sst-Cre, CA1)
- EBRAINS: Modeling and experimental datasets (Vip-Cre, Ai32)
The refined entities are injected directly into the existing keyword + vector retrieval pipeline.
I am planning to apply for GSoC 2026 with INCF and would like to lead the implementation of this feature. I have already set up a local development environment and verified the core logic. I would appreciate any feedback from the mentors and am ready to submit a PR!