sema-py

A semantic search engine for local files and images with GPU acceleration and multi-modal retrieval capabilities.

Demo

Web Interface

Terminal Interface

Features

Search text files and images using natural language queries
GPU-accelerated embedding generation (Apple Silicon and NVIDIA CUDA)
Incremental indexing with change detection
Multi-threaded file processing
Directory-scoped search with global caching
CLI and web interfaces
Persistent vector database storage

Installation

Requires Python 3.12+

git clone https://github.com/akshitsinha/sema-py
cd sema-py
uv sync

Usage

CLI REPL Interface

uv run main.py <directory> [options]

Options:

--extensions: File extensions to index (default: .txt, .md, .pdf)
--chunk-size: Characters per chunk (default: 800)
--chunk-overlap: Overlap between chunks (default: 100)

Example:

uv run main.py ./documents --extensions .md,.txt,.pdf --chunk-size 500

Web GUI Interface

uv run gui.py

Opens a Gradio web interface on http://localhost:7860 with:

Directory browser and loader
Text search with context display
Image search (text-to-image and image-to-image)
Indexing management and statistics

REPL Commands

Text Search

Command	Description
`<query>`	Search for semantic matches in text files
`/index`	Index/reindex text files (skip unchanged)
`/reindex`	Force complete reindex of text files
`/status`	Show database statistics
`/files`	List indexed files with chunk counts
`/clear`	Clear text database

Image Search

Command	Description
`/isearch <query>`	Search images by text description
`/imsearch <path>`	Find similar images by reference
`/iindex`	Index/reindex images
`/ireindex`	Force complete reindex of images
`/istatus`	Show image database statistics
`/ifiles`	List indexed images
`/iclear`	Clear image database

General

Command	Description
`/help`	Show available commands
`/config`	Show current configuration
`/exit`	Exit the program
`Ctrl+C`	Exit the program

How It Works

Indexing: Files are scanned recursively and split into overlapping chunks. Each chunk is converted to an embedding vector using EmbeddingGemma or all-MiniLM-L6-v2 (text) or CLIP (images), then stored in ChromaDB. Files are only reindexed when their content changes, detected via hash comparison.

Search: Your query is converted to an embedding vector and compared against stored vectors using cosine similarity. The most similar chunks are retrieved and merged to provide context. Directory filtering allows scoping results while keeping a global cache.

GPU Support: The system automatically detects and uses available GPUs (Apple Silicon or NVIDIA) for faster embedding generation. Falls back to CPU if no GPU is available.

Example Usage

Text Search

$ uv run main.py ./documents

Using device: mps

3 new · 12 chunks indexed
completed in 2.34s

> machine learning fundamentals

3 results · 0.18s

┌─ ml_guide.pdf ──────────────────────────────── lines 12-15 · score 0.87 ┐
│   12 | Machine learning is a subset of artificial intelligence    │
│   13 | that enables systems to learn and improve from experience  │
│   14 | without being explicitly programmed. The core principle    │
│   15 | involves training models on data to make predictions.      │
└────────────────────────────────────────────────────────────────────────┘

> /status

  files     8
  chunks    142
  size      2.5 mb

> /exit

Image Search

> /isearch sunset over mountains

5 results · 0.22s

┌─ vacation_2024/IMG_5432.jpg ──────────────── score 0.92 ┐
│ /Users/docs/photos/vacation_2024/IMG_5432.jpg           │
│ 1920x1080                                                │
└──────────────────────────────────────────────────────────┘

> /imsearch reference_image.jpg

Finding similar images...

4 results · 0.19s

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
demo		demo
.gitignore		.gitignore
.python-version		.python-version
LICENSE.md		LICENSE.md
README.md		README.md
gui.py		gui.py
image.py		image.py
main.py		main.py
pyproject.toml		pyproject.toml
repl.py		repl.py
text.py		text.py
utils.py		utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sema-py

Demo

Web Interface

Terminal Interface

Features

Installation

Usage

CLI REPL Interface

Web GUI Interface

REPL Commands

Text Search

Image Search

General

How It Works

Example Usage

Text Search

Image Search

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sema-py

Demo

Web Interface

Terminal Interface

Features

Installation

Usage

CLI REPL Interface

Web GUI Interface

REPL Commands

Text Search

Image Search

General

How It Works

Example Usage

Text Search

Image Search

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages