Univox

Univox is an intelligent AI-powered study assistant. It leverages advanced natural language processing, computer vision, and graph-based reasoning to help students learn more effectively through document analysis, content extraction, and personalized support.

The system is built on top of LangGraph, enabling a modular, agent-driven architecture where different AI components collaborate seamlessly. With Univox, students can interact naturally — through text or voice — and access a wide range of academic resources in an intuitive way.

Retrieval-Augmented Generation

A core feature of Univox is its RAG pipeline, which enhances responses with relevant documents retrieved from the student’s course materials:

Document retrieval: Every query is matched against a FAISS-based vectorstore built from parsed syllabi, books, slides, notes, and multimedia.
Context enrichment: Retrieved documents are injected into the LLM prompt, ensuring answers are grounded in the actual study material.
Downloadable resources: In addition to enriched answers, students are given direct links to download useful files (e.g., lecture slides, exam papers, reference books).

This means Univox is not just a conversational agent — it acts as a personal knowledge navigator, combining semantic search with LLM reasoning to maximize learning.

Project Structure

study-buddy/
│
├── data/                   # Raw (only locally), metadata, and processed datasets
│   └── README.md
│
├── faiss_index/            # Vectorstore index
│
├── images/
│
├── study_buddy/            # Core source code
│   └── README.md
│
├── tests/
│   ├── performance/
│   │    └── README.md
│   └── tool_tests/         # Unit and load tests
│
├── streamlit_frontend.py   # Streamlit web app entrypoint
├── config.yaml             # Configurable models, embeddings, vectorstore
├── langgraph.json
├── pyproject.toml          # Project metadata and dependencies
├── requirements.txt
├── setup.cfg
├── .gitignore
├── LICENSE
└── README.md

What You Can Do with Univox

Univox acts as your personal academic companion, combining course-specific knowledge with powerful AI tools to assist you throughout your learning journey.

Course-Specific Assistance

Need quick access to important information? Univox can:

Provide contact details for professors and teaching assistants
Show exam dates, office hours, and course schedules
Extract information directly from syllabi, announcements, and handouts

For example:

“When are the midterm exams for this course?”

“How can I contact Professor Lops?”

Exam Preparation and Practice

Univox helps you study smarter, not harder:

Retrieve past exam questions and exercises from your uploaded materials
Generate custom practice problems tailored to your curriculum
Suggest relevant topics and resources for your upcoming tests

For example:

“Give me practice problems for cosine similarity” “Show me past exams of MRI course”

Topic Clarification and Learning Support

Stuck on a concept? Univox explains it clearly and links you back to the exact source:

Summarizes difficult topics from lecture notes
Highlights key formulas and definitions
Points you to the relevant pages and documents

For example:

“Explain mean reciprocal rank in simple terms”

“What’s the difference between item based and user based recommender systems?”

Advanced Research and Analysis

Univox isn’t just a Q&A bot — it’s also a research companion:

Document Processing

Extracts text from scanned PDFs and images using Tesseract OCR
Summarizes long research papers and textbooks
Handles multiple document formats effortlessly

Data Analysis & Visualization

Analyzes datasets and generates insights
Creates visualizations from CSV files using natural language
Runs custom Python code for statistical tasks

Research Enhancement

Searches academic sources like ArXiv, Google Scholar, and Wikidata
Finds books, papers, and supplementary materials
Accesses up-to-date information via web search

Voice Interaction and Accessibility

Univox also offers hands-free interaction:

Ask questions using voice commands
Receive audio-based responses
Transcribe recorded lectures into searchable text (supports Italian)
Convert summaries and explanations into natural-sounding audio

This makes Univox highly useful for students with visual impairments, reading difficulties, or those who prefer multimodal learning.

Interactive Learning Experience

Ask natural language or voice questions about your uploaded materials
Get direct citations with every answer
Manage and browse your content via an intuitive Streamlit-based web app
Integrate multimedia: process text, images, audio, and datasets seamlessly

Transparency and Source Attribution

Univox is designed to be trustworthy:

Every answer includes clear references to the source documents
Displays page numbers and relevant sections where possible
Ensures traceability so you can always verify information

Technologies Used

Python 3.12
LangGraph for orchestrating multi-agent workflows
Tesseract OCR for document text extraction
Configurable Embedding Models — default: BAAI/bge-m3
Together.ai for LLM integration
FAISS for fast vector indexing
Transformers for model inference
TensorFlow / PyTorch with CUDA acceleration
Streamlit for the interactive web interface

Prerequisites

Windows 11 64-bit (tested environment)
Python 3.12
NVIDIA GPU with CUDA support (RTX 5080 recommended)
8GB+ RAM
~5GB disk space for models and dependencies
A Together.ai API key

Installation

1. Install Tesseract OCR

Download the installer: Tesseract OCR – UB Mannheim
Install it in the default directory.

Add these paths to System Environment Variables:

C:\Program Files\Tesseract-OCR
C:\Program Files\Tesseract-OCR\tesseract.exe

Verify installation:
```
tesseract --version
```

2. Set Up the Python Environment

git clone https://github.com/npinto97/univox.git
cd univox

python -m venv venv
venv\Scripts\activate

pip install --upgrade pip
pip install -r requirements.txt

3. GPU Configuration (Optional but Recommended)

Check CUDA availability:

nvidia-smi

Then install the correct CUDA and cuDNN versions for your GPU. For RTX 5080, CUDA 12.x is recommended. See NVIDIA CUDA downloads for more details.

4. Initialize the System

Parse your course metadata:
```
python parse_course_metadata.py
```
Build the FAISS index:
```
python update_faiss_index.py
```

Configuration

The system uses config.yaml to customize LLM, embeddings, and vector stores:

llm:
  model: "meta-llama/Llama-3.3-70B-Instruct-Turbo"

embeddings:
  model: "BAAI/bge-m3"

vector_store:
  type: "faiss"

You can replace the model names with any alternative available on Together.ai or Hugging Face.

Usage

Start the Streamlit app:

streamlit run streamlit_frontend.py

By default, it launches at: http://localhost:8501

Workflow:

Upload your study materials
Ask questions via chat or voice
View responses with source citations

Future Enhancements

More lightweight embedding models for CPU-only setups
Enhanced semantic search and document summarization
Richer multimodal learning experience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Univox

Retrieval-Augmented Generation

Project Structure

What You Can Do with Univox

Course-Specific Assistance

Exam Preparation and Practice

Topic Clarification and Learning Support

Advanced Research and Analysis

Voice Interaction and Accessibility

Interactive Learning Experience

Transparency and Source Attribution

Technologies Used

Prerequisites

Installation

1. Install Tesseract OCR

2. Set Up the Python Environment

3. GPU Configuration (Optional but Recommended)

4. Initialize the System

Configuration

Usage

Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
data		data
faiss_index		faiss_index
images		images
study_buddy		study_buddy
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
streamlit_frontend.py		streamlit_frontend.py

License

npinto97/study-buddy

Folders and files

Latest commit

History

Repository files navigation

Univox

Retrieval-Augmented Generation

Project Structure

What You Can Do with Univox

Course-Specific Assistance

Exam Preparation and Practice

Topic Clarification and Learning Support

Advanced Research and Analysis

Voice Interaction and Accessibility

Interactive Learning Experience

Transparency and Source Attribution

Technologies Used

Prerequisites

Installation

1. Install Tesseract OCR

2. Set Up the Python Environment

3. GPU Configuration (Optional but Recommended)

4. Initialize the System

Configuration

Usage

Future Enhancements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages