🐨 Koala

A web search engine built with FastAPI. Koala allows you to crawl websites, index their content, and perform semantic search with relevance scoring.

Features

Semantic Search: Advanced search using sentence transformers for better relevance
Web Crawling: Automated website crawling with configurable depth and page limits
Real-time Analytics: Search statistics and popular query tracking
RESTful API: Full-featured API for integration with other applications
Background Processing: Non-blocking website crawling with job status tracking

Quick Start

Prerequisites

Python 3.10 or higher
UV (recommended)

Installation

Clone the repository

git clone https://github.com/Abhinavexists/Koala.git
cd koala

Install backend dependencies:

cd backend
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Recommended: Install with uv (faster, more reliable):

cd backend
uv venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

Start the server
```
python static_server.py
```
Start the search api
```
python search_api.py
```
Open in browser Navigate to http://localhost:8000

Usage Guide

Adding Websites

Go to the Websites tab
Fill in the website details:
- URL: The website to crawl (e.g., https://example.com)
- Name: Display name for the website
- Description: Optional description
- Max Pages: Maximum number of pages to crawl (default: 50)
- Max Depth: How deep to crawl from the starting URL (default: 2)
Click Add Website
The system will start crawling in the background

Searching

Go to the Search tab
Enter your search query in the search box
Press Enter or click the Search button
View results with relevance scores and snippets
Use pagination to browse through results

Analytics

Go to the Analytics tab to view:
- Total number of searches performed
- Number of websites indexed
- Active crawling jobs
- Popular search queries

Key Endpoints

GET /api/search - Perform search queries
GET /api/websites - List all websites
POST /api/websites - Add new website to crawl
DELETE /api/websites/{id} - Remove website
POST /api/websites/{id}/recrawl - Recrawl website
GET /api/popular - Get popular search queries
GET /api/stats - Get system statistics

Example API Usage

# Search for content
curl "http://localhost:8080/api/search?q=python&page=1&per_page=10"

# Add a website
curl -X POST "http://localhost:8080/api/websites" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "name": "Example Site",
    "max_pages": 100,
    "max_depth": 3
  }'

Architecture

Backend (Python/FastAPI)

search_api.py: Main API endpoints and business logic
search_engine.py: Semantic search implementation using sentence transformers
crawler.py: Web crawling functionality with BeautifulSoup
static_server.py: Combined static file server and API gateway

Data Storage

prepared_data.json: Indexed website content and metadata
websites.json: Website configuration and crawl status
Vector Index: In-memory semantic search index using FAISS

Configuration

Environment Variables

HOST: Server host (default: 0.0.0.0)
PORT: Server port (default: 8080)

Search Configuration

Model: Uses all-MiniLM-L6-v2 sentence transformer model
Device: Automatically detects CUDA/CPU
Index Type: FAISS for efficient similarity search

Project Structure

koala/
├── frontend/           # Frontend assets
│   ├── css/           # Stylesheets
│   ├── js/            # JavaScript modules
│   └── index.html     # Main HTML file
├── search_api.py      # API endpoints
├── search_engine.py   # Search implementation
├── crawler.py         # Web crawler
├── static_server.py   # Server entry point
├── pyproject.toml     # project info
└── requirements.txt   # Python dependencies

Running in Development Mode

Backend only (API on port 8000):
```
python search_api.py
```

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

(Just a fun practice project to gain an understanding of Redis, crawlers, and semantic search)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐨 Koala

Features

Quick Start

Prerequisites

Installation

Usage Guide

Adding Websites

Searching

Analytics

Key Endpoints

Example API Usage

Architecture

Backend (Python/FastAPI)

Data Storage

Configuration

Environment Variables

Search Configuration

Project Structure

Running in Development Mode

License

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
frontend		frontend
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
crawler.py		crawler.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
search_api.py		search_api.py
search_engine.py		search_engine.py
static_server.py		static_server.py

License

Abhinavexists/Koala

Folders and files

Latest commit

History

Repository files navigation

🐨 Koala

Features

Quick Start

Prerequisites

Installation

Usage Guide

Adding Websites

Searching

Analytics

Key Endpoints

Example API Usage

Architecture

Backend (Python/FastAPI)

Data Storage

Configuration

Environment Variables

Search Configuration

Project Structure

Running in Development Mode

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages