🧠 AI Career Path + Job Intelligence Engine

Internship Project — Built for Endee.io AI Engineering Internship Evaluation
Demonstrates production-grade RAG pipeline architecture using Endee Vector Database

🎯 What It Does

A user types a career goal like "I want to become a Machine Learning Engineer" and the system:

Embeds the query into a 384-dimensional vector using sentence-transformers
Retrieves semantically relevant documents from 5 Endee collections in parallel
Augments an LLM prompt with the retrieved context (RAG)
Generates a structured career intelligence report containing:
- Step-by-step learning roadmap (0–12 months)
- Required skills ranked by relevance score
- Real job roles with salary ranges and demand signals
- Salary trends and market outlook
- Recommended portfolio projects
- Career timeline milestones

🏗️ System Architecture

User Input
    │
    ▼
┌─────────────────────────────────────────────────────────────────┐
│                     FastAPI Backend                              │
│                                                                  │
│  POST /api/v1/career/analyze                                     │
│       │                                                          │
│       ▼                                                          │
│  EmbeddingEngine (sentence-transformers/all-MiniLM-L6-v2)       │
│       │  query → 384-dim float vector                            │
│       │                                                          │
│       ▼                                                          │
│  ┌────────────────────── Endee Vector DB ──────────────────┐    │
│  │  Parallel cosine-similarity search across 5 collections │    │
│  │                                                          │    │
│  │  job_roles      → top-5 semantically relevant jobs      │    │
│  │  skill_taxonomy → top-5 relevant skills                  │    │
│  │  learning_paths → top-5 resources                        │    │
│  │  salary_insights→ top-5 salary records                   │    │
│  │  projects       → top-5 portfolio projects               │    │
│  └──────────────────────────────────────────────────────────┘    │
│       │  Retrieved SearchResult objects (id, score, metadata)    │
│       │                                                          │
│       ▼                                                          │
│  RAGPipeline.build_prompt(query + retrieved_context)             │
│       │                                                          │
│       ▼                                                          │
│  LLM Client (OpenAI / Groq / Ollama)                            │
│       │  system: expert career counselor                         │
│       │  user: career goal + retrieved context                   │
│       │                                                          │
│       ▼                                                          │
│  CareerInsight (structured Python dataclass → JSON)              │
└─────────────────────────────────────────────────────────────────┘
    │
    ▼
HTML/JS Frontend (renders roadmap, skills, jobs, salary, projects)

📁 Folder Structure

ai-career-engine/
│
├── backend/
│   ├── main.py                  # FastAPI app, startup, routing
│   ├── core/
│   │   ├── vector_store.py      # Endee SDK integration + InMemoryFallback
│   │   ├── embeddings.py        # sentence-transformers wrapper + LRU cache
│   │   └── rag_pipeline.py      # RAG orchestration + LLM client + prompts
│   └── api/
│       ├── career_routes.py     # /career/analyze, /search/semantic, /stats
│       └── health_routes.py     # /health
│
├── frontend/
│   └── templates/
│       └── index.html           # Single-file frontend (HTML + CSS + JS)
│
├── scripts/
│   └── seed_data.py             # One-time dataset loader → Endee collections
│
├── .env.example                 # Environment variable template
├── requirements.txt             # All Python dependencies
└── README.md

🧰 Tech Stack

Layer	Technology	Purpose
Vector DB	Endee	Store & search 384-dim embeddings
Embeddings	sentence-transformers (MiniLM)	Convert text to dense vectors
RAG	Custom pipeline	Retrieval-augmented prompt construction
LLM	OpenAI / Groq / Ollama	Career insight generation from context
Backend	FastAPI + Python 3.11	REST API, async, Pydantic validation
Frontend	Vanilla HTML/CSS/JS	Zero-dependency terminal-aesthetic UI

🔑 How Endee Is Used

Endee serves as the semantic memory of the system — replacing traditional keyword search with meaning-aware retrieval.

Collections

Collection	Documents	Content
`job_roles`	10	Job titles, salaries, required skills, demand
`skill_taxonomy`	15	Skills, categories, importance, learning time
`learning_paths`	10	Courses, books, platforms with metadata
`salary_insights`	5	Compensation by role, location, YoY growth
`projects`	6	Portfolio projects with tech stacks

Upsert Example

from core.vector_store import VectorStore
from core.embeddings import EmbeddingEngine

engine = EmbeddingEngine()
store  = VectorStore()
await store.initialize()

vector = engine.embed_single("Machine Learning Engineer requires Python and PyTorch")

await store.upsert_vectors("job_roles", [{
    "id": "job_001",
    "values": vector,                          # 384-dim float list
    "metadata": {
        "job_title":    "Machine Learning Engineer",
        "salary_range": "$130k–$180k",
        "demand":       "Very High",
        "key_skills":   ["Python", "PyTorch", "MLOps"],
        "content":      "Machine Learning Engineer builds production ML systems..."
    }
}])

Semantic Search Example

query_vector = engine.embed_single("I want a career in deep learning")

results = await store.semantic_search(
    collection="job_roles",
    query_vector=query_vector,
    top_k=5
)

for r in results:
    print(f"[{r.score:.3f}] {r.metadata['job_title']} — {r.metadata['salary_range']}")

⚡ Installation & Setup

Prerequisites

Python 3.11+
Endee API key (sign up at endee.io)
OpenAI or Groq API key (or run Ollama locally for free)

1. Clone & install

git clone https://github.com/yourusername/ai-career-engine.git
cd ai-career-engine

python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure environment

cp .env.example .env
# Edit .env and fill in:
#   ENDEE_API_KEY=your_key_here
#   OPENAI_API_KEY=your_key_here  (or GROQ_API_KEY for free tier)

3. Seed the vector database

python scripts/seed_data.py

This embeds 46 career documents and upserts them into 5 Endee collections.

4. Start the server

cd backend
uvicorn main:app --reload --port 8000

5. Open the app

Navigate to http://localhost:8000 in your browser.

📡 API Endpoints

Method	Endpoint	Description
`POST`	`/api/v1/career/analyze`	Full RAG career analysis
`GET`	`/api/v1/career/skills?role=`	Ranked skills for a role
`GET`	`/api/v1/career/jobs?goal=`	Job roles matching a goal
`POST`	`/api/v1/search/semantic`	Raw semantic search in any collection
`GET`	`/api/v1/stats`	Vector DB collection counts + cache stats
`GET`	`/api/v1/health`	Health check

Sample Request

curl -X POST http://localhost:8000/api/v1/career/analyze \
  -H "Content-Type: application/json" \
  -d '{"career_goal": "I want to become a Machine Learning Engineer", "top_k": 5}'

🚀 LLM Options (choose one)

Option	Cost	Setup
OpenAI GPT-4o-mini	~$0.01/query	`OPENAI_API_KEY=sk-...`
Groq Llama-3	Free tier	`GROQ_API_KEY=gsk_...` + update `.env`
Ollama (local)	Free	`ollama pull llama3.2` + update `.env`

No LLM key? The system still works — it falls back to rule-based generation using the retrieved Endee data.

🧪 Running Tests

pytest tests/ -v

🛠️ Key Design Decisions

Why Endee over SQL/keyword search?
A user saying "deep learning career" and "neural network engineer path" mean the same thing. SQL LIKE queries fail here. Endee's cosine similarity returns the same top results for both because the meaning is semantically close.

Why sentence-transformers (MiniLM)?
Fast (14k tokens/sec on CPU), small (22 MB), good quality for retrieval tasks. Outperforms TF-IDF and BM25 on semantic similarity benchmarks.

Why async throughout?
Parallel retrieval from 5 Endee collections simultaneously using asyncio. Without async, this would be ~5x slower (sequential).

Why a fallback for empty collections?
Graceful degradation — the system still produces useful output even before the DB is seeded, making it easier to demo.

👤 Author

Apoorva Deep Singh — B.Tech CSE
Built as part of the Endee.io AI Internship Evaluation

This project demonstrates production-level RAG system design: vector database integration, semantic retrieval, LLM prompt engineering, async FastAPI, and clean modular architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 AI Career Path + Job Intelligence Engine

🎯 What It Does

🏗️ System Architecture

📁 Folder Structure

🧰 Tech Stack

🔑 How Endee Is Used

Collections

Upsert Example

Semantic Search Example

⚡ Installation & Setup

Prerequisites

1. Clone & install

2. Configure environment

3. Seed the vector database

4. Start the server

5. Open the app

📡 API Endpoints

Sample Request

🚀 LLM Options (choose one)

🧪 Running Tests

🛠️ Key Design Decisions

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
api		api
core		core
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
career_routes.py		career_routes.py
embeddings.py		embeddings.py
index.html		index.html
main.py		main.py
rag_pipeline.py		rag_pipeline.py
readme.pdf		readme.pdf
requirements.txt		requirements.txt
script.js		script.js
seed_data.py		seed_data.py
vector_store.py		vector_store.py

Folders and files

Latest commit

History

Repository files navigation

🧠 AI Career Path + Job Intelligence Engine

🎯 What It Does

🏗️ System Architecture

📁 Folder Structure

🧰 Tech Stack

🔑 How Endee Is Used

Collections

Upsert Example

Semantic Search Example

⚡ Installation & Setup

Prerequisites

1. Clone & install

2. Configure environment

3. Seed the vector database

4. Start the server

5. Open the app

📡 API Endpoints

Sample Request

🚀 LLM Options (choose one)

🧪 Running Tests

🛠️ Key Design Decisions

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages