Skip to content

interviewstreet/hiring-agent

Repository files navigation

Hiring Agent

Resume-to-Score pipeline that extracts structured data from PDFs, enriches with GitHub signals, and outputs a fair, explainable evaluation.

Python License: MIT Code style: Black


Contents


Overview

Hiring Agent parses a resume PDF to Markdown, extracts sectioned JSON using a local or hosted LLM, augments the data with GitHub profile and repository signals, then produces an objective evaluation with category scores, evidence, bonus points, and deductions. You can run fully local with Ollama or use Google Gemini.


Architecture

Flow

  1. pymupdf_rag.py converts PDF pages to Markdown-like text.
  2. pdf.py calls the LLM per section using Jinja templates under prompts/templates.
  3. github.py fetches profile and repos, classifies projects, and asks the LLM to select the top 7.
  4. evaluator.py runs a strict-scored evaluation with fairness constraints.
  5. score.py orchestrates everything end to end and writes CSV when development mode is on.

Key modules

  • models.py Pydantic schemas and LLM provider interfaces.

  • llm_utils.py Provider initialization and response cleanup.

  • transform.py Normalization from loose LLM JSON to JSON Resume style.

  • prompts/ All Jinja templates for extraction and scoring.


Installation and Setup

Prerequisites

  • Python 3.11+

    The repository pins .python-version to 3.11.13.

  • One LLM backend (either of them)

    • Ollama for local models Install from the official site, then run ollama serve.
    • Google Gemini if you have an API key, get it from here.

Quick setup with pip

$ git clone https://github.com/interviewstreet/hiring-agent
$ cd hiring-agent

$ python -m venv .venv
# Linux or macOS
$ source .venv/bin/activate
# Windows
# .venv\Scripts\activate

$ pip install -r requirements.txt

Ollama Models

Pull the model you want to use. For example:

$ ollama pull gemma3:4b

If you want different results, you can pull other models such as:

# For higher system configuration
$ ollama pull gemma3:12b

# For lower system configuration
$ ollama pull gemma3:1b

Configuration

Copy the template and set your environment variables.

$ cp .env.example .env

Environment variables

Variable Values Description
LLM_PROVIDER ollama or gemini Chooses provider. Defaults to Ollama.
DEFAULT_MODEL for example gemma3:4b or gemini-2.5-pro Model name passed to the provider.
GEMINI_API_KEY string Required when LLM_PROVIDER=gemini.
GITHUB_TOKEN optional Inherits from your shell environment, improves GitHub API rate limits.

Provider mapping lives in prompt.py and models.py. The config.py file has a single flag:

# config.py
DEVELOPMENT_MODE = True  # enables caching and CSV export

You can leave it on during iteration. See the next section for details.


How it works

1) PDF extraction
  • pymupdf_rag.py and pdf.py read the PDF using PyMuPDF and convert pages to Markdown-like text.
  • The to_markdown routine handles headings, links, tables, and basic formatting.
2) Section parsing with templates
  • prompts/templates/*.jinja define strict instructions for each section Basics, Work, Education, Skills, Projects, Awards.
  • pdf.PDFHandler calls the LLM per section and assembles a JSONResume object (see models.py).
3) GitHub enrichment
  • github.py extracts a username from the resume profiles, fetches profile and repos, and classifies each project.
  • It asks the LLM to select exactly 7 unique projects with a minimum author commit threshold, favoring meaningful contributions.
4) Evaluation
  • evaluator.py uses templates that encode fairness and scoring rules.
  • Scores include open_source, self_projects, production, and technical_skills, plus bonus and deductions, then an explanation for evidence.
5) Output and CSV export
  • score.py prints a readable summary to stdout.
  • When DEVELOPMENT_MODE=True it creates or appends a resume_evaluations.csv with key fields, and caches intermediate JSON under cache/.

CLI usage

End to end scoring

Provide a path to a resume PDF.

$ python score.py /path/to/resume.pdf

What happens:

  1. If development mode is on, the PDF extraction result is cached to cache/resumecache_<basename>.json.
  2. If a GitHub profile is found in the resume, repositories are fetched and cached to cache/githubcache_<basename>.json.
  3. The evaluator prints a report and, in development mode, appends a CSV row to resume_evaluations.csv.

Directory layout

.
├── .env.example
├── .python-version
├── config.py
├── evaluator.py
├── github.py
├── llm_utils.py
├── models.py
├── pdf.py
├── prompt.py
├── prompts/
│   ├── template_manager.py
│   └── templates/
│       ├── awards.jinja
│       ├── basics.jinja
│       ├── education.jinja
│       ├── github_project_selection.jinja
│       ├── projects.jinja
│       ├── resume_evaluation_criteria.jinja
│       ├── resume_evaluation_system_message.jinja
│       ├── skills.jinja
│       ├── system_message.jinja
│       └── work.jinja
├── pymupdf_rag.py
├── requirements.txt
├── score.py
└── transform.py

Provider details

Ollama

  • Set LLM_PROVIDER=ollama
  • Set DEFAULT_MODEL to any pulled model, for example gemma3:4b
  • The provider wrapper in models.OllamaProvider calls ollama.chat

Gemini

  • Set LLM_PROVIDER=gemini
  • Set DEFAULT_MODEL to a supported Gemini model, for example gemini-2.0-flash
  • Provide GEMINI_API_KEY
  • The wrapper in models.GeminiProvider adapts responses to a unified format

Contributing

Please read the CONTRIBUTING.md for detailed guidelines on filing issues, proposing changes, and submitting pull requests. Key principles include:

  • Keep prompts declarative and provider-agnostic.
  • Validate changes with a couple of real resumes under different providers.
  • Add or adjust unit-free smoke tests that call each stage with minimal inputs.

License

MIT © HackerRank

About

AI agent to evaluate and score resumes.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7