Resume-to-Score pipeline that extracts structured data from PDFs, enriches with GitHub signals, and outputs a fair, explainable evaluation.
- Overview
- Architecture
- Installation and Setup
- Configuration
- How it works
- CLI usage
- Directory layout
- Provider details
- Contributing
- License
Hiring Agent parses a resume PDF to Markdown, extracts sectioned JSON using a local or hosted LLM, augments the data with GitHub profile and repository signals, then produces an objective evaluation with category scores, evidence, bonus points, and deductions. You can run fully local with Ollama or use Google Gemini.
|
Flow
|
Key modules
|
-
Python 3.11+
The repository pins
.python-versionto 3.11.13. -
One LLM backend (either of them)
- Ollama for local models
Install from the official site, then run
ollama serve. - Google Gemini if you have an API key, get it from here.
- Ollama for local models
Install from the official site, then run
$ git clone https://github.com/interviewstreet/hiring-agent
$ cd hiring-agent
$ python -m venv .venv
# Linux or macOS
$ source .venv/bin/activate
# Windows
# .venv\Scripts\activate
$ pip install -r requirements.txtPull the model you want to use. For example:
$ ollama pull gemma3:4bIf you want different results, you can pull other models such as:
# For higher system configuration
$ ollama pull gemma3:12b
# For lower system configuration
$ ollama pull gemma3:1bCopy the template and set your environment variables.
$ cp .env.example .envEnvironment variables
| Variable | Values | Description |
|---|---|---|
LLM_PROVIDER |
ollama or gemini |
Chooses provider. Defaults to Ollama. |
DEFAULT_MODEL |
for example gemma3:4b or gemini-2.5-pro |
Model name passed to the provider. |
GEMINI_API_KEY |
string | Required when LLM_PROVIDER=gemini. |
GITHUB_TOKEN |
optional | Inherits from your shell environment, improves GitHub API rate limits. |
Provider mapping lives in prompt.py and models.py. The config.py file has a single flag:
# config.py
DEVELOPMENT_MODE = True # enables caching and CSV exportYou can leave it on during iteration. See the next section for details.
1) PDF extraction
pymupdf_rag.pyandpdf.pyread the PDF using PyMuPDF and convert pages to Markdown-like text.- The
to_markdownroutine handles headings, links, tables, and basic formatting.
2) Section parsing with templates
prompts/templates/*.jinjadefine strict instructions for each section Basics, Work, Education, Skills, Projects, Awards.pdf.PDFHandlercalls the LLM per section and assembles aJSONResumeobject (seemodels.py).
3) GitHub enrichment
github.pyextracts a username from the resume profiles, fetches profile and repos, and classifies each project.- It asks the LLM to select exactly 7 unique projects with a minimum author commit threshold, favoring meaningful contributions.
4) Evaluation
evaluator.pyuses templates that encode fairness and scoring rules.- Scores include
open_source,self_projects,production, andtechnical_skills, plus bonus and deductions, then an explanation for evidence.
5) Output and CSV export
score.pyprints a readable summary to stdout.- When
DEVELOPMENT_MODE=Trueit creates or appends aresume_evaluations.csvwith key fields, and caches intermediate JSON undercache/.
Provide a path to a resume PDF.
$ python score.py /path/to/resume.pdfWhat happens:
- If development mode is on, the PDF extraction result is cached to
cache/resumecache_<basename>.json. - If a GitHub profile is found in the resume, repositories are fetched and cached to
cache/githubcache_<basename>.json. - The evaluator prints a report and, in development mode, appends a CSV row to
resume_evaluations.csv.
.
├── .env.example
├── .python-version
├── config.py
├── evaluator.py
├── github.py
├── llm_utils.py
├── models.py
├── pdf.py
├── prompt.py
├── prompts/
│ ├── template_manager.py
│ └── templates/
│ ├── awards.jinja
│ ├── basics.jinja
│ ├── education.jinja
│ ├── github_project_selection.jinja
│ ├── projects.jinja
│ ├── resume_evaluation_criteria.jinja
│ ├── resume_evaluation_system_message.jinja
│ ├── skills.jinja
│ ├── system_message.jinja
│ └── work.jinja
├── pymupdf_rag.py
├── requirements.txt
├── score.py
└── transform.py
- Set
LLM_PROVIDER=ollama - Set
DEFAULT_MODELto any pulled model, for examplegemma3:4b - The provider wrapper in
models.OllamaProvidercallsollama.chat
- Set
LLM_PROVIDER=gemini - Set
DEFAULT_MODELto a supported Gemini model, for examplegemini-2.0-flash - Provide
GEMINI_API_KEY - The wrapper in
models.GeminiProvideradapts responses to a unified format
Please read the CONTRIBUTING.md for detailed guidelines on filing issues, proposing changes, and submitting pull requests. Key principles include:
- Keep prompts declarative and provider-agnostic.
- Validate changes with a couple of real resumes under different providers.
- Add or adjust unit-free smoke tests that call each stage with minimal inputs.
MIT © HackerRank