An intelligent resume screening system that uses AI-powered vector embeddings and semantic search to match candidate resumes with job descriptions. Built with Spring Boot and PostgreSQL with pgvector extension.
- AI-Powered Resume Parsing: Automatically extracts text from PDF resumes using Apache Tika
- Vector Embeddings: Generates semantic embeddings using the
all-MiniLM-L6-v2sentence transformer model - Semantic Search: Finds the best matching candidates using cosine similarity on vector embeddings
- Modern Web Interface:
- Candidate upload page with drag-and-drop functionality
- HR dashboard for job description matching
- Real-time match scoring and ranking
- Database-Backed: Uses PostgreSQL with pgvector extension for efficient vector similarity search
- REST API: Clean RESTful endpoints for resume upload and candidate matching
- Java 21
- Spring Boot 3.4.1
- Spring Web
- Spring Data JPA
- Spring Boot DevTools
- Maven - Dependency management and build tool
- DJL (Deep Java Library) 0.25.0
- PyTorch engine
- HuggingFace tokenizers
- Sentence transformer model:
all-MiniLM-L6-v2
- Apache Tika 2.9.1 - PDF parsing and text extraction
- Apache OpenNLP 1.9.2 - Natural language processing
- PostgreSQL with pgvector extension - Vector similarity search
- Hypersistence Utils 3.9.0 - PostgreSQL vector type support
- Lombok - Reduce boilerplate code
- Commons IO - File handling utilities
- MinIO - Object storage (configured for future use)
- HTML5/CSS3/JavaScript
- Tailwind CSS - Modern styling
- Font Awesome - Icons
Before you begin, ensure you have the following installed:
- Java Development Kit (JDK) 21 or higher
- Maven 3.6+ (or use the included Maven wrapper)
- PostgreSQL 12+ with pgvector extension
- Git (for cloning the repository)
git clone https://github.com/Advay-S/AI-based-Resume-shortlisting-application-.git
cd AI-based-Resume-shortlisting-application-# Ubuntu/Debian
sudo apt-get update
sudo apt-get install postgresql postgresql-contrib
# macOS (using Homebrew)
brew install postgresql# Ubuntu/Debian
sudo apt-get install postgresql-server-dev-all
cd /tmp
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
# macOS (using Homebrew)
brew install pgvector# Connect to PostgreSQL
sudo -u postgres psql
# Create database
CREATE DATABASE resume_db;
# Connect to the database
\c resume_db
# Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
# Create the table (or use the init.sql file)
CREATE TABLE IF NOT EXISTS candidate_profiles (
id SERIAL PRIMARY KEY,
full_text TEXT,
embedding vector(384)
);Alternatively, you can use the provided init.sql file:
sudo -u postgres psql -d resume_db -f init.sqlCreate or update src/main/resources/application.properties:
# Database Configuration
spring.datasource.url=jdbc:postgresql://localhost:5432/resume_db
spring.datasource.username=postgres
spring.datasource.password=your_password_here
spring.datasource.driver-class-name=org.postgresql.Driver
# JPA Configuration
spring.jpa.hibernate.ddl-auto=update
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.PostgreSQLDialect
# File Upload Configuration (adjust as needed)
spring.servlet.multipart.max-file-size=10MB
spring.servlet.multipart.max-request-size=10MBUsing Maven wrapper (recommended):
./mvnw clean installOr using your system Maven:
mvn clean install./mvnw spring-boot:runOr run the generated JAR:
java -jar target/AirsApplication-0.0.1-SNAPSHOT.jarThe application will start on http://localhost:8080
- Navigate to
http://localhost:8080/ClientSidepage.html - Drag and drop a PDF resume or click to browse
- The system will:
- Extract text from the PDF
- Generate vector embeddings
- Store in the database
- Navigate to
http://localhost:8080/HRSidePage.html - Enter a job description in the text area
- Click "Run Analysis"
- View ranked candidates with match scores
- Candidates are sorted by semantic similarity (0-100%)
POST /api/resumecontroller/upload
Content-Type: multipart/form-data
Parameters:
file: PDF file (required)
Response: String confirmation messagePOST /api/resumecontroller/match
Content-Type: text/plain
Body: Job description text
Response: JSON array of matching resumes with scoresPOST /api/resumecontroller/rankmatch
Content-Type: text/plain
Body: Job description text
Response: JSON array of ranked resumes with match scores
Format: "Match Score: 0.XX | [resume text]"AI-based-Resume-shortlisting-application-/
βββ src/
β βββ main/
β β βββ java/
β β β βββ airesumescreener/airs/AirsApplication/
β β β βββ AirsApplication.java # Main Spring Boot application
β β β βββ InfrastructureDB/
β β β βββ DocumentParser.java # PDF text extraction
β β β βββ EmbeddingService.java # AI vector generation
β β β βββ ResumeController.java # REST API endpoints
β β β βββ ResumeRepository.java # Business logic
β β β βββ analyse.java # Analysis utilities
β β βββ resources/
β β βββ static/
β β β βββ ClientSidepage.html # Candidate upload UI
β β β βββ HRSidePage.html # HR matching UI
β β βββ application.properties # Configuration
β βββ test/
β βββ java/
β βββ airesumescreener/airs/AirsApplication/
β βββ AirsApplicationTests.java # Unit tests
βββ pom.xml # Maven dependencies
βββ init.sql # Database initialization
βββ mvnw # Maven wrapper (Unix)
βββ mvnw.cmd # Maven wrapper (Windows)
βββ README.md # This file
- User uploads a PDF resume via the web interface
DocumentParserextracts text using Apache TikaEmbeddingServicegenerates a 384-dimensional vector embedding using the pre-trained sentence transformer model- Resume text and embedding are stored in PostgreSQL with pgvector
- HR user enters a job description
- System generates vector embedding for the job description
- PostgreSQL performs cosine similarity search using pgvector's
<=>operator - Results are ranked by similarity score (1 - distance)
- Top 20 matches are returned with scores
SELECT
1 - (embedding <=> job_vector) as score,
full_text
FROM candidate_profiles
ORDER BY score DESC
LIMIT 20The <=> operator calculates cosine distance, and 1 - distance gives the similarity score.
Run the test suite:
./mvnw testThe default model is all-MiniLM-L6-v2. To use a different model, modify the Criteria builder in the EmbeddingService constructor:
Criteria<String, float[]> criteria = Criteria.builder()
.setTypes(String.class, float[].class)
.optModelUrls("djl://ai.djl.huggingface.pytorch/sentence-transformers/your-model-name")
.optEngine("PyTorch")
.optProgress(new ProgressBar())
.build();Modify the LIMIT in the SQL query in ResumeRepository.java (in the findMatchingResume method):
String sql = """
SELECT
1 - (embedding <=> ?::vector) as score,
full_text
FROM candidate_profiles
ORDER BY score DESC
LIMIT 20 -- Change this value to show more/fewer results
""";Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is open source and available under the MIT License.
Solution: Ensure you have a stable internet connection. The DJL library downloads the model on first run (~80MB).
Solution: Make sure pgvector is properly installed and the extension is enabled in your database:
CREATE EXTENSION IF NOT EXISTS vector;Solution: Increase JVM heap size:
java -Xmx2g -jar target/AirsApplication-0.0.1-SNAPSHOT.jarSolution: Ensure the PDF is not corrupted and contains extractable text (not scanned images).
For issues, questions, or suggestions, please open an issue on the GitHub repository.
- Support for multiple file formats (DOCX, TXT, etc.)
- Batch resume upload
- Advanced filtering (experience, skills, location)
- Resume ranking explanations
- Export results to CSV/Excel
- User authentication and authorization
- Resume deduplication
- Analytics dashboard
- Integration with ATS (Applicant Tracking Systems)