Faster, Fairer, and More Efficient Hiring Using Graph RAG
Graph-RAG is a sophisticated question-answering system that leverages knowledge graphs and Retrieval-Augmented Generation (RAG) to provide accurate and context-aware responses. The system processes PDF documents, extracts structured information, and stores it in a Neo4j graph database for efficient querying and retrieval.
Chatbot UI:
Link to connect with us - https://bit.ly/poster-informs-2025
- PDF document processing and text extraction
- Automatic knowledge graph construction from documents
- Structured information extraction
- Graph-based question answering
- Interview question generation
- Interactive web interface using Streamlit
- Support for multiple LLM providers (Groq, OpenAI, Ollama)
- Flexible embedding options (Cohere, OpenAI, Ollama)
- Python 3.8+
- Neo4j Database
- API keys for:
- Groq (or other LLM provider)
- Cohere (or other embedding provider)
- Clone the repository:
git clone https://github.com/mahithabsl/Graph-RAG.git
cd Graph-RAG- Install required dependencies:
pip install -r requirements.txt-
Configure Neo4j:
- Install and start Neo4j database
- Update the
config.inifile with your Neo4j credentials
-
Set up API keys:
- Update the
config.inifile with your API keys for the LLM and embedding providers
- Update the
The config.ini file contains all necessary configuration settings:
- Neo4j connection details
- LLM provider settings
- PDF processing parameters
- Embedding model configuration
Pdf_2_Text.py: Handles PDF text extraction and resume parsingMetaData_Extraction.ipynb: Extracts structured metadata from parsed textPdf_2_Graph.py: Constructs knowledge graph from extracted informationGraph_QA.py: Implements the question-answering systemGenerate_Interview_Questions.ipynb: Generates interview questions based on resume contentdata/: Directory for storing PDF documentsstatic/: Static assets for the web interface
from graph_rag.Pdf_2_Text import process_pdf
# Process a PDF resume
text_content = process_pdf("path/to/your/resume.pdf")# Run the MetaData_Extraction.ipynb notebook
# This extracts structured information from the parsed textfrom graph_rag.Pdf_2_Graph import process_document
# Process a document and build the knowledge graph
process_document("path/to/your/document.pdf", metadata_info={}, meta={})streamlit run graph_rag/Graph_QA.pyAccess the web interface at http://localhost:8501
# Run the Generate_Interview_Questions.ipynb notebook
# This generates relevant interview questions based on the resume content
-
Resume Parsing and Text Extraction:
- PDFs are processed and converted to text
- Structured information is extracted from resumes
-
Metadata Extraction:
- Key information is identified and structured
- Education, experience, skills, and other relevant data are extracted
-
Knowledge Graph Construction:
- Entities and relationships are identified
- Information is stored in Neo4j graph database
- Graph structure enables complex relationship queries
-
Question Answering:
- User questions are processed
- Relevant information is retrieved from the graph
- LLM generates context-aware responses
-
Interview Question Generation:
- Resume content is analyzed
- Relevant interview questions are generated
- Questions are tailored to the candidate's background
Contributions are welcome! Please feel free to submit a Pull Request.
