Figure: VERGE dataset generation process
This repository contains the implementation of VERGE, a verification-enhanced methodology for generating multi-hop datasets to evaluate Retrieval-Augmented Generation (RAG) systems. VERGE addresses significant methodological gaps in existing RAG evaluation frameworks by generating task-specific, multi-hop reasoning dataset.
- VERGE: Implements a novel verification agent that ensures generated questions necessitate genuine multi-hop reasoning and maintain factual consistency
- Hierarchical Error Taxonomy: Provides structured analysis of RAG system failure patterns specifically in multi-hop reasoning contexts
Chunker/: Scripts for chunking documentsData/: Scripts for downloading the datasetsExamProcesser: Scripts for generated exam processorSolver: Scripts for solving the generated examscategorise_errors.py: Scripts for categorise the error typegenerate_exam: Scripts for generating an examprompt_templates.py: Prompting templates for question generation, verification, and evaluationretriever.py: Retriever class
pip install -r requirements.txtpython src/Data/long_bench_downloader.py
python src/Data/download_documents_sec_filings.pypython src/Chunker/document_chunker.pypython src/generate_exam.pypython src/Solver/solve_exam_rag.pypython src/categorise_errors.py