Skip to content

Latest commit

 

History

History
50 lines (36 loc) · 1.48 KB

README.md

File metadata and controls

50 lines (36 loc) · 1.48 KB

LiHua-World Dataset

LiHuaWorld

中文说明

LiHua-World is a dataset specifically designed for local RAG (Retrieval-Augmented Generation) scenarios. It contains one year's worth of chat records from a virtual user named LiHua.

Dataset Features

  • Includes three types of questions:
    • Single-hop
    • Multi-hop
    • Summary
  • Each question is accompanied by manually annotated answers and supporting documents
  • The chat records cover various aspects of daily life, including:
    • Social interactions
    • Fitness training
    • Entertainment activities
    • Life affairs
    • ...

Dataset Structure

The dataset mainly consists of the following parts:

1. Original Chat Records (/data)

  • Chat messages organized in chronological order
  • Each message contains:
    • Timestamp
    • Sender
    • Message content
    • Message type

2. Q&A Data (/qa)

  • query_set.csv: Contains questions, standard answers, and evidence
  • query_set.json: JSON format version of the CSV file

3. Metadata

  • User information
  • Time range: January 2026 to December 2026
  • List of conversation participants

Usage Instructions

Step 1. Unzip the LiHuaWorld.zip file in the ./data directory to obtain the original chat records.

Step 2. Use all the chat records in the ./data directory as the knowledge base.

Step 3. Use query_set.csv or query_set.json in the ./qa directory as the question set to conduct RAG testing.