Skip to content

Fix: Correct location extraction logic and RAG other mode initialization#380

Open
ARYANPATEL-BIT wants to merge 1 commit intokubeedge:mainfrom
ARYANPATEL-BIT:fix/gov-rag-location-extraction
Open

Fix: Correct location extraction logic and RAG other mode initialization#380
ARYANPATEL-BIT wants to merge 1 commit intokubeedge:mainfrom
ARYANPATEL-BIT:fix/gov-rag-location-extraction

Conversation

@ARYANPATEL-BIT
Copy link
Copy Markdown

@ARYANPATEL-BIT ARYANPATEL-BIT commented Apr 10, 2026

Description

This PR fixes critical data parsing and logic errors in the Government RAG benchmarking example that previously broke location-based filtering and caused the [other] mode to crash.


Changes

1. Location Extraction Overhaul (basemodel.py)

Issue

  • Used os.path.basename(os.getcwd()) to determine province
  • Always returned project root (e.g., "ianvs")
  • Broke RAG filtering logic completely

Fix

  • Added _load_locations_from_dataset(data) helper method

Strategy

  • Primary: Reads .jsonl dataset and extracts province from level_4_dim
  • Fallback: Uses NLP-based text matching + internal dictionary to detect and translate Chinese province names

2. Fixed Missing all_locations Attribute

Issue

  • [other] mode used self.all_locations without initialization
  • Caused AttributeError

Fix

  • Properly initializes self.all_locations during location extraction
  • Stores all valid provinces dynamically
  • Enables correct inverse filtering logic

Testing Performed

  • ✅ Verified correct province mapping per query (e.g., "Shanghai")
  • ✅ Confirmed RAG filtering works as expected
  • ✅ Tested [other] mode → no crashes, correct inverted dataset behavior

Type of Change

  • Bug fix (non-breaking change)
  • Performance improvement

Impact

  • Restores correct RAG filtering behavior
  • Fixes [other] mode crash
  • Improves robustness of location detection across environments

Fixes : #379

@kubeedge-bot
Copy link
Copy Markdown
Collaborator

Welcome @ARYANPATEL-BIT! It looks like this is your first PR to kubeedge/ianvs 🎉

@kubeedge-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ARYANPATEL-BIT
To complete the pull request process, please assign moorezheng after the PR has been reviewed.
You can assign the PR to them by writing /assign @moorezheng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 10, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves location-based RAG filtering by extracting province information directly from the dataset or query text, replacing the previous unreliable directory-based detection. Review feedback suggests optimizing the location extraction logic by ensuring all_locations only contains detected provinces to avoid inefficient searches and invalid path lookups. Additionally, it is recommended to cache the dataset parsing results to prevent performance degradation during the prediction process.

… example

Signed-off-by: Aryan Patel <aryan.patel7291@gmail.com>
@ARYANPATEL-BIT ARYANPATEL-BIT force-pushed the fix/gov-rag-location-extraction branch from c02937e to 8adef5d Compare April 10, 2026 19:19
@ARYANPATEL-BIT
Copy link
Copy Markdown
Author

/assign @MooreZheng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Government RAG Example: Incorrect location extraction in basemodel.py breaks province filtering

3 participants