Skip to content

NavQA eval issue #21

@Yixun-Hu

Description

@Yixun-Hu

Thank you so much for your wonderful work. Other than #19 , I also find some other issues with this dataset.

Problem Overview of NavQA dataset

ALL position-related questions in the NavQA dataset have identical ground truth answers, which is a data annotation error that affects evaluation accuracy.

Cause Analysis

  1. Time Parsing Issue
    Human annotations: Time stamps like 7:57:53 (morning 7:57 AM)
    Actual caption time range: 10:49:45 to 11:03:17 (morning 10:49-11:03 AM)
    Parsed timestamps: 1673873873.0 (corresponding to 7:57:53 AM)
    Problem: Parsed timestamps are ~3 hours earlier than the actual caption time range
  2. Caption Index Lookup Failure
    Due to incorrect time parsing, np.argmax(diff > 0) - 1 returns -1
    All position questions fail to find corresponding captions
    context_captions becomes an empty list
  3. Fallback Mechanism Issue
    When context_captions is empty, the code uses a fallback mechanism
    All position questions end up using the same caption (index 162)
    This caption has position: [-0.39364902000000007, 0.0023255999999999897, -0.011701379999999999]
    Data Flow Analysis
Human CSV annotation → Time parsing → Caption lookup → Position extraction
     ↓                    ↓              ↓                ↓
  "7:57:53"        → 1673873873.0  →  No match    →  Fallback to same position

Code Location

File: remembr/scripts/question_scripts/form_question_jsons.py
Function: parse_answer() (lines 60-64):

elif q_type == 'position':
    if len(context) == 1:
        out_dict = {
            'position': context[0]['position']  # Always same position
        }

Impact

Evaluation accuracy: All position questions have identical ground truth

Current Status

Script runs without errors but produces the same caption (index 162), with position error ~ 200m
This is a data quality issue that needs to be addressed at the annotation level rather than just the code level.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions