Skip to content

Inconsistent Citation Results Across Identical Queries with Multi-Document Sets #509

@paullizer

Description

@paullizer

GitHub Issue: Inconsistent Citation Results Across Identical Queries with Multi-Document Sets

Description

When submitting the same query multiple times across a multi-document set, the system returns inconsistent citation results. For example:

  • First query: Returns citations from Document A (searching across 5 documents)
  • Second query (identical): Returns citations from Documents B and C, but excludes Document A (searching across the same 5 documents)

This inconsistency undermines user confidence in the retrieval system and makes results unpredictable.

Expected Behavior

Identical queries submitted against the same multi-document set should return consistent citation results, maintaining deterministic ranking and document selection across requests.

Current Behavior

  • Same query produces different citation sets on subsequent executions
  • Documents that appear in first query results may be excluded in second query results
  • Citation ordering and relevance scores appear non-deterministic

Root Cause Analysis

After investigating the codebase, several factors likely contribute to this inconsistency:

1. Azure AI Search Semantic Ranking Non-Determinism

Location: functions_search.py (Lines 1-280)

The hybrid search uses Azure AI Search with semantic ranking enabled:

query_type="semantic",
semantic_configuration_name="nexus-user-index-semantic-configuration",
query_caption="extractive",
query_answer="extractive",

Issue: Azure AI Search's semantic ranker can produce slightly different scores across identical queries due to:

  • Internal model variations
  • Non-deterministic tie-breaking when scores are similar
  • Semantic reranking behavior that may vary slightly per request

2. Multi-Index Result Merging Without Normalization

Location: functions_search.py (Lines 133-139)

When doc_scope="all", results are merged from three separate indexes:

user_results_final = extract_search_results(user_results, top_n)
group_results_final = extract_search_results(group_results, top_n)
public_results_final = extract_search_results(public_results, top_n)
results = user_results_final + group_results_final + public_results_final

Issue:

  • Scores from different indexes may be on different scales
  • No score normalization occurs before merging
  • Final sorting (Line 258) treats all scores as directly comparable when they may not be

3. Score-Based Sorting Without Secondary Sort Keys

Location: functions_search.py (Line 258)

results = sorted(results, key=lambda x: x['score'], reverse=True)[:top_n]

Issue:

  • When multiple documents have identical or near-identical scores, sort order becomes undefined
  • Python's sort is stable but doesn't guarantee consistent ordering across different result sets
  • No secondary sort criteria (e.g., document_id, timestamp, file_name) to ensure deterministic ordering

4. No Result Caching or Request Deduplication

Location: route_backend_chats.py (Lines 559-639)

Each request executes a fresh search with no caching:

search_results = hybrid_search(**search_args)

Issue:

  • Every request hits Azure AI Search independently
  • No mechanism to detect and return cached results for identical queries
  • No session-based consistency guarantee

Impact

  • User Trust: Inconsistent results reduce confidence in the system
  • Reproducibility: Users cannot reliably reference or share specific query results
  • Testing/Validation: Difficult to validate system accuracy when results vary
  • Enterprise Adoption: Organizations require consistent behavior for compliance and audit purposes

Affected Components

  • functions_search.py - hybrid_search() function
  • functions_search.py - extract_search_results() function
  • route_backend_chats.py - Chat endpoint search integration
  • functions_conversation_metadata.py - Citation tracking
  • route_backend_documents.py - /api/get_citation endpoint

Proposed Solutions

Option 1: Deterministic Sorting (Quick Fix)

Add secondary sort keys to ensure consistent ordering when scores are equal:

# In functions_search.py, line 258
results = sorted(
    results, 
    key=lambda x: (
        -x['score'],  # Primary: score (descending)
        x['file_name'],  # Secondary: filename (ascending)
        x['chunk_sequence']  # Tertiary: chunk order (ascending)
    )
)[:top_n]

Option 2: Score Normalization (Medium Fix)

Normalize scores from different indexes before merging:

def normalize_scores(results, min_score=0.0, max_score=1.0):
    """Normalize search scores to a consistent range."""
    if not results:
        return results
    
    scores = [r['score'] for r in results]
    min_s, max_s = min(scores), max(scores)
    range_s = max_s - min_s if max_s > min_s else 1.0
    
    for r in results:
        r['normalized_score'] = min_score + ((r['score'] - min_s) / range_s) * (max_score - min_score)
    
    return results

# Apply normalization to each index result before merging
user_results_final = normalize_scores(extract_search_results(user_results, top_n))
group_results_final = normalize_scores(extract_search_results(group_results, top_n))
public_results_final = normalize_scores(extract_search_results(public_results, top_n))

Option 3: Result Caching (Comprehensive Fix)

Implement query-based caching to return identical results for identical queries within a session:

from functools import lru_cache
import hashlib

def generate_search_cache_key(query, user_id, document_id, doc_scope, active_group_id, top_n):
    """Generate a cache key for search results."""
    key_data = f"{query}|{user_id}|{document_id}|{doc_scope}|{active_group_id}|{top_n}"
    return hashlib.sha256(key_data.encode()).hexdigest()

# Implement time-based cache expiration (e.g., 5 minutes)
search_cache = {}
cache_ttl = 300  # seconds

def cached_hybrid_search(query, user_id, document_id=None, top_n=12, doc_scope="all", 
                        active_group_id=None, active_public_workspace_id=None, 
                        enable_file_sharing=True):
    cache_key = generate_search_cache_key(query, user_id, document_id, doc_scope, 
                                          active_group_id, top_n)
    
    # Check cache
    if cache_key in search_cache:
        cached_result, cached_time = search_cache[cache_key]
        if time.time() - cached_time < cache_ttl:
            return cached_result
    
    # Execute search
    results = hybrid_search(query, user_id, document_id, top_n, doc_scope, 
                          active_group_id, active_public_workspace_id, 
                          enable_file_sharing)
    
    # Store in cache
    search_cache[cache_key] = (results, time.time())
    
    return results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions