Inconsistent Citation Results Across Identical Queries with Multi-Document Sets

## **GitHub Issue: Inconsistent Citation Results Across Identical Queries with Multi-Document Sets**

### **Description**

When submitting the same query multiple times across a multi-document set, the system returns inconsistent citation results. For example:

- **First query**: Returns citations from Document A (searching across 5 documents)
- **Second query** (identical): Returns citations from Documents B and C, but excludes Document A (searching across the same 5 documents)

This inconsistency undermines user confidence in the retrieval system and makes results unpredictable.

### **Expected Behavior**

Identical queries submitted against the same multi-document set should return consistent citation results, maintaining deterministic ranking and document selection across requests.

### **Current Behavior**

- Same query produces different citation sets on subsequent executions
- Documents that appear in first query results may be excluded in second query results
- Citation ordering and relevance scores appear non-deterministic

### **Root Cause Analysis**

After investigating the codebase, several factors likely contribute to this inconsistency:

#### 1. **Azure AI Search Semantic Ranking Non-Determinism**

**Location**: functions_search.py (Lines 1-280)

The hybrid search uses Azure AI Search with semantic ranking enabled:

```python
query_type="semantic",
semantic_configuration_name="nexus-user-index-semantic-configuration",
query_caption="extractive",
query_answer="extractive",
```

**Issue**: Azure AI Search's semantic ranker can produce slightly different scores across identical queries due to:

- Internal model variations
- Non-deterministic tie-breaking when scores are similar
- Semantic reranking behavior that may vary slightly per request

#### 2. **Multi-Index Result Merging Without Normalization**

**Location**: functions_search.py (Lines 133-139)

When `doc_scope="all"`, results are merged from three separate indexes:

```python
user_results_final = extract_search_results(user_results, top_n)
group_results_final = extract_search_results(group_results, top_n)
public_results_final = extract_search_results(public_results, top_n)
results = user_results_final + group_results_final + public_results_final
```

**Issue**: 

- Scores from different indexes may be on different scales
- No score normalization occurs before merging
- Final sorting (Line 258) treats all scores as directly comparable when they may not be

#### 3. **Score-Based Sorting Without Secondary Sort Keys**

**Location**: functions_search.py (Line 258)

```python
results = sorted(results, key=lambda x: x['score'], reverse=True)[:top_n]
```

**Issue**:

- When multiple documents have identical or near-identical scores, sort order becomes undefined
- Python's sort is stable but doesn't guarantee consistent ordering across different result sets
- No secondary sort criteria (e.g., document_id, timestamp, file_name) to ensure deterministic ordering

#### 4. **No Result Caching or Request Deduplication**

**Location**: route_backend_chats.py (Lines 559-639)

Each request executes a fresh search with no caching:

```python
search_results = hybrid_search(**search_args)
```

**Issue**:

- Every request hits Azure AI Search independently
- No mechanism to detect and return cached results for identical queries
- No session-based consistency guarantee

### **Impact**

- **User Trust**: Inconsistent results reduce confidence in the system
- **Reproducibility**: Users cannot reliably reference or share specific query results
- **Testing/Validation**: Difficult to validate system accuracy when results vary
- **Enterprise Adoption**: Organizations require consistent behavior for compliance and audit purposes

### **Affected Components**

- functions_search.py - `hybrid_search()` function
- functions_search.py - `extract_search_results()` function  
- route_backend_chats.py - Chat endpoint search integration
- functions_conversation_metadata.py - Citation tracking
- route_backend_documents.py - `/api/get_citation` endpoint

### **Proposed Solutions**

#### **Option 1: Deterministic Sorting (Quick Fix)**

Add secondary sort keys to ensure consistent ordering when scores are equal:

```python
# In functions_search.py, line 258
results = sorted(
    results, 
    key=lambda x: (
        -x['score'],  # Primary: score (descending)
        x['file_name'],  # Secondary: filename (ascending)
        x['chunk_sequence']  # Tertiary: chunk order (ascending)
    )
)[:top_n]
```

#### **Option 2: Score Normalization (Medium Fix)**

Normalize scores from different indexes before merging:

```python
def normalize_scores(results, min_score=0.0, max_score=1.0):
    """Normalize search scores to a consistent range."""
    if not results:
        return results
    
    scores = [r['score'] for r in results]
    min_s, max_s = min(scores), max(scores)
    range_s = max_s - min_s if max_s > min_s else 1.0
    
    for r in results:
        r['normalized_score'] = min_score + ((r['score'] - min_s) / range_s) * (max_score - min_score)
    
    return results

# Apply normalization to each index result before merging
user_results_final = normalize_scores(extract_search_results(user_results, top_n))
group_results_final = normalize_scores(extract_search_results(group_results, top_n))
public_results_final = normalize_scores(extract_search_results(public_results, top_n))
```

#### **Option 3: Result Caching (Comprehensive Fix)**

Implement query-based caching to return identical results for identical queries within a session:

```python
from functools import lru_cache
import hashlib

def generate_search_cache_key(query, user_id, document_id, doc_scope, active_group_id, top_n):
    """Generate a cache key for search results."""
    key_data = f"{query}|{user_id}|{document_id}|{doc_scope}|{active_group_id}|{top_n}"
    return hashlib.sha256(key_data.encode()).hexdigest()

# Implement time-based cache expiration (e.g., 5 minutes)
search_cache = {}
cache_ttl = 300  # seconds

def cached_hybrid_search(query, user_id, document_id=None, top_n=12, doc_scope="all", 
                        active_group_id=None, active_public_workspace_id=None, 
                        enable_file_sharing=True):
    cache_key = generate_search_cache_key(query, user_id, document_id, doc_scope, 
                                          active_group_id, top_n)
    
    # Check cache
    if cache_key in search_cache:
        cached_result, cached_time = search_cache[cache_key]
        if time.time() - cached_time < cache_ttl:
            return cached_result
    
    # Execute search
    results = hybrid_search(query, user_id, document_id, top_n, doc_scope, 
                          active_group_id, active_public_workspace_id, 
                          enable_file_sharing)
    
    # Store in cache
    search_cache[cache_key] = (results, time.time())
    
    return results
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent Citation Results Across Identical Queries with Multi-Document Sets #509

GitHub Issue: Inconsistent Citation Results Across Identical Queries with Multi-Document Sets

Description

Expected Behavior

Current Behavior

Root Cause Analysis

1. Azure AI Search Semantic Ranking Non-Determinism

2. Multi-Index Result Merging Without Normalization

3. Score-Based Sorting Without Secondary Sort Keys

4. No Result Caching or Request Deduplication

Impact

Affected Components

Proposed Solutions

Option 1: Deterministic Sorting (Quick Fix)

Option 2: Score Normalization (Medium Fix)

Option 3: Result Caching (Comprehensive Fix)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent Citation Results Across Identical Queries with Multi-Document Sets #509

Description

GitHub Issue: Inconsistent Citation Results Across Identical Queries with Multi-Document Sets

Description

Expected Behavior

Current Behavior

Root Cause Analysis

1. Azure AI Search Semantic Ranking Non-Determinism

2. Multi-Index Result Merging Without Normalization

3. Score-Based Sorting Without Secondary Sort Keys

4. No Result Caching or Request Deduplication

Impact

Affected Components

Proposed Solutions

Option 1: Deterministic Sorting (Quick Fix)

Option 2: Score Normalization (Medium Fix)

Option 3: Result Caching (Comprehensive Fix)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions