Skip to content

PostgreSQL/MSSQL: id_columns case sensitivity causes all documents to receive the same _id #3884

@dennis-tismenko

Description

@dennis-tismenko

Bug Description

When using advanced sync rules with the id_columns option in PostgreSQL or MSSQL connectors, documents may all receive the same _id, causing each document to overwrite the previous one. This results in only 1 document being indexed instead of the expected count.

The root cause is a case sensitivity mismatch between how id_columns are transformed and how column names are mapped internally:

  • map_column_names() applies .lower() to all column names
  • The id_columns transformation in get_docs() does not apply .lower()

This causes hash_id() to fail when looking up primary key values because the keys don't match (e.g., "dbo_COAXIS_RV_CALL_Call_No" vs "dbo_coaxis_rv_call_call_no").

Additionally, the MSSQL connector doesn't sort tables when building the id_columns prefix, but map_column_names() does sort them, causing a secondary mismatch for multi-table queries.

To Reproduce

  1. Create a database view without a primary key (e.g., COAXIS_RV_CALL in MSSQL)
  2. Configure an MSSQL connector with advanced sync rules:
   [
       {
           "tables": ["COAXIS_RV_CALL"],
           "query": "SELECT * FROM COAXIS_RV_CALL",
           "id_columns": ["Call_No"]
       }
   ]
  1. Run a full sync
  2. Observe in logs:
  • docs_extracted: N (e.g., 5)
  • bulk_item_responses._id_duplicates: N-1 (e.g., 4)
  • indexed_document_count: 1

Expected behavior

All documents should be indexed with unique _id values based on the specified

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions