-
Notifications
You must be signed in to change notification settings - Fork 192
Description
Bug Description
When using advanced sync rules with the id_columns option in PostgreSQL or MSSQL connectors, documents may all receive the same _id, causing each document to overwrite the previous one. This results in only 1 document being indexed instead of the expected count.
The root cause is a case sensitivity mismatch between how id_columns are transformed and how column names are mapped internally:
map_column_names()applies.lower()to all column names- The
id_columnstransformation inget_docs()does not apply.lower()
This causes hash_id() to fail when looking up primary key values because the keys don't match (e.g., "dbo_COAXIS_RV_CALL_Call_No" vs "dbo_coaxis_rv_call_call_no").
Additionally, the MSSQL connector doesn't sort tables when building the id_columns prefix, but map_column_names() does sort them, causing a secondary mismatch for multi-table queries.
To Reproduce
- Create a database view without a primary key (e.g.,
COAXIS_RV_CALLin MSSQL) - Configure an MSSQL connector with advanced sync rules:
[
{
"tables": ["COAXIS_RV_CALL"],
"query": "SELECT * FROM COAXIS_RV_CALL",
"id_columns": ["Call_No"]
}
]
- Run a full sync
- Observe in logs:
docs_extracted: N(e.g., 5)bulk_item_responses._id_duplicates: N-1(e.g., 4)indexed_document_count: 1
Expected behavior
All documents should be indexed with unique _id values based on the specified