-
Notifications
You must be signed in to change notification settings - Fork 854
Closed
Description
SynapseML version
1.0.9
System information
- Language version (e.g. python 3.8, scala 2.12): Python 3.10.12, Scala 2.12.17
- Spark Version (e.g. 3.2.3): 3.4.3
- Spark Platform (e.g. Synapse, Databricks): Microsoft Fabric
Describe the problem
When writing data to an existing Azure Search index that contains scoring profiles, the operation fails with a spray.json.DeserializationException because the JSON parser expects scoring profiles to be simple strings but Azure Search returns complex JSON objects.
Root cause:
AzureSearchSchemas.scaladefinesscoringProfiles: Option[Seq[String]]- But Azure Search actually returns complex objects with
functionAggregation,functions,text, etc.
Expected behavior:
Writing data to an index with scoring profiles should work without parsing errors.
Actual behavior:
Operation fails with DeserializationException when trying to parse the index definition.
Impact:
This prevents users from writing data to any Azure Search index that has scoring profiles configured, which is a common production scenario for relevance tuning.
Current workaround:
- Create indexes without scoring profiles when using SynapseML
- Add scoring profiles later via Azure Portal/REST API after data is written
- Or recreate the index without scoring profiles each time
Code to reproduce issue
from synapse.ml.services import *
from pyspark.sql import functions as F
AZURE_SEARCH_SUBSCRIPTION_KEY = "<your-subscription-key>"
AZURE_SEARCH_SERVICE_NAME = "<your-service-name>"
AZURE_SEARCH_INDEX_NAME = "existing-index-with-scoring-profiles"
# Create simple test DataFrame
test_df = spark.createDataFrame([
("TEST01", "item1", "2025-05-15"),
("TEST02", "item2", "2025-05-20")
], ["id", "name", "date"])
test_df = test_df.withColumn("SearchAction", F.lit("upload"))
# Assume you have an existing Azure Search index that contains scoring profiles like:
# {
# "name": "my-index",
# "fields": [...],
# "scoringProfiles": [{
# "name": "freshness_boost",
# "functionAggregation": "sum",
# "functions": [{
# "type": "freshness",
# "boost": 2.0,
# "fieldName": "date",
# "interpolation": "constant",
# "freshness": {"boostingDuration": "P1D"}
# }]
# }]
# }
# This FAILS with DeserializationException when the index has scoring profiles
try:
test_df.writeToAzureSearch(
subscriptionKey = AZURE_SEARCH_SUBSCRIPTION_KEY,
serviceName = AZURE_SEARCH_SERVICE_NAME,
indexName = AZURE_SEARCH_INDEX_NAME, # Index with scoring profiles
keyCol = "id",
actionCol = "SearchAction"
)
except Exception as e:
print(f"Error: {e}")
# Error: spray.json.DeserializationException: Expected String as JsString, but got {complex scoring profile object}
# WORKAROUND: Create/use an index without scoring profiles
AZURE_SEARCH_INDEX_NAME_NO_PROFILES = "same-index-no-scoring-profiles"
# This works when the index has no scoring profiles
test_df.writeToAzureSearch(
subscriptionKey = AZURE_SEARCH_SUBSCRIPTION_KEY,
serviceName = AZURE_SEARCH_SERVICE_NAME,
indexName = AZURE_SEARCH_INDEX_NAME_NO_PROFILES, # Index without scoring profiles
keyCol = "id",
actionCol = "SearchAction"
)
# Note: You can add scoring profiles to the index later via Azure Portal/REST APIOther info / logs
No response
What component(s) does this bug affect?
-
area/cognitive: Cognitive project -
area/core: Core project -
area/deep-learning: DeepLearning project -
area/lightgbm: Lightgbm project -
area/opencv: Opencv project -
area/vw: VW project -
area/website: Website -
area/build: Project build system -
area/notebooks: Samples under notebooks folder -
area/docker: Docker usage -
area/models: models related issue
What language(s) does this bug affect?
-
language/scala: Scala source code -
language/python: Pyspark APIs -
language/r: R APIs -
language/csharp: .NET APIs -
language/new: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/synapse: Azure Synapse integrations -
integrations/azureml: Azure ML integrations -
integrations/databricks: Databricks integrations