Skip to content

[BUG] Azure Search: DeserializationException when index contains scoring profiles #2382

@DennizSvens

Description

@DennizSvens

SynapseML version

1.0.9

System information

  • Language version (e.g. python 3.8, scala 2.12): Python 3.10.12, Scala 2.12.17
  • Spark Version (e.g. 3.2.3): 3.4.3
  • Spark Platform (e.g. Synapse, Databricks): Microsoft Fabric

Describe the problem

When writing data to an existing Azure Search index that contains scoring profiles, the operation fails with a spray.json.DeserializationException because the JSON parser expects scoring profiles to be simple strings but Azure Search returns complex JSON objects.

Root cause:

  • AzureSearchSchemas.scala defines scoringProfiles: Option[Seq[String]]
  • But Azure Search actually returns complex objects with functionAggregation, functions, text, etc.

Expected behavior:
Writing data to an index with scoring profiles should work without parsing errors.

Actual behavior:
Operation fails with DeserializationException when trying to parse the index definition.

Impact:
This prevents users from writing data to any Azure Search index that has scoring profiles configured, which is a common production scenario for relevance tuning.

Current workaround:

  • Create indexes without scoring profiles when using SynapseML
  • Add scoring profiles later via Azure Portal/REST API after data is written
  • Or recreate the index without scoring profiles each time

Code to reproduce issue

from synapse.ml.services import *
from pyspark.sql import functions as F

AZURE_SEARCH_SUBSCRIPTION_KEY = "<your-subscription-key>"
AZURE_SEARCH_SERVICE_NAME = "<your-service-name>"
AZURE_SEARCH_INDEX_NAME = "existing-index-with-scoring-profiles"

# Create simple test DataFrame
test_df = spark.createDataFrame([
    ("TEST01", "item1", "2025-05-15"),
    ("TEST02", "item2", "2025-05-20")
], ["id", "name", "date"])
test_df = test_df.withColumn("SearchAction", F.lit("upload"))

# Assume you have an existing Azure Search index that contains scoring profiles like:
# {
#   "name": "my-index",
#   "fields": [...],
#   "scoringProfiles": [{
#     "name": "freshness_boost",
#     "functionAggregation": "sum",
#     "functions": [{
#       "type": "freshness",
#       "boost": 2.0,
#       "fieldName": "date",
#       "interpolation": "constant", 
#       "freshness": {"boostingDuration": "P1D"}
#     }]
#   }]
# }

# This FAILS with DeserializationException when the index has scoring profiles
try:
    test_df.writeToAzureSearch(
        subscriptionKey = AZURE_SEARCH_SUBSCRIPTION_KEY,
        serviceName = AZURE_SEARCH_SERVICE_NAME,
        indexName = AZURE_SEARCH_INDEX_NAME,  # Index with scoring profiles
        keyCol = "id",
        actionCol = "SearchAction"
    )
except Exception as e:
    print(f"Error: {e}")
    # Error: spray.json.DeserializationException: Expected String as JsString, but got {complex scoring profile object}

# WORKAROUND: Create/use an index without scoring profiles
AZURE_SEARCH_INDEX_NAME_NO_PROFILES = "same-index-no-scoring-profiles"

# This works when the index has no scoring profiles
test_df.writeToAzureSearch(
    subscriptionKey = AZURE_SEARCH_SUBSCRIPTION_KEY,
    serviceName = AZURE_SEARCH_SERVICE_NAME,
    indexName = AZURE_SEARCH_INDEX_NAME_NO_PROFILES,  # Index without scoring profiles
    keyCol = "id",
    actionCol = "SearchAction"
)

# Note: You can add scoring profiles to the index later via Azure Portal/REST API

Other info / logs

No response

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions