Open
Description
I have connected a blob storage to azure AI search via indexer creating the required data source, skillset, index and the indexer.
I have used two skills: SplitSkill and AzureOpenAIEmbeddingSkill
SplitSkill is working properly as I can see in the index documents being split into chunks but no vector emebdding is being generated and the vector embedding fields remais empty.
What could be the reason? I have checked and verified embedding model, skillset and index.
I have used code present in the azure github samples.
Skillset Code:
from azure.search.documents.indexes.models import (
SplitSkill,
InputFieldMappingEntry,
OutputFieldMappingEntry,
AzureOpenAIEmbeddingSkill,
SearchIndexerIndexProjections,
SearchIndexerIndexProjectionSelector,
SearchIndexerIndexProjectionsParameters,
IndexProjectionMode,
SearchIndexerSkillset
)
# Create a skillset
skillset_name = f"{index_name}-skillset"
# Otherwise, use the normal document content.
split_skill_text_source = "/document/content" if not use_ocr else "/document/merged_content"
split_skill = SplitSkill(
description="Split skill to chunk documents",
text_split_mode="pages",
context="/document",
maximum_page_length=2000,
page_overlap_length=500,
inputs=[
InputFieldMappingEntry(name="text", source=split_skill_text_source),
],
outputs=[
OutputFieldMappingEntry(name="textItems", target_name="pages")
],
)
embedding_skill = AzureOpenAIEmbeddingSkill(
description="Skill to generate embeddings via Azure OpenAI",
context="/document/pages/*",
resource_uri=azure_openai_endpoint,
deployment_id=azure_openai_embedding_deployment,
model_name=azure_openai_model_name,
dimensions=dimenson,
api_key=model_key,
inputs=[
InputFieldMappingEntry(name="text", source="/document/pages/*"),
],
outputs=[
OutputFieldMappingEntry(name="embedding", target_name="content_vector")
],
)
index_projections = SearchIndexerIndexProjections(
selectors=[
SearchIndexerIndexProjectionSelector(
target_index_name=index_name,
parent_key_field_name="parent_id",
source_context="/document/pages/*",
mappings=[
InputFieldMappingEntry(name="content", source="/document/pages/*"),
InputFieldMappingEntry(name="content_vector", source="/document/pages/*/vector"),
InputFieldMappingEntry(name="metadata", source="/document/metadata_storage_name"),
],
),
],
parameters=SearchIndexerIndexProjectionsParameters(
projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS
),
)
skills = [split_skill, embedding_skill]
skillset = SearchIndexerSkillset(
name=skillset_name,
description="Skillset to chunk documents and generating embeddings",
skills=skills,
index_projections=index_projections
)
client = SearchIndexerClient(endpoint, credential)
client.create_or_update_skillset(skillset)
print(f"{skillset.name} created")
Metadata
Metadata
Assignees
Labels
No labels