Skip to content

Commit 66e324c

Browse files
authored
Merge pull request #724 from microsoft/vNext-Dev
v1.1.1 - Hotfixes for v1.1
2 parents 79af91d + b8470c8 commit 66e324c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+3961
-406
lines changed

.gitignore

+6-1
Original file line numberDiff line numberDiff line change
@@ -395,4 +395,9 @@ terraform.tfstate
395395
terraform.tfstate.d
396396
.tfplan.txt
397397
infra/infoasst*
398-
infra/sp_config/config.json
398+
infra/sp_config/config.json
399+
400+
#Upgrade & Migrate Support
401+
scripts/upgrade_repoint.config.json
402+
azcopy.tar.gz
403+
azcopy_dir

Makefile

+23-1
Original file line numberDiff line numberDiff line change
@@ -64,5 +64,27 @@ destroy-inf: check-subscription
6464
functional-tests: extract-env ## Run functional tests to check the processing pipeline is working
6565
@./scripts/functional-tests.sh
6666

67-
run-migration: ## Migrate from BICEP to Terraform
67+
merge-databases: ## Upgrade from bicep to terraform
68+
@figlet "Upgrading in place"
6869
python ./scripts/merge-databases.py
70+
71+
import-state: check-subscription ## import state of current services to TF state
72+
@./scripts/inf-import-state.sh
73+
74+
prep-upgrade: ## Command to merge databases and import TF state in prep for an upgrade from 1.0 to 1.n
75+
@figlet "Upgrading"
76+
merge-databases
77+
import-state
78+
79+
prep-env: ## Apply role assignments as needed to upgrade
80+
@figlet "Preparing Environment"
81+
@./scripts/prep-env.sh
82+
83+
prep-migration-env: ## Prepare the environment for migration by assigning required roles
84+
@./scripts/prep-migration-env.sh
85+
86+
run-data-migration: ## Run the data migration moving data from one resource group to another
87+
python ./scripts/extract-content.py
88+
89+
manual-inf-destroy: ## A command triggered by a user to destroy a resource group, associated resources, and related Entra items
90+
@./scripts/inf-manual-destroy.sh

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535

3636
[![Open in GitHub Codespaces](https://img.shields.io/static/v1?style=for-the-badge&label=GitHub+Codespaces&message=Open&color=brightgreen&logo=github)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=601652366&machine=basicLinux32gb&devcontainer_path=.devcontainer%2Fdevcontainer.json&location=eastus)
3737

38-
This industry accelerator showcases integration between Azure and OpenAI's large language models. It leverages Azure AI Search for data retrieval and ChatGPT-style Q&A interactions. Using the Retrieval Augmented Generation (RAG) design pattern with Azure Open AI's GPT models, it provides a natural language interaction to discover relevant responses to user queries. Azure AI Search simplifies data ingestion, transformation, indexing, and multilingual translation.
38+
This industry accelerator showcases integration between Azure and OpenAI's large language models. It leverages Azure AI Search for data retrieval and ChatGPT-style Q&A interactions. Using the Retrieval Augmented Generation (RAG) design pattern with Azure OpenAI's GPT models, it provides a natural language interaction to discover relevant responses to user queries. Azure AI Search simplifies data ingestion, transformation, indexing, and multilingual translation.
3939

4040
The accelerator adapts prompts based on the model type for enhanced performance. Users can customize settings like temperature and persona for personalized AI interactions. It offers features like explainable thought processes, referenceable citations, and direct content for verification.
4141

@@ -124,7 +124,7 @@ Find out more with Microsoft's [Responsible AI resources](https://www.microsoft.
124124

125125
### Content Safety
126126

127-
Content safety is provided through Azure Open AI service. The Azure OpenAI Service includes a content filtering system that runs alongside the core AI models. This system uses an ensemble of classification models to detect four categories of potentially harmful content (violence, hate, sexual, and self-harm) at four severity levels (safe, low, medium, high).These 4 categories may not be sufficient for all use cases, especially for minors. Please read our [Transaparncy Note](/docs/transparency.md)
127+
Content safety is provided through Azure OpenAI service. The Azure OpenAI Service includes a content filtering system that runs alongside the core AI models. This system uses an ensemble of classification models to detect four categories of potentially harmful content (violence, hate, sexual, and self-harm) at four severity levels (safe, low, medium, high).These 4 categories may not be sufficient for all use cases, especially for minors. Please read our [Transaparncy Note](/docs/transparency.md)
128128

129129
By default, the content filters are set to filter out prompts and completions that are detected as medium or high severity for those four harm categories. Content labeled as low or safe severity is not filtered.
130130

app/backend/app.py

+4-24
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@
126126
AUTHORITY = AzureAuthorityHosts.AZURE_GOVERNMENT
127127
else:
128128
AUTHORITY = AzureAuthorityHosts.AZURE_PUBLIC_CLOUD
129-
openai.api_version = "2023-12-01-preview"
129+
openai.api_version = "2024-02-01"
130130
# Use the current user identity to authenticate with Azure OpenAI, Cognitive Search and Blob Storage (no secrets needed,
131131
# just use 'az login' locally, and managed identity when deployed on Azure). If you need to use keys, use separate AzureKeyCredential instances with the
132132
# keys for each service
@@ -295,20 +295,11 @@ async def chat(request: Request):
295295
return {"error": "unknown approach"}, 400
296296

297297
if (Approaches(int(approach)) == Approaches.CompareWorkWithWeb or Approaches(int(approach)) == Approaches.CompareWebWithWork):
298-
r = await impl.run(json_body.get("history", []), json_body.get("overrides", {}), json_body.get("citation_lookup", {}), json_body.get("thought_chain", {}))
298+
r = impl.run(json_body.get("history", []), json_body.get("overrides", {}), json_body.get("citation_lookup", {}), json_body.get("thought_chain", {}))
299299
else:
300-
r = await impl.run(json_body.get("history", []), json_body.get("overrides", {}), {}, json_body.get("thought_chain", {}))
300+
r = impl.run(json_body.get("history", []), json_body.get("overrides", {}), {}, json_body.get("thought_chain", {}))
301301

302-
response = {
303-
"data_points": r["data_points"],
304-
"answer": r["answer"],
305-
"thoughts": r["thoughts"],
306-
"thought_chain": r["thought_chain"],
307-
"work_citation_lookup": r["work_citation_lookup"],
308-
"web_citation_lookup": r["web_citation_lookup"]
309-
}
310-
311-
return response
302+
return StreamingResponse(r, media_type="application/x-ndjson")
312303

313304
except Exception as ex:
314305
log.error(f"Error in chat:: {ex}")
@@ -824,17 +815,6 @@ async def stream_agent_response(question: str):
824815
Raises:
825816
HTTPException: If an error occurs while processing the question.
826817
"""
827-
# try:
828-
# def event_stream():
829-
# data_generator = iter(process_agent_response(question))
830-
# while True:
831-
# try:
832-
# chunk = next(data_generator)
833-
# yield chunk
834-
# except StopIteration:
835-
# yield "data: keep-alive\n\n"
836-
# time.sleep(5)
837-
# return StreamingResponse(event_stream(), media_type="text/event-stream")
838818
if question is None:
839819
raise HTTPException(status_code=400, detail="Question is required")
840820

app/backend/approaches/chatreadretrieveread.py

+116-85
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,16 @@
11
# Copyright (c) Microsoft Corporation.
22
# Licensed under the MIT license.
33

4+
import json
45
import re
56
import logging
67
import urllib.parse
78
from datetime import datetime, timedelta
8-
from typing import Any, Sequence
9+
from typing import Any, AsyncGenerator, Coroutine, Sequence
910

1011
import openai
12+
from openai import AzureOpenAI
13+
from openai import AsyncAzureOpenAI
1114
from approaches.approach import Approach
1215
from azure.search.documents import SearchClient
1316
from azure.search.documents.models import RawVectorQuery
@@ -128,17 +131,28 @@ def __init__(
128131
openai.api_base = oai_endpoint
129132
openai.api_type = 'azure'
130133
openai.api_key = oai_service_key
134+
openai.api_version = "2024-02-01"
135+
136+
self.client = AsyncAzureOpenAI(
137+
azure_endpoint = openai.api_base,
138+
api_key=openai.api_key,
139+
api_version=openai.api_version)
140+
131141

132142
self.model_name = model_name
133143
self.model_version = model_version
134144

145+
146+
147+
135148
# def run(self, history: list[dict], overrides: dict) -> any:
136149
async def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any], citation_lookup: dict[str, Any], thought_chain: dict[str, Any]) -> Any:
137150

138151
log = logging.getLogger("uvicorn")
139152
log.setLevel('DEBUG')
140153
log.propagate = True
141154

155+
chat_completion = None
142156
use_semantic_captions = True if overrides.get("semantic_captions") else False
143157
top = overrides.get("top") or 3
144158
user_persona = overrides.get("user_persona", "")
@@ -170,14 +184,19 @@ async def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any]
170184
self.chatgpt_token_limit - len(user_question)
171185
)
172186

173-
chat_completion = await openai.ChatCompletion.acreate(
174-
deployment_id=self.chatgpt_deployment,
175-
model=self.model_name,
176-
messages=messages,
177-
temperature=0.0,
178-
# max_tokens=32, # setting it too low may cause malformed JSON
179-
max_tokens=100,
180-
n=1)
187+
try:
188+
chat_completion= await self.client.chat.completions.create(
189+
model=self.chatgpt_deployment,
190+
messages=messages,
191+
temperature=0.0,
192+
# max_tokens=32, # setting it too low may cause malformed JSON
193+
max_tokens=100,
194+
n=1)
195+
196+
except Exception as e:
197+
log.error(f"Error generating optimized keyword search: {str(e)}")
198+
yield json.dumps({"error": f"Error generating optimized keyword search: {str(e)}"}) + "\n"
199+
return
181200

182201
generated_query = chat_completion.choices[0].message.content
183202

@@ -186,22 +205,33 @@ async def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any]
186205
generated_query = history[-1]["user"]
187206

188207
thought_chain["work_search_term"] = generated_query
208+
189209
# Generate embedding using REST API
190210
url = f'{self.embedding_service_url}/models/{self.escaped_target_model}/embed'
191211
data = [f'"{generated_query}"']
212+
192213
headers = {
193214
'Accept': 'application/json',
194215
'Content-Type': 'application/json',
195216
}
196217

197-
response = requests.post(url, json=data,headers=headers,timeout=60)
198-
if response.status_code == 200:
199-
response_data = response.json()
200-
embedded_query_vector =response_data.get('data')
201-
else:
202-
log.error(f"Error generating embedding:: {response.status_code}")
203-
raise Exception('Error generating embedding:', response.status_code)
204-
218+
embedded_query_vector = None
219+
try:
220+
response = requests.post(url, json=data,headers=headers,timeout=60)
221+
if response.status_code == 200:
222+
response_data = response.json()
223+
embedded_query_vector =response_data.get('data')
224+
else:
225+
# Generate an error message if the embedding generation fails
226+
log.error(f"Error generating embedding:: {response.status_code}")
227+
yield json.dumps({"error": "Error generating embedding"}) + "\n"
228+
return # Go no further
229+
except Exception as e:
230+
# Timeout or other error has occurred
231+
log.error(f"Error generating embedding: {str(e)}")
232+
yield json.dumps({"error": f"Error generating embedding: {str(e)}"}) + "\n"
233+
return # Go no further
234+
205235
#vector set up for pure vector search & Hybrid search & Hybrid semantic
206236
vector = RawVectorQuery(vector=embedded_query_vector, k=top, fields="contentVector")
207237

@@ -325,17 +355,19 @@ async def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any]
325355
userPersona=user_persona,
326356
systemPersona=system_persona,
327357
)
328-
# STEP 3: Generate a contextual and content-specific answer using the search results and chat history.
329-
#Added conditional block to use different system messages for different models.
330-
if self.model_name.startswith("gpt-35-turbo"):
331-
messages = self.get_messages_from_history(
332-
system_message,
333-
self.model_name,
334-
history,
335-
history[-1]["user"] + "Sources:\n" + content + "\n\n", # 3.5 has recency Bias that is why this is here
336-
self.RESPONSE_PROMPT_FEW_SHOTS,
337-
max_tokens=self.chatgpt_token_limit - 500
338-
)
358+
359+
try:
360+
# STEP 3: Generate a contextual and content-specific answer using the search results and chat history.
361+
#Added conditional block to use different system messages for different models.
362+
if self.model_name.startswith("gpt-35-turbo"):
363+
messages = self.get_messages_from_history(
364+
system_message,
365+
self.model_name,
366+
history,
367+
history[-1]["user"] + "Sources:\n" + content + "\n\n", # 3.5 has recency Bias that is why this is here
368+
self.RESPONSE_PROMPT_FEW_SHOTS,
369+
max_tokens=self.chatgpt_token_limit - 500
370+
)
339371

340372
#Uncomment to debug token usage.
341373
#print(messages)
@@ -347,66 +379,65 @@ async def run(self, history: Sequence[dict[str, str]], overrides: dict[str, Any]
347379
#print("System Message Tokens: ", self.num_tokens_from_string(system_message, "cl100k_base"))
348380
#print("Few Shot Tokens: ", self.num_tokens_from_string(self.response_prompt_few_shots[0]['content'], "cl100k_base"))
349381
#print("Message Tokens: ", self.num_tokens_from_string(message_string, "cl100k_base"))
350-
chat_completion = await openai.ChatCompletion.acreate(
351-
deployment_id=self.chatgpt_deployment,
352-
model=self.model_name,
353-
messages=messages,
354-
temperature=float(overrides.get("response_temp")) or 0.6,
355-
n=1
356-
)
357-
358-
elif self.model_name.startswith("gpt-4"):
359-
messages = self.get_messages_from_history(
360-
system_message,
361-
# "Sources:\n" + content + "\n\n" + system_message,
362-
self.model_name,
363-
history,
364-
# history[-1]["user"],
365-
history[-1]["user"] + "Sources:\n" + content + "\n\n", # GPT 4 starts to degrade with long system messages. so moving sources here
366-
self.RESPONSE_PROMPT_FEW_SHOTS,
367-
max_tokens=self.chatgpt_token_limit
368-
)
382+
chat_completion= await self.client.chat.completions.create(
383+
model=self.chatgpt_deployment,
384+
messages=messages,
385+
temperature=float(overrides.get("response_temp")) or 0.6,
386+
n=1,
387+
stream=True
388+
)
369389

370-
#Uncomment to debug token usage.
371-
#print(messages)
372-
#message_string = ""
373-
#for message in messages:
374-
# # enumerate the messages and add the role and content elements of the dictoinary to the message_string
375-
# message_string += f"{message['role']}: {message['content']}\n"
376-
#print("Content Tokens: ", self.num_tokens_from_string("Sources:\n" + content + "\n\n", "cl100k_base"))
377-
#print("System Message Tokens: ", self.num_tokens_from_string(system_message, "cl100k_base"))
378-
#print("Few Shot Tokens: ", self.num_tokens_from_string(self.response_prompt_few_shots[0]['content'], "cl100k_base"))
379-
#print("Message Tokens: ", self.num_tokens_from_string(message_string, "cl100k_base"))
390+
elif self.model_name.startswith("gpt-4"):
391+
messages = self.get_messages_from_history(
392+
system_message,
393+
# "Sources:\n" + content + "\n\n" + system_message,
394+
self.model_name,
395+
history,
396+
# history[-1]["user"],
397+
history[-1]["user"] + "Sources:\n" + content + "\n\n", # GPT 4 starts to degrade with long system messages. so moving sources here
398+
self.RESPONSE_PROMPT_FEW_SHOTS,
399+
max_tokens=self.chatgpt_token_limit
400+
)
380401

381-
chat_completion = await openai.ChatCompletion.acreate(
382-
deployment_id=self.chatgpt_deployment,
383-
model=self.model_name,
384-
messages=messages,
385-
temperature=float(overrides.get("response_temp")) or 0.6,
386-
max_tokens=1024,
387-
n=1
388-
)
389-
# STEP 4: Format the response
390-
msg_to_display = '\n\n'.join([str(message) for message in messages])
391-
generated_response=chat_completion.choices[0].message.content
392-
393-
# # Detect the language of the response
394-
response_language = self.detect_language(generated_response)
395-
#if response is not in user's language, translate it to user's language
396-
if response_language != detectedlanguage:
397-
translated_response = self.translate_response(generated_response, detectedlanguage)
398-
else:
399-
translated_response = generated_response
400-
thought_chain["work_response"] = urllib.parse.unquote(translated_response)
402+
#Uncomment to debug token usage.
403+
#print(messages)
404+
#message_string = ""
405+
#for message in messages:
406+
# # enumerate the messages and add the role and content elements of the dictoinary to the message_string
407+
# message_string += f"{message['role']}: {message['content']}\n"
408+
#print("Content Tokens: ", self.num_tokens_from_string("Sources:\n" + content + "\n\n", "cl100k_base"))
409+
#print("System Message Tokens: ", self.num_tokens_from_string(system_message, "cl100k_base"))
410+
#print("Few Shot Tokens: ", self.num_tokens_from_string(self.response_prompt_few_shots[0]['content'], "cl100k_base"))
411+
#print("Message Tokens: ", self.num_tokens_from_string(message_string, "cl100k_base"))
412+
413+
chat_completion= await self.client.chat.completions.create(
414+
model=self.chatgpt_deployment,
415+
messages=messages,
416+
temperature=float(overrides.get("response_temp")) or 0.6,
417+
n=1,
418+
stream=True
419+
420+
)
421+
msg_to_display = '\n\n'.join([str(message) for message in messages])
401422

402-
return {
403-
"data_points": data_points,
404-
"answer": f"{urllib.parse.unquote(translated_response)}",
405-
"thoughts": f"Searched for:<br>{generated_query}<br><br>Conversations:<br>" + msg_to_display.replace('\n', '<br>'),
406-
"thought_chain": thought_chain,
407-
"work_citation_lookup": citation_lookup,
408-
"web_citation_lookup": {}
409-
}
423+
424+
# Return the data we know
425+
yield json.dumps({"data_points": {},
426+
"thoughts": f"Searched for:<br>{generated_query}<br><br>Conversations:<br>" + msg_to_display.replace('\n', '<br>'),
427+
"thought_chain": thought_chain,
428+
"work_citation_lookup": citation_lookup,
429+
"web_citation_lookup": {}}) + "\n"
430+
431+
# STEP 4: Format the response
432+
async for chunk in chat_completion:
433+
# Check if there is at least one element and the first element has the key 'delta'
434+
if len(chunk.choices) > 0:
435+
yield json.dumps({"content": chunk.choices[0].delta.content}) + "\n"
436+
except Exception as e:
437+
log.error(f"Error generating chat completion: {str(e)}")
438+
yield json.dumps({"error": f"Error generating chat completion: {str(e)}"}) + "\n"
439+
return
440+
410441

411442
def detect_language(self, text: str) -> str:
412443
""" Function to detect the language of the text"""

0 commit comments

Comments
 (0)