Skip to content

Commit 62947b6

Browse files
SQL-migration-assistant (#238)
1 parent adde807 commit 62947b6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2434
-2
lines changed

CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ go-libs @nfx
1313
ip_access_list_analyzer @alexott
1414
metascan @nfx
1515
runtime-packages @nfx
16+
sql_migration_copilot @robertwhiffin
1617
tacklebox @Jonathan-Choi
1718
uc-catalog-cloning @esiol-db @vasco-lopes
1819
.github @nfx

cli.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,13 @@ def ip_access_list_analyzer(**args):
1010
import ip_access_list_analyzer.ip_acl_analyzer as analyzer
1111
analyzer.main(args)
1212

13+
def sql_migration_assistant(**args):
14+
from sql_migration_assistant import hello
15+
hello()
1316

1417
MAPPING = {
15-
"ip-access-list-analyzer": ip_access_list_analyzer
18+
"ip-access-list-analyzer": ip_access_list_analyzer,
19+
"sql-migration-assistant": sql_migration_assistant
1620
}
1721

1822

labs.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
---
22
name: sandbox
33
install:
4-
min_runtime_version: 13.1
54
script: install.py
65
description: Databricks Labs Sandbox
76
entrypoint: cli.py
@@ -17,3 +16,12 @@ commands:
1716
- name: apply
1817
description: "If script should do the changes"
1918
default: false
19+
- name: sql-migration-assistant
20+
description: "GenAI enabled SQL Migration tool"
21+
flags:
22+
- name: json_file
23+
description: "Optional JSON file with dump of IP Access Lists"
24+
default: ''
25+
- name: apply
26+
description: "If script should do the changes"
27+
default: false

sql_migration_assistant/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.databrickscfg

sql_migration_assistant/README.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
title: Project Legion - SQL Migration Assistant
3+
language: python
4+
author: Robert Whiffin
5+
date: 2024-08-28
6+
7+
tags:
8+
- SQL
9+
- Migration
10+
- copilot
11+
- GenAi
12+
13+
---
14+
15+
# Project Legion - SQL Migration Assistant
16+
17+
Legion is a Databricks field project to accelerate migrations on to Databricks leveraging the platform’s generative AI
18+
capabilities. It uses an LLM for code conversion and intent summarisation, presented to users in a front end web
19+
application.
20+
21+
Legion provides a chatbot interface to users for translating input code (for example T-SQL to Databricks SQL) and
22+
summarising the intent and business purpose of the code. This intent is then embedded for serving in a Vector Search
23+
index for finding similar pieces of code. This presents an opportunity for increased collaboration (find out who is
24+
working on similar projects), rationalisation (identify duplicates based on intent) and discoverability (semantic search).
25+
26+
Legion is a solution accelerator - it is *not* a fully baked solution. This is something for you the customer to take
27+
on and own. This allows you to present a project to upskill your employees, leverage GenAI for a real use case,
28+
customise the application to their needs and entirely own the IP.
29+
30+
## Installation Videos
31+
32+
https://github.com/user-attachments/assets/b43372fb-95ea-49cd-9a2c-aec8e0d6700f
33+
34+
https://github.com/user-attachments/assets/fa622f96-a78c-40b8-9eb9-f6671c4d7b47
35+
36+
https://github.com/user-attachments/assets/1a58a1b5-2dcf-4624-b93f-214735162584
37+
38+
39+
40+
Setting Legion up is a simple and automated process. Ensure you have the [Databricks CLI]
41+
(https://docs.databricks.com/en/dev-tools/cli/index.html) installed and configured with the correct workspace. Install
42+
the [Databricks Labs Sandbox](https://github.com/databrickslabs/sandbox).
43+
44+
First, navigate to where you have installed the Databricks Labs Sandbox. For example
45+
```bash
46+
cd /Documents/sandbox
47+
```
48+
49+
You'll need to install the python requirements in the `requirements.txt` file in the root of the project.
50+
You may wish to do this in a virtual environment.
51+
```bash
52+
pip install -r sql-migration-assistant/requirements.txt -q
53+
```
54+
Run the following command to start the installation process, creating all the necessary resources in your workspace.
55+
```bash
56+
databricks labs sandbox sql-migration-assistant
57+
```
58+
59+
### What Legion needs - during setup above you will create or choose existing resources for the following:
60+
61+
- A no-isolation shared cluster running the ML runtime (tested on DBR 15.0 ML) to host the front end application.
62+
- A catalog and schema in Unity Catalog.
63+
- A table to store the code intent statements and their embeddings.
64+
- A vector search endpoint and an embedding model: see docs
65+
https://docs.databricks.com/en/generative-ai/vector-search.html#how-to-set-up-vector-search
66+
- A chat LLM. Pay Per Token is recomended where available, but the set up will also allow for creation of
67+
a provisioned throughput endpoint.
68+
- A PAT stored in a secret scope chosen by you, under the key `sql-migration-pat`.

sql_migration_assistant/__init__.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
from sql_migration_assistant.utils.initialsetup import SetUpMigrationAssistant
2+
from databricks.sdk import WorkspaceClient
3+
from databricks.labs.blueprint.tui import Prompts
4+
import yaml
5+
6+
7+
def hello():
8+
w = WorkspaceClient(product="sql_migration_assistant", product_version="0.0.1")
9+
p = Prompts()
10+
setter_upper = SetUpMigrationAssistant()
11+
final_config = setter_upper.setup_migration_assistant(w, p)
12+
with open("sql_migration_assistant/config.yml", "w") as f:
13+
yaml.dump(final_config, f)
14+
setter_upper.upload_files(w)
15+
setter_upper.launch_review_app(w, final_config)

sql_migration_assistant/app/__init__.py

Whitespace-only changes.

sql_migration_assistant/app/llm.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
import logging
2+
3+
from databricks.sdk import WorkspaceClient
4+
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole
5+
6+
w = WorkspaceClient()
7+
foundation_llm_name = "databricks-meta-llama-3-1-405b-instruct"
8+
max_token = 4096
9+
messages = [
10+
ChatMessage(role=ChatMessageRole.SYSTEM, content="You are an unhelpful assistant"),
11+
ChatMessage(role=ChatMessageRole.USER, content="What is RAG?"),
12+
]
13+
14+
15+
class LLMCalls:
16+
def __init__(self, foundation_llm_name, max_tokens):
17+
self.w = WorkspaceClient()
18+
self.foundation_llm_name = foundation_llm_name
19+
self.max_tokens = int(max_tokens)
20+
21+
def call_llm(self, messages):
22+
"""
23+
Function to call the LLM model and return the response.
24+
:param messages: list of messages like
25+
messages=[
26+
ChatMessage(role=ChatMessageRole.SYSTEM, content="You are an unhelpful assistant"),
27+
ChatMessage(role=ChatMessageRole.USER, content="What is RAG?"),
28+
ChatMessage(role=ChatMessageRole.ASSISTANT, content="A type of cloth?")
29+
]
30+
:return: the response from the model
31+
"""
32+
response = self.w.serving_endpoints.query(
33+
name=foundation_llm_name, max_tokens=max_token, messages=messages
34+
)
35+
message = response.choices[0].message.content
36+
return message
37+
38+
def convert_chat_to_llm_input(self, system_prompt, chat):
39+
# Convert the chat list of lists to the required format for the LLM
40+
messages = [ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt)]
41+
for q, a in chat:
42+
messages.extend(
43+
[
44+
ChatMessage(role=ChatMessageRole.USER, content=q),
45+
ChatMessage(role=ChatMessageRole.ASSISTANT, content=a),
46+
]
47+
)
48+
return messages
49+
50+
################################################################################
51+
# FUNCTION FOR TRANSLATING CODE
52+
################################################################################
53+
54+
# this is called to actually send a request and receive response from the llm endpoint.
55+
56+
def llm_translate(self, system_prompt, input_code):
57+
messages = [
58+
ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
59+
ChatMessage(role=ChatMessageRole.USER, content=input_code),
60+
]
61+
62+
# call the LLM end point.
63+
llm_answer = self.call_llm(messages=messages)
64+
# Extract the code from in between the triple backticks (```), since LLM often prints the code like this.
65+
# Also removes the 'sql' prefix always added by the LLM.
66+
translation = llm_answer # .split("Final answer:\n")[1].replace(">>", "").replace("<<", "")
67+
return translation
68+
69+
def llm_chat(self, system_prompt, query, chat_history):
70+
messages = self.convert_chat_to_llm_input(system_prompt, chat_history)
71+
messages.append(ChatMessage(role=ChatMessageRole.USER, content=query))
72+
# call the LLM end point.
73+
llm_answer = self.call_llm(messages=messages)
74+
return llm_answer
75+
76+
def llm_intent(self, system_prompt, input_code):
77+
messages = [
78+
ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
79+
ChatMessage(role=ChatMessageRole.USER, content=input_code),
80+
]
81+
82+
# call the LLM end point.
83+
llm_answer = self.call_llm(messages=messages)
84+
return llm_answer
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
from databricks.sdk import WorkspaceClient
2+
from databricks.labs.lsql.core import StatementExecutionExt
3+
4+
5+
class SimilarCode:
6+
7+
def __init__(
8+
self,
9+
workspace_client: WorkspaceClient,
10+
see: StatementExecutionExt,
11+
catalog,
12+
schema,
13+
code_intent_table_name,
14+
VS_index_name,
15+
VS_endpoint_name,
16+
):
17+
self.w = workspace_client
18+
self.see = see
19+
self.catalog = catalog
20+
self.schema = schema
21+
self.code_intent_table_name = code_intent_table_name
22+
self.vs_index_name = VS_index_name
23+
self.vs_endpoint_name = VS_endpoint_name
24+
25+
def save_intent(self, code, intent):
26+
code_hash = hash(code)
27+
_ = self.see.execute(
28+
f'INSERT INTO {self.catalog}.{self.schema}.{self.code_intent_table_name} VALUES ({code_hash}, "{code}", "{intent}")',
29+
)
30+
31+
def get_similar_code(self, chat_history):
32+
intent = chat_history[-1][1]
33+
results = self.w.vector_search_indexes.query_index(
34+
index_name=f"{self.catalog}.{self.schema}.{self.vs_index_name}",
35+
columns=["code", "intent"],
36+
query_text=intent,
37+
num_results=1,
38+
)
39+
docs = results.result.data_array
40+
return (docs[0][0], docs[0][1])
Binary file not shown.
Binary file not shown.
Binary file not shown.

sql_migration_assistant/docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

0 commit comments

Comments
 (0)