Skip to content

Documentation infrastructure #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 155 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
155 commits
Select commit Hold shift + click to select a range
0e62be7
Basic skeleton
pramitchoudhary Apr 27, 2023
fe67b99
Initial options
pramitchoudhary May 1, 2023
7156cd8
Local db setup
pramitchoudhary May 2, 2023
4d01712
Initial flow for task generation
pramitchoudhary May 9, 2023
35134db
Support to generate and edit basic tasks
pramitchoudhary May 10, 2023
df620b7
Basic SQL generation using tasks
pramitchoudhary May 10, 2023
81c5b75
Add step to edit generated SQL
pramitchoudhary May 10, 2023
1727a2c
Fix task formatting
pramitchoudhary May 11, 2023
ccb9513
Update with dev installation steps
pramitchoudhary May 11, 2023
120bd88
Cache API key in .env.toml if entered manually
pramitchoudhary May 11, 2023
4f3f453
Merge branch 'main' of github.com:h2oai/sql-sidekick into main
pramitchoudhary May 11, 2023
f26af52
Minor prompt design update; added references
pramitchoudhary May 12, 2023
686336f
Fix pending table name param
pramitchoudhary May 12, 2023
f8f8dc5
Catch ValueError while transpiling
pramitchoudhary May 12, 2023
ae16574
Update README.md
pramitchoudhary May 12, 2023
f9deda3
Prompt format adjustment #1
pramitchoudhary May 15, 2023
af69137
Possible flow to inject missed additional data context #1
pramitchoudhary May 15, 2023
5070ba3
Merge branch 'main' of github.com:h2oai/sql-sidekick into main
pramitchoudhary May 15, 2023
e150cad
Minor prompt design adjusments #1
pramitchoudhary May 16, 2023
7b4abc5
Cache table name for current session #1
pramitchoudhary May 16, 2023
6ffd616
Further adjust the prompts #1
pramitchoudhary May 18, 2023
77ded5d
Add extra context for additional guidance #1
pramitchoudhary May 18, 2023
321641d
Add additional context for better guidance
pramitchoudhary May 18, 2023
08e4cd4
Simple memory to track n update context
pramitchoudhary May 23, 2023
10e1d61
Minor prompt adjustment
pramitchoudhary May 23, 2023
0c8ce8d
Update context query from saved historuy
pramitchoudhary May 24, 2023
e492a72
Use all-MiniLM-L6-v1 embeddings for fast duplicate removal
pramitchoudhary May 25, 2023
669eff3
Sample data format
pramitchoudhary May 25, 2023
c8149fe
Update content extraction strategy
pramitchoudhary May 25, 2023
17b4886
Update context memory option #1
pramitchoudhary May 25, 2023
d219a0d
Use all-MiniLM-L6-v1 embeddings for fast duplicate removal
pramitchoudhary May 25, 2023
e08a0fa
Use all-MiniLM-L6-v1 embeddings for fast duplicate removal
pramitchoudhary May 25, 2023
811d60e
Update dependencies
pramitchoudhary May 25, 2023
5c6da97
Enable ability to add context (query/answer) as input
pramitchoudhary May 25, 2023
f84c848
Check if updated context exists #1
pramitchoudhary May 25, 2023
b7c94b0
Fix duplicate removal #1
pramitchoudhary May 26, 2023
a4ca238
Memory format update #1
pramitchoudhary May 27, 2023
a064450
Extract entity from Qs n fix tracking #1
pramitchoudhary May 30, 2023
4742c5c
Fix context file error #1
pramitchoudhary May 30, 2023
569c8a9
Remove <key><value> tags before persisting #1
pramitchoudhary May 31, 2023
ada9fb5
Update DB config env variables
pramitchoudhary Jun 2, 2023
7bdeab8
Handle in-correct table name better
pramitchoudhary Jun 2, 2023
babd38c
Add missing dependency and update .env.toml
pramitchoudhary Jun 2, 2023
d8a5761
Updated requirements.txt
pramitchoudhary Jun 2, 2023
1b754d7
Improve prompt template
pramitchoudhary Jun 6, 2023
749b82c
Allow user to save generate SQL
pramitchoudhary Jun 6, 2023
063977e
Fix error when attepting to address syntax error dynamically
pramitchoudhary Jun 6, 2023
743d7d0
Add ability to build pkg(.whl) (#6)
pramitchoudhary Jun 9, 2023
100ba22
Update README.md
pramitchoudhary Jun 9, 2023
78c7a8f
Save updated state if generated SQL is modified
pramitchoudhary Jun 14, 2023
a3c4c88
Prompt update and handle exception wen queried table is not found
pramitchoudhary Jun 16, 2023
11efcf0
Enable workflow to regenerate response
pramitchoudhary Jun 19, 2023
08a24a8
Prompt adjustments
pramitchoudhary Jun 19, 2023
46f53ab
Only keep contextually similar samples for few shot learnig
pramitchoudhary Jun 19, 2023
8de3e77
Only keep contextually similar samples for few shot learnig
pramitchoudhary Jun 19, 2023
bb080ad
Add a common logger
pramitchoudhary Jun 19, 2023
0c7cd5b
Bookkeeping for v0.0.2
pramitchoudhary Jun 19, 2023
65cfd1d
Fix data template issue (#7)
pramitchoudhary Jun 21, 2023
3e3ae39
Bookkeeping v0.0.3 n fixed updating table metadata
pramitchoudhary Jun 21, 2023
fc21cfa
Update README.md
pramitchoudhary Jun 21, 2023
f3d8acb
Telemetry examples
pramitchoudhary Jun 21, 2023
bddb67b
More updates to README
pramitchoudhary Jun 21, 2023
1ce7ad5
Minor prompt adjustment
pramitchoudhary Jun 22, 2023
ecf0847
Merge branch 'main' of github.com:h2oai/sql-sidekick into main
pramitchoudhary Jun 22, 2023
be1c879
Parameterize model name n minor prompt adjustments
pramitchoudhary Jun 26, 2023
b33db8d
Code to add samples and execute query (#9)
narasimhard Jul 6, 2023
25a9570
Partial logic for basic UI workflow + More improvements (#10)
pramitchoudhary Jul 7, 2023
6d35db8
Change the default dialect to SQLite (#13)
narasimhard Jul 19, 2023
a0241d5
Initial commit to connect chat with sidekick
narasimhard Jul 28, 2023
1a1e3de
Initial UI to connect chat with sidekick
pramitchoudhary Jul 28, 2023
c1e2cc4
Initial skeleton to support local LLM #4
pramitchoudhary Jul 26, 2023
061822c
Cache column sample values for future use #4
pramitchoudhary Jul 28, 2023
777d63c
Decode and generate #4
pramitchoudhary Jul 28, 2023
0724822
Add InstructorEmbedding as dependency #4
pramitchoudhary Aug 2, 2023
0240036
Initial working version with local LLM #4
pramitchoudhary Aug 2, 2023
83ab1c2
Dependency update #4
pramitchoudhary Aug 3, 2023
dd6ab34
Load quantized version of the model for faster inferrence #4
pramitchoudhary Aug 4, 2023
a7dfae3
Fix initialization #4
pramitchoudhary Aug 7, 2023
9c46ff1
Fix respective config paths #4
pramitchoudhary Aug 8, 2023
a3ee1bf
Update Make rules for demo data #4
pramitchoudhary Aug 8, 2023
d6a38b7
Enable local LLM - Part 1
pramitchoudhary Aug 8, 2023
c7719f5
Upload Datasets
narasimhard Aug 9, 2023
97b2c78
Additional syntax based dialect validation #4
pramitchoudhary Aug 9, 2023
186ea08
Adding code to generate temporary files & dir for uploading datasets
narasimhard Aug 9, 2023
84c094c
Merge pull request #16 from h2oai/initial_gui_dataset_uploads
narasimhard Aug 10, 2023
3f06f6c
Correct make rule for demo dir
pramitchoudhary Aug 10, 2023
f4ecff8
Update README.md
pramitchoudhary Aug 10, 2023
dae4bbd
Updated TypeHint
narasimhard Aug 10, 2023
c0bc110
Updated TypeHint
pramitchoudhary Aug 11, 2023
6108d85
Fix runtime error for CPU #4
pramitchoudhary Aug 11, 2023
979df99
Minor bug fixes
pramitchoudhary Aug 12, 2023
2bdfe11
GUI changes - table select, Progress Bar, UI fixes
narasimhard Aug 12, 2023
ea8f1ad
Added LLMs to load only once
narasimhard Aug 14, 2023
37fd4a2
Fixed table name update error
narasimhard Aug 14, 2023
53066f3
Fix user table select option
narasimhard Aug 14, 2023
3b56360
Merge pull request #22 from h2oai/minor_ui_changes
narasimhard Aug 14, 2023
91a802a
Merge remote-tracking branch 'origin/main' into no_reload_llm
narasimhard Aug 14, 2023
69d3ae3
Minor table_context file fix
narasimhard Aug 14, 2023
d38e3c0
Lowercase table name
narasimhard Aug 14, 2023
2b50259
Removed COLLATE NOCASE
narasimhard Aug 14, 2023
02869ed
Workaround to manage context length #4
pramitchoudhary Aug 15, 2023
1df20c1
Workaround to manage context length #4
pramitchoudhary Aug 15, 2023
e8c460e
Update prompter.py
narasimhard Aug 15, 2023
bea173f
Add few changes to NSQL model load method
narasimhard Aug 15, 2023
5bb1aac
Demoware on cloud for further experimentation (#25)
pramitchoudhary Aug 16, 2023
c1bf188
Merge branch 'main' into no_reload_llm
narasimhard Aug 16, 2023
c00c624
Merge pull request #23 from h2oai/no_reload_llm
narasimhard Aug 16, 2023
a741ee4
adjust python requirements
pramitchoudhary Aug 17, 2023
147edb2
Swap model for llaMa2-7b version #4
pramitchoudhary Aug 17, 2023
8cd47c4
Regenerate Response in UI & CLI
narasimhard Aug 18, 2023
3362b68
Merge pull request #27 from h2oai/regenerate_response
narasimhard Aug 18, 2023
38a6762
Download models during install #4
pramitchoudhary Aug 18, 2023
379a1ec
Log alternate generation possibilities with respective scores #4
pramitchoudhary Aug 19, 2023
32bc877
Enable alternate options in chat #4
pramitchoudhary Aug 20, 2023
80cb1f4
Dynmaically offload state between GPU n CPU as needed #4
pramitchoudhary Aug 22, 2023
427ad8b
Support 4bit quantized model format as default (#28)
pramitchoudhary Aug 30, 2023
3ba31f0
Added support for faster n exhaustive regeneration #4
pramitchoudhary Sep 6, 2023
4689307
Make additional options layout more visible
pramitchoudhary Sep 7, 2023
b23cab3
Update to SQLGenerator initialization
pramitchoudhary Sep 7, 2023
e79a59c
Enable ability to save conversation for future
pramitchoudhary Sep 8, 2023
30d398e
Fix edge case for regeneration/save
pramitchoudhary Sep 11, 2023
8419845
Sort result descending order wen showing multiple options
pramitchoudhary Sep 12, 2023
76287ac
Adjust prompt #29
pramitchoudhary Sep 13, 2023
5e310c8
Update ability to save QnA pairs #29
pramitchoudhary Sep 13, 2023
e5cc258
Update workflow to save qna pair #29
pramitchoudhary Sep 13, 2023
301f26a
Misc updates n fixes
pramitchoudhary Sep 13, 2023
5724f06
Changes to Similarity Model and regenerate option
narasimhard Sep 13, 2023
51e2827
Fix save format issue #29
pramitchoudhary Sep 18, 2023
25e956d
Changes the similarity Model and regenerate option
pramitchoudhary Sep 18, 2023
b130fe8
Fix table name input with spaces #29
pramitchoudhary Sep 19, 2023
ce6e461
Handle basic Describe data query #29
pramitchoudhary Sep 19, 2023
677ae24
Demo update #29
pramitchoudhary Sep 19, 2023
389f606
Demo mode update
pramitchoudhary Sep 19, 2023
a89373a
just dumping column names to a jsonl
robinliubin Sep 20, 2023
7f6ca75
draft version with correct schema
robinliubin Sep 20, 2023
93d7e52
Add separate tag to execute user input SQL code #29
pramitchoudhary Sep 20, 2023
a4c7796
Bookkeeping #29
pramitchoudhary Sep 20, 2023
af37b71
remove Sample Values for NUMERIC and convert whitespace in column nam…
robinliubin Sep 21, 2023
40417ac
Fix column name mismatch
pramitchoudhary Sep 21, 2023
d586118
Upgrade version to v0.0.12
pramitchoudhary Sep 25, 2023
bbae1b2
Basic rules for detecting malicious patterns in generated SQL #33
pramitchoudhary Sep 26, 2023
128d13b
convert notebook to py file
robinliubin Sep 27, 2023
d96cc0d
Auto fix in-correct '“' during generation #33
pramitchoudhary Sep 27, 2023
7727abf
Additional guardrails to flag generic syntax based malicious patterns…
pramitchoudhary Sep 28, 2023
d3330ac
Fix additional string related edge cases #33
pramitchoudhary Sep 28, 2023
e3ee3eb
Basic rules for detecting malicious patterns in generated SQL #33
pramitchoudhary Sep 29, 2023
c2e2811
Fix table name n meta-data mapping
pramitchoudhary Sep 30, 2023
2b6f805
Fix annoyig table name to meta data mapping #29
pramitchoudhary Oct 3, 2023
394a41f
remove csv
robinliubin Oct 3, 2023
61aab98
Merge pull request #36 from h2oai/robin/schema_generator
robinliubin Oct 3, 2023
264e833
Handle special characters in default schema generator & basic insert …
pramitchoudhary Oct 4, 2023
6eceb90
Update CLI params for default schema generation #35
pramitchoudhary Oct 5, 2023
5d53811
Remove un-used logic #35
pramitchoudhary Oct 5, 2023
60bfdde
Update
5675sp Oct 6, 2023
fecf34a
Update
5675sp Oct 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .appignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.DS_Store
venv/
.sidekickvenv/
var/
.git/
.idea/
*/__pycache__/
scripts/
setup_cythonize/
.sh
build/
dist/
tests/
ci/
examples/sleep_eda/
examples/telemetry/
.log
15 changes: 15 additions & 0 deletions .github/ISSUE_TEMPLATE/documentation_issue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
name: "\U0001F4C3 Documentation"
about: Create a documentation issue or request to help us improve the sql-sidekick repo
title: "[DOCS]"
labels: area/documentation
assignees: '5675sp'
---

### 📃 Documentation issue/request

<!-- A clear and concise description of what the documentation issue/request is -->

### Documentation version

<!-- Documentation version (for a current or future version)? -->
32 changes: 32 additions & 0 deletions .github/workflows/deploy-to-github-pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Deploy to GitHub pages

on:
workflow_dispatch:

jobs:
deploy:
name: Deploy to GitHub Pages
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
always-auth: true
registry-url: https://npm.pkg.github.com/
node-version: 18
cache: npm
cache-dependency-path: documentation/package-lock.json

- name: Install dependencies
run: cd documentation && npm install --frozen-lockfile
env:
NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Build docs
run: cd documentation && npm run build
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./documentation/tmp/build
user_name: 5675sp ##swap username out with the username of someone with admin access to the repo
user_email: [email protected] ##swap email out with the email of someone with admin access to the repo
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -157,4 +157,4 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/
5 changes: 5 additions & 0 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"recommendations": [
"ms-python.python"
]
}
15 changes: 15 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"[python]": {
"editor.tabSize": 4,
"editor.defaultFormatter": "ms-python.black-formatter"
},
"files.eol": "\n",
"files.insertFinalNewline": true,
"files.trimFinalNewlines": true,
"files.trimTrailingWhitespace": true,
"python.formatting.provider": "none",
"python.linting.enabled": true,
"python.linting.flake8Enabled": true,
"python.formatting.blackArgs": ["--line-length", "120"],
"python.linting.flake8Args": ["--max-line-length=120"],
}
41 changes: 41 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
sentence_transformer = s3cmd get --recursive --skip-existing s3://h2o-model-gym/models/nlp/sentence_trasnsformer/all-MiniLM-L6-v2/ ./models/sentence_transformers/sentence-transformers_all-MiniLM-L6-v2
demo_data = s3cmd get --recursive --skip-existing s3://h2o-sql-sidekick/demo/sleepEDA/ ./examples/demo/

.PHONY: download_demo_data

all: download_demo_data

setup: download_demo_data ## Setup
python3 -m venv .sidekickvenv
./.sidekickvenv/bin/python3 -m pip install --upgrade pip
./.sidekickvenv/bin/python3 -m pip install wheel
./.sidekickvenv/bin/python3 -m pip install -r requirements.txt
mkdir -p ./db/sqlite
mkdir -p ./examples/demo/

download_models:
mkdir -p ./models/sentence_transformers/sentence-transformers_all-MiniLM-L6-v2

download_demo_data:
mkdir -p ./examples/demo/
$(demo_data)

cloud_bundle:
h2o bundle -L debug 2>&1 | tee -a h2o-bundle.log


setup-doc: # Install documentation dependencies
cd documentation && npm install

run-doc: # Run the doc locally
cd documentation && npm start

update-documentation-infrastructure:
cd documentation && npm update @h2oai/makersaurus
cd documentation && npm ls

build-doc-locally: # Bundles your website into static files for production
cd documentation && npm run build

serve-doc-locally: # Serves the built website locally
cd documentation && npm run serve
49 changes: 48 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,49 @@
# sql-sidekick
A simple sql assistant
A simple SQL assistant (WIP)


# Installation
## Dev
```
1. git clone [email protected]:h2oai/sql-sidekick.git
2. cd sql-sidekick
3. make setup
4. source ./.sidekickvenv/bin/activate
5. python sidekick/prompter.py
```
## Usage
```
Dialect: postgres
- docker pull postgres (will pull the latest version)
- docker run --rm --name pgsql-dev -e POSTGRES_PASSWORD=abc -p 5432:5432 postgres

Default: sqlite
Step:
- Download and install .whl --> s3://sql-sidekick/releases/sql_sidekick-0.0.3-py3-none-any.whl
- python3 -m venv .sidekickvenv
- source .sidekickvenv/bin/activate
- python3 -m pip install sql_sidekick-0.0.3-py3-none-any.whl
```
## Start
```
Welcome to the SQL Sidekick! I am an AI assistant that helps you with SQL
queries. I can help you with the following:

1. Configure a local database(for schema validation and syntax checking):
`sql-sidekick configure db-setup -t "<local_dir_path_to_>/table_info.jsonl"` (e.g., format --> https://github.com/h2oai/sql-sidekick/blob/main/examples/telemetry/table_info.jsonl)

2. Ask a question: `sql-sidekick query -q "avg Gpus" -s "<local_dir_path_to_>/samples.csv"` (e.g., format --> https://github.com/h2oai/sql-sidekick/blob/main/examples/telemetry/samples.csv)

3. Learn contextual query/answer pairs: `sql-sidekick learn add-samples` (optional)

4. Add context as key/value pairs: `sql-sidekick learn update-context` (optional)

Options:
--version Show the version and exit.
--help Show this message and exit.

Commands:
configure Helps in configuring local database.
learn Helps in learning and building memory.
query Asks question and returns SQL
```
Empty file added __init__.py
Empty file.
12 changes: 12 additions & 0 deletions about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
**App Goal:** Demo-ware web client for SQL-Sidekick

**Target Audience:** Data (Machine Learning) Scientists, Citizen Data Scientists, Data Engineers Managers and Business Analysts

**Actively Being Maintained:** Yes (Demo release: _In active RnD_)

**Last Updated:** September, 2023

**Allows uploading and using new model and data:** Yes

**Detailed Description:**
An experimental demo to evaluate text-to-SQL capabilities of large language models (LLMs) to enable QnA for tabular data
23 changes: 23 additions & 0 deletions app.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[App]
name = "ai.h2o.wave.sql-sidekick"
title = "SQL-Sidekick"
description = "QnA with tabular data using NLQ"
LongDescription = "about.md"
Tags = ["DATA_SCIENCE", "MACHINE_LEARNING", "NLP"]
Version = "0.0.13"

[Runtime]
MemoryLimit = "64Gi"
MemoryReservation = "16Gi"
module = "start"
VolumeMount = "/meta_data"
VolumeSize = "100Gi"
ResourceVolumeSize = "64Gi"
GPUCount = 1
RuntimeVersion = "ub2004_cuda114_cudnn8_py38_wlatest_a10g"
RoutingMode = "BASE_URL"
EnableOIDC = true

[[Env]]
Name = "H2O_WAVE_MAX_REQUEST_SIZE"
Value = "20M"
17 changes: 17 additions & 0 deletions documentation/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
node_modules
tmp

# Generated files
.docusaurus
.cach-loader

# Misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*
20 changes: 20 additions & 0 deletions documentation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# New Documentation Site

What is the purpose of these docs?

## Running this site

This site was generated using Makersaurus, which is a very thin wrapper around Facebook's Docusaurus. You can write documentation in the typical way, using markdown files located in the `docs` folder and registering those files in `sidebars.js`.

Use the following commands to run the generate the site and run it locally:

```
npx @h2oai/makersaurus@latest gen
cd gen
npm install
npm start
```

## More information

Use the Makersaurus docs to earn how to edit docs, deploy the site, set up versioning and more.
1 change: 1 addition & 0 deletions documentation/docs/admin-guide/page-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Page 1
1 change: 1 addition & 0 deletions documentation/docs/api-reference-guide/page-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Page 1
Binary file added documentation/docs/application-name-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions documentation/docs/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Concepts
13 changes: 13 additions & 0 deletions documentation/docs/faqs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# FAQs

[application_description]


---

The below sections provide answers to frequently asked questions. If you have additional questions, please send them to <[email protected]>.


## General


Loading