We're thrilled to announce the preview of DiskANN, a leading vector indexing algorithm, on Azure Database for PostgreSQL - Flexible Server! Developed by Microsoft Research and used extensively at Microsoft in global services such as Bing and Microsoft 365, DiskANN enables developers to build highly accurate, performant and scalable Generative AI applications surpassing pgvector’s HNSW and IVFFlat in both latency and accuracy. DiskANN also overcomes a long-standing limitation of pgvector in filtered vector search, where it occasionally returns incorrect results.
DiskANN_demo_for_blog.mp4
Sample Website. This sample application show a sample AirBNB dataset search page:
- It illustrate the improved recall of using DiskANN vs using HNSW.
- When a filter is apply you will notice
HNSW index
doesn't return the same amount of results asDiskANN
orNo Index
.
Read more on how to use DiskANN in the Microsoft Docs: Docs
Check out the blog: Blog
First step will be to setup the data on the new Postgres Database you just created.
Make sure the following tools are installed:
Follow Microsoft documentation for enrolling in DiskANN preview
Follow Microsoft documentation for enabling DiskANN
This demo app will show you how DiskANN Index works better that HNSW.
-
In a Shell prompt, enter the following to clone the GitHub repo containing exercise resources:
git clone https://github.com/Azure-Samples/DiskANN-demo.git
Enter the repo directory
cd DiskANN-demo
-
Update your .env with the following:
AZURE_OPENAI_API_KEY ="***sample***key****" AZURE_OPENAI_ENDPOINT="https://sample.openai.azure.com/" AZURE_PG_CONNECTION="dbname={DB_NAME} host={HOST} port=5432 sslmode=require user={USER_NAME} password={PASSWORD}"
Connect to the database and run commands in the file setup/sql-scripts/1_PGAI-Demo_setup.sql
in psql
or your favorite Postgres Editor
NOTE: This command has to be run from the root directory of the repo. In order to correctly locate the
setup/sql-scripts/1_PGAI-Demo_setup.sql
file.
If running from psql
:
\i setup/sql-scripts/1_PGAI-Demo_setup.sql
Run commands in the file setup/sql-scripts/2_PGAI-Demo_endpoint_and_embedding_config.sql
in psql
or your favorite Postgres Editor
NOTE: You will need to update Line 21-22, with out OpenAI credentials.
select azure_ai.set_setting('azure_openai.endpoint', '');
select azure_ai.set_setting('azure_openai.subscription_key', '');
If running from psql
:
\i setup/sql-scripts/2_PGAI-Demo_endpoint_and_embedding_config.sql
TIP: This will take around 2-5 minutes to run.
Run commands in the file setup/sql-scripts/3_PGAI-Demo_pgai_queries.sql
in psql
or your favorite Postgres Editor
If running from psql
:
\i setup/sql-scripts/3_PGAI-Demo_pgai_queries.sql
Since the local app uses OpenAI models, you should first deploy it for the optimal experience.
- Copy
.env.sample
into a.env
file. - To use Azure OpenAI, fill in the values of
AZURE_OPENAI_ENDPOINT
andAZURE_OPENAI_API_KEY
based on the deployed values. - Fill in the connection string value for
AZURE_PG_CONNECTION
, You can find this in the Azure Portal
Install required Python packages and streamlit application:
python3 -m venv .diskann
source .diskann/bin/activate
pip install -r requirements.txt
From root directory
cd src/app
streamlit run app.py
When run locally run looking for website at http://localhost:8501/
Explore Python Notebook