Skip to content

Latest commit

 

History

History
139 lines (94 loc) · 5.92 KB

README.md

File metadata and controls

139 lines (94 loc) · 5.92 KB

DiskANN Vector Index in Azure Database for PostgreSQL

Deploy to Azure

We're thrilled to announce the preview of DiskANN, a leading vector indexing algorithm, on Azure Database for PostgreSQL - Flexible Server! Developed by Microsoft Research and used extensively at Microsoft in global services such as Bing and Microsoft 365, DiskANN enables developers to build highly accurate, performant and scalable Generative AI applications surpassing pgvector’s HNSW and IVFFlat in both latency and accuracy. DiskANN also overcomes a long-standing limitation of pgvector in filtered vector search, where it occasionally returns incorrect results.

DiskANN_demo_for_blog.mp4

Sample Website. This sample application show a sample AirBNB dataset search page:

  • It illustrate the improved recall of using DiskANN vs using HNSW.
  • When a filter is apply you will notice HNSW index doesn't return the same amount of results as DiskANN or No Index.

Table of Content

Documentation

Read more on how to use DiskANN in the Microsoft Docs: Docs

Check out the blog: Blog

Getting started

First step will be to setup the data on the new Postgres Database you just created.

Make sure the following tools are installed:

Enroll in the pg_diskann Preview Feature

Follow Microsoft documentation for enrolling in DiskANN preview

Enable pg_diskann extension

Follow Microsoft documentation for enabling DiskANN

Setup Seattle AirBnb Data and test DiskANN

This demo app will show you how DiskANN Index works better that HNSW.

0. Set up Git Repo and Update your .env file

  1. In a Shell prompt, enter the following to clone the GitHub repo containing exercise resources:

        git clone https://github.com/Azure-Samples/DiskANN-demo.git
    

    Enter the repo directory

    cd DiskANN-demo
    
  2. Update your .env with the following:

    AZURE_OPENAI_API_KEY ="***sample***key****"
    AZURE_OPENAI_ENDPOINT="https://sample.openai.azure.com/"
    AZURE_PG_CONNECTION="dbname={DB_NAME} host={HOST} port=5432 sslmode=require user={USER_NAME} password={PASSWORD}"

1. Set up Data

Connect to the database and run commands in the file setup/sql-scripts/1_PGAI-Demo_setup.sql in psql or your favorite Postgres Editor

NOTE: This command has to be run from the root directory of the repo. In order to correctly locate the setup/sql-scripts/1_PGAI-Demo_setup.sql file.

If running from psql:

\i setup/sql-scripts/1_PGAI-Demo_setup.sql

2. Set up OpenAI endpoint, embed data and create indexes

Run commands in the file setup/sql-scripts/2_PGAI-Demo_endpoint_and_embedding_config.sql in psql or your favorite Postgres Editor

NOTE: You will need to update Line 21-22, with out OpenAI credentials.

select azure_ai.set_setting('azure_openai.endpoint', '');
select azure_ai.set_setting('azure_openai.subscription_key', '');

If running from psql:

\i setup/sql-scripts/2_PGAI-Demo_endpoint_and_embedding_config.sql

TIP: This will take around 2-5 minutes to run.

3. Test out sample vector queries

Run commands in the file setup/sql-scripts/3_PGAI-Demo_pgai_queries.sql in psql or your favorite Postgres Editor

If running from psql:

\i setup/sql-scripts/3_PGAI-Demo_pgai_queries.sql

Build Sample Application Locally

Setting up the environment file

Since the local app uses OpenAI models, you should first deploy it for the optimal experience.

  1. Copy .env.sample into a .env file.
  2. To use Azure OpenAI, fill in the values of AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY based on the deployed values.
  3. Fill in the connection string value for AZURE_PG_CONNECTION, You can find this in the Azure Portal

Install dependencies

Install required Python packages and streamlit application:

python3 -m venv .diskann
source .diskann/bin/activate
pip install -r requirements.txt

Running the application

From root directory

cd src/app
streamlit run app.py

When run locally run looking for website at http://localhost:8501/

Explore Indexes with Python Notebook

Explore Python Notebook