You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An Open-Source Conversational AI - Intelligence App built on Snowflake Cloud Environment
15
+
16
+
## Overview
17
+
18
+
The **Hyperledger Labs AIFAQ Prototype** is an open-source conversational intelligence system designed to deliver accurate, context-aware answers from enterprise documentation, technical references, and organizational knowledge bases. It integrates the governance strengths of Hyperledger with the scalability of **Snowflake** and the flexibility of **open-source LLMs** to create a secure, multi-tenant production grade enterprise knowledge assistant.
19
+
20
+
The prototype demonstrates a complete pipeline for ingesting, embedding, storing, and querying documents using Snowflake’s native capabilities and external AI inference. It supports open models such as **Llama**, **Mistral**, and **Snowflake Arctic** etc, offering a modular architecture suitable for production-grade deployments.
21
+
22
+
## Features
23
+
24
+
-**Multi-User Authentication**
25
+
Secure login and strict data isolation across document sets and chat histories.
26
+
27
+
-**Hybrid LLM Support**
28
+
Route queries to Snowflake Cortex or external OpenSource LLM models through secure external functions.
29
+
30
+
-**Multi-Document Knowledge Retrieval**
31
+
Supports structred and unstructred data.
32
+
33
+
-**Persistent Chat Sessions**
34
+
Full session history stored in Snowflake with easy retrieval.
35
+
36
+
-**Streamlit Frontend**
37
+
Intuitive UI for uploading documents, interacting with the assistant, and browsing past conversations.
38
+
39
+
-**Snowflake Vector Search**
40
+
High-performance similarity search using Cortex Vector Search and SQL APIs inside the snowflake cloud environment.
41
+
42
+
-**Automated Pipelines**
43
+
Re-embedding and re-indexing triggered by Snowflake Streams and Tasks when documents update.
21
44
22
-
The **Hyperledger Labs AIFAQ Prototype** is an open-source conversational AI tool designed to answer questions from technical documentation, FAQs, and internal knowledge bases with high accuracy and context awareness. This implementation of AIFAQ integrates deeply with **Snowflake**, providing secure multi-user support, persistent chat history, and access to powerful LLMs like OpenAI, Anthropic, and Snowflake Cortex.
45
+
-**Enterprise Governance**
46
+
RBAC, row-level security, and masking policies ensure protected data access.
A more advanced, optimized version improving modularity and performance.
45
108
46
-
1. Flexible Document Ingestion: AIFAQ supports various source formats (PDFs, HTML, YouTube transcripts, etc.) ingested into Snowflake via external tables, raw storage, and pipelines using tools like Snowpipe and Lambda-based metadata extractors.
47
-
2. Preprocessing & Embedding: Documents are chunked using Snowpark UDFs and embedded using LLM-based models. Embedding vectors are stored in Snowflake, forming the searchable knowledge base alongside metadata.
48
-
3. Access Control & Governance: Fine-grained access is enforced through Snowflake's role-based permissions, row-level security, and data masking policies to protect sensitive content.
49
-
4. LLM Query Augmentation & Retrieval: User queries are augmented with context by retrieving relevant chunks from the vector database (via Cortex Vector Search or SQL API), then sent to external LLMs (OpenAI, Anthropic) for response generation.
50
-
5. Automation & Monitoring: Updates to documents automatically re-trigger embedding pipelines using Snowflake Streams and Tasks, while monitoring tools like Snoopy and event notifications ensure system observability and orchestration.
109
+
Includes:
110
+
- Refined RAG pipeline ( Improved data ingestion pipeline)
111
+
- Cortex Vector Search & utilities
112
+
- Advanced version of Role Base Access Control (RBAC)
113
+
- Enhanced logging/observability
114
+
- Stronger multi-tenant isolation
115
+
- Updated Streamlit interface
116
+
- Multi Cloud ingestion
51
117
52
118
---
53
-
## 📝 Setup Instructions (Snowflake Branch)
54
-
Follow these steps to configure your Snowflake environment using the provided `setup.sql` script.
55
-
56
-
1. Set up a role for the chatbot and grant access to required resources:
57
-
58
-
```
59
-
CREATE OR REPLACE ROLE chatbot_user;
60
-
61
-
GRANT USAGE ON WAREHOUSE compute_wh TO ROLE chatbot_user;
62
-
GRANT USAGE ON DATABASE llm_chatbot TO ROLE chatbot_user;
63
-
64
-
```
65
-
2. Initialize the database and schema for storing documents and chat data:
66
-
67
-
```
68
-
CREATE OR REPLACE DATABASE llm_chatbot;
69
-
CREATE OR REPLACE SCHEMA chatbot;
70
-
USE SCHEMA llm_chatbot.chatbot;
71
-
72
-
```
73
-
3. Create two core tables, one for document chunks and another for chat history:
74
-
75
-
```
76
-
CREATE OR REPLACE TABLE documents (
77
-
user_id STRING,
78
-
doc_id STRING,
79
-
doc_name STRING,
80
-
chunk_id STRING,
81
-
chunk_text STRING,
82
-
embedding VECTOR(FLOAT, 1536)
83
-
);
84
-
85
-
CREATE OR REPLACE TABLE chat_history (
86
-
user_id STRING,
87
-
session_id STRING,
88
-
doc_id STRING,
89
-
turn INT,
90
-
user_input STRING,
91
-
bot_response STRING,
92
-
timestamp TIMESTAMP
93
-
);
94
-
```
95
-
4. External Function – OpenAI: Create an external function to call OpenAI's API:
96
-
97
-
```
98
-
CREATE OR REPLACE EXTERNAL FUNCTION openai_complete(prompt STRING)
99
-
RETURNS STRING
100
-
API_INTEGRATION = my_api_integration
101
-
HEADERS = (
102
-
"Authorization" = 'Bearer <OPENAI_API_KEY>',
103
-
"Content-Type" = 'application/json'
104
-
)
105
-
URL = 'https://api.openai.com/v1/completions'
106
-
POST_BODY = '{
107
-
"model": "gpt-3.5-turbo-instruct",
108
-
"prompt": "' || prompt || '",
109
-
"max_tokens": 200
110
-
}';
111
-
112
-
```
113
-
> Replace <OPENAI_API_KEY> with your actual OpenAI API key.
114
-
115
-
5. External Function – Anthropic: Similarly, set up a function to call Anthropic's Claude model:
116
-
117
-
```
118
-
CREATE OR REPLACE EXTERNAL FUNCTION anthropic_complete(prompt STRING)
119
-
RETURNS STRING
120
-
API_INTEGRATION = my_api_integration
121
-
HEADERS = (
122
-
"x-api-key" = '<ANTHROPIC_API_KEY>',
123
-
"Content-Type" = 'application/json'
124
-
)
125
-
URL = 'https://api.anthropic.com/v1/complete'
126
-
POST_BODY = '{
127
-
"model": "claude-3-opus-20240229",
128
-
"prompt": "Human: ' || prompt || '\nAssistant:",
129
-
"max_tokens": 200
130
-
}';
131
-
132
-
```
133
-
> Replace <ANTHROPIC_API_KEY> with your actual key.
134
-
135
-
6. Deploy the chatbot interface using the Streamlit app stored in your project:
Join our weekly public calls every Monday! See the [Hyperledger Labs Calendar](https://wiki.hyperledger.org/display/HYP/Calendar+of+Public+Meetings) for details.
0 commit comments