Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 2 additions & 17 deletions .github/workflows/main.yml → .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,6 @@ on:
- "**"

jobs:
format-frontend:
name: 'Format Frontend'
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v4

- name: Prettify code
uses: creyD/prettier_action@v4.3
with:
prettier_options: --write src/frontend/**/*.{ts,tsx}

- name: Post Formatting Check
run: git diff --exit-code

format-core:
name: 'Format Core'
runs-on: ubuntu-latest
Expand All @@ -49,7 +34,7 @@ jobs:

- name: Format Backend
run: |
black src/core
black version_1/ version_2/

- name: Post Formatting Check
run: git diff --exit-code
run: git diff --exit-code
214 changes: 96 additions & 118 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Snowflake Branch: Hyperledger Labs AIFAQ prototype

![Hyperledger Labs](https://img.shields.io/badge/Hyperledger-Labs-blue?logo=hyperledger)
![Apache License 2.0](https://img.shields.io/badge/license-Apache%202.0-green.svg)
Expand All @@ -7,140 +6,118 @@

[![GitHub Stars](https://img.shields.io/github/stars/hyperledger-labs/aifaq?style=social)](https://github.com/hyperledger-labs/aifaq/stargazers)
[![GitHub Forks](https://img.shields.io/github/forks/hyperledger-labs/aifaq?style=social)](https://github.com/hyperledger-labs/aifaq/network/members)
[![Language Stats](https://img.shields.io/github/languages/top/hyperledger-labs/aifaq)](https://github.com/hyperledger-labs/aifaq)
[![Issues](https://img.shields.io/github/issues/hyperledger-labs/aifaq)](https://github.com/hyperledger-labs/aifaq/issues)
[![Pull Requests](https://img.shields.io/github/issues-pr/hyperledger-labs/aifaq)](https://github.com/hyperledger-labs/aifaq/pulls)

![Language Stats](https://img.shields.io/github/languages/count/hyperledger-labs/aifaq)
![Python](https://img.shields.io/badge/Python-85%25-blue?logo=python)
![HTML](https://img.shields.io/badge/HTML-10%25-orange?logo=html5)
![Other](https://img.shields.io/badge/Others-5%25-lightgrey?logo=github)

---
## 🚀 Overview
# Hyperledger Labs AIFAQ Prototype in Snowflake
An Open-Source Conversational AI - Intelligence App built on Snowflake Cloud Environment

## Overview

The **Hyperledger Labs AIFAQ Prototype** is an open-source conversational intelligence system designed to deliver accurate, context-aware answers from enterprise documentation, technical references, and organizational knowledge bases. It integrates the governance strengths of Hyperledger with the scalability of **Snowflake** and the flexibility of **open-source LLMs** to create a secure, multi-tenant production grade enterprise knowledge assistant.

The prototype demonstrates a complete pipeline for ingesting, embedding, storing, and querying documents using Snowflake’s native capabilities and external AI inference. It supports open models such as **Llama**, **Mistral**, and **Snowflake Arctic** etc, offering a modular architecture suitable for production-grade deployments.

## Features

- **Multi-User Authentication**
Secure login and strict data isolation across document sets and chat histories.

- **Hybrid LLM Support**
Route queries to Snowflake Cortex or external OpenSource LLM models through secure external functions.

- **Multi-Document Knowledge Retrieval**
Supports structred and unstructred data.
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "structred" should be "structured".

Suggested change
Supports structred and unstructred data.
Supports structured and unstructured data.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "unstructred" should be "unstructured".

Suggested change
Supports structred and unstructred data.
Supports structured and unstructured data.

Copilot uses AI. Check for mistakes.

- **Persistent Chat Sessions**
Full session history stored in Snowflake with easy retrieval.

- **Streamlit Frontend**
Intuitive UI for uploading documents, interacting with the assistant, and browsing past conversations.

- **Snowflake Vector Search**
High-performance similarity search using Cortex Vector Search and SQL APIs inside the snowflake cloud environment.

- **Automated Pipelines**
Re-embedding and re-indexing triggered by Snowflake Streams and Tasks when documents update.

The **Hyperledger Labs AIFAQ Prototype** is an open-source conversational AI tool designed to answer questions from technical documentation, FAQs, and internal knowledge bases with high accuracy and context awareness. This implementation of AIFAQ integrates deeply with **Snowflake**, providing secure multi-user support, persistent chat history, and access to powerful LLMs like OpenAI, Anthropic, and Snowflake Cortex.
- **Enterprise Governance**
RBAC, row-level security, and masking policies ensure protected data access.

👉 Official Wiki Pages:

- [Hyperledger Labs Wiki](https://lf-hyperledger.atlassian.net/wiki/spaces/labs/pages/20290949/AI+FAQ+2025)
## Architecture

👉 Weekly Community Calls:
### 1. Ingestion Layer
- Accepts structured and unstructured formats including PDFs, HTML, plain text, and transcripts.
- Uses Snowflake external tables, stages, Snowpipe, or cloud functions to store and extract metadata.
- All raw inputs move through well-defined staging schemas.

- Every Monday (public) — join via [Hyperledger Labs Calendar](https://wiki.hyperledger.org/display/HYP/Calendar+of+Public+Meetings).
### 2. Preprocessing & Embedding
- Snowpark UDFs handle chunking, cleaning, and tokenization.
- Embeddings generated using Cortex or external open-source models.
- Metadata and embedding vectors stored inside Snowflake as the unified knowledge base.

### 3. Access Control & Security
- Document and chat isolation enforced via Snowflake roles.
- Row-level security restricts user visibility to their own data.
- Sensitive fields are masked using policy-based governance.

### 4. Retrieval-Augmented Generation (RAG)
- User query → vector search → relevant context retrieval → model response.
- Hybrid routing selects the best LLM based on context given by the user preference.
- Ensures responses are grounded in user-provided documentation.

### 5. Automation & Observability
- Snowflake Streams detect document changes.
- Tasks automate reprocessing and embedding updates.
- Monitoring through Snoopy and event notifications for operational visibility.

---
## Features

- User Authentication: Secure, multi-user access with isolated document and chat histories
- LLM Integration: Seamless access to Cortex, OpenAI, and Anthropic models via Snowflake external functions
- Multi-Document Support: Upload and query multiple documents per session
- Persistent Chat History: Retrieve and continue conversations across sessions
- Streamlit UI: Intuitive document upload and chat interface
## Getting Started

1. Choose the appropriate implementation folder:
- **version_1** for stable production deployment
- **version_2** for advanced workflows with Multi cloud data ingestion

2. Follow the README inside the selected folder to set up:
- Warehouses
- Stages and schemas
- Pipelines
- External LLM functions
- Streamlit deployment
---

## Folder Descriptions

### `version_1/`
A simplified demonstration build intended for quick snowflake evaluation adn to get hands on for beginners.
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "adn" should be "and".

Suggested change
A simplified demonstration build intended for quick snowflake evaluation adn to get hands on for beginners.
A simplified demonstration build intended for quick snowflake evaluation and to get hands on for beginners.

Copilot uses AI. Check for mistakes.

Includes:
- Lightweight ingestion + embedding flow
- Basic Streamlit UI
- Environment dependencies
- Minimal Snowflake setup scripts
- Ingestion and RAG flows

---
## 🛠️ Architecture

![Snowflake integration architecture](./images/snowflake-architecture.png)
### `version_2/`
A more advanced, optimized version improving modularity and performance.

1. Flexible Document Ingestion: AIFAQ supports various source formats (PDFs, HTML, YouTube transcripts, etc.) ingested into Snowflake via external tables, raw storage, and pipelines using tools like Snowpipe and Lambda-based metadata extractors.
2. Preprocessing & Embedding: Documents are chunked using Snowpark UDFs and embedded using LLM-based models. Embedding vectors are stored in Snowflake, forming the searchable knowledge base alongside metadata.
3. Access Control & Governance: Fine-grained access is enforced through Snowflake's role-based permissions, row-level security, and data masking policies to protect sensitive content.
4. LLM Query Augmentation & Retrieval: User queries are augmented with context by retrieving relevant chunks from the vector database (via Cortex Vector Search or SQL API), then sent to external LLMs (OpenAI, Anthropic) for response generation.
5. Automation & Monitoring: Updates to documents automatically re-trigger embedding pipelines using Snowflake Streams and Tasks, while monitoring tools like Snoopy and event notifications ensure system observability and orchestration.
Includes:
- Refined RAG pipeline ( Improved data ingestion pipeline)
- Cortex Vector Search & utilities
- Advanced version of Role Base Access Control (RBAC)
- Enhanced logging/observability
- Stronger multi-tenant isolation
- Updated Streamlit interface
- Multi Cloud ingestion

---
## 📝 Setup Instructions (Snowflake Branch)
Follow these steps to configure your Snowflake environment using the provided `setup.sql` script.

1. Set up a role for the chatbot and grant access to required resources:

```
CREATE OR REPLACE ROLE chatbot_user;

GRANT USAGE ON WAREHOUSE compute_wh TO ROLE chatbot_user;
GRANT USAGE ON DATABASE llm_chatbot TO ROLE chatbot_user;

```
2. Initialize the database and schema for storing documents and chat data:

```
CREATE OR REPLACE DATABASE llm_chatbot;
CREATE OR REPLACE SCHEMA chatbot;
USE SCHEMA llm_chatbot.chatbot;

```
3. Create two core tables, one for document chunks and another for chat history:

```
CREATE OR REPLACE TABLE documents (
user_id STRING,
doc_id STRING,
doc_name STRING,
chunk_id STRING,
chunk_text STRING,
embedding VECTOR(FLOAT, 1536)
);

CREATE OR REPLACE TABLE chat_history (
user_id STRING,
session_id STRING,
doc_id STRING,
turn INT,
user_input STRING,
bot_response STRING,
timestamp TIMESTAMP
);
```
4. External Function – OpenAI: Create an external function to call OpenAI's API:

```
CREATE OR REPLACE EXTERNAL FUNCTION openai_complete(prompt STRING)
RETURNS STRING
API_INTEGRATION = my_api_integration
HEADERS = (
"Authorization" = 'Bearer <OPENAI_API_KEY>',
"Content-Type" = 'application/json'
)
URL = 'https://api.openai.com/v1/completions'
POST_BODY = '{
"model": "gpt-3.5-turbo-instruct",
"prompt": "' || prompt || '",
"max_tokens": 200
}';

```
> Replace <OPENAI_API_KEY> with your actual OpenAI API key.

5. External Function – Anthropic: Similarly, set up a function to call Anthropic's Claude model:

```
CREATE OR REPLACE EXTERNAL FUNCTION anthropic_complete(prompt STRING)
RETURNS STRING
API_INTEGRATION = my_api_integration
HEADERS = (
"x-api-key" = '<ANTHROPIC_API_KEY>',
"Content-Type" = 'application/json'
)
URL = 'https://api.anthropic.com/v1/complete'
POST_BODY = '{
"model": "claude-3-opus-20240229",
"prompt": "Human: ' || prompt || '\nAssistant:",
"max_tokens": 200
}';

```
> Replace <ANTHROPIC_API_KEY> with your actual key.

6. Deploy the chatbot interface using the Streamlit app stored in your project:

```
CREATE OR REPLACE STREAMLIT chatbot_ui
FROM '/chatbot_app'
MAIN_FILE = '/app.py';
```

---



## 🌐 Open Source License

Expand All @@ -159,8 +136,9 @@ We welcome contributions! Please check our [CONTRIBUTING](./docs/CONTRIBUTING.md
Join our weekly public calls every Monday! See the [Hyperledger Labs Calendar](https://wiki.hyperledger.org/display/HYP/Calendar+of+Public+Meetings) for details.


## 📢 Stay Connected
## Stay Connected

- [Slack Discussions](https://join.slack.com/t/aifaqworkspace/shared_invite/zt-337k74jsl-tvH_4ct3zLj99dvZaf9nZw)
- [Hyperledger Labs Community](https://lf-hyperledger.atlassian.net/wiki/spaces/labs/pages/20290949/AI+FAQ+2025)
- Official Website: [aifaq.pro](https://aifaq.pro)
- Official Wiki Pages: [Hyperledger Labs Wiki](https://lf-hyperledger.atlassian.net/wiki/spaces/labs/pages/20290949/AI+FAQ+2025)
Binary file removed images/activate_gpu.png
Binary file not shown.
Binary file removed images/compress_files.png
Binary file not shown.
Binary file removed images/copy_paste_code.png
Binary file not shown.
Binary file removed images/curl_results.png
Binary file not shown.
Binary file removed images/move_command.png
Binary file not shown.
Binary file removed images/new_studio.png
Binary file not shown.
Binary file removed images/open_terminal.png
Binary file not shown.
Binary file removed images/prototype_schema_v1.drawio.png
Binary file not shown.
Binary file removed images/remove_command.png
Binary file not shown.
Binary file removed images/rename_studio.png
Binary file not shown.
Binary file removed images/run_api.png
Binary file not shown.
Binary file removed images/run_ingest.png
Binary file not shown.
Binary file removed images/select_L4.png
Binary file not shown.
Binary file removed images/snowflake-architecture.png
Binary file not shown.
Binary file removed images/studio_code.png
Binary file not shown.
Binary file removed images/wget_rtdocs.png
Binary file not shown.
44 changes: 0 additions & 44 deletions src/Dockerfile

This file was deleted.

30 changes: 0 additions & 30 deletions src/Readme.md

This file was deleted.

Loading