Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Optional: LLM Model
LLM_MODEL=gpt-3.5-turbo

# Optional: Logging Level
LOG_LEVEL=INFO

# Optional: Vector Store Path
VECTOR_STORE_PATH=./vector_store

# Optional: Embedding Model
EMBEDDING_MODEL=all-MiniLM-L6-v2
48 changes: 48 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
\# Contributing to GA4GH-RegBot



Thank you for your interest in contributing to GA4GH-RegBot!



\## Getting Started



1\. \*\*Fork the repository\*\* on GitHub

2\. \*\*Clone your fork\*\* locally

3\. \*\*Create a feature branch\*\* for your changes

4\. \*\*Follow the development workflow\*\* below

5\. \*\*Submit a pull request\*\* with a clear description



\## Development Workflow

Comment on lines +9 to +26
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTRIBUTING.md currently covers forking/cloning but doesn’t yet include the development workflow details mentioned in the PR description/issue (#17) (branch naming, commit conventions, lint/format tools, code style expectations, PR process, etc.). Either expand this document accordingly or adjust the PR description/scope so expectations match what’s delivered.

Copilot uses AI. Check for mistakes.


\### 1. Fork and Clone



```bash

\# Fork on GitHub, then clone your fork

Comment on lines +1 to +36
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the other docs, headings and comments are escaped (e.g., \#, \##, \###, and \# Fork on GitHub...) which will render incorrectly in Markdown and in the fenced code block. Remove the leading backslashes so both the Markdown and shell comments display as intended.

Copilot uses AI. Check for mistakes.
git clone https://github.com/YOUR-USERNAME/GA4GH-RegBot.git

cd GA4GH-RegBot



\# Add upstream remote for staying updated

git remote add upstream https://github.com/ga4gh/GA4GH-RegBot.git



45 changes: 32 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,40 @@
GA4GH-RegBot: Compliance Assistant
Status: Proposal Stage for GSoC 2026
\# GA4GH-RegBot: Compliance Assistant

Overview
RegBot is an LLM-powered tool designed to help researchers map their consent forms against GA4GH regulatory frameworks. It uses RAG (Retrieval-Augmented Generation) to flag compliance gaps automatically.

Architecture (Planned)
Core: Python

LLM Framework: LangChain / LlamaIndex
\*\*Status:\*\* Proposal Stage for GSoC 2026

Vector Store: ChromaDB / FAISS

UI: Streamlit

Roadmap
Phase 1: Ingest GA4GH "Framework for Responsible Sharing" policy documents.
GA4GH-RegBot is an LLM-powered tool designed to help researchers map their consent forms against GA4GH regulatory frameworks. It uses RAG (Retrieval-Augmented Generation) to flag compliance gaps automatically.



\## Quick Start (5 minutes)
Comment on lines +9 to +13
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README update adds a Quick Start but does not include the “project structure”/architecture overview referenced in the PR description and issue #17 acceptance criteria, nor a link to the full setup guide. Consider adding a short “Project Structure” section (even a brief tree) and a link to SETUP.md near the Quick Start so the README matches the stated goals.

Copilot uses AI. Check for mistakes.


Comment on lines +1 to +15
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README markdown is also escaped (e.g., \#, \*\*Status\*\*, \##), which will prevent proper rendering on GitHub. Remove the backslashes so the README displays correctly.

Copilot uses AI. Check for mistakes.

```bash

git clone https://github.com/ga4gh/GA4GH-RegBot.git

cd GA4GH-RegBot

python -m venv venv

venv\\Scripts\\activate # Windows

source venv/bin/activate # Mac/Linux

pip install -r requirements.txt

copy .env.example .env # Windows

cp .env.example .env # Mac/Linux

\# Edit .env with your OpenAI API key

python src/main.py


Phase 2: Build RAG pipeline for clause extraction.

Phase 3: Develop Streamlit frontend for user uploads.
60 changes: 60 additions & 0 deletions SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
\# GA4GH-RegBot: Development Setup Guide



Welcome! This guide will help you set up GA4GH-RegBot for local development and testing.



\## Prerequisites



\- \*\*Python 3.8+\*\*
Comment on lines +1 to +13
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown headings/lists are escaped (e.g., \#, \##, \-) so they will render as literal backslashes instead of proper Markdown formatting. Please remove the backslashes so headings and bullets render correctly.

Copilot uses AI. Check for mistakes.

\- \*\*pip\*\* (comes with Python)

\- \*\*Git\*\*



\## Quick Start (5 minutes)



```bash

\# 1. Clone the repository

git clone https://github.com/ga4gh/GA4GH-RegBot.git

cd GA4GH-RegBot



\# 2. Create a virtual environment

python -m venv venv

venv\\Scripts\\activate # Windows



\# 3. Install dependencies

pip install -r requirements.txt



\# 4. Set up environment

copy .env.example .env

Comment on lines +35 to +52
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Quick Start only includes Windows-specific environment activation (venv\\Scripts\\activate) and .env creation (copy ...). Since this guide is meant for general onboarding, add the Mac/Linux equivalents (source venv/bin/activate and cp .env.example .env) or clearly label the section as Windows-only.

Copilot uses AI. Check for mistakes.


\# 5. Verify

python -c "import langchain; import chromadb; print('OK')"



22 changes: 21 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,29 @@
# LangChain: LLM orchestration and RAG framework
langchain==0.1.0

# LangChain community integrations
langchain-community==0.0.10

# ChromaDB: Vector database for semantic search
chromadb==0.4.22

# OpenAI: LLM provider
openai==1.7.0

# Streamlit: Web UI framework (for Phase 3)
streamlit==1.30.0
pypdf==3.17.4

# Python-dotenv: Environment variable management
python-dotenv==1.0.0

# Tiktoken: Token counting for LLM
tiktoken==0.5.2

# Sentence-Transformers: Embedding models
sentence-transformers==2.2.2

# PyPDF: PDF document processing
PyPDF==3.17.1

# Pydantic: Data validation
pydantic==2.5.0
Loading