-
Notifications
You must be signed in to change notification settings - Fork 11
[docs] Add development setup guide and contribution guidelines #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # OpenAI Configuration | ||
| OPENAI_API_KEY=your_openai_api_key_here | ||
|
|
||
| # Optional: LLM Model | ||
| LLM_MODEL=gpt-3.5-turbo | ||
|
|
||
| # Optional: Logging Level | ||
| LOG_LEVEL=INFO | ||
|
|
||
| # Optional: Vector Store Path | ||
| VECTOR_STORE_PATH=./vector_store | ||
|
|
||
| # Optional: Embedding Model | ||
| EMBEDDING_MODEL=all-MiniLM-L6-v2 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| \# Contributing to GA4GH-RegBot | ||
|
|
||
|
|
||
|
|
||
| Thank you for your interest in contributing to GA4GH-RegBot! | ||
|
|
||
|
|
||
|
|
||
| \## Getting Started | ||
|
|
||
|
|
||
|
|
||
| 1\. \*\*Fork the repository\*\* on GitHub | ||
|
|
||
| 2\. \*\*Clone your fork\*\* locally | ||
|
|
||
| 3\. \*\*Create a feature branch\*\* for your changes | ||
|
|
||
| 4\. \*\*Follow the development workflow\*\* below | ||
|
|
||
| 5\. \*\*Submit a pull request\*\* with a clear description | ||
|
|
||
|
|
||
|
|
||
| \## Development Workflow | ||
|
|
||
|
|
||
|
|
||
| \### 1. Fork and Clone | ||
|
|
||
|
|
||
|
|
||
| ```bash | ||
|
|
||
| \# Fork on GitHub, then clone your fork | ||
|
|
||
|
Comment on lines
+1
to
+36
|
||
| git clone https://github.com/YOUR-USERNAME/GA4GH-RegBot.git | ||
|
|
||
| cd GA4GH-RegBot | ||
|
|
||
|
|
||
|
|
||
| \# Add upstream remote for staying updated | ||
|
|
||
| git remote add upstream https://github.com/ga4gh/GA4GH-RegBot.git | ||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,21 +1,40 @@ | ||
| GA4GH-RegBot: Compliance Assistant | ||
| Status: Proposal Stage for GSoC 2026 | ||
| \# GA4GH-RegBot: Compliance Assistant | ||
|
|
||
| Overview | ||
| RegBot is an LLM-powered tool designed to help researchers map their consent forms against GA4GH regulatory frameworks. It uses RAG (Retrieval-Augmented Generation) to flag compliance gaps automatically. | ||
|
|
||
| Architecture (Planned) | ||
| Core: Python | ||
|
|
||
| LLM Framework: LangChain / LlamaIndex | ||
| \*\*Status:\*\* Proposal Stage for GSoC 2026 | ||
|
|
||
| Vector Store: ChromaDB / FAISS | ||
|
|
||
| UI: Streamlit | ||
|
|
||
| Roadmap | ||
| Phase 1: Ingest GA4GH "Framework for Responsible Sharing" policy documents. | ||
| GA4GH-RegBot is an LLM-powered tool designed to help researchers map their consent forms against GA4GH regulatory frameworks. It uses RAG (Retrieval-Augmented Generation) to flag compliance gaps automatically. | ||
|
|
||
|
|
||
|
|
||
| \## Quick Start (5 minutes) | ||
|
Comment on lines
+9
to
+13
|
||
|
|
||
|
|
||
|
Comment on lines
+1
to
+15
|
||
|
|
||
| ```bash | ||
|
|
||
| git clone https://github.com/ga4gh/GA4GH-RegBot.git | ||
|
|
||
| cd GA4GH-RegBot | ||
|
|
||
| python -m venv venv | ||
|
|
||
| venv\\Scripts\\activate # Windows | ||
|
|
||
| source venv/bin/activate # Mac/Linux | ||
|
|
||
| pip install -r requirements.txt | ||
|
|
||
| copy .env.example .env # Windows | ||
|
|
||
| cp .env.example .env # Mac/Linux | ||
|
|
||
| \# Edit .env with your OpenAI API key | ||
|
|
||
| python src/main.py | ||
|
|
||
|
|
||
| Phase 2: Build RAG pipeline for clause extraction. | ||
|
|
||
| Phase 3: Develop Streamlit frontend for user uploads. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| \# GA4GH-RegBot: Development Setup Guide | ||
|
|
||
|
|
||
|
|
||
| Welcome! This guide will help you set up GA4GH-RegBot for local development and testing. | ||
|
|
||
|
|
||
|
|
||
| \## Prerequisites | ||
|
|
||
|
|
||
|
|
||
| \- \*\*Python 3.8+\*\* | ||
|
Comment on lines
+1
to
+13
|
||
|
|
||
| \- \*\*pip\*\* (comes with Python) | ||
|
|
||
| \- \*\*Git\*\* | ||
|
|
||
|
|
||
|
|
||
| \## Quick Start (5 minutes) | ||
|
|
||
|
|
||
|
|
||
| ```bash | ||
|
|
||
| \# 1. Clone the repository | ||
|
|
||
| git clone https://github.com/ga4gh/GA4GH-RegBot.git | ||
|
|
||
| cd GA4GH-RegBot | ||
|
|
||
|
|
||
|
|
||
| \# 2. Create a virtual environment | ||
|
|
||
| python -m venv venv | ||
|
|
||
| venv\\Scripts\\activate # Windows | ||
|
|
||
|
|
||
|
|
||
| \# 3. Install dependencies | ||
|
|
||
| pip install -r requirements.txt | ||
|
|
||
|
|
||
|
|
||
| \# 4. Set up environment | ||
|
|
||
| copy .env.example .env | ||
|
|
||
|
Comment on lines
+35
to
+52
|
||
|
|
||
|
|
||
| \# 5. Verify | ||
|
|
||
| python -c "import langchain; import chromadb; print('OK')" | ||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,29 @@ | ||
| # LangChain: LLM orchestration and RAG framework | ||
| langchain==0.1.0 | ||
|
|
||
| # LangChain community integrations | ||
| langchain-community==0.0.10 | ||
|
|
||
| # ChromaDB: Vector database for semantic search | ||
| chromadb==0.4.22 | ||
|
|
||
| # OpenAI: LLM provider | ||
| openai==1.7.0 | ||
|
|
||
| # Streamlit: Web UI framework (for Phase 3) | ||
| streamlit==1.30.0 | ||
| pypdf==3.17.4 | ||
|
|
||
| # Python-dotenv: Environment variable management | ||
| python-dotenv==1.0.0 | ||
|
|
||
| # Tiktoken: Token counting for LLM | ||
| tiktoken==0.5.2 | ||
|
|
||
| # Sentence-Transformers: Embedding models | ||
| sentence-transformers==2.2.2 | ||
|
|
||
| # PyPDF: PDF document processing | ||
| PyPDF==3.17.1 | ||
|
|
||
| # Pydantic: Data validation | ||
| pydantic==2.5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONTRIBUTING.mdcurrently covers forking/cloning but doesn’t yet include the development workflow details mentioned in the PR description/issue (#17) (branch naming, commit conventions, lint/format tools, code style expectations, PR process, etc.). Either expand this document accordingly or adjust the PR description/scope so expectations match what’s delivered.