Skip to content

Commit b4aa633

Browse files
authored
Merge pull request #1 from HBClab/dev
Dev
2 parents 18c490f + 3b6df12 commit b4aa633

File tree

2,457 files changed

+104388
-4229
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,457 files changed

+104388
-4229
lines changed

.github/workflows/main.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: temp name
2+
3+
on:
4+
push:
5+
branches: [ "main", "dev" ]
6+
7+
jobs:
8+
build:
9+
runs-on: ubuntu-latest
10+
11+
steps:
12+
- uses: actions/checkout@v4
13+
- name: Set up Python 3.10
14+
uses: actions/setup-python@v3
15+
with:
16+
python-version: "3.10"
17+
- name: Install dependencies
18+
run: |
19+
python -m pip install --upgrade pip
20+
python -m pip install flake8 pytest
21+
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
22+
- name: Run Python script
23+
run: |
24+
python code/main_handler.py all
25+
- name: Commit and push changes to dev
26+
run: |
27+
git config --global user.name "GitHub Actions Bot"
28+
git config --global user.email "github-actions[bot]@users.noreply.github.com"
29+
git checkout dev
30+
git add .
31+
git commit -m "Automated changes from GitHub Actions"
32+
git push origin dev

README.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,86 @@
2828

2929

3030

31+
## Relational Database Design Summary for Clinical Trial Cognitive Data
3132

33+
>>Purpose & Scope
34+
• This database will organize and store clinical trial cognitive data.
35+
• Each participant completes 13 cognitive tasks over two runs each.
36+
• The data will be ingested daily from a prewritten backend.
37+
• The database will integrate with a frontend using Python and Azure.
38+
• Expected data volume: Hundreds to thousands of participants.
39+
40+
>>Core Entities & Relationships
41+
42+
1. Participants (participants)
43+
• Stores participant identifiers, their assigned study type (observation/intervention), and their site location.
44+
• Each participant completes 26 runs total (13 tasks × 2 runs).
45+
• Relationships:
46+
• Linked to sites (site_id)
47+
• Linked to study_types (study_id)
48+
• Has many runs
49+
50+
2. Study Types (study_types)
51+
• Defines whether a participant is in the Intervention or Observation group.
52+
53+
3. Sites (sites)
54+
• Stores the location each participant is from.
55+
• Explicitly defined in the directory structure.
56+
57+
4. Tasks (tasks)
58+
• Stores the 13 predefined tasks in a static table.
59+
60+
5. Runs (runs)
61+
• Stores each task run per participant (26 runs per participant).
62+
• Each run is linked to a participant and a task.
63+
• Can store a timestamp (nullable, extracted from CSVs).
64+
65+
6. Results (results)
66+
• Stores raw cognitive task data extracted from CSV files.
67+
• CSV contents will be stored directly in the database (not just file paths).
68+
• Linked to runs via run_id.
69+
70+
7. Reports (reports)
71+
• Stores 1-2 PNG files per run as binary blobs (not file paths).
72+
• Linked to runs via run_id.
73+
• Has a missing_png_flag to track if files are absent.
74+
75+
Constraints & Data Integrity
76+
• Primary Keys (PKs) & Foreign Keys (FKs):
77+
• participant_id → Primary key in participants
78+
• task_id → Primary key in tasks
79+
• run_id → Primary key in runs, foreign key links to participants & tasks
80+
• result_id → Primary key in results, foreign key links to runs
81+
• report_id → Primary key in reports, foreign key links to runs
82+
• Data Rules & Validation:
83+
• All 13 tasks must be associated with each participant (26 runs total).
84+
• missing_png_flag will track missing PNG files.
85+
• csv_data will be stored as structured data (likely JSON or table format).
86+
87+
>>Indexing & Optimization
88+
89+
• Indexes on:
90+
• participant_id (for quick retrieval of participant data)
91+
• task_id (for filtering task-based results)
92+
• study_id (for intervention vs. observation analysis)
93+
• site_id (for location-based analysis)
94+
• Storage Considerations:
95+
• CSV data stored as structured content (JSON or column format).
96+
• PNG files stored as binary blobs.
97+
• Query Optimization:
98+
• JOINs will be used for participant-level queries.
99+
• Materialized views can be considered for frequently used summaries.
100+
101+
>>Security & Access Control
102+
• Currently, only you will use the database, so permissions are simple.
103+
• Future security measures:
104+
• Row-level security for multiple users.
105+
• Encryption for sensitive participant records.
106+
107+
>>Backup & Recovery
108+
• Daily backups of database storage + binary files.
109+
• Azure Blob Storage or PostgreSQL Large Objects for efficient handling of PNG & CSV files.
110+
111+
Next Step: SQL Schema Implementation
112+
113+
Would you like the SQL schema to be written for PostgreSQL, MySQL, or another database system?

app/__pycache__/app.cpython-39.pyc

-2.04 KB
Binary file not shown.
-3.49 KB
Binary file not shown.
-1.61 KB
Binary file not shown.

app/app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ def serve_data_file(subpath):
3737

3838
def create_app():
3939
app = Flask(__name__)
40-
app.config['DATA_FOLDER'] = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', 'data', 'test'))
40+
app.config['DATA_FOLDER'] = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', 'data'))
4141
app.config['ALLOWED_EXTENSIONS'] = {'csv', 'txt', 'png'}
4242

4343
# Ensure the data folder exists

app/db.py

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
import os
2+
import psycopg
3+
from psycopg import sql
4+
import logging
5+
from main.update_db import DatabaseUtils
6+
7+
# Database connection setup
8+
def connect_to_db(db_name, user, password, host="localhost", port=5432):
9+
return psycopg.connect(dbname=db_name, user=user, password=password, host=host, port=port)
10+
11+
# Initialize database schema
12+
def initialize_schema(connection):
13+
try:
14+
with connection.cursor() as cursor:
15+
cursor.execute("""
16+
CREATE TABLE IF NOT EXISTS study (
17+
id SERIAL PRIMARY KEY,
18+
name VARCHAR(50) UNIQUE NOT NULL
19+
);
20+
21+
CREATE TABLE IF NOT EXISTS site (
22+
id SERIAL PRIMARY KEY,
23+
name VARCHAR(50) NOT NULL,
24+
study_id INT REFERENCES study(id) ON DELETE CASCADE
25+
);
26+
27+
CREATE TABLE IF NOT EXISTS subject (
28+
id SERIAL PRIMARY KEY,
29+
name VARCHAR(50) NOT NULL,
30+
site_id INT REFERENCES site(id) ON DELETE CASCADE
31+
);
32+
33+
CREATE TABLE IF NOT EXISTS task (
34+
id SERIAL PRIMARY KEY,
35+
name VARCHAR(50) NOT NULL,
36+
subject_id INT REFERENCES subject(id) ON DELETE CASCADE
37+
);
38+
39+
CREATE TABLE IF NOT EXISTS session (
40+
id SERIAL PRIMARY KEY,
41+
session_name VARCHAR(50) NOT NULL,
42+
category INT NOT NULL,
43+
csv_path TEXT,
44+
plot_paths TEXT[],
45+
task_id INT REFERENCES task(id) ON DELETE CASCADE,
46+
date TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
47+
);
48+
""")
49+
connection.commit()
50+
except Exception as e:
51+
logging.error(f"Error initializing schema: {e}")
52+
connection.rollback()
53+
54+
finally:
55+
if connection:
56+
connection.close()
57+
58+
# Populate the database from the folder structure
59+
def populate_database(connection, data_folder):
60+
for study_name in os.listdir(data_folder):
61+
study_path = os.path.join(data_folder, study_name)
62+
if not os.path.isdir(study_path):
63+
continue
64+
65+
with connection.cursor() as cursor:
66+
cursor.execute("INSERT INTO study (name) VALUES (%s) ON CONFLICT (name) DO NOTHING RETURNING id;", (study_name,))
67+
study_id = cursor.fetchone() or (cursor.execute("SELECT id FROM study WHERE name = %s;", (study_name,)), cursor.fetchone()[0])
68+
69+
for site_name in os.listdir(study_path):
70+
site_path = os.path.join(study_path, site_name)
71+
if not os.path.isdir(site_path):
72+
continue
73+
74+
with connection.cursor() as cursor:
75+
cursor.execute("INSERT INTO site (name, study_id) VALUES (%s, %s) ON CONFLICT DO NOTHING RETURNING id;", (site_name, study_id))
76+
site_id = cursor.fetchone() or (cursor.execute("SELECT id FROM site WHERE name = %s AND study_id = %s;", (site_name, study_id)), cursor.fetchone()[0])
77+
78+
for subject_name in os.listdir(site_path):
79+
subject_path = os.path.join(site_path, subject_name)
80+
if not os.path.isdir(subject_path):
81+
continue
82+
83+
with connection.cursor() as cursor:
84+
cursor.execute("INSERT INTO subject (name, site_id) VALUES (%s, %s) ON CONFLICT DO NOTHING RETURNING id;", (subject_name, site_id))
85+
subject_id = cursor.fetchone() or (cursor.execute("SELECT id FROM subject WHERE name = %s AND site_id = %s;", (subject_name, site_id)), cursor.fetchone()[0])
86+
87+
for task_name in os.listdir(subject_path):
88+
task_path = os.path.join(subject_path, task_name)
89+
if not os.path.isdir(task_path):
90+
continue
91+
92+
with connection.cursor() as cursor:
93+
cursor.execute("INSERT INTO task (name, subject_id) VALUES (%s, %s) ON CONFLICT DO NOTHING RETURNING id;", (task_name, subject_id))
94+
task_id = cursor.fetchone() or (cursor.execute("SELECT id FROM task WHERE name = %s AND subject_id = %s;", (task_name, subject_id)), cursor.fetchone()[0])
95+
96+
for folder in ["data", "plot"]:
97+
folder_path = os.path.join(task_path, folder)
98+
if not os.path.exists(folder_path):
99+
continue
100+
101+
if folder == "data":
102+
for file in os.listdir(folder_path):
103+
if file.endswith(".csv"):
104+
parts = file.split("_")
105+
session_name = parts[1].split("-")[1]
106+
category = int(parts[2].split("-")[1].split(".")[0])
107+
108+
with connection.cursor() as cursor:
109+
cursor.execute("""
110+
INSERT INTO session (session_name, category, csv_path, task_id)
111+
VALUES (%s, %s, %s, %s)
112+
ON CONFLICT DO NOTHING;
113+
""", (session_name, category, os.path.join(folder_path, file), task_id))
114+
115+
elif folder == "plot":
116+
plots = []
117+
for file in os.listdir(folder_path):
118+
if file.endswith(".png"):
119+
plots.append(os.path.join(folder_path, file))
120+
121+
with connection.cursor() as cursor:
122+
cursor.execute("""
123+
UPDATE session
124+
SET plot_paths = %s
125+
WHERE task_id = %s;
126+
""", (plots, task_id))
127+
connection.commit()
128+
import psycopg
129+
from psycopg import sql
130+
131+
132+
# Main entry point
133+
if __name__ == "__main__":
134+
db_name = "boostbeh"
135+
user = "zakg04"
136+
password = "*mIloisfAT23*123*"
137+
data_folder = "../data"
138+
connection = connect_to_db(db_name, user, password)
139+
util_instance = DatabaseUtils(connection, data_folder)
140+
util_instance.update_database()
141+
142+
"""conn = connect_to_db(db_name, user, password)
143+
try:
144+
initialize_schema(conn)
145+
populate_database(conn, data_folder)
146+
print("Database initialized and populated successfully.")
147+
finally:
148+
conn.close()"""
4.82 KB
Binary file not shown.
-3.13 KB
Binary file not shown.

0 commit comments

Comments
 (0)