Skip to content

Commit 9d8145e

Browse files
committed
how did everything delete
1 parent 11963ed commit 9d8145e

File tree

2,420 files changed

+128775
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,420 files changed

+128775
-0
lines changed

.DS_Store

6 KB
Binary file not shown.

.github/workflows/main.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: temp name
2+
3+
on:
4+
push:
5+
branches: [ "main" ]
6+
7+
jobs:
8+
build:
9+
runs-on: ubuntu-latest
10+
11+
steps:
12+
- uses: actions/checkout@v4
13+
- name: Set up Python 3.10
14+
uses: actions/setup-python@v3
15+
with:
16+
python-version: "3.10"
17+
- name: Install dependencies
18+
run: |
19+
python -m pip install --upgrade pip
20+
python -m pip install flake8 pytest
21+
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
22+
- name: Run Python script
23+
run: |
24+
python code/main_handler.py all
25+
- name: Commit and push changes to dev
26+
run: |
27+
git config --global user.name "GitHub Actions Bot"
28+
git config --global user.email "github-actions[bot]@users.noreply.github.com"
29+
git checkout dev
30+
git add .
31+
git commit -m "Automated changes from GitHub Actions"
32+
git push origin dev

.qcrc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
3+
"DATADIR": "./data"
4+
"RAWDATADIR": "./data/raw"
5+
6+
}

README.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Holistic QC rewrite
2+
3+
> Contains a more modular approach to QCing, alleviates the headaches of old stuff
4+
5+
6+
## Requirements
7+
- process batches of txt files by turning them into csv
8+
- requires basic qcs
9+
- requires scoring criteria
10+
- plot information on a graph
11+
- save the data in the correct location
12+
- uploads the data to git and server
13+
- adds plots and scoring to github pages
14+
15+
16+
## Plan
17+
18+
- `pull_handler` returns a list of txt files
19+
- `utils` contains commonly used functions like converting txt file to csv
20+
- each domain has its own qc file with diff methods for qcing by task
21+
- takes in a list of files as an arg and processes them, returning the usability score and logging any problems
22+
23+
24+
## Tasks
25+
- [x] finish cc algos
26+
- [x] test
27+
- [ ] start WL/DWL algos -> separate class from mem
28+
29+
30+
31+
## Relational Database Design Summary for Clinical Trial Cognitive Data
32+
33+
>>Purpose & Scope
34+
• This database will organize and store clinical trial cognitive data.
35+
• Each participant completes 13 cognitive tasks over two runs each.
36+
• The data will be ingested daily from a prewritten backend.
37+
• The database will integrate with a frontend using Python and Azure.
38+
• Expected data volume: Hundreds to thousands of participants.
39+
40+
>>Core Entities & Relationships
41+
42+
1. Participants (participants)
43+
• Stores participant identifiers, their assigned study type (observation/intervention), and their site location.
44+
• Each participant completes 26 runs total (13 tasks × 2 runs).
45+
• Relationships:
46+
• Linked to sites (site_id)
47+
• Linked to study_types (study_id)
48+
• Has many runs
49+
50+
2. Study Types (study_types)
51+
• Defines whether a participant is in the Intervention or Observation group.
52+
53+
3. Sites (sites)
54+
• Stores the location each participant is from.
55+
• Explicitly defined in the directory structure.
56+
57+
4. Tasks (tasks)
58+
• Stores the 13 predefined tasks in a static table.
59+
60+
5. Runs (runs)
61+
• Stores each task run per participant (26 runs per participant).
62+
• Each run is linked to a participant and a task.
63+
• Can store a timestamp (nullable, extracted from CSVs).
64+
65+
6. Results (results)
66+
• Stores raw cognitive task data extracted from CSV files.
67+
• CSV contents will be stored directly in the database (not just file paths).
68+
• Linked to runs via run_id.
69+
70+
7. Reports (reports)
71+
• Stores 1-2 PNG files per run as binary blobs (not file paths).
72+
• Linked to runs via run_id.
73+
• Has a missing_png_flag to track if files are absent.
74+
75+
Constraints & Data Integrity
76+
• Primary Keys (PKs) & Foreign Keys (FKs):
77+
• participant_id → Primary key in participants
78+
• task_id → Primary key in tasks
79+
• run_id → Primary key in runs, foreign key links to participants & tasks
80+
• result_id → Primary key in results, foreign key links to runs
81+
• report_id → Primary key in reports, foreign key links to runs
82+
• Data Rules & Validation:
83+
• All 13 tasks must be associated with each participant (26 runs total).
84+
• missing_png_flag will track missing PNG files.
85+
• csv_data will be stored as structured data (likely JSON or table format).
86+
87+
>>Indexing & Optimization
88+
89+
• Indexes on:
90+
• participant_id (for quick retrieval of participant data)
91+
• task_id (for filtering task-based results)
92+
• study_id (for intervention vs. observation analysis)
93+
• site_id (for location-based analysis)
94+
• Storage Considerations:
95+
• CSV data stored as structured content (JSON or column format).
96+
• PNG files stored as binary blobs.
97+
• Query Optimization:
98+
• JOINs will be used for participant-level queries.
99+
• Materialized views can be considered for frequently used summaries.
100+
101+
>>Security & Access Control
102+
• Currently, only you will use the database, so permissions are simple.
103+
• Future security measures:
104+
• Row-level security for multiple users.
105+
• Encryption for sensitive participant records.
106+
107+
>>Backup & Recovery
108+
• Daily backups of database storage + binary files.
109+
• Azure Blob Storage or PostgreSQL Large Objects for efficient handling of PNG & CSV files.
110+
111+
Next Step: SQL Schema Implementation
112+
113+
Would you like the SQL schema to be written for PostgreSQL, MySQL, or another database system?

app/app.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
import os
2+
from flask import Flask, send_from_directory
3+
from main.utils import construct_master_list
4+
5+
def update_png_paths_and_create_serve_function(app):
6+
"""
7+
Updates the PNG file paths in MASTER_LIST to use the new served directory structure
8+
and creates a Flask route to serve the files.
9+
"""
10+
master_list = app.config["MASTER_LIST"]
11+
data_folder = app.config["DATA_FOLDER"]
12+
13+
# Create a new directory structure for serving
14+
for subject_id, subject_data in master_list.items():
15+
for task_name, task_data in subject_data.get("tasks", {}).items():
16+
new_png_paths = []
17+
for file_path in task_data.get("png_paths", []):
18+
# Extract relative path: 'subject/task/file'
19+
relative_path = os.path.relpath(file_path, data_folder)
20+
new_png_paths.append(f"data/{relative_path}")
21+
22+
# Update the master list with the new paths
23+
task_data["png_paths"] = new_png_paths
24+
# Add a route to serve the updated files
25+
@app.route("/data/<path:subpath>")
26+
def serve_data_file(subpath):
27+
"""
28+
Serve files from the data directory using the new structure.
29+
"""
30+
file_path = os.path.join(data_folder, subpath)
31+
if not os.path.exists(file_path):
32+
return f"File not found: {file_path}", 404
33+
34+
directory, filename = os.path.split(file_path)
35+
return send_from_directory(directory, filename)
36+
37+
38+
def create_app():
39+
app = Flask(__name__)
40+
app.config['DATA_FOLDER'] = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', 'data'))
41+
app.config['ALLOWED_EXTENSIONS'] = {'csv', 'txt', 'png'}
42+
43+
# Ensure the data folder exists
44+
if not os.path.exists(app.config['DATA_FOLDER']):
45+
raise FileNotFoundError(f"Data folder not found at {app.config['DATA_FOLDER']}")
46+
47+
# Construct the master list and store it in the app config
48+
app.config['MASTER_LIST'] = construct_master_list(app.config['DATA_FOLDER'])
49+
50+
# Update paths in the master list and add the serve route
51+
with app.app_context():
52+
update_png_paths_and_create_serve_function(app)
53+
54+
# Register blueprints
55+
from feed_blueprint import feed_print
56+
from home_blueprint import home_blueprint
57+
app.register_blueprint(feed_print)
58+
app.register_blueprint(home_blueprint)
59+
60+
return app
61+
62+
63+
if __name__ == '__main__':
64+
# Initialize and run the Flask app
65+
app = create_app()
66+
app.run(debug=True)

app/db 2.py

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
import os
2+
import psycopg
3+
from psycopg import sql
4+
import logging
5+
from main.update_db import DatabaseUtils
6+
7+
# Database connection setup
8+
def connect_to_db(db_name, user, password, host="localhost", port=5432):
9+
return psycopg.connect(dbname=db_name, user=user, password=password, host=host, port=port)
10+
11+
# Initialize database schema
12+
def initialize_schema(connection):
13+
try:
14+
with connection.cursor() as cursor:
15+
cursor.execute("""
16+
-- Drop existing tables in reverse dependency order.
17+
DROP TABLE IF EXISTS session CASCADE;
18+
DROP TABLE IF EXISTS task CASCADE;
19+
DROP TABLE IF EXISTS subject CASCADE;
20+
DROP TABLE IF EXISTS site CASCADE;
21+
DROP TABLE IF EXISTS study CASCADE;
22+
23+
-- Create table "study"
24+
CREATE TABLE study (
25+
id SERIAL PRIMARY KEY,
26+
name TEXT NOT NULL UNIQUE
27+
);
28+
29+
-- Create table "site"
30+
CREATE TABLE site (
31+
id SERIAL PRIMARY KEY,
32+
name TEXT NOT NULL,
33+
study_id INTEGER NOT NULL,
34+
UNIQUE (name, study_id),
35+
FOREIGN KEY (study_id) REFERENCES study(id) ON DELETE CASCADE
36+
);
37+
38+
-- Create table "subject"
39+
CREATE TABLE subject (
40+
id SERIAL PRIMARY KEY,
41+
name TEXT NOT NULL,
42+
site_id INTEGER NOT NULL,
43+
UNIQUE (name, site_id),
44+
FOREIGN KEY (site_id) REFERENCES site(id) ON DELETE CASCADE
45+
);
46+
47+
-- Create table "task"
48+
CREATE TABLE task (
49+
id SERIAL PRIMARY KEY,
50+
name TEXT NOT NULL,
51+
subject_id INTEGER NOT NULL,
52+
UNIQUE (name, subject_id),
53+
FOREIGN KEY (subject_id) REFERENCES subject(id) ON DELETE CASCADE
54+
);
55+
56+
-- Create table "session"
57+
CREATE TABLE session (
58+
id SERIAL PRIMARY KEY,
59+
session_name TEXT NOT NULL,
60+
category INTEGER NOT NULL,
61+
csv_path TEXT NOT NULL,
62+
task_id INTEGER NOT NULL,
63+
date TIMESTAMP,
64+
plot_paths TEXT[],
65+
FOREIGN KEY (task_id) REFERENCES task(id) ON DELETE CASCADE,
66+
UNIQUE (session_name, category, csv_path, task_id)
67+
);
68+
""")
69+
connection.commit()
70+
except Exception as e:
71+
logging.error(f"Error initializing schema: {e}")
72+
connection.rollback()
73+
74+
finally:
75+
if connection:
76+
connection.close()
77+
78+
# Populate the database from the folder structure
79+
def populate_database(connection, data_folder):
80+
for study_name in os.listdir(data_folder):
81+
study_path = os.path.join(data_folder, study_name)
82+
if not os.path.isdir(study_path):
83+
continue
84+
85+
with connection.cursor() as cursor:
86+
cursor.execute("INSERT INTO study (name) VALUES (%s) ON CONFLICT (name) DO NOTHING RETURNING id;", (study_name,))
87+
study_id = cursor.fetchone() or (cursor.execute("SELECT id FROM study WHERE name = %s;", (study_name,)), cursor.fetchone()[0])
88+
89+
for site_name in os.listdir(study_path):
90+
site_path = os.path.join(study_path, site_name)
91+
if not os.path.isdir(site_path):
92+
continue
93+
94+
with connection.cursor() as cursor:
95+
cursor.execute("INSERT INTO site (name, study_id) VALUES (%s, %s) ON CONFLICT DO NOTHING RETURNING id;", (site_name, study_id))
96+
site_id = cursor.fetchone() or (cursor.execute("SELECT id FROM site WHERE name = %s AND study_id = %s;", (site_name, study_id)), cursor.fetchone()[0])
97+
98+
for subject_name in os.listdir(site_path):
99+
subject_path = os.path.join(site_path, subject_name)
100+
if not os.path.isdir(subject_path):
101+
continue
102+
103+
with connection.cursor() as cursor:
104+
cursor.execute("INSERT INTO subject (name, site_id) VALUES (%s, %s) ON CONFLICT DO NOTHING RETURNING id;", (subject_name, site_id))
105+
subject_id = cursor.fetchone() or (cursor.execute("SELECT id FROM subject WHERE name = %s AND site_id = %s;", (subject_name, site_id)), cursor.fetchone()[0])
106+
107+
for task_name in os.listdir(subject_path):
108+
task_path = os.path.join(subject_path, task_name)
109+
if not os.path.isdir(task_path):
110+
continue
111+
112+
with connection.cursor() as cursor:
113+
cursor.execute("INSERT INTO task (name, subject_id) VALUES (%s, %s) ON CONFLICT DO NOTHING RETURNING id;", (task_name, subject_id))
114+
task_id = cursor.fetchone() or (cursor.execute("SELECT id FROM task WHERE name = %s AND subject_id = %s;", (task_name, subject_id)), cursor.fetchone()[0])
115+
116+
for folder in ["data", "plot"]:
117+
folder_path = os.path.join(task_path, folder)
118+
if not os.path.exists(folder_path):
119+
continue
120+
121+
if folder == "data":
122+
for file in os.listdir(folder_path):
123+
if file.endswith(".csv"):
124+
parts = file.split("_")
125+
session_name = parts[1].split("-")[1]
126+
category = int(parts[2].split("-")[1].split(".")[0])
127+
128+
with connection.cursor() as cursor:
129+
cursor.execute("""
130+
INSERT INTO session (session_name, category, csv_path, task_id)
131+
VALUES (%s, %s, %s, %s)
132+
ON CONFLICT DO NOTHING;
133+
""", (session_name, category, os.path.join(folder_path, file), task_id))
134+
135+
elif folder == "plot":
136+
plots = []
137+
for file in os.listdir(folder_path):
138+
if file.endswith(".png"):
139+
plots.append(os.path.join(folder_path, file))
140+
141+
with connection.cursor() as cursor:
142+
cursor.execute("""
143+
UPDATE session
144+
SET plot_paths = %s
145+
WHERE task_id = %s;
146+
""", (plots, task_id))
147+
connection.commit()
148+
import psycopg
149+
from psycopg import sql
150+
151+
152+
# Main entry point
153+
if __name__ == "__main__":
154+
db_name = "boost-beh"
155+
user = "zakg04"
156+
password = "*mIloisfAT23*123*"
157+
data_folder = "../data"
158+
connection = connect_to_db(db_name, user, password)
159+
try:
160+
initialize_schema(connection)
161+
finally:
162+
connection.close()
163+
'''
164+
util_instance = DatabaseUtils(connection, data_folder)
165+
util_instance.update_database()
166+
167+
'''

0 commit comments

Comments
 (0)