Skip to content

Commit 0002650

Browse files
committed
#updated with new group plotsgit add .
includes *many* changes to logic including better QC handling (thanks codex), efficiency changes, and group plots. Almost ready for js implementation in React app.
1 parent 199a136 commit 0002650

File tree

5,915 files changed

+311466
-22749
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

5,915 files changed

+311466
-22749
lines changed

README.md

Lines changed: 7 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -2,112 +2,10 @@
22

33
> Contains a more modular approach to QCing, alleviates the headaches of old stuff
44
5-
6-
## Requirements
7-
- process batches of txt files by turning them into csv
8-
- requires basic qcs
9-
- requires scoring criteria
10-
- plot information on a graph
11-
- save the data in the correct location
12-
- uploads the data to git and server
13-
- adds plots and scoring to github pages
14-
15-
16-
## Plan
17-
18-
- `pull_handler` returns a list of txt files
19-
- `utils` contains commonly used functions like converting txt file to csv
20-
- each domain has its own qc file with diff methods for qcing by task
21-
- takes in a list of files as an arg and processes them, returning the usability score and logging any problems
22-
23-
24-
## Tasks
25-
- [x] finish cc algos
26-
- [x] test
27-
- [ ] start WL/DWL algos -> separate class from mem
28-
29-
30-
31-
## Relational Database Design Summary for Clinical Trial Cognitive Data
32-
33-
>>Purpose & Scope
34-
• This database will organize and store clinical trial cognitive data.
35-
• Each participant completes 13 cognitive tasks over two runs each.
36-
• The data will be ingested daily from a prewritten backend.
37-
• The database will integrate with a frontend using Python and Azure.
38-
• Expected data volume: Hundreds to thousands of participants.
39-
40-
>>Core Entities & Relationships
41-
42-
1. Participants (participants)
43-
• Stores participant identifiers, their assigned study type (observation/intervention), and their site location.
44-
• Each participant completes 26 runs total (13 tasks × 2 runs).
45-
• Relationships:
46-
• Linked to sites (site_id)
47-
• Linked to study_types (study_id)
48-
• Has many runs
49-
50-
2. Study Types (study_types)
51-
• Defines whether a participant is in the Intervention or Observation group.
52-
53-
3. Sites (sites)
54-
• Stores the location each participant is from.
55-
• Explicitly defined in the directory structure.
56-
57-
4. Tasks (tasks)
58-
• Stores the 13 predefined tasks in a static table.
59-
60-
5. Runs (runs)
61-
• Stores each task run per participant (26 runs per participant).
62-
• Each run is linked to a participant and a task.
63-
• Can store a timestamp (nullable, extracted from CSVs).
64-
65-
6. Results (results)
66-
• Stores raw cognitive task data extracted from CSV files.
67-
• CSV contents will be stored directly in the database (not just file paths).
68-
• Linked to runs via run_id.
69-
70-
7. Reports (reports)
71-
• Stores 1-2 PNG files per run as binary blobs (not file paths).
72-
• Linked to runs via run_id.
73-
• Has a missing_png_flag to track if files are absent.
74-
75-
Constraints & Data Integrity
76-
• Primary Keys (PKs) & Foreign Keys (FKs):
77-
• participant_id → Primary key in participants
78-
• task_id → Primary key in tasks
79-
• run_id → Primary key in runs, foreign key links to participants & tasks
80-
• result_id → Primary key in results, foreign key links to runs
81-
• report_id → Primary key in reports, foreign key links to runs
82-
• Data Rules & Validation:
83-
• All 13 tasks must be associated with each participant (26 runs total).
84-
• missing_png_flag will track missing PNG files.
85-
• csv_data will be stored as structured data (likely JSON or table format).
86-
87-
>>Indexing & Optimization
88-
89-
• Indexes on:
90-
• participant_id (for quick retrieval of participant data)
91-
• task_id (for filtering task-based results)
92-
• study_id (for intervention vs. observation analysis)
93-
• site_id (for location-based analysis)
94-
• Storage Considerations:
95-
• CSV data stored as structured content (JSON or column format).
96-
• PNG files stored as binary blobs.
97-
• Query Optimization:
98-
• JOINs will be used for participant-level queries.
99-
• Materialized views can be considered for frequently used summaries.
100-
101-
>>Security & Access Control
102-
• Currently, only you will use the database, so permissions are simple.
103-
• Future security measures:
104-
• Row-level security for multiple users.
105-
• Encryption for sensitive participant records.
106-
107-
>>Backup & Recovery
108-
• Daily backups of database storage + binary files.
109-
• Azure Blob Storage or PostgreSQL Large Objects for efficient handling of PNG & CSV files.
110-
111-
Next Step: SQL Schema Implementation
112-
113-
Would you like the SQL schema to be written for PostgreSQL, MySQL, or another database system?
5+
## NF
6+
'NF': {
7+
'A': [978, 980],
8+
'B': [979, 981],
9+
'C': [977, 982]
10+
},
11+
"jap_5ThOJ14yf7z1EPEUpAoZYMWoETZcmJk305719"

app/app.py

Lines changed: 0 additions & 68 deletions
This file was deleted.

app/db.py

Lines changed: 0 additions & 167 deletions
This file was deleted.

0 commit comments

Comments
 (0)