Refactor logging in orchestrator.py to use warnings for feature export errors and update test_orchestrator.py to remove commented-out tests

alperkent-cmi · alperkent-cmi · commit eac9c4538b02 · 2025-07-30T13:43:11.000-04:00
diff --git a/README.md b/README.md
@@ -13,60 +13,6 @@ A Python toolkit for analysis of graphomotor data collected via Curious.
 
 Welcome to `graphomotor`, a specialized Python library for analyzing graphomotor data collected via [Curious](https://www.gettingcurious.com/). This toolkit provides comprehensive tools for processing, analyzing, and visualizing data from various graphomotor assessment tasks including spiral drawing, trails making, alphabetic writing, digit symbol substitution, and the Rey-Osterrieth Complex Figure Test.
 
-## Development Progress
-
-⚠️ **This package is under active development.** Currently, the focus is on the Spiral task. After finalizing feature extraction, the next steps will involve implementing both preprocessing and visualization for this task. Once these parts are in place, we plan to extend support to other tasks.
-
-| Task | Preprocessing | Feature Extraction | Visualization |
-| :--- | :---: | :---: | :---: |
-| Spiral | ![Spiral: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Spiral: Feature Extraction In Progress](https://img.shields.io/badge/in_progress-yellow) | ![Spiral: Visualization Pending](https://img.shields.io/badge/pending-red) |
-| Rey-Osterrieth Complex Figure | ![Rey-Osterrieth: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Rey-Osterrieth: Feature Extraction Pending](https://img.shields.io/badge/pending-red) | ![Rey-Osterrieth: Visualization Pending](https://img.shields.io/badge/pending-red) |
-| Alphabetic Writing | ![Alphabetic Writing: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Alphabetic Writing: Feature Extraction Pending](https://img.shields.io/badge/pending-red) | ![Alphabetic Writing: Visualization Pending](https://img.shields.io/badge/pending-red) |
-| Digit Symbol Substitution | ![Digit Symbol Substitution: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Digit Symbol Substitution: Feature Extraction Pending](https://img.shields.io/badge/pending-red) | ![Digit Symbol Substitution: Visualization Pending](https://img.shields.io/badge/pending-red) |
-| Trails Making | ![Trails Making: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Trails Making: Feature Extraction Pending](https://img.shields.io/badge/pending-red) | ![Trails Making: Visualization Pending](https://img.shields.io/badge/pending-red) |
-
-## Data Format Requirements
-
-⚠️ **This implementation requires data to adhere to a specific format matching the standard output from [Curious drawing responses](https://mindlogger.atlassian.net/servicedesk/customer/portal/3/article/859242501).**
-
-When exporting drawing data from Curious, you typically receive the following files:
-
-- **report.csv**: Contains the participants' actual responses.
-- **activity_user_journey.csv**: Logs the entire journey through the activity, including button actions like "Next", "Skip", "Back", and "Undo", regardless of whether a response was provided.
-- **drawing-responses-{date}.zip**: A ZIP archive with raw drawing response CSV files for each participant (e.g., `drawing-responses-Mon May 29 2023.zip`).
-- **media-responses-{date}.zip**: A ZIP archive containing SVG files for the drawing responses (e.g., `media-responses-Mon May 29 2023.zip`).
-- **trails-responses-{date}.zip**: A ZIP archive with raw trail making response CSV files (if there are any) for each participant (e.g., `trails-responses-Mon May 29 2023.zip`).
-
-For Spiral tasks, the toolkit uses only the CSV files from the drawing responses ZIP. Support for additional tasks will be added in future releases.
-
-### File Naming Convention
-
-Your spiral data files must follow this naming convention:
-
-```text
-[5123456]a7f3b2e9-d4c8-f1a6-e5b9-c2d7f8a3e6b4-spiral_trace1_Dom.csv
-```
-
-Where:
-
-- **Participant ID**: Must be enclosed in brackets `[]` and be a 7-digit number starting with `5` (e.g., `[5123456]`) that matches the `target_secret_id` column in the **report.csv** file.
-- **Activity Submission ID**: Must be a 32-character hexadecimal string (e.g., `18f2-45ea-a1e4-2334e07cc706`) that matches the `id` column in the **report.csv** file.
-- **Task**: Must be one of the following that matches the `item` column in the **report.csv** file:
-  - `spiral_trace1_Dom` through `spiral_trace5_Dom` (dominant hand tracing tasks)
-  - `spiral_trace1_NonDom` through `spiral_trace5_NonDom` (non-dominant hand tracing tasks)
-  - `spiral_recall1_Dom` through `spiral_recall3_Dom` (dominant hand recall tasks)
-  - `spiral_recall1_NonDom` through `spiral_recall3_NonDom` (non-dominant hand recall tasks)
-
-### Data Format
-
-Your spiral data CSV file must contain the following columns:
-
-```text
-line_number, x, y, UTC_Timestamp, seconds, epoch_time_in_seconds_start
-```
-
-This format represents the standard output from [Curious drawing responses data dictionary](https://mindlogger.atlassian.net/servicedesk/customer/portal/3/article/596082739).
-
 ## Feature Extraction Capabilities
 
 The toolkit extracts clinically relevant metrics from digitized drawing data. Currently implemented features include:
@@ -92,6 +38,8 @@ pip install git+https://github.com/childmindresearch/graphomotor
 
 ## Quick Start
 
+> **⚠️ This implementation requires data to adhere to a specific format matching the standard output from [Curious drawing responses](https://mindlogger.atlassian.net/servicedesk/customer/portal/3/article/859242501).**
+
 Currently, `graphomotor` is available as an importable Python library. CLI functionality is planned for future releases.
 
 ### Extracting Features from Spiral Drawing Data
@@ -110,32 +58,35 @@ features_df = orchestrator.run_pipeline(
     input_path=input_file
 )
 
+# Features are returned as a pandas DataFrame with source file as index
+print(f"Extracted features: {list(features_df.columns)}")
+
+# Access the single file's data (features_df has one row)
+file_path = features_df.index[0]
+print(f"File: {file_path}")
+print(f"Participant: {features_df.loc[file_path, 'participant_id']}")
+print(f"Task: {features_df.loc[file_path, 'task']}")
+print(f"Hand: {features_df.loc[file_path, 'hand']}")
+print(f"Duration: {features_df.loc[file_path, 'duration']}")
+```
+
+```python
 # Option 2: Save to a directory with auto-generated filename
 # Creates a CSV file with auto-generated name in the specified directory
 # Format: {participant_id}_{task}_{hand}_features_{YYYYMMDD_HHMM}.csv
 features_df = orchestrator.run_pipeline(
     input_path=input_file,
     output_path="path/to/output/directory"
 )
+```
 
+```python
 # Option 3: Save to a specific CSV file
 # Features will be saved to the specified file path
 features_df = orchestrator.run_pipeline(
     input_path=input_file,
     output_path="path/to/features.csv"
 )
-
-# Features are returned as a pandas DataFrame with source file as index
-print(f"Successfully processed {len(features_df)} file")
-print(f"Extracted features: {list(features_df.columns)}")
-
-# Access the single file's data (features_df has one row)
-file_path = features_df.index[0]
-print(f"File: {file_path}")
-print(f"Participant: {features_df.loc[file_path, 'participant_id']}")
-print(f"Task: {features_df.loc[file_path, 'task']}")
-print(f"Hand: {features_df.loc[file_path, 'hand']}")
-print(f"Duration: {features_df.loc[file_path, 'duration']}")
 ```
 
 #### Batch Processing
@@ -152,42 +103,92 @@ features_df = orchestrator.run_pipeline(
     input_path=input_dir,
 )
 
+# Features are returned as a pandas DataFrame with source files as index
+# Columns include: participant_id, task, hand, start_time, and calculated features
+print(f"Successfully processed {len(features_df)} files")
+
+# Access metadata and features for a specific file
+for file_path in features_df.index:
+    print(f"File: {file_path}")
+    print(f"Participant: {features_df.loc[file_path, 'participant_id']}")
+    print(f"Task: {features_df.loc[file_path, 'task']}")
+    print(f"Hand: {features_df.loc[file_path, 'hand']}")
+    print(f"Duration: {features_df.loc[file_path, 'duration']}")
+
+```
+
+```python
 # Option 2: Save to a directory with auto-generated filename
 # Creates a single consolidated CSV file with auto-generated name
 # Format: batch_features_{YYYYMMDD_HHMM}.csv
 features_df = orchestrator.run_pipeline(
     input_path=input_dir,
     output_path="path/to/output/directory"
 )
+```
 
+```python
 # Option 3: Save to a specific CSV file (single consolidated file)
 # All features will be written to one specified file
 features_df = orchestrator.run_pipeline(
     input_path=input_dir,
     output_path="path/to/consolidated_features.csv"
 )
+```
 
-# Features are returned as a pandas DataFrame with source files as index
-# Columns include: participant_id, task, hand, start_time, and calculated features
-print(f"Successfully processed {len(features_df)} files")
+For detailed configuration options and additional parameters, refer to the [`run_pipeline` documentation](https://childmindresearch.github.io/graphomotor/graphomotor/core/orchestrator.html#run_pipeline).
 
-# Access metadata and features for a specific file
-for file_path in features_df.index:
-    print(f"File: {file_path}")
-    print(f"Participant: {features_df.loc[file_path, 'participant_id']}")
-    print(f"Task: {features_df.loc[file_path, 'task']}")
-    print(f"Hand: {features_df.loc[file_path, 'hand']}")
-    print(f"Duration: {features_df.loc[file_path, 'duration']}")
+## Development Progress
+
+⚠️ **This package is under active development.** Currently, the focus is on the Spiral task. After finalizing feature extraction, the next steps will involve implementing both preprocessing and visualization for this task. Once these parts are in place, we plan to extend support to other tasks.
+
+| Task | Preprocessing | Feature Extraction | Visualization |
+| :--- | :---: | :---: | :---: |
+| Spiral | ![Spiral: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Spiral: Feature Extraction In Progress](https://img.shields.io/badge/in_progress-yellow) | ![Spiral: Visualization Pending](https://img.shields.io/badge/pending-red) |
+| Rey-Osterrieth Complex Figure | ![Rey-Osterrieth: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Rey-Osterrieth: Feature Extraction Pending](https://img.shields.io/badge/pending-red) | ![Rey-Osterrieth: Visualization Pending](https://img.shields.io/badge/pending-red) |
+| Alphabetic Writing | ![Alphabetic Writing: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Alphabetic Writing: Feature Extraction Pending](https://img.shields.io/badge/pending-red) | ![Alphabetic Writing: Visualization Pending](https://img.shields.io/badge/pending-red) |
+| Digit Symbol Substitution | ![Digit Symbol Substitution: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Digit Symbol Substitution: Feature Extraction Pending](https://img.shields.io/badge/pending-red) | ![Digit Symbol Substitution: Visualization Pending](https://img.shields.io/badge/pending-red) |
+| Trails Making | ![Trails Making: Preprocessing Pending](https://img.shields.io/badge/pending-red) | ![Trails Making: Feature Extraction Pending](https://img.shields.io/badge/pending-red) | ![Trails Making: Visualization Pending](https://img.shields.io/badge/pending-red) |
+
+## Data Format Requirements
+
+When exporting drawing data from Curious, you typically receive the following files:
+
+- **report.csv**: Contains the participants' actual responses.
+- **activity_user_journey.csv**: Logs the entire journey through the activity, including button actions like "Next", "Skip", "Back", and "Undo", regardless of whether a response was provided.
+- **drawing-responses-{date}.zip**: A ZIP archive with raw drawing response CSV files for each participant (e.g., `drawing-responses-Mon May 29 2023.zip`).
+- **media-responses-{date}.zip**: A ZIP archive containing SVG files for the drawing responses (e.g., `media-responses-Mon May 29 2023.zip`).
+- **trails-responses-{date}.zip**: A ZIP archive with raw trail making response CSV files (if there are any) for each participant (e.g., `trails-responses-Mon May 29 2023.zip`).
+
+For Spiral tasks, the toolkit uses only the CSV files from the drawing responses ZIP. Support for additional tasks will be added in future releases.
+
+### File Naming Convention
 
-# Or work with the DataFrame directly
-print(f"Mean duration across all files: {features_df['duration'].astype(float).mean()}")
-print(f"Spiral with highest linear velocity: {features_df['linear_velocity_median'].astype(float).idxmax()}")
+Your spiral data files must follow this naming convention:
 
-# Easy filtering and grouping by metadata
-print(f"Files with dominant hand: {len(features_df[features_df['hand'] == 'Dom'])}")
+```text
+[5123456]a7f3b2e9-d4c8-f1a6-e5b9-c2d7f8a3e6b4-spiral_trace1_Dom.csv
 ```
 
-For detailed configuration options and additional parameters, refer to the [`run_pipeline` documentation](https://childmindresearch.github.io/graphomotor/graphomotor/core/orchestrator.html#run_pipeline).
+Where:
+
+- **Participant ID**: Must be enclosed in brackets `[]` and be a 7-digit number starting with `5` (e.g., `[5123456]`) that matches the `target_secret_id` column in the **report.csv** file.
+- **Activity Submission ID**: Must be a 32-character hexadecimal string (e.g., `18f2-45ea-a1e4-2334e07cc706`) that matches the `id` column in the **report.csv** file.
+- **Task**: Must be one of the following that matches the `item` column in the **report.csv** file:
+  - `spiral_trace1_Dom` through `spiral_trace5_Dom` (dominant hand tracing tasks)
+  - `spiral_trace1_NonDom` through `spiral_trace5_NonDom` (non-dominant hand tracing tasks)
+  - `spiral_recall1_Dom` through `spiral_recall3_Dom` (dominant hand recall tasks)
+  - `spiral_recall1_NonDom` through `spiral_recall3_NonDom` (non-dominant hand recall tasks)
+
+### Data Format
+
+Your spiral data CSV file must contain the following columns:
+
+```text
+line_number, x, y, UTC_Timestamp, seconds, epoch_time_in_seconds_start
+```
+
+This format represents the standard output from [Curious drawing responses data dictionary](https://mindlogger.atlassian.net/servicedesk/customer/portal/3/article/596082739).
 
 ## Future Directions
 
diff --git a/src/graphomotor/core/orchestrator.py b/src/graphomotor/core/orchestrator.py
@@ -146,13 +146,13 @@ def export_features_to_csv(
         results_df.to_csv(output_file)
         logger.debug(f"Features saved successfully to {output_file}")
     except Exception as e:
-        logger.error(f"Failed to save features to {output_file}: {str(e)}")
+        logger.warning(f"Failed to save features to {output_file}: {str(e)}")
 
 
 def _run_file(
     input_path: pathlib.Path,
     feature_categories: list[FeatureCategories],
-    spiral_config: config.SpiralConfig | None,
+    spiral_config: config.SpiralConfig,
 ) -> dict[str, str]:
     """Process a single file for feature extraction.
 
@@ -180,7 +180,7 @@ def _run_file(
 def _run_directory(
     input_path: pathlib.Path,
     feature_categories: list[FeatureCategories],
-    spiral_config: config.SpiralConfig | None,
+    spiral_config: config.SpiralConfig,
 ) -> list[dict[str, str]]:
     """Process all CSV files in a directory and its subdirectories.
 
@@ -228,7 +228,7 @@ def _run_directory(
             results.append(features)
             logger.debug(f"Successfully processed {csv_file.name}")
         except Exception as e:
-            logger.error(f"Failed to process {csv_file.name}: {str(e)}")
+            logger.warning(f"Failed to process {csv_file.name}: {str(e)}")
             failed_files.append(csv_file.name)
             continue
 
diff --git a/tests/unit/test_orchestrator.py b/tests/unit/test_orchestrator.py
@@ -66,9 +66,6 @@ def test_validate_feature_categories_mixed(caplog: pytest.LogCaptureFixture) ->
     assert "meaning_of_life" in caplog.text
 
 
-# Tests for extract_features()
-
-
 @pytest.mark.parametrize(
     "feature_categories, expected_feature_number",
     [

Original file line number	Diff line number	Diff line change
`@@ -66,9 +66,6 @@ def test_validate_feature_categories_mixed(caplog: pytest.LogCaptureFixture) ->`
`66`	`66`	`assert "meaning_of_life" in caplog.text`
`67`	`67`
`68`	`68`
`69`		`-# Tests for extract_features()`
`70`		`-`
`71`		`-`
`72`	`69`	`@pytest.mark.parametrize(`
`73`	`70`	`"feature_categories, expected_feature_number",`
`74`	`71`	`[`