mc2-center · aditigopalan · May 27, 2025 · Mar 14, 2025 · Mar 14, 2025 · Mar 14, 2025
@@ -0,0 +1,20 @@
+# Define maintainers for key parts of the repository
+
+# Core Workflow Files
+/main.nf @aditigopalan
+/modules/ @aditigopalan
+
+# Documentation
+/README.md @aditigopalan
+
+# Tests
+/tests/ @aditigopalan
+
+# Configuration
+/nextflow.config @aditigopalan
+
+# Scripts
+/scripts/ @aditigopalan
+
+# Other
+* @aditigopalan 
@@ -0,0 +1,21 @@
+# Pull Request
+
+## Description
+Please provide a brief description of the changes made in this PR.
+
+## Changes Made
+- [ ] Change 1
+- [ ] Change 2
+- [ ] Change 3
+
+## Related Issues
+Fixes #<issue_number>
+
+## Checklist
+- [ ] Code follows the project's coding standards.
+- [ ] Tests have been added or updated to cover the changes.
+- [ ] Documentation has been updated to reflect the changes.
+- [ ] All tests pass locally.
+
+## Additional Notes
+Any additional information or context that might be helpful for reviewers. 
@@ -0,0 +1,53 @@
+# Contributing to Cancer Complexity Toolkit Workflow
+
+We love your input! We want to make contributing to the Cancer Complexity Toolkit Workflow as easy and transparent as possible, whether it's:
+
+- Reporting a bug
+- Discussing the current state of the code
+- Submitting a fix
+- Proposing new features
+
+## We Develop with GitHub
+We use GitHub to host code, to track issues and feature requests, as well as accept pull requests.
+
+## We Use [Nextflow](https://www.nextflow.io/)
+We use Nextflow for workflow management. Make sure you have Nextflow installed and are familiar with its syntax before contributing.
+
+## Development Process
+We use the `main` branch as the primary development branch. All changes should be made through pull requests.
+
+1. Fork the repo and create your branch from `main`.
+2. If you've added code that should be tested, add tests.
+3. If you've changed APIs, update the documentation.
+4. Ensure the test suite passes.
+5. Make sure your code lints.
+6. Issue that pull request!
+
+## Any contributions you make will be under the MIT Software License
+In short, when you submit code changes, your submissions are understood to be under the same [MIT License](http://choosealicense.com/licenses/mit/) that covers the project. Feel free to contact the maintainers if that's a concern.
+
+## Report bugs using GitHub's [issue tracker](https://github.com/yourusername/cckp-toolkit-workflow/issues)
+We use GitHub issues to track public bugs. Report a bug by [opening a new issue](https://github.com/yourusername/cckp-toolkit-workflow/issues/new); it's that easy!
+
+## Write bug reports with detail, background, and sample code
+
+**Great Bug Reports** tend to have:
+
+- A quick summary and/or background
+- Steps to reproduce
+  - Be specific!
+  - Give sample code if you can.
+- What you expected would happen
+- What actually happens
+- Notes (possibly including why you think this might be happening, or stuff you tried that didn't work)
+
+## Use a Consistent Coding Style
+
+* Use 2 spaces for indentation rather than tabs
+* Keep line length under 100 characters
+* Follow the Nextflow style guide for workflow files
+* Use meaningful variable names
+* Add comments for complex logic
+
+## License
+By contributing, you agree that your contributions will be licensed under its MIT License. 
@@ -1,77 +1,166 @@
-# CCKP Toolkit Workflow
+# Cancer Complexity Toolkit Workflow
+
+![CCT Logo](cct-logo.png)
 
 ## Description
 
-This Nextflow workflow (`main.nf`) performs quality and metadata checks on software tools by running a series of checks:
+The Cancer Complexity Toolkit Workflow is a scalable infrastructure framework to promote sustainable tool development. It performs multiple levels of analysis:
+
+1. **Basic Repository Checks**
+   - Repository cloning and validation
+   - README file verification
+   - Dependency file detection
+   - Test suite presence
+
+2. **Advanced Analysis**
+   - [Software Gardening Almanack](https://github.com/software-gardening/almanack) analysis
+   - JOSS (Journal of Open Source Software) criteria evaluation
+   - AI-powered repository analysis (optional, requires Synapse agent ID)
+   - Test execution and coverage
 
-- **CloneRepository**: Clones the repository.
-- **CheckReadme**: Verifies the existence of a README file.
-- **CheckDependencies**: Looks for dependency files (e.g., `requirements.txt`, `Pipfile`, `setup.py`, etc.).
-- **CheckTests**: Checks for the presence of test directories or test files.
-- **CheckAlmanack**: Runs the [Software Gardening Almanack](https://github.com/software-gardening/almanack) analysis.
+3. **Optional Synapse Integration**
+   - Results upload to Synapse platform
+   - Metadata management
 
-The final output is a **consolidated CSV report** where each row represents a tool (i.e., a repository) with the following columns:
+## Requirements
 
-```Tool, CloneRepository, CheckReadme, CheckDependencies, CheckTests, Almanack```
+### Core Dependencies
+- **Nextflow** (version 24.04.3 or later): Install from [Nextflow's official website]. Install instructions below (https://www.nextflow.io/).
+- **Docker** (required for containerized execution): Install from [Docker's official website](https://www.docker.com/get-started).
+- **Python 3.8+**: Install from [Python's official website](https://www.python.org/downloads/).
+- **Git**
 
-Each column shows the status (`PASS`/`FAIL`) for the respective check.
+> [!IMPORTANT]
+> Docker is required to run this workflow. The toolkit uses containerized processes to ensure consistent execution environments across different systems.
 
-## Running the Workflow
-You can execute the workflow in one of two ways:
-- Analyze a single tool by specifying its repository URL.
-- Analyze multiple tools using a sample sheet (CSV file) that includes a repo_url header.
+### Optional Dependencies
+For Synapse integration:
+- Synapse Python client
+- Synapse authentication token
+- Synapse configuration file
 
-### Install Nextflow 
-Follow the official installation guide [here](https://www.nextflow.io/docs/latest/install.html) or use the command below:
+## Installation
 
+1. **Install Nextflow**
 ```bash
 curl -s https://get.nextflow.io | bash
 ```
 
-### Run with a Single Repository URL
+2. **Install Python Dependencies**
 ```bash
-nextflow run main.nf --repo_url https://github.com/example/repo.git
+pip install -r requirements.txt
 ```
 
-### Run with a Sample Sheet
-Prepare a CSV file (e.g., example-input.csv) with a header repo_url and one URL per row, then run:
+3. **Configure Synapse** (Optional)
+```bash
+# Create Synapse config file
+mkdir -p ~/.synapse
+touch ~/.synapseConfig
+```
+
+> [!NOTE]
+> To use Synapse features, you'll need to:
+> 1. Create a personal access token from your [Synapse Account Settings](https://help.synapse.org/docs/Managing-Your-Account.2055405596.html#ManagingYourAccount-PersonalAccessTokens)
+> 2. Add the token to your `~/.synapseConfig` file:
+>    ```
+>    [authentication]
+>    username = your_username
+>    apiKey = your_personal_access_token
+>    ```
+> 3. Set the token as a Nextflow secret:
+>    ```bash
+>    nextflow secrets set SYNAPSE_AUTH_TOKEN your_personal_access_token
+>    ```
+
+## Usage
+
+### Input Format
 
+The workflow accepts input in two formats:
+
+1. **Single Repository URL**
 ```bash
-nextflow run main.nf --sample_sheet <samplesheet>
+nextflow run main.nf --repo_url https://github.com/example/repo.git
 ```
 
-## Output
-After the workflow completes, you'll find a consolidated CSV report (consolidated_report.csv) in your output directory (by default, under the results folder). Each row in this report represents a tool and its corresponding check statuses.
+2. **Sample Sheet (CSV)**
+
+Example `input.csv`:
+```csv
+repo_url,description
+https://github.com/PythonOT/POT.git,Python Optimal Transport Library
+https://github.com/RabadanLab/TARGet.git,TARGet Analysis Tool
+```
 
-## Optional: Uploading Results to Synapse
-To upload results to Synapse, run the workflow with the following parameters:
+### Running the Workflow
 
+#### Basic Analysis
+```bash
+nextflow run main.nf --repo_url https://github.com/example/repo.git
+```
+
+#### With AI Analysis
 ```bash
 nextflow run main.nf \
     --repo_url https://github.com/example/repo.git \
-    --upload_to_synapse true\
-    --synapse_folder_id syn64626421
+    --synapse_agent_id LOWYSX3QSQ
+```
+
+#### With Sample Sheet
+```bash
+nextflow run main.nf --sample_sheet input.csv
 ```
-Ensure your Synapse credentials are properly set up (e.g., by mounting your .synapseConfig file).
 
-## Tools You Can Test With
+> [!NOTE]
+> When using AI Analysis or Synapse integration, ensure you have:
+> - Valid Synapse authentication token
+> - Proper Synapse configuration
+> - Synapse agent ID for AI analysis (e.g., LOWYSX3QSQ)
+> - Correct folder ID with write permissions (for upload)
+
+## Output
+
+The workflow generates several output files in the `results` directory:
+
+- `<repo_name>_ai_analysis.json`: AI-powered qualitative summary and recommendations (final report)
+- `almanack_results.json`: Detailed metrics from Almanack analysis
+- `joss_report_<repo_name>.json`: JOSS criteria evaluation metrics
+- `test_results_<repo_name>.json`: Test execution results and coverage metrics
+
+> [!NOTE]
+> The AI analysis report provides a high-level qualitative summary and actionable recommendations. For detailed metrics and specific measurements, refer to the other output files.
+
+## Development Status
+
+> [!WARNING]
+> The AI Analysis component is currently in beta. Results may vary and the interface is subject to change.
+
+> [!IMPORTANT]
+> Synapse integration requires proper authentication and permissions setup.
+
+## Example Repositories
+
+| Repository | Description | Expected Status |
+|------------|-------------|----------------|
+| [PythonOT/POT](https://github.com/PythonOT/POT) | Python Optimal Transport Library | All checks pass |
+| [RabadanLab/TARGet](https://github.com/RabadanLab/TARGet) | TARGet Analysis Tool | Fails dependency and test checks |
+| [arjunrajlaboratory/memSeqASEanalysis](https://github.com/arjunrajlaboratory/memSeqASEanalysis) | memSeq ASE Analysis | Fails dependency and test checks |
+
+## Configuration
+
+### Synapse Configuration
 
-1. **Python Optimal Transport Library**  
-   - Synapse: [POT](https://cancercomplexity.synapse.org/Explore/Tools/DetailsPage?toolName=POT)  
-   - GitHub: [PythonOT/POT](https://github.com/PythonOT/POT)  
-   - Note: Should pass all tests
+**Authentication Token**
+   - Set as Nextflow secret:
+   ```bash
+   nextflow secrets set SYNAPSE_AUTH_TOKEN your_personal_access_token
+   ```
 
-2. **TARGet**  
-   - Synapse: [TARGet](https://cancercomplexity.synapse.org/Explore/Tools/DetailsPage?toolName=TARGet)  
-   - GitHub: [RabadanLab/TARGet](https://github.com/RabadanLab/TARGet/tree/master)  
-   - Note: Fails CheckDependencies, CheckTests
+## Contributing
 
-3. **memSeqASEanalysis**  
-   - Synapse: [memSeqASEanalysis](https://cancercomplexity.synapse.org/Explore/Tools/DetailsPage?toolName=memSeqASEanalysis)  
-   - GitHub: [arjunrajlaboratory/memSeqASEanalysis](https://github.com/arjunrajlaboratory/memSeqASEanalysis)  
-   - Note: Fails CheckDependencies, CheckTests
+> [!NOTE]
+> We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
 
-**Subset of tools to test**: Any from [this list](https://cancercomplexity.synapse.org/Explore/Tools) with a GitHub repository.
+## License
 
-## Notes
-- Ensure Nextflow and Docker are installed 
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. 
@@ -0,0 +1,73 @@
+#!/usr/bin/env python3
+
+import json
+import os
+import sys
+from synapseclient import Synapse
+from synapseclient.models import Agent, AgentSession
+
+def call_synapse_agent(agent_id, prompt):
+    """
+    Call the Synapse agent with the given prompt and return its response.
+
+    Args:
+        agent_id (str): The ID of the Synapse agent to use
+        prompt (str): The prompt to send to the agent
+
+    Returns:
+        str: The agent's response
+
+    Raises:
+        Exception: If there's an error during agent communication
+    """
+    syn = Synapse()
+    syn.login(authToken=os.environ['SYNAPSE_AUTH_TOKEN'])
+    agent = Agent(cloud_agent_id=agent_id)
+    agent.register(synapse_client=syn)
+    session = agent.start_session(synapse_client=syn)
+    response = session.prompt(
+        prompt=prompt,
+        enable_trace=True,
+        print_response=False,
+        synapse_client=syn
+    )
+    return response.response
+
+if __name__ == "__main__":
+    repo_name = sys.argv[1]
+    repo_url = sys.argv[2]
+    almanack_results_file = sys.argv[3]
+    joss_report_file = sys.argv[4]
+    agent_id = sys.argv[5]
+
+    try:
+        # Read input files
+        with open(almanack_results_file, 'r') as f:
+            almanack_results = json.load(f)
+        with open(joss_report_file, 'r') as f:
+            joss_report = json.load(f)
+
+        # Prepare input for agent
+        agent_input = {
+            "repository_url": repo_url,
+            "almanack_results": almanack_results,
+            "joss_report": joss_report
+        }
+
+        # Call Synapse agent and treat response as HTML
+        response_html = call_synapse_agent(agent_id, json.dumps(agent_input))
+
+        # Write the HTML response directly to file
+        os.makedirs("results", exist_ok=True)
+        output_file = f"{repo_name}_ai_analysis.html"
+        with open(output_file, 'w') as f:
+            f.write(response_html)
+    except Exception as e:
+        print(f"[ERROR] Analysis failed: {str(e)}")
+        print(f"[ERROR] Exception type: {type(e)}")
+        import traceback
+        print(f"[ERROR] Traceback:\n{traceback.format_exc()}")
+        os.makedirs("results", exist_ok=True)
+        output_file = f"results/{sys.argv[1]}_ai_analysis.html"
+        with open(output_file, 'w') as f:
+            f.write(f"<html><body><h1>Error in AI Analysis</h1><pre>{str(e)}</pre></body></html>")