Skip to content
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
9e8d94f
Adding Joss modules to main.nf
aditigopalan Mar 14, 2025
f3733b0
Create AnalyzeJOSSCriteria.nf
aditigopalan Mar 14, 2025
d20f2aa
Create InterepretWithGPT.nf
aditigopalan Mar 14, 2025
c586586
Update GenerateReport.nf
aditigopalan Mar 14, 2025
bb908b4
Update RunAlmanack.nf
aditigopalan Mar 14, 2025
cd1e55f
Merge pull request #38 from mc2-center/main
aditigopalan Mar 14, 2025
1fe2824
Update main.nf with new modules
aditigopalan Mar 14, 2025
d5d01b9
Update AnalyzeJOSSCriteria.nf
aditigopalan Mar 14, 2025
a379209
Adding final report and csv functionallity
aditigopalan Mar 14, 2025
ac1d8ae
Updating GPT response
aditigopalan Mar 14, 2025
67f6884
Updating tuple
aditigopalan Mar 14, 2025
dac393c
Setting GPT analysis as optional
aditigopalan Mar 14, 2025
cb695a7
Update consolidated_report.csv
aditigopalan Mar 14, 2025
2e96950
Setting better criteria for JOSS Review
aditigopalan Mar 14, 2025
a2bcf68
Adding tests as a check
aditigopalan Mar 14, 2025
cd0d970
Update JOSSCriteria to take output from toolkit as well
aditigopalan Mar 14, 2025
30398c3
Adding example output for JOSS
aditigopalan Mar 14, 2025
02122c9
fix: Improve error handling and file reading in AnalyzeJOSSCriteria m…
aditigopalan May 6, 2025
55cfcea
Update main workflow to include test execution and improved data flow…
aditigopalan May 6, 2025
beded55
Update JOSS criteria analysis to handle test results and improve scoring
aditigopalan May 6, 2025
702ac18
Add GPT interpretation module for detailed analysis of JOSS results
aditigopalan May 6, 2025
98c0f89
Update repository processing to include test detection
aditigopalan May 6, 2025
5e3c7bf
Update Almanack analysis to improve status reporting
aditigopalan May 6, 2025
539f4ef
Update Nextflow config with new process containers and parameters
aditigopalan May 6, 2025
19f096f
Add new TestExecutor module for running and analyzing repository tests
aditigopalan May 6, 2025
5f34012
Add CONTRIBUTING.md file
aditigopalan May 21, 2025
90c5818
Update README.md
aditigopalan May 21, 2025
f6c9aee
refactor: main.nf
aditigopalan May 21, 2025
8eeaa2b
feat: add AIAnalysis module
aditigopalan May 21, 2025
a73393c
update: enhance JOSS criteria analysis
aditigopalan May 21, 2025
bb6d6a0
Delete redundant modules
aditigopalan May 21, 2025
01f97da
refactor: update process configuration for AIAnalysis
aditigopalan May 21, 2025
c5feaff
Update main.nf.test
aditigopalan May 21, 2025
c01ef9e
Deleting nf reports
aditigopalan May 22, 2025
cacccca
delete consolidated_report.csv
aditigopalan May 22, 2025
641d140
test: update assertions to handle AIAnalysis process failures
aditigopalan May 22, 2025
456766c
adding aianalysis changes to main
aditigopalan May 22, 2025
8f23f2b
fix: ensure AIAnalysis process correctly references analyze.py
aditigopalan May 22, 2025
47d9067
feat: implement AI analysis for repository evaluation
aditigopalan May 22, 2025
c23c53f
Updating nextflow.config
aditigopalan May 22, 2025
05a9a1f
Adding logo
aditigopalan May 22, 2025
a12f648
Configuring CODEOWNERS
aditigopalan May 22, 2025
088fa62
Specify download location for dependencies
aditigopalan May 23, 2025
c948ad7
Update nextflow secrets
aditigopalan May 23, 2025
a92ab85
Remove synapse config option
aditigopalan May 23, 2025
29a8f92
move to bin + add type hints and docstring
aditigopalan May 23, 2025
d99589c
include invalid repo url in error message
aditigopalan May 23, 2025
7c4474d
refactor: use named parameters in channel combinations
aditigopalan May 23, 2025
48e5972
refactor: use named parameters in AI input channel combination
aditigopalan May 23, 2025
450e750
refactor: provide analyze.py as process input
aditigopalan May 23, 2025
3e6508a
fix: pin synapsepythonclient container version
aditigopalan May 23, 2025
89370f4
refactor: move JOSS analysis logic to separate Python script
aditigopalan May 23, 2025
f6de88e
refactor: add type hints and docstrings to metric handling
aditigopalan May 23, 2025
d846a45
refactor: move test execution logic to separate Python script
aditigopalan May 23, 2025
e4b0444
Clean up nf config
aditigopalan May 23, 2025
c1a40e0
removing debug statements
aditigopalan May 27, 2025
d40cdcc
update: add comprehensive joss analysis
aditigopalan May 27, 2025
16a103a
feat: test execution support
aditigopalan May 27, 2025
c90ad86
update: bringing back array indices
aditigopalan May 27, 2025
8d0f3bb
Updating container name
aditigopalan May 27, 2025
8f7846c
Update: updating input channels
aditigopalan May 27, 2025
c778c8e
Removing debug mode
aditigopalan May 27, 2025
6d39d35
Update main.nf.test
aditigopalan May 27, 2025
a220593
Update bin/analyze_joss.py
aditigopalan May 27, 2025
8a2246e
Minimize nesting bin/analyze_joss.py
aditigopalan May 27, 2025
d189c25
Update bin/run_tests.py
aditigopalan May 27, 2025
95077e6
Update modules/AIAnalysis.nf
aditigopalan May 27, 2025
a9bfc92
Update modules/AnalyzeJOSSCriteria.nf
aditigopalan May 27, 2025
263b8ee
Update modules/TestExecutor.nf
aditigopalan May 27, 2025
d665c0c
Update modules/TestExecutor.nf
aditigopalan May 27, 2025
5423752
Fixing indentation error
aditigopalan May 27, 2025
dc70d6f
Updating main.nf to remove bin paths
aditigopalan May 27, 2025
bca71a6
Removing bin path from file
aditigopalan May 27, 2025
653b899
Removing unused AgentSession
aditigopalan May 27, 2025
fd44c40
Adding type hints
aditigopalan May 27, 2025
f972dd1
Update return type hints
aditigopalan May 27, 2025
c00c9da
Adding docstring
aditigopalan May 27, 2025
2b3e25f
Adding docstring
aditigopalan May 27, 2025
3babf55
Removing unused variables
aditigopalan May 27, 2025
9089d6e
Breaking up analyze_joss_criteria into helper functions
aditigopalan May 27, 2025
af15f0f
Removing bin path
aditigopalan May 27, 2025
4354547
Defining strings as enums
aditigopalan May 27, 2025
70e42db
Updating string > enum
aditigopalan May 27, 2025
4ff3b66
Adding type hints/ docstrings
aditigopalan May 27, 2025
c3277f1
Removing unused results
aditigopalan May 27, 2025
2c786b0
refactor(run_tests): improve test result pattern matching
aditigopalan May 27, 2025
3688fa1
String > enum for needs improvement
aditigopalan May 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Define maintainers for key parts of the repository

# Core Workflow Files
/main.nf @aditigopalan
/modules/ @aditigopalan

# Documentation
/README.md @aditigopalan

# Tests
/tests/ @aditigopalan

# Configuration
/nextflow.config @aditigopalan

# Scripts
/scripts/ @aditigopalan

# Other
* @aditigopalan
21 changes: 21 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Pull Request

## Description
Please provide a brief description of the changes made in this PR.

## Changes Made
- [ ] Change 1
- [ ] Change 2
- [ ] Change 3

## Related Issues
Fixes #<issue_number>

## Checklist
- [ ] Code follows the project's coding standards.
- [ ] Tests have been added or updated to cover the changes.
- [ ] Documentation has been updated to reflect the changes.
- [ ] All tests pass locally.

## Additional Notes
Any additional information or context that might be helpful for reviewers.
53 changes: 53 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Contributing to Cancer Complexity Toolkit Workflow

We love your input! We want to make contributing to the Cancer Complexity Toolkit Workflow as easy and transparent as possible, whether it's:

- Reporting a bug
- Discussing the current state of the code
- Submitting a fix
- Proposing new features

## We Develop with GitHub
We use GitHub to host code, to track issues and feature requests, as well as accept pull requests.

## We Use [Nextflow](https://www.nextflow.io/)
We use Nextflow for workflow management. Make sure you have Nextflow installed and are familiar with its syntax before contributing.

## Development Process
We use the `main` branch as the primary development branch. All changes should be made through pull requests.

1. Fork the repo and create your branch from `main`.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation.
4. Ensure the test suite passes.
5. Make sure your code lints.
6. Issue that pull request!

## Any contributions you make will be under the MIT Software License
In short, when you submit code changes, your submissions are understood to be under the same [MIT License](http://choosealicense.com/licenses/mit/) that covers the project. Feel free to contact the maintainers if that's a concern.

## Report bugs using GitHub's [issue tracker](https://github.com/yourusername/cckp-toolkit-workflow/issues)
We use GitHub issues to track public bugs. Report a bug by [opening a new issue](https://github.com/yourusername/cckp-toolkit-workflow/issues/new); it's that easy!

## Write bug reports with detail, background, and sample code

**Great Bug Reports** tend to have:

- A quick summary and/or background
- Steps to reproduce
- Be specific!
- Give sample code if you can.
- What you expected would happen
- What actually happens
- Notes (possibly including why you think this might be happening, or stuff you tried that didn't work)

## Use a Consistent Coding Style

* Use 2 spaces for indentation rather than tabs
* Keep line length under 100 characters
* Follow the Nextflow style guide for workflow files
* Use meaningful variable names
* Add comments for complex logic

## License
By contributing, you agree that your contributions will be licensed under its MIT License.
177 changes: 133 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,166 @@
# CCKP Toolkit Workflow
# Cancer Complexity Toolkit Workflow

![CCT Logo](cct-logo.png)

## Description

This Nextflow workflow (`main.nf`) performs quality and metadata checks on software tools by running a series of checks:
The Cancer Complexity Toolkit Workflow is a scalable infrastructure framework to promote sustainable tool development. It performs multiple levels of analysis:

1. **Basic Repository Checks**
- Repository cloning and validation
- README file verification
- Dependency file detection
- Test suite presence

2. **Advanced Analysis**
- [Software Gardening Almanack](https://github.com/software-gardening/almanack) analysis
- JOSS (Journal of Open Source Software) criteria evaluation
- AI-powered repository analysis (optional, requires Synapse agent ID)
- Test execution and coverage

- **CloneRepository**: Clones the repository.
- **CheckReadme**: Verifies the existence of a README file.
- **CheckDependencies**: Looks for dependency files (e.g., `requirements.txt`, `Pipfile`, `setup.py`, etc.).
- **CheckTests**: Checks for the presence of test directories or test files.
- **CheckAlmanack**: Runs the [Software Gardening Almanack](https://github.com/software-gardening/almanack) analysis.
3. **Optional Synapse Integration**
- Results upload to Synapse platform
- Metadata management

The final output is a **consolidated CSV report** where each row represents a tool (i.e., a repository) with the following columns:
## Requirements

```Tool, CloneRepository, CheckReadme, CheckDependencies, CheckTests, Almanack```
### Core Dependencies
- **Nextflow** (version 24.04.3 or later): Install from [Nextflow's official website]. Install instructions below (https://www.nextflow.io/).
- **Docker** (required for containerized execution): Install from [Docker's official website](https://www.docker.com/get-started).
- **Python 3.8+**: Install from [Python's official website](https://www.python.org/downloads/).
- **Git**

Each column shows the status (`PASS`/`FAIL`) for the respective check.
> [!IMPORTANT]
> Docker is required to run this workflow. The toolkit uses containerized processes to ensure consistent execution environments across different systems.

## Running the Workflow
You can execute the workflow in one of two ways:
- Analyze a single tool by specifying its repository URL.
- Analyze multiple tools using a sample sheet (CSV file) that includes a repo_url header.
### Optional Dependencies
For Synapse integration:
- Synapse Python client
- Synapse authentication token
- Synapse configuration file

### Install Nextflow
Follow the official installation guide [here](https://www.nextflow.io/docs/latest/install.html) or use the command below:
## Installation

1. **Install Nextflow**
```bash
curl -s https://get.nextflow.io | bash
```

### Run with a Single Repository URL
2. **Install Python Dependencies**
```bash
nextflow run main.nf --repo_url https://github.com/example/repo.git
pip install -r requirements.txt
```

### Run with a Sample Sheet
Prepare a CSV file (e.g., example-input.csv) with a header repo_url and one URL per row, then run:
3. **Configure Synapse** (Optional)
```bash
# Create Synapse config file
mkdir -p ~/.synapse
touch ~/.synapseConfig
```

> [!NOTE]
> To use Synapse features, you'll need to:
> 1. Create a personal access token from your [Synapse Account Settings](https://help.synapse.org/docs/Managing-Your-Account.2055405596.html#ManagingYourAccount-PersonalAccessTokens)
> 2. Add the token to your `~/.synapseConfig` file:
> ```
> [authentication]
> username = your_username
> apiKey = your_personal_access_token
> ```
> 3. Set the token as a Nextflow secret:
> ```bash
> nextflow secrets set SYNAPSE_AUTH_TOKEN your_personal_access_token
> ```

## Usage

### Input Format

The workflow accepts input in two formats:

1. **Single Repository URL**
```bash
nextflow run main.nf --sample_sheet <samplesheet>
nextflow run main.nf --repo_url https://github.com/example/repo.git
```

## Output
After the workflow completes, you'll find a consolidated CSV report (consolidated_report.csv) in your output directory (by default, under the results folder). Each row in this report represents a tool and its corresponding check statuses.
2. **Sample Sheet (CSV)**

Example `input.csv`:
```csv
repo_url,description
https://github.com/PythonOT/POT.git,Python Optimal Transport Library
https://github.com/RabadanLab/TARGet.git,TARGet Analysis Tool
```

## Optional: Uploading Results to Synapse
To upload results to Synapse, run the workflow with the following parameters:
### Running the Workflow

#### Basic Analysis
```bash
nextflow run main.nf --repo_url https://github.com/example/repo.git
```

#### With AI Analysis
```bash
nextflow run main.nf \
--repo_url https://github.com/example/repo.git \
--upload_to_synapse true\
--synapse_folder_id syn64626421
--synapse_agent_id LOWYSX3QSQ
```

#### With Sample Sheet
```bash
nextflow run main.nf --sample_sheet input.csv
```
Ensure your Synapse credentials are properly set up (e.g., by mounting your .synapseConfig file).

## Tools You Can Test With
> [!NOTE]
> When using AI Analysis or Synapse integration, ensure you have:
> - Valid Synapse authentication token
> - Proper Synapse configuration
> - Synapse agent ID for AI analysis (e.g., LOWYSX3QSQ)
> - Correct folder ID with write permissions (for upload)

## Output

The workflow generates several output files in the `results` directory:

- `<repo_name>_ai_analysis.json`: AI-powered qualitative summary and recommendations (final report)
- `almanack_results.json`: Detailed metrics from Almanack analysis
- `joss_report_<repo_name>.json`: JOSS criteria evaluation metrics
- `test_results_<repo_name>.json`: Test execution results and coverage metrics

> [!NOTE]
> The AI analysis report provides a high-level qualitative summary and actionable recommendations. For detailed metrics and specific measurements, refer to the other output files.

## Development Status

> [!WARNING]
> The AI Analysis component is currently in beta. Results may vary and the interface is subject to change.

> [!IMPORTANT]
> Synapse integration requires proper authentication and permissions setup.

## Example Repositories

| Repository | Description | Expected Status |
|------------|-------------|----------------|
| [PythonOT/POT](https://github.com/PythonOT/POT) | Python Optimal Transport Library | All checks pass |
| [RabadanLab/TARGet](https://github.com/RabadanLab/TARGet) | TARGet Analysis Tool | Fails dependency and test checks |
| [arjunrajlaboratory/memSeqASEanalysis](https://github.com/arjunrajlaboratory/memSeqASEanalysis) | memSeq ASE Analysis | Fails dependency and test checks |

## Configuration

### Synapse Configuration

1. **Python Optimal Transport Library**
- Synapse: [POT](https://cancercomplexity.synapse.org/Explore/Tools/DetailsPage?toolName=POT)
- GitHub: [PythonOT/POT](https://github.com/PythonOT/POT)
- Note: Should pass all tests
**Authentication Token**
- Set as Nextflow secret:
```bash
nextflow secrets set SYNAPSE_AUTH_TOKEN your_personal_access_token
```

2. **TARGet**
- Synapse: [TARGet](https://cancercomplexity.synapse.org/Explore/Tools/DetailsPage?toolName=TARGet)
- GitHub: [RabadanLab/TARGet](https://github.com/RabadanLab/TARGet/tree/master)
- Note: Fails CheckDependencies, CheckTests
## Contributing

3. **memSeqASEanalysis**
- Synapse: [memSeqASEanalysis](https://cancercomplexity.synapse.org/Explore/Tools/DetailsPage?toolName=memSeqASEanalysis)
- GitHub: [arjunrajlaboratory/memSeqASEanalysis](https://github.com/arjunrajlaboratory/memSeqASEanalysis)
- Note: Fails CheckDependencies, CheckTests
> [!NOTE]
> We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

**Subset of tools to test**: Any from [this list](https://cancercomplexity.synapse.org/Explore/Tools) with a GitHub repository.
## License

## Notes
- Ensure Nextflow and Docker are installed
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
73 changes: 73 additions & 0 deletions bin/analyze.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/usr/bin/env python3

import json
import os
import sys
from synapseclient import Synapse
from synapseclient.models import Agent, AgentSession

def call_synapse_agent(agent_id, prompt):
"""
Call the Synapse agent with the given prompt and return its response.

Args:
agent_id (str): The ID of the Synapse agent to use
prompt (str): The prompt to send to the agent

Returns:
str: The agent's response

Raises:
Exception: If there's an error during agent communication
"""
syn = Synapse()
syn.login(authToken=os.environ['SYNAPSE_AUTH_TOKEN'])
agent = Agent(cloud_agent_id=agent_id)
agent.register(synapse_client=syn)
session = agent.start_session(synapse_client=syn)
response = session.prompt(
prompt=prompt,
enable_trace=True,
print_response=False,
synapse_client=syn
)
return response.response

if __name__ == "__main__":
repo_name = sys.argv[1]
repo_url = sys.argv[2]
almanack_results_file = sys.argv[3]
joss_report_file = sys.argv[4]
agent_id = sys.argv[5]

try:
# Read input files
with open(almanack_results_file, 'r') as f:
almanack_results = json.load(f)
with open(joss_report_file, 'r') as f:
joss_report = json.load(f)

# Prepare input for agent
agent_input = {
"repository_url": repo_url,
"almanack_results": almanack_results,
"joss_report": joss_report
}

# Call Synapse agent and treat response as HTML
response_html = call_synapse_agent(agent_id, json.dumps(agent_input))

# Write the HTML response directly to file
os.makedirs("results", exist_ok=True)
output_file = f"{repo_name}_ai_analysis.html"
with open(output_file, 'w') as f:
f.write(response_html)
except Exception as e:
print(f"[ERROR] Analysis failed: {str(e)}")
print(f"[ERROR] Exception type: {type(e)}")
import traceback
print(f"[ERROR] Traceback:\n{traceback.format_exc()}")
os.makedirs("results", exist_ok=True)
output_file = f"results/{sys.argv[1]}_ai_analysis.html"
with open(output_file, 'w') as f:
f.write(f"<html><body><h1>Error in AI Analysis</h1><pre>{str(e)}</pre></body></html>")
Loading
Loading