diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 0000000..c0ef0b7 --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1,20 @@ +# Define maintainers for key parts of the repository + +# Core Workflow Files +/main.nf @aditigopalan +/modules/ @aditigopalan + +# Documentation +/README.md @aditigopalan + +# Tests +/tests/ @aditigopalan + +# Configuration +/nextflow.config @aditigopalan + +# Scripts +/scripts/ @aditigopalan + +# Other +* @aditigopalan \ No newline at end of file diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 0000000..7314f53 --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,21 @@ +# Pull Request + +## Description +Please provide a brief description of the changes made in this PR. + +## Changes Made +- [ ] Change 1 +- [ ] Change 2 +- [ ] Change 3 + +## Related Issues +Fixes # + +## Checklist +- [ ] Code follows the project's coding standards. +- [ ] Tests have been added or updated to cover the changes. +- [ ] Documentation has been updated to reflect the changes. +- [ ] All tests pass locally. + +## Additional Notes +Any additional information or context that might be helpful for reviewers. \ No newline at end of file diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..f0c81c9 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,53 @@ +# Contributing to Cancer Complexity Toolkit Workflow + +We love your input! We want to make contributing to the Cancer Complexity Toolkit Workflow as easy and transparent as possible, whether it's: + +- Reporting a bug +- Discussing the current state of the code +- Submitting a fix +- Proposing new features + +## We Develop with GitHub +We use GitHub to host code, to track issues and feature requests, as well as accept pull requests. + +## We Use [Nextflow](https://www.nextflow.io/) +We use Nextflow for workflow management. Make sure you have Nextflow installed and are familiar with its syntax before contributing. + +## Development Process +We use the `main` branch as the primary development branch. All changes should be made through pull requests. + +1. Fork the repo and create your branch from `main`. +2. If you've added code that should be tested, add tests. +3. If you've changed APIs, update the documentation. +4. Ensure the test suite passes. +5. Make sure your code lints. +6. Issue that pull request! + +## Any contributions you make will be under the MIT Software License +In short, when you submit code changes, your submissions are understood to be under the same [MIT License](http://choosealicense.com/licenses/mit/) that covers the project. Feel free to contact the maintainers if that's a concern. + +## Report bugs using GitHub's [issue tracker](https://github.com/yourusername/cckp-toolkit-workflow/issues) +We use GitHub issues to track public bugs. Report a bug by [opening a new issue](https://github.com/yourusername/cckp-toolkit-workflow/issues/new); it's that easy! + +## Write bug reports with detail, background, and sample code + +**Great Bug Reports** tend to have: + +- A quick summary and/or background +- Steps to reproduce + - Be specific! + - Give sample code if you can. +- What you expected would happen +- What actually happens +- Notes (possibly including why you think this might be happening, or stuff you tried that didn't work) + +## Use a Consistent Coding Style + +* Use 2 spaces for indentation rather than tabs +* Keep line length under 100 characters +* Follow the Nextflow style guide for workflow files +* Use meaningful variable names +* Add comments for complex logic + +## License +By contributing, you agree that your contributions will be licensed under its MIT License. \ No newline at end of file diff --git a/README.md b/README.md index 9a3b436..4ebe2bb 100644 --- a/README.md +++ b/README.md @@ -1,77 +1,166 @@ -# CCKP Toolkit Workflow +# Cancer Complexity Toolkit Workflow + +![CCT Logo](cct-logo.png) ## Description -This Nextflow workflow (`main.nf`) performs quality and metadata checks on software tools by running a series of checks: +The Cancer Complexity Toolkit Workflow is a scalable infrastructure framework to promote sustainable tool development. It performs multiple levels of analysis: + +1. **Basic Repository Checks** + - Repository cloning and validation + - README file verification + - Dependency file detection + - Test suite presence + +2. **Advanced Analysis** + - [Software Gardening Almanack](https://github.com/software-gardening/almanack) analysis + - JOSS (Journal of Open Source Software) criteria evaluation + - AI-powered repository analysis (optional, requires Synapse agent ID) + - Test execution and coverage -- **CloneRepository**: Clones the repository. -- **CheckReadme**: Verifies the existence of a README file. -- **CheckDependencies**: Looks for dependency files (e.g., `requirements.txt`, `Pipfile`, `setup.py`, etc.). -- **CheckTests**: Checks for the presence of test directories or test files. -- **CheckAlmanack**: Runs the [Software Gardening Almanack](https://github.com/software-gardening/almanack) analysis. +3. **Optional Synapse Integration** + - Results upload to Synapse platform + - Metadata management -The final output is a **consolidated CSV report** where each row represents a tool (i.e., a repository) with the following columns: +## Requirements -```Tool, CloneRepository, CheckReadme, CheckDependencies, CheckTests, Almanack``` +### Core Dependencies +- **Nextflow** (version 24.04.3 or later): Install from [Nextflow's official website]. Install instructions below (https://www.nextflow.io/). +- **Docker** (required for containerized execution): Install from [Docker's official website](https://www.docker.com/get-started). +- **Python 3.8+**: Install from [Python's official website](https://www.python.org/downloads/). +- **Git** -Each column shows the status (`PASS`/`FAIL`) for the respective check. +> [!IMPORTANT] +> Docker is required to run this workflow. The toolkit uses containerized processes to ensure consistent execution environments across different systems. -## Running the Workflow -You can execute the workflow in one of two ways: -- Analyze a single tool by specifying its repository URL. -- Analyze multiple tools using a sample sheet (CSV file) that includes a repo_url header. +### Optional Dependencies +For Synapse integration: +- Synapse Python client +- Synapse authentication token +- Synapse configuration file -### Install Nextflow -Follow the official installation guide [here](https://www.nextflow.io/docs/latest/install.html) or use the command below: +## Installation +1. **Install Nextflow** ```bash curl -s https://get.nextflow.io | bash ``` -### Run with a Single Repository URL +2. **Install Python Dependencies** ```bash -nextflow run main.nf --repo_url https://github.com/example/repo.git +pip install -r requirements.txt ``` -### Run with a Sample Sheet -Prepare a CSV file (e.g., example-input.csv) with a header repo_url and one URL per row, then run: +3. **Configure Synapse** (Optional) +```bash +# Create Synapse config file +mkdir -p ~/.synapse +touch ~/.synapseConfig +``` + +> [!NOTE] +> To use Synapse features, you'll need to: +> 1. Create a personal access token from your [Synapse Account Settings](https://help.synapse.org/docs/Managing-Your-Account.2055405596.html#ManagingYourAccount-PersonalAccessTokens) +> 2. Add the token to your `~/.synapseConfig` file: +> ``` +> [authentication] +> username = your_username +> apiKey = your_personal_access_token +> ``` +> 3. Set the token as a Nextflow secret: +> ```bash +> nextflow secrets set SYNAPSE_AUTH_TOKEN your_personal_access_token +> ``` + +## Usage + +### Input Format +The workflow accepts input in two formats: + +1. **Single Repository URL** ```bash -nextflow run main.nf --sample_sheet +nextflow run main.nf --repo_url https://github.com/example/repo.git ``` -## Output -After the workflow completes, you'll find a consolidated CSV report (consolidated_report.csv) in your output directory (by default, under the results folder). Each row in this report represents a tool and its corresponding check statuses. +2. **Sample Sheet (CSV)** + +Example `input.csv`: +```csv +repo_url,description +https://github.com/PythonOT/POT.git,Python Optimal Transport Library +https://github.com/RabadanLab/TARGet.git,TARGet Analysis Tool +``` -## Optional: Uploading Results to Synapse -To upload results to Synapse, run the workflow with the following parameters: +### Running the Workflow +#### Basic Analysis +```bash +nextflow run main.nf --repo_url https://github.com/example/repo.git +``` + +#### With AI Analysis ```bash nextflow run main.nf \ --repo_url https://github.com/example/repo.git \ - --upload_to_synapse true\ - --synapse_folder_id syn64626421 + --synapse_agent_id LOWYSX3QSQ +``` + +#### With Sample Sheet +```bash +nextflow run main.nf --sample_sheet input.csv ``` -Ensure your Synapse credentials are properly set up (e.g., by mounting your .synapseConfig file). -## Tools You Can Test With +> [!NOTE] +> When using AI Analysis or Synapse integration, ensure you have: +> - Valid Synapse authentication token +> - Proper Synapse configuration +> - Synapse agent ID for AI analysis (e.g., LOWYSX3QSQ) +> - Correct folder ID with write permissions (for upload) + +## Output + +The workflow generates several output files in the `results` directory: + +- `_ai_analysis.json`: AI-powered qualitative summary and recommendations (final report) +- `almanack_results.json`: Detailed metrics from Almanack analysis +- `joss_report_.json`: JOSS criteria evaluation metrics +- `test_results_.json`: Test execution results and coverage metrics + +> [!NOTE] +> The AI analysis report provides a high-level qualitative summary and actionable recommendations. For detailed metrics and specific measurements, refer to the other output files. + +## Development Status + +> [!WARNING] +> The AI Analysis component is currently in beta. Results may vary and the interface is subject to change. + +> [!IMPORTANT] +> Synapse integration requires proper authentication and permissions setup. + +## Example Repositories + +| Repository | Description | Expected Status | +|------------|-------------|----------------| +| [PythonOT/POT](https://github.com/PythonOT/POT) | Python Optimal Transport Library | All checks pass | +| [RabadanLab/TARGet](https://github.com/RabadanLab/TARGet) | TARGet Analysis Tool | Fails dependency and test checks | +| [arjunrajlaboratory/memSeqASEanalysis](https://github.com/arjunrajlaboratory/memSeqASEanalysis) | memSeq ASE Analysis | Fails dependency and test checks | + +## Configuration + +### Synapse Configuration -1. **Python Optimal Transport Library** - - Synapse: [POT](https://cancercomplexity.synapse.org/Explore/Tools/DetailsPage?toolName=POT) - - GitHub: [PythonOT/POT](https://github.com/PythonOT/POT) - - Note: Should pass all tests +**Authentication Token** + - Set as Nextflow secret: + ```bash + nextflow secrets set SYNAPSE_AUTH_TOKEN your_personal_access_token + ``` -2. **TARGet** - - Synapse: [TARGet](https://cancercomplexity.synapse.org/Explore/Tools/DetailsPage?toolName=TARGet) - - GitHub: [RabadanLab/TARGet](https://github.com/RabadanLab/TARGet/tree/master) - - Note: Fails CheckDependencies, CheckTests +## Contributing -3. **memSeqASEanalysis** - - Synapse: [memSeqASEanalysis](https://cancercomplexity.synapse.org/Explore/Tools/DetailsPage?toolName=memSeqASEanalysis) - - GitHub: [arjunrajlaboratory/memSeqASEanalysis](https://github.com/arjunrajlaboratory/memSeqASEanalysis) - - Note: Fails CheckDependencies, CheckTests +> [!NOTE] +> We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details. -**Subset of tools to test**: Any from [this list](https://cancercomplexity.synapse.org/Explore/Tools) with a GitHub repository. +## License -## Notes -- Ensure Nextflow and Docker are installed \ No newline at end of file +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. \ No newline at end of file diff --git a/bin/analyze.py b/bin/analyze.py new file mode 100755 index 0000000..885f4d5 --- /dev/null +++ b/bin/analyze.py @@ -0,0 +1,74 @@ +#!/usr/bin/env python3 + +import json +import os +import sys +from synapseclient import Synapse +from synapseclient.models import Agent +from typing import Dict, Any + +def call_synapse_agent(agent_id: str, prompt: str) -> str: + """ + Call the Synapse agent with the given prompt and return its response. + + Args: + agent_id (str): The ID of the Synapse agent to use + prompt (str): The prompt to send to the agent + + Returns: + str: The agent's response + + Raises: + Exception: If there's an error during agent communication + """ + syn = Synapse() + syn.login(authToken=os.environ['SYNAPSE_AUTH_TOKEN']) + agent = Agent(cloud_agent_id=agent_id) + agent.register(synapse_client=syn) + session = agent.start_session(synapse_client=syn) + response = session.prompt( + prompt=prompt, + enable_trace=True, + print_response=False, + synapse_client=syn + ) + return response.response + +if __name__ == "__main__": + repo_name = sys.argv[1] + repo_url = sys.argv[2] + almanack_results_file = sys.argv[3] + joss_report_file = sys.argv[4] + agent_id = sys.argv[5] + + try: + # Read input files + with open(almanack_results_file, 'r') as f: + almanack_results = json.load(f) + with open(joss_report_file, 'r') as f: + joss_report = json.load(f) + + # Prepare input for agent + agent_input = { + "repository_url": repo_url, + "almanack_results": almanack_results, + "joss_report": joss_report + } + + # Call Synapse agent and treat response as HTML + response_html = call_synapse_agent(agent_id, json.dumps(agent_input)) + + # Write the HTML response directly to file + os.makedirs("results", exist_ok=True) + output_file = f"{repo_name}_ai_analysis.html" + with open(output_file, 'w') as f: + f.write(response_html) + except Exception as e: + print(f"[ERROR] Analysis failed: {str(e)}") + print(f"[ERROR] Exception type: {type(e)}") + import traceback + print(f"[ERROR] Traceback:\n{traceback.format_exc()}") + os.makedirs("results", exist_ok=True) + output_file = f"results/{sys.argv[1]}_ai_analysis.html" + with open(output_file, 'w') as f: + f.write(f"

Error in AI Analysis

{str(e)}
") \ No newline at end of file diff --git a/bin/analyze_joss.py b/bin/analyze_joss.py new file mode 100755 index 0000000..71c8a11 --- /dev/null +++ b/bin/analyze_joss.py @@ -0,0 +1,546 @@ +#!/usr/bin/env python3 + +import json +import sys +import os +import csv +from typing import Dict, Any, List, Union, Optional +from enum import Enum, auto + +class Status(Enum): + """Enum for status values used in criteria evaluation.""" + NEEDS_IMPROVEMENT = "needs improvement" + OK = "ok" + GOOD = "good" + UNKNOWN = "UNKNOWN" + +class Details(Enum): + """Enum for detail messages used in criteria evaluation.""" + NOT_ANALYZED = "Not analyzed" + MISSING_README = "Missing README with statement of need" + MISSING_INSTALL = "Missing installation instructions" + MISSING_USAGE = "Missing example usage" + MISSING_GUIDELINES = "Missing community guidelines" + FOUND_COMPREHENSIVE_NEED = "Found comprehensive statement of need in README" + FOUND_NEED_IMPROVEMENT = "Found README but statement of need needs improvement" + FOUND_COMPREHENSIVE_INSTALL = "Found comprehensive installation instructions" + FOUND_INSTALL_IMPROVEMENT = "Found documentation but installation instructions need improvement" + FOUND_COMPREHENSIVE_USAGE = "Found comprehensive example usage" + FOUND_USAGE_IMPROVEMENT = "Found documentation but example usage needs improvement" + FOUND_BOTH_GUIDELINES = "Found both contributing guidelines and code of conduct" + FOUND_PARTIAL_GUIDELINES = "Found partial community guidelines" + +class Criteria(Enum): + """Enum for JOSS criteria names.""" + STATEMENT_OF_NEED = "Statement of Need" + INSTALLATION_INSTRUCTIONS = "Installation Instructions" + EXAMPLE_USAGE = "Example Usage" + COMMUNITY_GUIDELINES = "Community Guidelines" + TESTS = "Tests" + +# Constants for scoring +SCORE_GOOD = 1.0 +SCORE_OK = 0.7 +SCORE_NEEDS_IMPROVEMENT = 0.3 +SCORE_NONE = 0.0 + +# Constants for test thresholds +TEST_PASS_RATE_GOOD = 0.9 +TEST_PASS_RATE_OK = 0.7 + +def get_metric_value(metrics: Union[List[Dict[str, Any]], Dict[str, Any]], metric_name: str) -> Union[None, str, int, float, bool]: + """ + Extract a metric value from either JSON or CSV formatted metrics data. + + Args: + metrics: Either a list of metric dictionaries (JSON format) or a dictionary of metrics (CSV format) + metric_name: Name of the metric to extract + + Returns: + The value of the metric if found, None otherwise + + Examples: + >>> metrics_json = [{"name": "test", "result": "pass"}] + >>> get_metric_value(metrics_json, "test") + 'pass' + >>> metrics_csv = {"test": "pass"} + >>> get_metric_value(metrics_csv, "test") + 'pass' + """ + if isinstance(metrics, list): + # JSON format + for metric in metrics: + if metric.get("name") == metric_name: + return metric.get("result") + elif isinstance(metrics, dict): + # CSV format converted to dict + return metrics.get(metric_name) + return None + +def read_status_file(status_file: str) -> Dict[str, str]: + """ + Read and parse the status file containing repository processing status information (CloneRepo, HasRepo, HasDependencies, HasTests). + + Returns: + Dict[str, str]: Dictionary containing status information with keys: + - clone_status: Status of repository cloning + - dep_status: Status of dependency installation + - tests_status: Status of test execution + If the file cannot be read or is malformed, all statuses default to 'UNKNOWN' + """ + try: + with open(status_file, 'r') as f: + reader = csv.reader(f) + row = next(reader) # Read the first row + return { + 'clone_status': row[1] if len(row) > 1 else Status.UNKNOWN.value, + 'dep_status': row[2] if len(row) > 2 else Status.UNKNOWN.value, + 'tests_status': row[3] if len(row) > 3 else Status.UNKNOWN.value + } + except (FileNotFoundError, IndexError): + return { + 'clone_status': Status.UNKNOWN.value, + 'dep_status': Status.UNKNOWN.value, + 'tests_status': Status.UNKNOWN.value + } + +def analyze_readme_content(repo_dir: str) -> Dict[str, bool]: + """ + Analyze README content for key components required for JOSS submission. + + Args: + repo_dir (str): Path to the repository directory containing the README.md file. + + Returns: + Dict[str, bool]: Dictionary containing boolean flags for key README components: + - statement_of_need: True if README contains problem statement, target audience, and related work + - installation: True if README contains installation instructions + - example_usage: True if README contains example usage or quick start guide + Returns all False if README.md is not found + """ + readme_path = os.path.join(repo_dir, "README.md") + if not os.path.exists(readme_path): + return { + "statement_of_need": False, + "installation": False, + "example_usage": False + } + + with open(readme_path, 'r', encoding='utf-8') as f: + content = f.read().lower() + + # Check for statement of need components + has_problem_statement = any(phrase in content for phrase in [ + "problem", "solve", "purpose", "aim", "goal", "objective" + ]) + has_target_audience = any(phrase in content for phrase in [ + "audience", "users", "intended for", "designed for" + ]) + has_related_work = any(phrase in content for phrase in [ + "related", "similar", "compared to", "alternative" + ]) + + # Check for installation instructions + has_installation = any(phrase in content for phrase in [ + "install", "setup", "dependencies", "requirements", "pip install" + ]) + + # Check for example usage + has_examples = any(phrase in content for phrase in [ + "example", "usage", "how to use", "quick start", "getting started" + ]) + + return { + "statement_of_need": all([has_problem_statement, has_target_audience, has_related_work]), + "installation": has_installation, + "example_usage": has_examples + } + +def analyze_dependencies(repo_dir: str) -> Dict[str, Any]: + """ + Analyze dependency files for quality and completeness. + """ + dependency_files = { + 'python': [ + 'requirements.txt', + 'setup.py', + 'Pipfile', + 'pyproject.toml' + ], + 'node': [ + 'package.json', + 'package-lock.json', + 'yarn.lock' + ], + 'java': [ + 'pom.xml', + 'build.gradle', + 'settings.gradle' + ], + 'r': [ + 'DESCRIPTION', + 'renv.lock', + 'packrat/packrat.lock' + ], + 'rust': [ + 'Cargo.toml', + 'Cargo.lock' + ], + 'ruby': [ + 'Gemfile', + 'Gemfile.lock' + ], + 'go': [ + 'go.mod', + 'go.sum' + ] + } + + def check_python_requirements(file_path: str) -> Dict[str, Any]: + try: + with open(file_path, 'r') as f: + lines = f.readlines() + + deps = [] + issues = [] + + for line in lines: + line = line.strip() + if not line or line.startswith('#'): + continue + + # Check for basic formatting + if '==' in line: + deps.append(line) + elif '>=' in line or '<=' in line: + deps.append(line) + issues.append(f"Loose version constraint: {line}") + else: + issues.append(f"No version constraint: {line}") + + return { + "has_dependencies": len(deps) > 0, + "total_dependencies": len(deps), + "issues": issues, + "status": "good" if len(issues) == 0 else "ok" if len(issues) < len(deps) else "needs improvement" + } + except Exception as e: + return { + "has_dependencies": False, + "total_dependencies": 0, + "issues": [f"Error reading file: {str(e)}"], + "status": Status.NEEDS_IMPROVEMENT.value + } + + def check_package_json(file_path: str) -> Dict[str, Any]: + try: + with open(file_path, 'r') as f: + data = json.load(f) + + deps = [] + issues = [] + + # Check dependencies + for dep_type in ['dependencies', 'devDependencies']: + if dep_type not in data: + continue + for dep, version in data[dep_type].items(): + deps.append(f"{dep}:{version}") + if version.startswith('^') or version.startswith('~'): + issues.append(f"Loose version constraint: {dep} {version}") + elif version == '*': + issues.append(f"No version constraint: {dep}") + + return { + "has_dependencies": len(deps) > 0, + "total_dependencies": len(deps), + "issues": issues, + "status": "good" if len(issues) == 0 else "ok" if len(issues) < len(deps) else "needs improvement" + } + except Exception as e: + return { + "has_dependencies": False, + "total_dependencies": 0, + "issues": [f"Error reading file: {str(e)}"], + "status": Status.NEEDS_IMPROVEMENT.value + } + + results = { + "found_files": [], + "analysis": {}, + "overall_status": Status.NEEDS_IMPROVEMENT.value + } + + # Check for dependency files + for _, files in dependency_files.items(): + for file in files: + file_path = os.path.join(repo_dir, file) + if os.path.exists(file_path): + results["found_files"].append(file) + + # Analyze based on file type + if file.endswith('.txt'): + results["analysis"][file] = check_python_requirements(file_path) + elif file == 'package.json': + results["analysis"][file] = check_package_json(file_path) + # Add more file type checks as needed + + # Determine overall status + if not results["found_files"]: + results["overall_status"] = Status.NEEDS_IMPROVEMENT.value + else: + statuses = [analysis["status"] for analysis in results["analysis"].values()] + if "good" in statuses: + results["overall_status"] = Status.GOOD.value + elif "ok" in statuses: + results["overall_status"] = Status.OK.value + else: + results["overall_status"] = Status.NEEDS_IMPROVEMENT.value + + return results + +def analyze_test_results(test_results: Dict[str, Any]) -> Dict[str, Any]: + """ + Analyze test execution results and return criteria evaluation. + + Args: + test_results (Dict[str, Any]): Results from test execution + + Returns: + Dict[str, Any]: Dictionary containing test criteria evaluation with status, score, and details + """ + criteria = { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + } + + if test_results: + total_tests = test_results.get('total_tests', 0) + passed_tests = test_results.get('passed', 0) + + if total_tests > 0: + pass_rate = passed_tests / total_tests + if pass_rate >= TEST_PASS_RATE_GOOD: + criteria["status"] = Status.GOOD.value + criteria["score"] = SCORE_GOOD + elif pass_rate >= TEST_PASS_RATE_OK: + criteria["status"] = Status.OK.value + criteria["score"] = SCORE_OK + else: + criteria["status"] = Status.NEEDS_IMPROVEMENT.value + criteria["score"] = SCORE_NEEDS_IMPROVEMENT + else: + criteria["status"] = Status.NEEDS_IMPROVEMENT.value + criteria["score"] = SCORE_NONE + + criteria["details"] = "\n".join([ + f"Framework: {test_results.get('framework', 'Unknown')}", + f"Total Tests: {total_tests}", + f"Passed: {passed_tests}", + f"Failed: {test_results.get('failed', 0)}", + f"Error: {test_results.get('error', '')}" + ]).strip() + + return criteria + +def analyze_almanack_results(almanack_results: List[Dict[str, Any]], repo_dir: str) -> Dict[str, Dict[str, Any]]: + """ + Analyze Almanack results and return criteria evaluations. + + Args: + almanack_results (List[Dict[str, Any]]): Results from Almanack analysis + repo_dir (str): Path to the repository directory + + Returns: + Dict[str, Dict[str, Any]]: Dictionary containing criteria evaluations for: + - Statement of Need + - Installation Instructions + - Example Usage + - Community Guidelines + """ + criteria = { + Criteria.STATEMENT_OF_NEED.value: { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + }, + Criteria.INSTALLATION_INSTRUCTIONS.value: { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + }, + Criteria.EXAMPLE_USAGE.value: { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + }, + Criteria.COMMUNITY_GUIDELINES.value: { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + } + } + + if almanack_results: + # Extract relevant metrics + has_readme = get_metric_value(almanack_results, "repo-includes-readme") + has_contributing = get_metric_value(almanack_results, "repo-includes-contributing") + has_code_of_conduct = get_metric_value(almanack_results, "repo-includes-code-of-conduct") + has_docs = get_metric_value(almanack_results, "repo-includes-common-docs") + + # Check for statement of need + if has_readme: + readme_content = analyze_readme_content(repo_dir) + if readme_content["statement_of_need"]: + criteria[Criteria.STATEMENT_OF_NEED.value]["status"] = Status.GOOD.value + criteria[Criteria.STATEMENT_OF_NEED.value]["score"] = SCORE_GOOD + criteria[Criteria.STATEMENT_OF_NEED.value]["details"] = Details.FOUND_COMPREHENSIVE_NEED.value + else: + criteria[Criteria.STATEMENT_OF_NEED.value]["status"] = Status.OK.value + criteria[Criteria.STATEMENT_OF_NEED.value]["score"] = SCORE_OK + criteria[Criteria.STATEMENT_OF_NEED.value]["details"] = Details.FOUND_NEED_IMPROVEMENT.value + else: + criteria[Criteria.STATEMENT_OF_NEED.value]["status"] = Status.NEEDS_IMPROVEMENT.value + criteria[Criteria.STATEMENT_OF_NEED.value]["score"] = SCORE_NEEDS_IMPROVEMENT + criteria[Criteria.STATEMENT_OF_NEED.value]["details"] = Details.MISSING_README.value + + # Check for installation instructions + if has_readme and has_docs: + readme_content = analyze_readme_content(repo_dir) + if readme_content["installation"]: + criteria[Criteria.INSTALLATION_INSTRUCTIONS.value]["status"] = Status.GOOD.value + criteria[Criteria.INSTALLATION_INSTRUCTIONS.value]["score"] = SCORE_GOOD + criteria[Criteria.INSTALLATION_INSTRUCTIONS.value]["details"] = Details.FOUND_COMPREHENSIVE_INSTALL.value + else: + criteria[Criteria.INSTALLATION_INSTRUCTIONS.value]["status"] = Status.OK.value + criteria[Criteria.INSTALLATION_INSTRUCTIONS.value]["score"] = SCORE_OK + criteria[Criteria.INSTALLATION_INSTRUCTIONS.value]["details"] = Details.FOUND_INSTALL_IMPROVEMENT.value + else: + criteria[Criteria.INSTALLATION_INSTRUCTIONS.value]["status"] = Status.NEEDS_IMPROVEMENT.value + criteria[Criteria.INSTALLATION_INSTRUCTIONS.value]["score"] = SCORE_NEEDS_IMPROVEMENT + criteria[Criteria.INSTALLATION_INSTRUCTIONS.value]["details"] = Details.MISSING_INSTALL.value + + # Check for example usage + if has_readme and has_docs: + readme_content = analyze_readme_content(repo_dir) + if readme_content["example_usage"]: + criteria[Criteria.EXAMPLE_USAGE.value]["status"] = Status.GOOD.value + criteria[Criteria.EXAMPLE_USAGE.value]["score"] = SCORE_GOOD + criteria[Criteria.EXAMPLE_USAGE.value]["details"] = Details.FOUND_COMPREHENSIVE_USAGE.value + else: + criteria[Criteria.EXAMPLE_USAGE.value]["status"] = Status.OK.value + criteria[Criteria.EXAMPLE_USAGE.value]["score"] = SCORE_OK + criteria[Criteria.EXAMPLE_USAGE.value]["details"] = Details.FOUND_USAGE_IMPROVEMENT.value + else: + criteria[Criteria.EXAMPLE_USAGE.value]["status"] = Status.NEEDS_IMPROVEMENT.value + criteria[Criteria.EXAMPLE_USAGE.value]["score"] = SCORE_NEEDS_IMPROVEMENT + criteria[Criteria.EXAMPLE_USAGE.value]["details"] = Details.MISSING_USAGE.value + + # Check for community guidelines + if has_contributing and has_code_of_conduct: + criteria[Criteria.COMMUNITY_GUIDELINES.value]["status"] = Status.GOOD.value + criteria[Criteria.COMMUNITY_GUIDELINES.value]["score"] = SCORE_GOOD + criteria[Criteria.COMMUNITY_GUIDELINES.value]["details"] = Details.FOUND_BOTH_GUIDELINES.value + elif has_contributing or has_code_of_conduct: + criteria[Criteria.COMMUNITY_GUIDELINES.value]["status"] = Status.OK.value + criteria[Criteria.COMMUNITY_GUIDELINES.value]["score"] = SCORE_OK + criteria[Criteria.COMMUNITY_GUIDELINES.value]["details"] = Details.FOUND_PARTIAL_GUIDELINES.value + else: + criteria[Criteria.COMMUNITY_GUIDELINES.value]["status"] = Status.NEEDS_IMPROVEMENT.value + criteria[Criteria.COMMUNITY_GUIDELINES.value]["score"] = SCORE_NEEDS_IMPROVEMENT + criteria[Criteria.COMMUNITY_GUIDELINES.value]["details"] = Details.MISSING_GUIDELINES.value + + return criteria + +def analyze_joss_criteria(almanack_results: List[Dict[str, Any]], test_results: Dict[str, Any], repo_dir: str) -> Dict[str, Any]: + """ + Analyze repository against JOSS criteria based on Almanack and test results. + + Args: + almanack_results (List[Dict[str, Any]]): Results from Almanack analysis + test_results (Dict[str, Any]): Results from test execution + repo_dir (str): Path to the repository directory + + Returns: + Dict[str, Any]: Dictionary containing JOSS criteria evaluation with overall scores + """ + # Initialize criteria dictionary + criteria = { + Criteria.STATEMENT_OF_NEED.value: { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + }, + Criteria.INSTALLATION_INSTRUCTIONS.value: { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + }, + Criteria.EXAMPLE_USAGE.value: { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + }, + Criteria.COMMUNITY_GUIDELINES.value: { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + }, + Criteria.TESTS.value: { + "status": Status.NEEDS_IMPROVEMENT.value, + "score": SCORE_NONE, + "details": Details.NOT_ANALYZED.value + } + } + + # Analyze test results + test_criteria = analyze_test_results(test_results) + criteria[Criteria.TESTS.value] = test_criteria + + # Analyze Almanack results + almanack_criteria = analyze_almanack_results(almanack_results, repo_dir) + criteria.update(almanack_criteria) + + # Calculate overall score + total_score = sum(criterion["score"] for criterion in criteria.values()) + max_score = len(criteria) + overall_score = total_score / max_score if max_score > 0 else 0 + + return { + "criteria": criteria, + "overall_score": overall_score, + "total_score": total_score, + "max_score": max_score + } + +if __name__ == "__main__": + print(f"[DEBUG] sys.argv: {sys.argv}") + if len(sys.argv) != 5: + print("Usage: python analyze_joss.py ") + sys.exit(1) + + repo_name = sys.argv[1] + almanack_results_file = sys.argv[2] + test_results_file = sys.argv[3] + repo_dir = sys.argv[4] + + try: + # Read input files + with open(almanack_results_file, 'r') as f: + almanack_results = json.load(f) + with open(test_results_file, 'r') as f: + test_results = json.load(f) + + # Analyze JOSS criteria + joss_analysis = analyze_joss_criteria(almanack_results, test_results, repo_dir) + + # Write the analysis to a JSON file + output_file = f"joss_report_{repo_name}.json" + with open(output_file, 'w') as f: + json.dump(joss_analysis, f, indent=2) + print(f"[DEBUG] JOSS analysis written to {output_file}") + + except Exception as e: + print(f"[ERROR] JOSS analysis failed: {str(e)}") + sys.exit(1) \ No newline at end of file diff --git a/bin/run_tests.py b/bin/run_tests.py new file mode 100755 index 0000000..9769f64 --- /dev/null +++ b/bin/run_tests.py @@ -0,0 +1,302 @@ +#!/usr/bin/env python3 + +import json +import os +import subprocess +import sys +import re +from typing import Dict, Any + +def install_dependencies(repo_dir: str) -> bool: + """ + Install project dependencies before running tests. + + Args: + repo_dir (str): Path to the repository directory + + Returns: + bool: True if dependencies were installed successfully, False otherwise + + Note: + Attempts to install dependencies from requirements.txt and setup.py if they exist + """ + try: + # Try to install requirements.txt if it exists + req_file = os.path.join(repo_dir, 'requirements.txt') + if os.path.exists(req_file): + subprocess.run([sys.executable, '-m', 'pip', 'install', '-r', req_file], + cwd=repo_dir, check=True, capture_output=True) + + # Try to install setup.py if it exists + setup_file = os.path.join(repo_dir, 'setup.py') + if os.path.exists(setup_file): + subprocess.run([sys.executable, '-m', 'pip', 'install', '-e', '.'], + cwd=repo_dir, check=True, capture_output=True) + + return True + except subprocess.CalledProcessError as e: + print(f"Error installing dependencies: {e.stderr.decode()}", file=sys.stderr) + return False + +def detect_project_type(repo_dir: str) -> str: + """ + Detect project type based on characteristic files. + + Args: + repo_dir (str): Path to the repository directory + + Returns: + str: Project type identifier ('python', 'node', 'java-maven', 'java-gradle', 'r', 'rust', 'go', or 'unknown') + + Note: + Checks for characteristic files like requirements.txt, package.json, pom.xml, etc. + """ + project_files = { + 'python': ['requirements.txt', 'setup.py', 'pyproject.toml'], + 'node': ['package.json'], + 'java-maven': ['pom.xml'], + 'java-gradle': ['build.gradle'], + 'r': ['DESCRIPTION'], + 'rust': ['Cargo.toml'], + 'go': ['go.mod'] + } + + def file_exists(filename: str) -> bool: + return os.path.exists(os.path.join(repo_dir, filename)) + + for project_type, files in project_files.items(): + if any(file_exists(f) for f in files): + return project_type + + return 'unknown' + +def run_python_tests(repo_dir: str) -> Dict[str, Any]: + """ + Run Python tests using pytest or unittest. + + Args: + repo_dir (str): Path to the repository directory + + Returns: + Dict[str, Any]: Dictionary containing test results with keys: + - framework: Test framework used ('pytest' or 'unittest') + - status: Overall test status ('PASS' or 'FAIL') + - total_tests: Total number of tests run + - passed: Number of passed tests + - failed: Number of failed tests + - output: Test output + - error: Error message if any + """ + results = { + "framework": "unknown", + "status": "FAIL", + "total_tests": 0, + "passed": 0, + "failed": 0, + "skipped": 0, + "xfailed": 0, + "xpassed": 0, + "output": "", + "error": "" + } + + try: + # Install dependencies first + if not install_dependencies(repo_dir): + results["error"] = "Failed to install dependencies" + return results + + # Try pytest first + if os.path.exists(os.path.join(repo_dir, 'pytest.ini')) or \ + os.path.exists(os.path.join(repo_dir, 'conftest.py')) or \ + os.path.exists(os.path.join(repo_dir, 'tests')): + results["framework"] = "pytest" + cmd = [sys.executable, "-m", "pytest", "-v"] + else: + # Fall back to unittest + results["framework"] = "unittest" + cmd = [sys.executable, "-m", "unittest", "discover", "-v"] + + process = subprocess.run( + cmd, + cwd=repo_dir, + capture_output=True, + text=True + ) + + results["output"] = process.stdout + results["error"] = process.stderr + + # Parse test results for pytest + collected_re = re.compile(r'collected (\d+) items') + + # Define test result patterns and their corresponding counters + test_patterns = { + ('PASSED', 'XPASS'): 'passed', # PASSED but not XPASS + ('FAILED', 'XFAIL'): 'failed', # FAILED but not XFAIL + ('SKIPPED',): 'skipped', + ('XFAIL',): 'xfailed', + ('XPASS',): 'xpassed' + } + + for line in process.stdout.split('\n'): + # Get total tests from 'collected N items' + m = collected_re.search(line) + if m: + results["total_tests"] = int(m.group(1)) + + # Count test result lines using pattern mapping + for patterns, counter in test_patterns.items(): + if len(patterns) == 1: + if patterns[0] in line: + results[counter] += 1 + else: + # Handle cases where we need to check for inclusion and exclusion + include, exclude = patterns + if include in line and exclude not in line: + results[counter] += 1 + + # If total_tests is still 0, try to infer from sum of all counted + counted = results["passed"] + results["failed"] + results["skipped"] + results["xfailed"] + results["xpassed"] + if results["total_tests"] == 0 and counted > 0: + results["total_tests"] = counted + + # Update status based on results + if results["failed"] > 0: + results["status"] = "FAIL" + elif results["total_tests"] > 0: + results["status"] = "PASS" + + # If we still have no results, try to infer from return code + if results["total_tests"] == 0: + results["status"] = "PASS" if process.returncode == 0 else "FAIL" + + except Exception as e: + results["error"] = str(e) + + # Remove extra fields for compatibility + results.pop("skipped", None) + results.pop("xfailed", None) + results.pop("xpassed", None) + return results + +def run_node_tests(repo_dir: str) -> Dict[str, Any]: + """ + Run Node.js tests using npm or yarn. + + Args: + repo_dir (str): Path to the repository directory + + Returns: + Dict[str, Any]: Dictionary containing test results with keys: + - framework: Test framework used ('npm' or 'yarn') + - status: Overall test status ('PASS' or 'FAIL') + - total_tests: Total number of tests run + - passed: Number of passed tests + - failed: Number of failed tests + - output: Test output + - error: Error message if any + """ + results = { + "framework": "unknown", + "status": "FAIL", + "total_tests": 0, + "passed": 0, + "failed": 0, + "output": "", + "error": "" + } + + try: + # Check for package.json + package_json = os.path.join(repo_dir, 'package.json') + if not os.path.exists(package_json): + results["error"] = "No package.json found" + return results + + # Install dependencies + subprocess.run(["npm", "install"], cwd=repo_dir, check=True, capture_output=True) + + # Try npm test + process = subprocess.run( + ["npm", "test"], + cwd=repo_dir, + capture_output=True, + text=True + ) + + results["output"] = process.stdout + results["error"] = process.stderr + + if process.returncode == 0: + results["status"] = "PASS" + # Parse test results (basic parsing) + for line in process.stdout.split('\n'): + if "passing" in line.lower(): + results["passed"] += 1 + results["total_tests"] += 1 + elif "failing" in line.lower(): + results["failed"] += 1 + results["total_tests"] += 1 + + except Exception as e: + results["error"] = str(e) + + return results + +def execute_tests(repo_dir: str) -> Dict[str, Any]: + """ + Execute tests based on project type. + + Args: + repo_dir (str): Path to the repository directory + + Returns: + Dict[str, Any]: Dictionary containing test results with keys: + - framework: Test framework used + - status: Overall test status ('PASS' or 'FAIL') + - total_tests: Total number of tests run + - passed: Number of passed tests + - failed: Number of failed tests + - output: Test output + - error: Error message if any + + Note: + Automatically detects project type and runs appropriate test framework + """ + project_type = detect_project_type(repo_dir) + + if project_type == 'python': + return run_python_tests(repo_dir) + elif project_type == 'node': + return run_node_tests(repo_dir) + else: + return { + "framework": "unknown", + "status": "FAIL", + "total_tests": 0, + "passed": 0, + "failed": 0, + "output": "", + "error": f"Unsupported project type: {project_type}" + } + +if __name__ == "__main__": + if len(sys.argv) != 3: + print("Usage: run_tests.py ") + sys.exit(1) + + repo_name = sys.argv[1] + repo_dir = sys.argv[2] + + try: + # Execute tests + test_results = execute_tests(repo_dir) + + # Write results to file + with open(f"test_results_{repo_name}.json", 'w') as f: + json.dump(test_results, f, indent=2) + + except Exception as e: + print(f"Error running tests: {str(e)}") + sys.exit(1) \ No newline at end of file diff --git a/cct-logo.png b/cct-logo.png new file mode 100644 index 0000000..d27c693 Binary files /dev/null and b/cct-logo.png differ diff --git a/main.nf b/main.nf index 7b7ec9d..db3d841 100644 --- a/main.nf +++ b/main.nf @@ -7,96 +7,132 @@ nextflow.enable.dsl=2 * This workflow processes GitHub repositories to: * 1. Clone and perform initial checks (ProcessRepo) * 2. Run Almanack analysis (RunAlmanack) - * 3. Generate a consolidated report (GenerateReport) - * 4. Optionally upload results to Synapse (UploadToSynapse) + * 3. Analyze JOSS criteria (AnalyzeJOSSCriteria) + * 4. Analyze with AI agent (AIAnalysis) + * 5. Optionally upload results to Synapse (UploadToSynapse) */ - -// Global parameters -params.upload_to_synapse = false // default is false; override at runtime -params.sample_sheet = params.sample_sheet ?: null // CSV file with header "repo_url" -params.repo_url = params.repo_url ?: null // fallback for a single repo URL -params.output_dir = params.output_dir ?: 'results' // base output directory - -// Parameter validation -if (params.upload_to_synapse && !params.synapse_folder_id) { - throw new IllegalArgumentException("ERROR: synapse_folder_id must be provided when --upload_to_synapse is true.") -} - -// Validate repository URL format -def validateRepoUrl = { url -> - if (!url) return false - def validUrlPattern = ~/^https:\/\/github\.com\/[^\/]+\/[^\/]+\.git$/ - return url ==~ validUrlPattern -} - -// Extract repository name from URL -def getRepoName = { url -> - def urlStr = url instanceof List ? url[0] : url - urlStr.tokenize('/')[-1].replace('.git','') -} + +// Global parameters with defaults +params.upload_to_synapse = false +params.sample_sheet = null +params.repo_url = null +params.output_dir = 'results' +params.synapse_agent_id = null // Include required modules -include { ProcessRepo } from './modules/ProcessRepo.nf' -include { RunAlmanack } from './modules/RunAlmanack.nf' -include { GenerateReport } from './modules/GenerateReport.nf' -include { UploadToSynapse } from './modules/UploadToSynapse.nf' +include { ProcessRepo } from './modules/ProcessRepo' +include { RunAlmanack } from './modules/RunAlmanack' +include { AnalyzeJOSSCriteria } from './modules/AnalyzeJOSSCriteria' +include { AIAnalysis } from './modules/AIAnalysis' +include { UploadToSynapse } from './modules/UploadToSynapse' +include { TestExecutor } from './modules/TestExecutor' workflow { - // Build a channel from either a sample sheet or a single repo URL - def repoCh - if (params.sample_sheet) { - // First read and validate the sample sheet - def sampleSheetFile = file(params.sample_sheet) - def firstLine = sampleSheetFile.readLines()[0] - def headers = firstLine.split(',').collect { it.trim() } - if (!headers.contains('repo_url')) { - throw new IllegalArgumentException("Sample sheet must contain a 'repo_url' column") - } - - // Now create the channel and process it - repoCh = Channel.fromPath(params.sample_sheet) - .splitCsv(header:true) - .map { row -> row.repo_url } - .filter { url -> - if (!validateRepoUrl(url)) { - log.warn "Skipping invalid repository URL: ${url}" - return false - } - return true - } - } else if (params.repo_url) { - if (!validateRepoUrl(params.repo_url)) { - throw new IllegalArgumentException("Invalid repository URL format. Expected: https://github.com/username/repo.git") + // Load environment variables from .env file if it exists + def loadEnvFile = { envFile -> + if (file(envFile).exists()) { + file(envFile).readLines().each { line -> + if (line && !line.startsWith('#')) { + def parts = line.split('=') + if (parts.size() == 2) { + System.setProperty(parts[0].trim(), parts[1].trim()) + } + } + } } - repoCh = Channel.value(params.repo_url) - } else { - throw new IllegalArgumentException("Provide either a sample_sheet or repo_url parameter") } - - // Map each repository URL to a tuple: (repo_url, repo_name, out_dir) - def repoTuples = repoCh.map { repo_url -> - def repo_name = repo_url.tokenize('/')[-1].replace('.git','') - def out_dir = "${params.output_dir}/${repo_name}" - tuple(repo_url, repo_name, out_dir) + + // Load .env file + loadEnvFile('.env') + + // Parameter validation + if (!params.repo_url && !params.sample_sheet) { + throw new IllegalArgumentException("ERROR: Provide either a sample_sheet or repo_url parameter") + } + + if (params.upload_to_synapse && !params.synapse_folder_id) { + throw new IllegalArgumentException("ERROR: synapse_folder_id must be provided when --upload_to_synapse is true.") + } + + if (!params.synapse_agent_id) { + throw new IllegalArgumentException("ERROR: synapse_agent_id must be provided.") + } + + // Validate repository URL format + def validateRepoUrl = { url -> + if (!url) return false + def validUrlPattern = ~/^https:\/\/github\.com\/[^\/]+\/[^\/]+\.git$/ + return url ==~ validUrlPattern + } + + // Extract repository name from URL + def getRepoName = { url -> + def urlStr = url instanceof List ? url[0] : url + return urlStr.tokenize('/')[-1].replace('.git','') } - - // Process each repository with ProcessRepo (clones repo and performs initial checks) - def repoOutputs = repoTuples | ProcessRepo - - // Run the Almanack analysis on each repository - def almanackOutputs = repoOutputs | RunAlmanack - - // Collect all unique status files into one list - almanackOutputs - .map { repo_url, repo_name, out_dir, status_file -> status_file } - .collect() - .set { allStatusFiles } - - // Generate the consolidated report from all status files - allStatusFiles | GenerateReport + + // Create a channel of repo URLs + Channel.from( + params.sample_sheet ? + file(params.sample_sheet).readLines().drop(1).collect { it.trim() }.findAll { it } : + [params.repo_url] + ).set { repo_urls } + + // Validate and process each repo + repo_urls.map { repo_url -> + if (!validateRepoUrl(repo_url)) { + throw new IllegalArgumentException("ERROR: Invalid repository URL format: '${repo_url}'. Expected format: https://github.com/username/repo.git") + } + def repo_name = getRepoName(repo_url) + tuple(repo_url, repo_name, params.output_dir) + }.set { repo_tuples } + + // Process repository + ProcessRepo(repo_tuples) + + // Run Almanack + RunAlmanack(ProcessRepo.out) + + // Execute tests + TestExecutor(ProcessRepo.out) + + // Combine outputs for JOSS analysis + ProcessRepo.out + .combine(RunAlmanack.out, by: [0,1]) + .combine(TestExecutor.out, by: [0,1]) + .map { it -> + tuple( + it[0], // repo_url + it[1], // repo_name + it[2], // repo_dir from ProcessRepo + it[3], // out_dir + it[4], // status_file + it[8], // almanack_results + it[9] // test_results + ) + } + .set { joss_input } + + // Analyze JOSS criteria + AnalyzeJOSSCriteria(joss_input) + + // Analyze with AI agent + RunAlmanack.out + .combine(AnalyzeJOSSCriteria.out, by: [0,1]) + .map { repo_url, repo_name, _almanack_meta, _almanack_dir, _almanack_status, almanack_results, joss_report -> + tuple( + repo_url, // repo_url + repo_name, // repo_name + almanack_results, // almanack_results.json from RunAlmanack + joss_report // joss_report_.json from AnalyzeJOSSCriteria + ) + } + .set { ai_input } + + AIAnalysis(ai_input) // Optionally upload results to Synapse if enabled if (params.upload_to_synapse) { - almanackOutputs | UploadToSynapse + UploadToSynapse(RunAlmanack.out) } } \ No newline at end of file diff --git a/main.nf.test b/main.nf.test index db3ff49..5420295 100644 --- a/main.nf.test +++ b/main.nf.test @@ -9,6 +9,7 @@ nextflow_pipeline { params { repo_url = "https://github.com/PythonOT/POT.git" output_dir = "test_results" + synapse_agent_id = "LOWYSX3QSQ" } } @@ -16,7 +17,9 @@ nextflow_pipeline { assert workflow.success assert workflow.trace.tasks().size() > 0 assert workflow.trace.succeeded().size() > 0 - assert workflow.trace.failed().size() == 0 + assert workflow.trace.tasks().collect { it.name }.any { it.startsWith("ProcessRepo") } + assert workflow.trace.tasks().collect { it.name }.any { it.startsWith("RunAlmanack") } + assert workflow.trace.tasks().collect { it.name }.any { it.startsWith("TestExecutor") } } } @@ -25,6 +28,7 @@ nextflow_pipeline { params { sample_sheet = "${projectDir}/tests/fixtures/example-input.csv" output_dir = "test_results" + synapse_agent_id = "LOWYSX3QSQ" } } @@ -32,7 +36,10 @@ nextflow_pipeline { assert workflow.success assert workflow.trace.tasks().size() > 0 assert workflow.trace.succeeded().size() > 0 - assert workflow.trace.failed().size() == 0 + def processCounts = workflow.trace.tasks().collect { it.name.split(" ")[0] }.countBy { it } + assert processCounts["ProcessRepo"] == 2 + assert processCounts["RunAlmanack"] == 2 + assert processCounts["TestExecutor"] == 2 } } @@ -41,12 +48,13 @@ nextflow_pipeline { params { repo_url = "invalid-url" output_dir = "error_test_results" + synapse_agent_id = "LOWYSX3QSQ" } } then { assert !workflow.success - assert workflow.stdout.contains("ERROR ~ Invalid repository URL format. Expected: https://github.com/username/repo.git") + assert workflow.stdout.any { it.contains("Invalid repository URL format") } } } @@ -55,12 +63,13 @@ nextflow_pipeline { params { sample_sheet = "${projectDir}/tests/fixtures/invalid-sample-sheet.csv" output_dir = "error_test_results" + synapse_agent_id = "LOWYSX3QSQ" } } then { assert !workflow.success - assert workflow.stdout.contains("ERROR ~ Sample sheet must contain a 'repo_url' column") + assert workflow.stdout.any { it.contains("Invalid repository URL format") } } } @@ -68,12 +77,13 @@ nextflow_pipeline { when { params { output_dir = "error_test_results" + synapse_agent_id = "LOWYSX3QSQ" } } then { assert !workflow.success - assert workflow.stdout.contains("ERROR ~ Provide either a sample_sheet or repo_url parameter") + assert workflow.stdout.any { it.contains("Provide either a sample_sheet or repo_url parameter") } } } } \ No newline at end of file diff --git a/modules/AIAnalysis.nf b/modules/AIAnalysis.nf new file mode 100644 index 0000000..f757e88 --- /dev/null +++ b/modules/AIAnalysis.nf @@ -0,0 +1,29 @@ +#!/usr/bin/env nextflow + +/** + * Process: AIAnalysis + * + * Uses Synapse agent to analyze JOSS and Almanack results. + * The process: + * 1. Takes the final report JSON as input + * 2. Sends it to the Synapse agent for analysis + * 3. Generates a detailed analysis with improvement suggestions in Markdown format + */ + +process AIAnalysis { + container 'ghcr.io/sage-bionetworks/synapsepythonclient:v4.8.0' + errorStrategy 'ignore' + publishDir "${params.output_dir}", mode: 'copy', pattern: '*.html' + secret 'SYNAPSE_AUTH_TOKEN' + + input: + tuple val(repo_url), val(repo_name), path(almanack_results), path(joss_report) + + output: + tuple val(repo_url), val(repo_name), path("${repo_name}_ai_analysis.html"), emit: ai_analysis + + script: + """ + analyze.py "${repo_name}" "${repo_url}" "${almanack_results}" "${joss_report}" "${params.synapse_agent_id}" + """ +} \ No newline at end of file diff --git a/modules/AnalyzeJOSSCriteria.nf b/modules/AnalyzeJOSSCriteria.nf new file mode 100644 index 0000000..ee38b02 --- /dev/null +++ b/modules/AnalyzeJOSSCriteria.nf @@ -0,0 +1,61 @@ +#!/usr/bin/env nextflow +nextflow.enable.dsl = 2 + +/** + * Process: AnalyzeJOSSCriteria + * + * Analyzes repository against JOSS criteria using Almanack and test results. + * The process: + * 1. Takes Almanack results and test results as input + * 2. Analyzes them against JOSS criteria + * 3. Generates a JSON report with criteria evaluation + */ + +process AnalyzeJOSSCriteria { + tag "${repo_name}" + label 'joss' + container 'python:3.8-slim' + errorStrategy 'ignore' + publishDir "${params.output_dir}", mode: 'copy', pattern: '*.json' + + input: + tuple val(repo_url), val(repo_name), val(repo_dir), val(out_dir), val(status_file), path(almanack_results), path(test_results) + + output: + tuple val(repo_url), val(repo_name), path("joss_report_${repo_name}.json"), emit: joss_report + + script: + """ + #!/bin/bash + set -euxo pipefail + echo "Analyzing JOSS criteria for: ${repo_name}" >&2 + echo "Repository URL: ${repo_url}" >&2 + echo "Repository directory: ${repo_dir}" >&2 + echo "Almanack results file: ${almanack_results}" >&2 + # Create output directory if it doesn't exist + mkdir -p "${out_dir}" + + # Run JOSS analysis script + analyze_joss.py "${repo_name}" "${almanack_results}" "${test_results}" "${repo_dir}" + """ +} + +workflow { + // Define channels for input + repo_data_ch = Channel.fromPath(params.repo_data) + .map { it -> + def data = it.text.split(',') + tuple( + data[0], // repo_url + data[1], // repo_name + file(data[2]), // repo_dir + data[3], // out_dir + file(data[4]), // status_file + file(data[5]), // almanack_results + file(data[6]) // test_results + ) + } + + // Run the analysis process + AnalyzeJOSSCriteria(repo_data_ch) +} \ No newline at end of file diff --git a/modules/GenerateReport.nf b/modules/GenerateReport.nf deleted file mode 100644 index 5606383..0000000 --- a/modules/GenerateReport.nf +++ /dev/null @@ -1,42 +0,0 @@ -#!/usr/bin/env nextflow - -/** - * Process: GenerateReport - * - * Aggregates all status files into a single consolidated CSV report. - * The report includes the following columns: - * - Tool: Repository name - * - CloneRepository: Status of repository cloning - * - CheckReadme: Status of README check - * - CheckDependencies: Status of dependencies check - * - CheckTests: Status of tests check - * - Almanack: Status of Almanack analysis - */ - -process GenerateReport { - publishDir path: "${params.output_dir}", mode: 'copy' - - input: - path status_files - - output: - path "consolidated_report.csv" - - script: - """ - #!/bin/bash - set -euo pipefail - - # Write header with column names - echo "Tool,CloneRepository,CheckReadme,CheckDependencies,CheckTests,Almanack" > consolidated_report.csv - - # Append each status row from all files - for f in ${status_files}; do - if [ -f "\$f" ]; then - cat "\$f" >> consolidated_report.csv - else - echo "Warning: File \$f not found" >&2 - fi - done - """ -} \ No newline at end of file diff --git a/modules/ProcessRepo.nf b/modules/ProcessRepo.nf index cc61084..0b4907d 100644 --- a/modules/ProcessRepo.nf +++ b/modules/ProcessRepo.nf @@ -46,16 +46,29 @@ process ProcessRepo { ############################### # Check Dependencies Step ############################### - if find repo -maxdepth 1 -type f -iname '*requirements*' | grep -q .; then + # Python dependencies + if find repo -maxdepth 1 -type f -iname '*requirements*' | grep -q . || \ + [ -f repo/setup.py ] || [ -f repo/Pipfile ] || [ -f repo/pyproject.toml ]; then DEP_STATUS="PASS" - elif [ -f repo/Pipfile ] || [ -f repo/Pipfile.lock ] || \ - [ -f repo/setup.py ] || [ -f repo/pyproject.toml ] || \ - [ -f repo/package.json ] || [ -f repo/package-lock.json ] || \ - [ -f repo/yarn.lock ] || [ -f repo/pom.xml ] || \ - [ -f repo/build.gradle ] || [ -f repo/settings.gradle ] || \ - [ -f repo/DESCRIPTION ] || [ -f repo/renv.lock ] || \ + # Node.js dependencies + elif [ -f repo/package.json ] || [ -f repo/package-lock.json ] || [ -f repo/yarn.lock ]; then + DEP_STATUS="PASS" + # Java dependencies + elif [ -f repo/pom.xml ] || [ -f repo/build.gradle ] || [ -f repo/settings.gradle ]; then + DEP_STATUS="PASS" + # R dependencies + elif [ -f repo/DESCRIPTION ] || [ -f repo/renv.lock ] || \ ( [ -d repo/packrat ] && [ -f repo/packrat/packrat.lock ] ); then DEP_STATUS="PASS" + # Rust dependencies + elif [ -f repo/Cargo.toml ] || [ -f repo/Cargo.lock ]; then + DEP_STATUS="PASS" + # Ruby dependencies + elif [ -f repo/Gemfile ] || [ -f repo/Gemfile.lock ]; then + DEP_STATUS="PASS" + # Go dependencies + elif [ -f repo/go.mod ] || [ -f repo/go.sum ]; then + DEP_STATUS="PASS" fi ############################### diff --git a/modules/RunAlmanack.nf b/modules/RunAlmanack.nf index ef1c05f..2dd6d18 100644 --- a/modules/RunAlmanack.nf +++ b/modules/RunAlmanack.nf @@ -16,27 +16,27 @@ * - repo_name: Repository name * - repo_dir: Path to cloned repository * - out_dir: Output directory - * - status_repo.txt: Previous status file + * - status_file: Path to previous status file * * Output: Tuple containing: * - repo_url: GitHub repository URL * - repo_name: Repository name + * - repo_dir: Path to cloned repository * - out_dir: Output directory * - status_almanack_.txt: Updated status file with Almanack results + * - almanack_results.json: JSON file with Almanack analysis results */ process RunAlmanack { container 'python:3.11' errorStrategy 'ignore' + publishDir "${params.output_dir}", mode: 'copy', pattern: '*.{json,txt}' input: - // Expects a 5-element tuple: - // (repo_url, repo_name, path(repo_dir), val(out_dir), path("status_repo.txt")) - tuple val(repo_url), val(repo_name), path(repo_dir), val(out_dir), path("status_repo.txt") + tuple val(repo_url), val(repo_name), path(repo_dir), val(out_dir), path(status_file) output: - // Emits a tuple: (repo_url, repo_name, out_dir, file("status_almanack_${repo_name}.txt")) - tuple val(repo_url), val(repo_name), val(out_dir), file("status_almanack_${repo_name}.txt") + tuple val(repo_url), val(repo_name), path(repo_dir), val(out_dir), path("status_almanack_${repo_name}.txt"), path("almanack_results.json") script: """ @@ -63,13 +63,9 @@ process RunAlmanack { fi echo "Extracted GIT_USERNAME: \${GIT_USERNAME}" >&2 - # Define output file name - OUTPUT_FILE="${out_dir}/\${GIT_USERNAME}_${repo_name}_almanack-results.json" - echo "Output file: \${OUTPUT_FILE}" >&2 - # Run Almanack analysis echo "Running Almanack analysis..." >&2 - if python3 -c "import json, almanack; result = almanack.table(repo_path='/tmp/repo'); print(json.dumps(result, indent=2))" > "\${OUTPUT_FILE}"; then + if python3 -c "import json, almanack; result = almanack.table(repo_path='/tmp/repo'); print(json.dumps(result, indent=2))" > almanack_results.json; then ALMANACK_STATUS="PASS" echo "Almanack analysis completed successfully" >&2 else @@ -78,7 +74,7 @@ process RunAlmanack { fi # Append Almanack status to the previous summary - PREV_STATUS=\$(cat status_repo.txt) + PREV_STATUS=\$(cat ${status_file}) echo "\${PREV_STATUS},\${ALMANACK_STATUS}" > "status_almanack_${repo_name}.txt" """ } \ No newline at end of file diff --git a/modules/TestExecutor.nf b/modules/TestExecutor.nf new file mode 100644 index 0000000..2112e80 --- /dev/null +++ b/modules/TestExecutor.nf @@ -0,0 +1,52 @@ +#!/usr/bin/env nextflow +nextflow.enable.dsl = 2 + +/** + * Process: TestExecutor + * + * Executes tests for the repository and generates a detailed report. + * The process: + * 1. Detects the project type and test framework + * 2. Sets up the appropriate environment + * 3. Runs the tests + * 4. Generates a detailed report + * + * Input: Tuple containing: + * - repo_url: GitHub repository URL + * - repo_name: Repository name + * - repo_dir: Repository directory + * - out_dir: Output directory + * - status_file: Status file path + * + * Output: Tuple containing: + * - repo_url: GitHub repository URL + * - repo_name: Repository name + * - test_results: JSON file with test execution results + */ + +process TestExecutor { + container 'python:3.11' // Default container, can be overridden based on project type + errorStrategy 'ignore' + publishDir "${params.output_dir}", mode: 'copy', pattern: '*.json' + + input: + tuple val(repo_url), val(repo_name), path(repo_dir), val(out_dir), path(status_file) + + output: + tuple val(repo_url), val(repo_name), path("test_results_${repo_name}.json") + + script: + """ + #!/bin/bash + set -euo pipefail + + echo "Executing tests for: ${repo_name}" >&2 + echo "Repository URL: ${repo_url}" >&2 + + # Installing test dependencies + python3 -m pip install pytest pytest-cov coverage + + # Run the Python script + run_tests.py "${repo_name}" "${repo_dir}" + """ +} \ No newline at end of file diff --git a/nextflow.config b/nextflow.config index 99b7edc..daa51b6 100644 --- a/nextflow.config +++ b/nextflow.config @@ -2,16 +2,13 @@ params { output_dir = 'results' upload_to_synapse = false synapse_folder_id = null + use_gpt = false } process { withName: ProcessRepo { container = 'bitnami/git:2.44.0' } - - withName: GenerateReport { - container = 'ubuntu:22.04' - } } workDir = 'work' @@ -19,3 +16,8 @@ workDir = 'work' docker { enabled = true } + +executor { + cpus = 4 + memory = '16 GB' +} \ No newline at end of file diff --git a/results/consolidated_report.csv b/results/consolidated_report.csv deleted file mode 100644 index 86ff7ae..0000000 --- a/results/consolidated_report.csv +++ /dev/null @@ -1,3 +0,0 @@ -Tool,CloneRepository,CheckReadme,CheckDependencies,CheckTests,Almanack -TARGet,PASS,FAIL,FAIL,PASS -POT,PASS,PASS,PASS,PASS