Skip to content

Deal with non-essential files #260

@JaeAeich

Description

@JaeAeich

Title: Proposal: Structured Workflow File Categories in /files Endpoint or support .trsignore

Summary
The current /files endpoint in TRS returns all files associated with a workflow version—including non-essential files like .gitignore, .svg, README assets, and more. This leads to unnecessary bloat when trying to retrieve only the core files required to run the workflow.

It would be helpful to have a stadard way to deal with it, of the top of my head:

  1. A .trsignore mechanism (similar to .gitignore) to exclude irrelevant files.
  2. Structured categorization of files (e.g., core workflow files, test files, and others).
  3. A streamlined way to download categorized ZIP archives directly.

Problem

Calling the /tools/{id}/versions/{version_id}/files endpoint often returns all files from the repository or archive, regardless of their relevance to executing the tool/workflow. For example:

[
  {
    "path": "main.nf",
    "file_type": "PRIMARY_DESCRIPTOR"
  },
  {
    "path": ".gitignore",
    "file_type": "OTHER"
  },
  {
    "path": "assets/logo.svg",
    "file_type": "OTHER"
  },
  {
    "path": "test/test_input.csv",
    "file_type": "TEST_FILE"
  }
]

From a consumer’s perspective (e.g., a WES client or a CLI tool trying to fetch a runnable workflow), these unrelated files introduce confusion and unnecessary data transfer.

Proposed Improvements

1. .trsignore File Support

Allow tool authors to include a .trsignore file in the workflow source (like .gitignore) to explicitly list patterns of files to exclude from the /files endpoint.

Example .trsignore:

.gitignore
assets/
*.svg
docs/

This would give authors control over what gets published as part of the TRS /files endpoint.

2. Categorize Files in the API

Extend the /files endpoint or introduce a new one (e.g., /structured-files) to return files grouped by their usage:

{
  "core": [
    { "path": "main.nf", "file_type": "PRIMARY_DESCRIPTOR" },
    { "path": "modules/align.nf", "file_type": "SECONDARY_DESCRIPTOR" }
  ],
  "tests": [
    { "path": "tests/input.csv", "file_type": "TEST_FILE" }
  ],
  "other": [
    { "path": ".gitignore", "file_type": "OTHER" },
    { "path": "assets/logo.svg", "file_type": "OTHER" }
  ]
}

3. Download Bundles by Category

It would be helpful to support endpoints like:

  • /tools/.../versions/.../files/core.zip
  • /tools/.../versions/.../files/tests.zip
  • /tools/.../versions/.../files/all.zip

This would avoid having to:

  • Call /files,
  • Filter files manually, and
  • Use /tools/.../versions/.../files/{path} N times to fetch them individually.

Benefits

  • Reduces bloat when importing workflows into other services (e.g., WES).
  • Clarifies workflow structure for both humans and tools.
  • Provides a better developer experience for both authors and consumers.
  • Sets a standard that aligns with common practices like .gitignore.

Related Use Case

When ingesting workflows via TRS, I want to download only the minimal required files. Currently, I have to filter manually or rely on heuristics, which is error-prone.

┆Issue is synchronized with this Jira Story
┆Project Name: Zzz-ARCHIVE GA4GH tool-registry-service
┆Issue Number: TRS-72

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions