Skip to content

Add a url for constructing a hub.txt file for the genome-browser #421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

u8sand
Copy link
Contributor

@u8sand u8sand commented Feb 20, 2025

Based on:
https://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#UseOneFile

This page essentially defines a hub.txt file for any file in the c2m2
but it will only work for files that:

  1. have file_format of VCF, bigBED, or bigWig
  2. have a subject attached with the taxonomy 9606 (human)
  3. have a publicly accessible access_url persistent_id with the actual data

We use the genome: hg19 but this may not be accurate,
it's unclear how we'd determine the correct one at this stage.

A hub can thus be assembled by pointing to the url at:
https://cfde.cloud/data/c2m2/file/{id_namespace}/{local_id}/genome-browser/hub.txt

Some caveats of this, if they weren't already obvious:

  • We don't actually have access_urls yet, and the only files which could ever work do not define the right urls in the persistent_id field.
  • even when we do have access_url, we'll possibly have DRS URIs which it doesn't seem the genome browser will be able to handle
  • the current method of choosing the genome leaves much to be desired
    • missing other organisms
    • no guarantee that it's the right one

Because of these caveats we need to think more about how we can get the other information to support this use case. We'll need to:

  • specify the genome in use when providing VCF,bigBed, or bigWig files
  • provide an access url specific to those files

Thoughts: As this is rather specific to these types of files, it would belong in an independent table anyway that simply lists all genome-browser compatible files, it feels to me like it is just a bit out of scope of the current c2m2 and would be more useful as an independent effort to gather "tracks" from DCCs, perhaps as another element of the data matrix.

@u8sand u8sand requested a review from AviMaayan February 20, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant