-
Notifications
You must be signed in to change notification settings - Fork 0
feature/documentation #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
docs/README.md:9
- [nitpick] Consider adding a space after the numeral and period (e.g., '0. [README-comparison]') for more consistent and readable list formatting.
0.[README-comparison](README-comparison.md)
docs/README-comparison.md:12
- The citation marker 'citeturn0search0' appears to be an unintended placeholder; consider removing or replacing it with a proper citation if needed.
- **Pointer-Based Storage:** Replaces large files in a Git repo with lightweight text pointers, while storing the actual file contents on a remote server. citeturn0search0
Note to self: @lbeckman314 🔮
|
initial draft Co-authored-by: Copilot <[email protected]>
568c2a2
to
31723c0
Compare
@kellrott Does this capture intent? Comparison of git-gen3 vs Git LFS
|
README-associating-biomedical-entities.md 📄 Associating Files with Biomedical EntitiesOverviewIn genomics, imaging, and clinical research, it is essential to associate data files with key biomedical entities such as:
How to Associate FilesWhen adding a file using git drs add path/to/file.vcf \
--patient-id "Patient-12345" \
--specimen-id "Specimen-67890" \
--assay-id "ServiceRequest-ABCDE"
This command records the tags alongside the file metadata. Example Metadata Entry{
"path": "path/to/file.vcf",
"etag": "a7c1c0...",
"size": 4534678,
"remote": "s3://bucket/path/to/file.vcf",
"patient_id": "Patient-12345",
"specimen_id": "Specimen-67890",
"assay_id": "ServiceRequest-ABCDE"
} Tagging Fields
All fields are optional but highly recommended for structured datasets. Bulk AssociationTo associate large numbers of files efficiently, see: Best Practices
Future Features
✅ SummaryTagging files with biomedical entity identifiers using
|
README-bulk-association.md📄 Bulk Tagging Files with Biomedical IdentifiersOverviewIn large research projects, it is often necessary to associate hundreds or thousands of files with biomedical identifiers such as Patient, Specimen, or Assay ( To streamline this process, This document describes how to prepare, import, and manage bulk file associations. Preparing a Bulk ManifestThe manifest must be a CSV file with the following columns:
Example Manifest
Importing the ManifestUse the git drs import-manifest path/to/manifest.csv This will:
Notes
Example Workflow# 1. Track your files as usual
git drs add path/to/data1.vcf
git drs add path/to/data2.vcf
# 2. Prepare a manifest.csv linking files to biomedical IDs
# 3. Import the manifest
git drs import-manifest manifest.csv
# 4. (Optional) Verify updates
git drs ls --patient-id Patient-12345 Future Enhancements (Planned)
|
| Function | Description | | ||
|-----------------------------|-------------| | ||
| `fetch_github_teams()` | Get org teams, members, and slugs | | ||
| `map_to_gen3_roles()` | Transform GitHub teams → Gen3 roles | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by this. Are you implying that all data repos in github would be under one organization and permissions per user would be done in that way in github and then synched to gen3 equivalent terms ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given typical git roles:
- Read: Recommended for non-code contributors who want to view or discuss your project
- Triage: Recommended for contributors who need to proactively manage issues, discussions, and pull requests without write access
- Write: Recommended for contributors who actively push to your project
- Maintain: Recommended for project managers who need to manage the repository without access to sensitive or destructive actions
- Admin: Recommended for people who need full access to the project, including sensitive and destructive actions like managing security or deleting a repository
Mapping:
- Read, Triage, Maintain: Mapped to gen3 read-only access
- Admin, Write: Mapped to gen3 submitter, sower access
|
||
```yaml | ||
projects: | ||
project-xyz: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes it seem like each gen3 project is a github organization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exact mapping is TBD, but point taken: does github.organization map to gen3.program 🤔
| | | | ||
+------------+---------------+------------------------------+ | ||
| | ||
v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this without much background knowledge what is unclear to me what the RoleSourceAdapter Interface aims to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level GitLab, Bitbucket, Git Enterprise have different APIs. The RoleSourceAdapter would adapt to standard
|
||
### 📦 Track Remote File | ||
```bash | ||
lfs-meta track-remote s3://my-bucket/data/foo.vcf \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would look to simplify this further if possible. I understand that path is a subset of the full bucket path s3://my-bucket/data/foo.vcf but why is it needed?
|
||
### 🧬 Generate FHIR Metadata | ||
```bash | ||
lfs-meta init-meta \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like an cool pattern. Guessing it would make our existing push metadata from META directory pattern backwards compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes at the end of the day, I don't anticipate many (any?) changes to publish
exit 0 | ||
fi | ||
|
||
lfs-meta validate --file .lfs-meta/metadata.json || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed it what is the validate command doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as g3t meta validate
is current metadata complete?
@@ -0,0 +1,301 @@ | |||
# Overview `git-sync` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guessing this is another server micro-service running in a pod server side. I understand there would be a some sort of sync operation with github style inputs and expecting to reflect it in gen3, I'd be curious to see a simplified openAPi spec on what this micro-service would exactly look like.
This PR:
git-sync
(similar to synapse-sync used for bridge2ai)