Skip to content

v0.1.0

Latest
Compare
Choose a tag to compare
@JoshuaHarris391 JoshuaHarris391 released this 15 Jan 02:18
· 20 commits to main since this release

๐Ÿ“ฆ gen3schemadev Release - Archival Milestone

This release archives the current state of the gen3schemadev repository as it transitions to focus exclusively on data modeling and schema development. Previously, the repository supported tasks such as data generation, upload, synthetic file generation, and API-based data manipulation. These functionalities will no longer be the focus.


๐Ÿ”ง Release Details

๐Ÿ—‚๏ธ Repository Summary

This repository provides tools for automating processes in the Gen3 ecosystem, specifically:

  • ๐Ÿ“˜ Data dictionary creation
  • ๐Ÿ“Š Data simulation
  • โœ… Metadata validation
  • ๐Ÿ“ค Data submission

๐Ÿ”‘ Key Features

1๏ธโƒฃ gen3schemadev: Object-Relational Mapper for Gen3 Schemas

  • Converts spreadsheets into YAML files for building Gen3 Data Dictionaries.
  • Example tool: sheet2yaml.py

2๏ธโƒฃ Workflow for Editing Project Dictionaries

  • Edits made in Google Sheets.
  • YAML schemas generated and validated locally.
  • Simulated data created, validated, and uploaded to Gen3.
  • Indexing services configured to integrate new dictionaries.

3๏ธโƒฃ sheet2yaml-CLI.py: Command-Line Tool

  • Generates schemas from Google Sheets/tabs formatted according to the provided template.

4๏ธโƒฃ Plausible Data Generator

  • Enhances simulated data by replacing random values with plausible ones based on defined distributions.
  • Input: JSON files and a CSV or Google Sheet describing plausible values.
  • Output: Edited JSONs and optional dummy sequencing/lipid files.

Example Usage:

python3 plausible_data_gen.py --path <PATH_TO_SIM_DATA> [--values <PATH_TO_CSV> | --gurl <GOOGLE_SHEET_URL>] --generate-files --file-types aligned_reads 

5๏ธโƒฃ Metadata Validator

  • Validates metadata against defined schemas.
  • Includes a user guide and Jupyter notebook example.

6๏ธโƒฃ Gen3 Data Submitter

  • Automates data submission to Gen3 with detailed usage instructions.

๐Ÿ“Œ Supported Workflows

  • ๐Ÿ› ๏ธ Schema Development: YAML generation from spreadsheets.
  • ๐ŸŽ›๏ธ Data Simulation: Plausible dataset creation and refinement.
  • ๐Ÿ“‘ Metadata Validation: Schema compliance checks.
  • ๐Ÿš€ Data Submission: Automated upload and indexing in Gen3.

๐Ÿ”ฎ Moving Forward

The repository will now focus exclusively on data modeling and schema development. Other functionalities will no longer be maintained or supported.

What's Changed

Full Changelog: https://github.com/AustralianBioCommons/gen3schemadev/commits/v0.1.0