Skip to content

UoMResearchIT/UMFRAD-GUI

Repository files navigation

GUI for Enhancing Data Accessibility and Security through Innovative Data Synthesis (EDASIDA) project

Requirements

Python 3.7-3.11
Git
A Command Line Interface (CLI)

On windows, download and install Git from https://git-scm.com/downloads/win.

GUI Development Setup

1. Clone the repo

To clone the repository, click <> Code above the list of files on the GitHub page of this repo, and copy the URL under your choice of clone method (HTTPS / SSH Key / GitHub CLI).

In the CLI (Git Bash / Terminal):
git clone --recursive <URL>

Change into the repo directory for the following steps.
cd UMFRAD-GUI

If the repository was not cloned recursively (i.e. the UMFRAD folder is empty), use the following command to fetch the submodule.
git submodule update --init --recursive

2. Set up the development environment

Install the correct version of Python

The initial development was done using Python 3.11.10. If the version of Python 3 on your device is not between 3.7 and 3.11, a compatible version of Python must be installed and any commands starting with python3 in the following steps must be replaced with the exact python version (e.g. python3.11 instead of python3).

Linux / WSL

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11
sudo apt install python3.11-venv

Windows

Download from python.org or the Microsoft Store.

macOS

Install the XCode Command Line Tools: xcode-select --install This currently installs Python 3.9.6, which is sufficient for this GUI.

Install packages

The packages will be installed in a virtual environment.

Linux / WSL

python3 -m venv <virtual_environment_name>
source <virtual_environment_name>/bin/activate
python -m pip install -r requirements.core.txt
python -m pip install -r requirements.dev.txt

Some additional packages may be required for Linux distributions to run the gui. To install the packages, run the following command:

sudo apt install libxkbcommon-x11-0 ffmpeg libsm6 libxext6 libxcb-cursor0

Windows

python3 -m venv <virtual_environment_name>
<virtual_environment_name>\Scripts\activate
python -m pip install -r requirements.core.txt
python -m pip install -r requirements.dev.txt

macOS

python3 -m venv <virtual_environment_name>
source <virtual_environment_name>/bin/activate
python -m pip install -r requirements.core.txt
python -m pip install -r requirements.dev.txt

3. Run the GUI

python main.py

4. Run the tests

You can run the test suite using either of the following commands. The first command runs the tests and shows the detailed output, while the second command also tracks which lines of the code are executed during testing and generates a .coverage file.

  • Using pytest: pytest --pyargs gui -v
  • Using pytest with coverage report: coverage run -m pytest --pyargs gui -v

To check the generated coverage report:
coverage report -i

During Development

Pulling upstream changes from the project remote

Incorporate changes from a remote repository into the current branch:

git pull

The command above recursively fetches submodule changes, but does not update the submodules. If there are submodule changes, run the following command:

git submodule update

Changing the commit that the submodule points to

To pull changes from a branch (e.g. development) in the submodule's remote repository, run the following commands:

cd UMFRAD
git pull origin <branch>

OR

git submodule foreach git pull origin <branch>

Application Packaging Instructions

Create a GitHub Release

Navigate to the Releases page of this repo and click the "Draft a new release" button. Create a new tag (e.g. v1.0.0), select a target branch/commit and click the "Publish release" button to create a new release.

This action triggers a GitHub Actions workflow to build the distributable.

The distributable, EDASIDAGUI.zip, can be found in the Assets of the created release when the workflow run ends.

Run the build-executable action manually (alternative method)

From the GitHub repository, navigate to Actions → build-executable. Click the "Run workflow" dropdown, select the target branch, and click the "Run workflow" button to start the workflow run.

The distributable and installer, EDASIDAGUI and EDASIDAGUISetup, can be found in the Artifacts section of the workflow run summary.

GitHub Docs: Downloading workflow artifacts

Enhancing Data Accessibility and Security through Innovative Data Synthesis (EDASIDA)

Project background and aims

The project aims to develop a machine learning model that can generate synthetic data sets from summary statistics of restricted access data sets. The synthetic data will be used for teaching and testing and can be safely shared with students and researchers.

Usage

The code takes an input of the descriptors of the restricted data set - these may be regression coefficients, probabilities of values, or other summary statistics. The code then generates synthetic data sets and scores them based on how well they match the descriptors original data set, giving each a mean square error (MSE). The best data sets are used as the parents, which are then mutated and crossed over to produce the next generation. This process is repeated until the required number of generations.

How to install

To install the project, you need to clone the repository and install the dependencies. If you are running multiple Python projects it is good to keep your envirnments separate by creating a virtual environment:

python3 -m venv venv # Create a virtual environment
source venv/bin/activate # Activate the virtual environment

The prompt will then show (venv) to confirm that the environment is active. To install the dependencies, run the following command:

pip install -r requirements.txt

The Jupyter notebooks can then be run using the kernel venv (Python 3) which is the virtual environment.

How to run

The Jupyter notebooks contain examples of how to run the code with different use cases and descriptions of the outputs.

The inputs are the summary statistics of the restricted access data sets. This can be regression coefficiants, probabilities of values, or other summary statistics.

The results of the code are the synthetic data sets, and the scores of the best data sets. The scores are based on how well the synthetic data sets match the summary statistics of the original data set.

Full details of the inputs and outputs can be found in the documentation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •