Skip to content

mleap.shared.FilesIO

ViktorKaz edited this page Dec 3, 2017 · 43 revisions

Notes

Main interface class for comunicating with files on HDD. Currently the functionality covers mainly interactions with HDF5 database

class mleap.sharedFilesIO(hdf5_filename, mode='a')

Parameters:

  • hdf5_filename (String) - Location on HDD to HDF5 file

  • mode (String) - Mode for opening HDF5 file. Available options are: “w”, “r”, “r+”, “a”, “w-”).

Methods

def check_file_exists(dataset_name, file_type):

1. Parameters:

  • dataset_name (string) - Checks whether files exists on HDD

  • file_type (string - flag) - Acceptable flags are ML Model or Prediction. Depending on the input the path is appended to dataset_name

2. Return:

Unpickled file if exists, False otherwise


def check_prediction_exists(dataset_name)

1. Parameters:

  • dataset_name (string) - Checks if prediction on the test set exists for a dataset

2. Return:

  • Boolean

def load_predictions_for_dataset(dataset_name)

1. Parameters

  • dataset_name (string) - Path to dataset in HDF5

2. Return

  • Boolean

def save_trained_models_to_disk(trained_models, dataset_name)

1. Parameters

  • trained_models (Array) - Saves array of trained models to disk as a pickle.

  • dataset_name (String) - name of dataset on which the models were trained.

2. Return

Nothing


def save_predictions_to_db(predictions, dataset_name)

1. Parameters

  • predictions (Array) - Array with predictions on test split

  • dataset_name (String) - name of dataset on which the predictions were made

2. Return

  • Nothing

def save_array_hdf5(group, datasets, array_names, array_meta)

1. Parameters

  • group (String) - location to save the data

  • datasets (Array) - information to save

  • array_names (String) - names of datasets

2. Return

  • Nothing

def save_prediction_accuracies_to_db(model_accuracies)

1. Parameters

  • model_accuracies (Array) - Array in the form of [Strategy Name, Prediction accuracy] for each learning strategy. The method appends the accuracies per strategy and per dataset in the HDF5

def get_prediction_accuracies_per_strategy():

1. Parameters

  • None

2. Return

Dictionary with {key: Name of strategy; value: accuacy of strategy per dataset}


def save_ml_strategy_timestamps(timestamps_df, dataset_name)

1. Parameters

  • timestamps_df (DataFrame) - Dataframe with strategy name, Begin time of experiment, End time of Experiment

def list_datasets(hdf5_group):

1. Parameters

  • hdf5_group (String) - Path to group in HDF5

2. Return

  • (String) of the elements in the group

def load_dataset_h5(dataset_name):

1. Parameters

  • dataset_name (String) - Location of dataset in hdf5

2. Return

  • Array numpy

def load_dataset(dataset_name):

1. Parameters

  • dataset_name (String) - full path to dataset in HDF5

2. Returns

  • Dataset (DataFrame), Metadata (Dictionary)

def save_datasets(datasets, datasets_save_paths, dts_metadata, verbose = False)

1. Parameters

  • datasets (Array of DataFrame objects) - datasets stored in an array as pandas DataFrame objects

  • datasets_save_paths (Array, String) - full save paths

  • dts_metadata (Dictioary) - min values are: 1. name of dataset; 2. name of column with label

2. Return

  • Nothing

def split_dataset(dataset_path, test_size=0.33)

1. Parameters

  • dataset_path (String) - full paht to saved dataset in HDF5

  • test_size (Float) - percentage allocated to test set. Default value is 1/3

2. Returns

  • DataFrame: X_train, X_test, y_train, y_test

Clone this wiki locally