mleap.shared.FilesIO

Notes

Main interface class for comunicating with files on HDD. Currently the functionality covers mainly interactions with HDF5 database

class mleap.sharedFilesIO(hdf5_filename, mode='a')

Parameters:

hdf5_filename (String) - Location on HDD to HDF5 file
mode (String) - Mode for opening HDF5 file. Available options are: “w”, “r”, “r+”, “a”, “w-”).

Methods

def check_file_exists(dataset_name, file_type):

1. Parameters:

dataset_name (string) - Checks whether files exists on HDD
file_type (string - flag) - Acceptable flags are ML Model or Prediction. Depending on the input the path is appended to dataset_name

2. Return:

Unpickled file if exists, False otherwise

def check_prediction_exists(dataset_name)

1. Parameters:

dataset_name (string) - Checks if prediction on the test set exists for a dataset

2. Return:

Boolean

def load_predictions_for_dataset(dataset_name)

1. Parameters

dataset_name (string) - Path to dataset in HDF5

2. Return

Boolean

def save_trained_models_to_disk(trained_models, dataset_name)

1. Parameters

trained_models (Array) - Saves array of trained models to disk as a pickle.
dataset_name (String) - name of dataset on which the models were trained.

2. Return

Nothing

def save_predictions_to_db(predictions, dataset_name)

1. Parameters

predictions (Array) - Array with predictions on test split
dataset_name (String) - name of dataset on which the predictions were made

2. Return

Nothing

def save_array_hdf5(group, datasets, array_names, array_meta)

1. Parameters

group (String) - location to save the data
datasets (Array) - information to save
array_names (String) - names of datasets

2. Return

Nothing

def save_prediction_accuracies_to_db(model_accuracies)

1. Parameters

model_accuracies (Array) - Array in the form of [Strategy Name, Prediction accuracy] for each learning strategy. The method appends the accuracies per strategy and per dataset in the HDF5

def get_prediction_accuracies_per_strategy():

1. Parameters

None

2. Return

Dictionary with {key: Name of strategy; value: accuacy of strategy per dataset}

def save_ml_strategy_timestamps(timestamps_df, dataset_name)

1. Parameters

timestamps_df (DataFrame) - Dataframe with strategy name, Begin time of experiment, End time of Experiment

def list_datasets(hdf5_group):

1. Parameters

hdf5_group (String) - Path to group in HDF5

2. Return

(String) of the elements in the group

def load_dataset_h5(dataset_name):

1. Parameters

dataset_name (String) - Location of dataset in hdf5

2. Return

Array numpy

def load_dataset(dataset_name):

1. Parameters

dataset_name (String) - full path to dataset in HDF5

2. Returns

Dataset (DataFrame), Metadata (Dictionary)

def save_datasets(datasets, datasets_save_paths, dts_metadata, verbose = False)

1. Parameters

datasets (Array of DataFrame objects) - datasets stored in an array as pandas DataFrame objects
datasets_save_paths (Array, String) - full save paths
dts_metadata (Dictioary) - min values are: 1. name of dataset; 2. name of column with label

2. Return

Nothing

def split_dataset(dataset_path, test_size=0.33)

1. Parameters

dataset_path (String) - full paht to saved dataset in HDF5
test_size (Float) - percentage allocated to test set. Default value is 1/3

2. Returns

DataFrame: X_train, X_test, y_train, y_test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mleap.shared.FilesIO

Notes

Methods

Uh oh!

Uh oh!

Clone this wiki locally