Skip to content

mleap.shared.FilesIO

Viktor Kazakov edited this page Dec 2, 2017 · 43 revisions

Notes

Main interface class for comunicating with files on HDD. Currently the functionality covers mainly interactions wiht HDF5 database

Methods

def check_file_exists(dataset_name, file_type):

1. Parameters:

dataset_name (string) - Checks whether files exists on HDD

file_type (string - flag) - Acceptable flags are ML Model or Prediction. Depending on the input the path is appended to dataset_name

2. Return:

Unpickled file if exists, False otherwise


def check_prediction_exists(dataset_name)

1. Parameters:

dataset_name (string) - Checks if prediction on the test set exists for a dataset

2. Return:

Boolean


def load_predictions_for_dataset(dataset_name)

1. Parameters

dataset_name (string) - Path to dataset in HDF5

2. Return

Boolean


def save_trained_models_to_disk(trained_models, dataset_name)

1. Parameters

trained_models (Array) - Saves array of trained models to disk as a pickle. dataset_name (String) - name of dataset on which the models were trained.

2. Return

Nothing


def save_predictions_to_db(predictions, dataset_name)

1. Parameters

predictions (Array) - Array with predictions on test split

dataset_name (String) - name of dataset on which the predictions were made

2. Return

Nothing


def save_array_hdf5(group, datasets, array_names, array_meta)

1. Parameters

group (String) - location to save the data

datasets (Array) - information to save

array_names (String) - names of datasets

2. Return

Nothing


def save_prediction_accuracies_to_db(model_accuracies)

1. Parameters

model_accuracies (Array) - Array in the form of [Strategy Name, Prediction accuracy] for each learning strategy. The method appends the accuracies per strategy and per dataset in the HDF5


def get_prediction_accuracies_per_strategy():

1. Parameters

None

2. Return

Dictionary with {key: Name of strategy; value: accuacy of strategy per dataset}


def save_ml_strategy_timestamps(timestamps_df, dataset_name)

1. Parameters

timestamps_df (DataFrame) - Dataframe with strategy name, Begin time of experiment, End time of Experiment


def list_datasets(hdf5_group):

1. Parameters

hdf5_group (String) - Path to group in HDF5

2. Return

String of the elements in the group


def load_dataset(dataset_name):

1. Parameters

dataset_name (String) - full path to dataset in HDF5

2. Returns

Dataset (DataFrame), Metadata (Dictionary)


def save_datasets(datasets, datasets_save_paths, dts_metadata, verbose = False)

1. Parameters

datasets (Array of DataFrame objects) - datasets stored in an array as pandas DataFrame objects

datasets_save_paths (Array, String) - full save paths

dts_metadata (Dictioary) - min values are: 1. name of dataset; 2. name of column with label

2. Return

Nothing


def split_dataset(dataset_path, test_size=0.33)

1. Parameters

dataset_path (String) - full paht to saved dataset in HDF5

test_size (Float) - percentage allocated to test set. Default value is 1/3

2. Returns

DataFrame: X_train, X_test, y_train, y_test


def split_and_save(dataset_paths, save_loc, test_size=0.33)

Clone this wiki locally