-
Notifications
You must be signed in to change notification settings - Fork 5
mleap.shared.FilesIO
Main interface class for comunicating with files on HDD. Currently the functionality covers mainly interactions with HDF5 database
class mleap.sharedFilesIO(hdf5_filename, mode='a')Parameters:
-
hdf5_filename(String) - Location on HDD to HDF5 file -
mode(String) - Mode for opening HDF5 file. Available options are: “w”, “r”, “r+”, “a”, “w-”).
def check_file_exists(dataset_name, file_type):1. Parameters:
-
dataset_name(string) - Checks whether files exists on HDD -
file_type(string - flag) - Acceptable flags are ML Model or Prediction. Depending on the input the path is appended to dataset_name
2. Return:
Unpickled file if exists, False otherwise
def check_prediction_exists(dataset_name)1. Parameters:
-
dataset_name(string) - Checks if prediction on the test set exists for a dataset
2. Return:
Boolean
def load_predictions_for_dataset(dataset_name)1. Parameters
-
dataset_name(string) - Path to dataset in HDF5
2. Return
Boolean
def save_trained_models_to_disk(trained_models, dataset_name)1. Parameters
-
trained_models(Array) - Saves array of trained models to disk as a pickle. -
dataset_name(String) - name of dataset on which the models were trained.
2. Return
Nothing
def save_predictions_to_db(predictions, dataset_name)1. Parameters
-
predictions(Array) - Array with predictions on test split -
dataset_name(String) - name of dataset on which the predictions were made
2. Return
- Nothing
def save_array_hdf5(group, datasets, array_names, array_meta)1. Parameters
-
group(String) - location to save the data -
datasets(Array) - information to save -
array_names(String) - names of datasets
2. Return
- Nothing
def save_prediction_accuracies_to_db(model_accuracies)1. Parameters
-
model_accuracies(Array) - Array in the form of[Strategy Name, Prediction accuracy]for each learning strategy. The method appends the accuracies per strategy and per dataset in the HDF5
def get_prediction_accuracies_per_strategy():1. Parameters
- None
2. Return
Dictionary with {key: Name of strategy; value: accuacy of strategy per dataset}
def save_ml_strategy_timestamps(timestamps_df, dataset_name)1. Parameters
-
timestamps_df(DataFrame) - Dataframe with strategy name, Begin time of experiment, End time of Experiment
def list_datasets(hdf5_group):1. Parameters
-
hdf5_group(String) - Path to group in HDF5
2. Return
- (String) of the elements in the group
def load_dataset_h5(dataset_name):1. Parameters
-
dataset_name(String) - Location of dataset in hdf5
2. Return
- Array numpy
def load_dataset(dataset_name):1. Parameters
-
dataset_name(String) - full path to dataset in HDF5
2. Returns
- Dataset (DataFrame), Metadata (Dictionary)
def save_datasets(datasets, datasets_save_paths, dts_metadata, verbose = False)1. Parameters
-
datasets(Array of DataFrame objects) - datasets stored in an array as pandas DataFrame objects -
datasets_save_paths(Array, String) - full save paths -
dts_metadata(Dictioary) - min values are: 1. name of dataset; 2. name of column with label
2. Return
- Nothing
def split_dataset(dataset_path, test_size=0.33)1. Parameters
-
dataset_path(String) - full paht to saved dataset in HDF5 -
test_size(Float) - percentage allocated to test set. Default value is 1/3
2. Returns
- DataFrame: X_train, X_test, y_train, y_test