-
Notifications
You must be signed in to change notification settings - Fork 5
mleap.shared.FilesIO
Main interface class for comunicating with files on HDD. Currently the functionality covers mainly interactions wiht HDF5 database
def check_file_exists(dataset_name, file_type):1. Parameters:
dataset_name (string) - Checks whether files exists on HDD
file_type (string - flag) - Acceptable flags are ML Model or Prediction. Depending on the input the path is appended to dataset_name
2. Return:
Unpickled file if exists, False otherwise
def check_prediction_exists(dataset_name)1. Parameters:
dataset_name (string) - Checks if prediction on the test set exists for a dataset
2. Return:
Boolean
def load_predictions_for_dataset(dataset_name)1. Parameters
dataset_name (string) - Path to dataset in HDF5
2. Return
Boolean
def save_trained_models_to_disk(trained_models, dataset_name)1. Parameters
trained_models (Array) - Saves array of trained models to disk as a pickle.
dataset_name (String) - name of dataset on which the models were trained.
2. Return
Nothing
def save_predictions_to_db(predictions, dataset_name)1. Parameters
predictions (Array) - Array with predictions on test split
dataset_name (String) - name of dataset on which the predictions were made
2. Return
Nothing
def save_array_hdf5(group, datasets, array_names, array_meta)1. Parameters
group (String) - location to save the data
datasets (Array) - information to save
array_names (String) - names of datasets
2. Return
Nothing
def save_prediction_accuracies_to_db(model_accuracies)1. Parameters
model_accuracies (Array) - Array in the form of [Strategy Name, Prediction accuracy] for each learning strategy. The method appends the accuracies per strategy and per dataset in the HDF5
def get_prediction_accuracies_per_strategy():1. Parameters
None
2. Return
Dictionary with {key: Name of strategy; value: accuacy of strategy per dataset}
def save_ml_strategy_timestamps(timestamps_df, dataset_name)1. Parameters
timestamps_df (DataFrame) - Dataframe with strategy name, Begin time of experiment, End time of Experiment
def list_datasets(hdf5_group):1. Parameters
hdf5_group (String) - Path to group in HDF5
2. Return
String of the elements in the group
def load_dataset(dataset_name):1. Parameters
dataset_name (String) - full path to dataset in HDF5
2. Returns
Dataset (DataFrame), Metadata (Dictionary)
def save_datasets(datasets, datasets_save_paths, dts_metadata, verbose = False)1. Parameters
datasets (Array of DataFrame objects) - datasets stored in an array as pandas DataFrame objects
datasets_save_paths (Array, String) - full save paths
dts_metadata (Dictioary) - min values are: 1. name of dataset; 2. name of column with label
2. Return
Nothing
def split_dataset(dataset_path, test_size=0.33)1. Parameters
dataset_path (String) - full paht to saved dataset in HDF5
test_size (Float) - percentage allocated to test set. Default value is 1/3
2. Returns
DataFrame: X_train, X_test, y_train, y_test