-
Notifications
You must be signed in to change notification settings - Fork 2
Machine Learning Model Input Files
This page discusses examples of the ML model file, which can be provided with the premium version of D-SPACE4Cloud and, usually, allows to speed up significantly the optimization time providing a promising initial solution to the Hill Climbing algorithm implemented. Such a JSON object is meant to provide all the parameters relevant for the application of a support vector regression (SVR) model.
The listing shown below contains a dictionary that associates queries, here Q1 (see DICE Deliverable D3.9), to the respective ML profiles. These, in turn, contain all the parameters needed to apply a linear SVR model: the mean (mu) and standard deviation (sigma) used for data normalization, for all the features and for the response time to provide as output, in addition to the constant term (b) and the coefficients (w) of the regression line.
{
"Q1":{
"b":-0.0054638,
"mu_t":432230.122,
"sigma_t":170134.948,
"mlFeatures":{
"avgTask_S0":{
"w":0.012694,
"mu":1632.769,
"sigma":159.488
},
"avgTask_S1":{
"w":-0.016927,
"mu":1527.671,
"sigma":206.061
},
"x":{
"w":0.71995,
"mu":0.042528,
"sigma":0.021299
},
"h":{
"w":0.12854,
"mu":2511.447,
"sigma":409.724
}
}
}
}
Recall that the goal of SVR is to fit a regression line of the form:

so that most data points lie in a stripe centered around it, minimizing both the line coefficients and the remaining points distance from the stripe. Here t is the response time to predict, whilst ζ is the vector containing all the features described in the mlFeatures dictionary. In this example, both w and ζ have four elements.
Moreover, due to numerical analysis considerations, it is advisable to normalize both the features and the predicted variable when using SVR. D-SPACE4Cloud obtains normalized features via Z scores:

whence the need for means and standard deviations in this JSON.
Since the main purpose of the optimizer is to determine the optimal concurrency and resource allocation, the dictionary mlFeatures must at least provide the features h, which represents the contribution given by concurrency, and x, stating the influence of additional CPU cores on the execution time.
Copyright © 2017 Politecnico di Milano