-
Notifications
You must be signed in to change notification settings - Fork 2
Optimization Tool Usage Scenarios
Marco Ieni edited this page Aug 31, 2017
·
2 revisions
The DICE Optimisation tool allows for designing public or private cloud clusters to support data intensive applications based on Hadoop, Spark or Stork technologies. Two main analyses can be conducted:
-
Public Cloud Analysis - In this scenario the ARCHITECT wants to analyze the case in which the
whole Big Data cluster is provisioned on a public cloud. The first consequence of this choice is
that the virtualized resources (i.e., VMs) that can be used to provision the cluster can be considered
practically infinite for our purposes. This also means that, under the common assumption that
rejecting a job has a much higher cost than the VM leasing costs, it will not apply any job rejection
policy in this case. Consequently, the concurrency level for each job, i.e., the number of concurrent
users running the job can be set arbitrarily high being always (theoretically)
possible to provision a cluster able to handle the load.
In this scenario, the ARCHITECT may want to know which machine type to select and how many of them in order to execute the application with a certain level of concurrency, meaning considering several similar applications running at the same time in the cluster. She/he might also like to know which cloud provider is cheaper to choose, considering that providers have also different pricing models. For this reason, she/he has to feed the tool with the DTSM diagrams but also with a list of providers, and a list of VM types for each provider. Note that, the information required in the profile of the previous release of D-SPACE4Cloud (see DICE Deliverable D3.8), including, e.g., the number of maps, number of reducers, average map time, average reduce time, etc., are now available within the DTSMs (in particular, every DTSM includes such information for each candidate VM type). -
Private Cloud Analysis - In this case the cluster is provisioned in house. This choice in some
sense changes radically the problem. In fact, the resources usable to constitute a cluster are generally
limited by the hardware available. Therefore, the resource provisioning problem has to
contemplate the possibility to exhaust the computational capacity (memory and CPUs) before being
able to provision a cluster capable of satisfying a certain concurrency level and deadlines. In such a situation the ARCHITECT could consider two sub-scenarios:
- Allowing job rejection, that is consider the possibility to reject a certain number of jobs (lowering consequently the concurrency level), i.e., introducing an Admission Control (AC) mechanism. In this case, since the overall capacity is limited, the system reacts to the excess of load by rejecting jobs; this has an impact on the execution costs as it seems fair to believe that pecuniary penalties can be associate to rejection.
- Denying job rejection, that is imposing that a certain concurrency level must be respected. This translates into a strong constraint for the problem that may not be satisfiable with the resources at hand.
Another aspect not to be underestimated is that data-centers are made of physical machines each with a certain amount of memory and cores. D-SPACE4Cloud considers the linear relaxation of the problem and estimates the total capacity available as the sum of the memory and number of CPUs.
Copyright © 2017 Politecnico di Milano