7-conclusion.tex

\section{Conclusion} \label{sec:conclusion}

In this paper we have presented the \textit{Batch Task Migration} approach for distributed global rescheduling.
It intends to preserve task communication locality, migrating multiple work units from a source to the same destination, in order to balance system load.
This preserves communication efficiency, while other workload-aware strategies perform rescheduling without considering task locality.

Our approach also mitigates communication costs during algorithm execution time.
We guarantee this by transmitting information about multiple migrations at a time, in \textit{batches}.
Thanks to this, our novel scheduler (presented in Section~\ref{sec:algo:main}) has an increased performance in high communication overhead platforms, discussed in Section~\ref{sec:cluster}.

We have evaluated our strategy in two different execution environments. 
The first was a high communication cost, $4$ cores/node cluster, executing over $32$ cores.
In this scenario, \textit{PackDrop} had a rescheduling speedup of up to $3.75$ and $1.15$ when compared to centralized and distributed approaches, respectively (Section~\ref{sec:cluster}).

The second scenario was a highly coupled cluster with low communication overhead, with $24$ cores/node.
We executed our experiments varying platform size from $16$ to $32$ nodes.
In this scenario, rescheduling time of \textit{PackDrop} and \textit{Distributed} were very similar, although both had a time up to $3$ orders of magnitude faster than any centralized approach. %Add some data to reinforce this
This reinforces the relevance of work in the distributed scheduling domain, and approaches such as our \textit{Batch Task Migration}.

\subsection{Future Work}

Future work on this theme includes the use of \textit{Batch Task Migration} in the communication-aware domain.
Since our approach already has locality-based benefits, combining this with communication pattern information may incur on even greater performance increase in applications~\cite{Unat2017localitysurvey,commaware}.
We believe a novel strategy focused on the \textit{Stencil} programming model is something to be considered, prioritizing migration of edges among PEs, instead of random parts of the stencil~\cite{stenciltiling}.

Further work will also be developed in order to increase performance in heterogeneous clusters.
These may have heterogeneous processing capacities and network capabilities, which enhances complexity of load balancing significantly~\cite{Beri2015hetws,Cheriere2015hetdist}.
In this given scenario, enhancing rescheduling decision processes may be crucial to ensure gains in application performance.
Finally, we also intend to evaluate the impact of fine-tuning the $ps$ factor in different computing platforms.