-
Notifications
You must be signed in to change notification settings - Fork 8
HotI draft
scottatchley edited this page Apr 28, 2011
·
18 revisions
%% This is a skeleton file demonstrating the use of IEEEtran.cls
%% (requires IEEEtran.cls version 1.7 or later) with an IEEE conference paper.
\documentclass[conference]{IEEEtran}
% If IEEEtran.cls has not been installed into the LaTeX system files,
% manually specify the path to it like:
% \documentclass[conference]{../sty/IEEEtran}
% Some very useful LaTeX packages include:
% (uncomment the ones you want to load)
\usepackage{cite}
% cite.sty was written by Donald Arseneau
% V1.6 and later of IEEEtran pre-defines the format of the cite.sty package
% \cite{} output to follow that of IEEE. Loading the cite package will
% result in citation numbers being automatically sorted and properly
% "compressed/ranged". e.g., [1], [9], [2], [7], [5], [6] without using
% cite.sty will become [1], [2], [5]--[7], [9] using cite.sty. cite.sty's
% \cite will automatically add leading space, if needed. Use cite.sty's
% noadjust option (cite.sty V3.8 and later) if you want to turn this off.
% cite.sty is already installed on most LaTeX systems. Be sure and use
% version 4.0 (2003-05-27) and later if using hyperref.sty. cite.sty does
% not currently provide for hyperlinked citations.
% The latest version can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/cite/
% The documentation is contained in the cite.sty file itself.
\usepackage{url}
% url.sty was written by Donald Arseneau. It provides better support for
% handling and breaking URLs. url.sty is already installed on most LaTeX
% systems. The latest version can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/misc/
% Read the url.sty source comments for usage information. Basically,
% \url{my_url_here}.
% *** Do not adjust lengths that control margins, column widths, etc. ***
% *** Do not use packages that alter fonts (such as pslatex). ***
% There should be no need to do such things with IEEEtran.cls V1.6 and later.
% (Unless specifically asked to do so by the journal or conference you plan
% to submit to, of course. )
% correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor}
\begin{document}
%
% paper title
% can use linebreaks \ within to get better formatting as desired
\title{CCI: Common Communication Interface}
% for over three affiliations, or if they all won't fit within the width
% of the page, use this alternative format:
%
\author{\IEEEauthorblockN{Scott Atchley\IEEEauthorrefmark{1},
David Dillow\IEEEauthorrefmark{1},
Galen Shipman\IEEEauthorrefmark{1},
Patrick Geoffray\IEEEauthorrefmark{2} and
Jeff Squyers\IEEEauthorrefmark{3}}
\IEEEauthorblockA{\IEEEauthorrefmark{1}Oak Ridge National Laboratory, Oak Ridge, TN}
\IEEEauthorblockA{\IEEEauthorrefmark{2}Myricom, Inc., Arcadia, CA}
\IEEEauthorblockA{\IEEEauthorrefmark{3}Cisco Systems, Inc., San Jose, CA}}
% make the title area
\maketitle
\begin{abstract}
%\boldmath
The abstract goes here.
\end{abstract}
% IEEEtran.cls defaults to using nonbold math in the Abstract.
% This preserves the distinction between vectors and scalars. However,
% if the conference you are submitting to favors bold math in the abstract,
% then you can use LaTeX's standard command \boldmath at the very start
% of the abstract to achieve this. Many IEEE journals/conferences frown on
% math in the abstract anyway.
% no keywords
% For peer review papers, you can put extra information on the cover
% page as needed:
% \ifCLASSOPTIONpeerreview
% \begin{center} \bfseries EDICS Category: 3-BBND \end{center}
% \fi
%
% For peerreview papers, this IEEEtran command inserts a page break and
% creates the second title. It will be ignored for other modes.
\IEEEpeerreviewmaketitle
\section{Introduction}
% no \IEEEPARstart
We will talk about the various communication interfaces including sockets, the *M (AM, FM,
GM, PM, etc), Gamma, Portals, MX, and VIA/Verbs.
\section{Basic Elements of Communication}
For two endpoints to communicate, they must agree on some basic elements including whether
to acknowledge receipt (reliability), whether the receiver must enforce in-order delivery
even if messages arrive out-of-order, whether direct placement of data is possible (remote
memory access or RMA), and how to handle small, unexpected messages.
\subsection{Reliability}
When establishing communications, the two must agree on whether they will acknowledge
receipt (or \emph{ACK}) of each other's messages and retransmit messages that are lost.
Without ACKs, the sender never knows if the message arrived successfully at the receiver.
Providing ACKs, however, imposes costs in complexity and performance. The complexity is
due to the need to manage state about which messages have been sent but not yet ACKed. The
performance impact is due to the need to service the ACKs as well as the additional
network traffic consumed by them and the retransmitted messages.
Not all applications need guaranteed delivery such as video streaming since new data will
immediately replace the missing data or financial trading software which is most
interested in the most recent bid.
\subsection{Order}
In addition to reliability, two endpoints must agree on whether messages that arrive
out-of-order may be delivered out-of-order or if they must be delivered in-order. In-orer
delivery is not an issue on simple networks (e.g. a rack of machines with a single
interface each connected to a single switch) or networks that enfore strict ordering on
the wire (e.g. Infiniband).
More complex networks, however, are becoming more commmon. Examples include hosts with
multiple interfaces connected to multiple switches for fault-tolerance as well as
high-radix switches that provide multiple paths between each pair of endpoints. These more
complex networks trade in-order delivery for higher-bandwidth and/or congestion avoidance.
If an application needs in-order delivery, the communications layer must buffer messages
until all preceding messages have arrived. If an earlier message is lost, the network must
wait for retransmission of the lost message to arrive.
\subsection{Remote Memory Access}
To achieve maximum performance, some network APIs \cite{GM, MX, Verbs} provide remote
memory access (RMA) to enable direct placement of incoming data in the application buffer
to avoid copying as well as the kernel stack. RMA typically requires memory registration
(i.e. pinning) to prevent the pages from swapping out during the transfer. It also
requires previous communication between the endpoint to exchange RMA handles so that the
read or write has both source and destination handles.
\subsection{Small, Unexpected Messages}
As mentioned above, RMA transfers require the previous exchange of the RMA handles. This
leads to a recursive loop if the exchange requires RMA as well. Many APIs, instead,
support the exchange of small, unexpected messages \cite{AM, MX} that do not require RMA
while others treat all data as a series of smaller messages (e.g. SOCK\_STREAM).
\section{CCI API}
We have three goals for the API:
\begin{itemize}
\item Simplicity: small API (easier than Verbs, simple as Sockets)
\item Portability: support multiple different underlying transports
\item Performance: faster than TCP sockets, scalable, fault-tolerant,
relaxed network constraints
\end{itemize}
\subsection{Initialization of Devices}
CCI can support multiple drivers (e.g. sock, Portals, Verbs, or hardware native)
concurrently, each of which may have multiple devices. A device structure includes name
and info strings, a NULL-terminated array of key=value pairs, a maximum send size, a data
rate in bits per second, and a PCI struct to use when registering memory.
Before calling any other CCI function, the application must call the init function:
\small
\begin{verbatim}
int cci_init(uint32_t abi_ver,
uint32_t flags,
uint32_t *caps);
\end{verbatim}
\normalsize
After init, the application will want to get the array of available devices:
\small
\begin{verbatim}
int cci_get_devices(cci_device_t const
*** const devices);
\end{verbatim}
\normalsize
The user or system administrator provides a config file with a device name and driver and
optional arguments that are driver specific (e.g. ip=10.0.120.5). The arguments are
key=value pairs and are returned in the the device struct. A device specification may look
like:
\small
\begin{verbatim}
[eth1]
driver = sock
ip = 10.0.120.5
mtu = 9000
\end{verbatim}
\normalsize
In the above example, CCI will create a device called \emph{eth1} which uses the sock
driver, has an IP address of 10.0.120.5, and uses a MTU of 9000.
\subsection{Endpoints}
Subsection text here.
\subsection{Connections}
Subsection text here.
\subsubsection{Connection Types}
Talk about UU, RU, and RO in order of least guarantees to most.
\subsubsection{Connection Handshake}
Briefly discuss the connection handshake between client and server.
\subsection{Data Transfer}
Subsection text here.
\section{Suitability of CCI Interface}
Building other APIs on top of CCI primitives:
\subsection{Sockets}
TCP and UDP
\subsection{Active Messages}
AM
\subsection{Portals}
Portals
\subsection{MyrinetExpress (MX)}
MX
\section{Overhead of CCI over Portals}
Review performance of CCI over Portals with native Portals for pingpong, stream, and
naive alltoall.
\section{Conclusion}
The conclusion goes here.
% use section* for acknowledgement
\section*{Acknowledgment}
The authors would like to thank...
% trigger a \newpage just before the given reference
% number - used to balance the columns on the last page
% adjust value as needed - may need to be readjusted if
% the document is modified later
%\IEEEtriggeratref{8}
% The "triggered" command can be changed if desired:
%\IEEEtriggercmd{\enlargethispage{-5in}}
% references section
% can use a bibliography generated by BibTeX as a .bbl file
% BibTeX documentation can be easily obtained at:
% http://www.ctan.org/tex-archive/biblio/bibtex/contrib/doc/
% The IEEEtran BibTeX style support page is at:
% http://www.michaelshell.org/tex/ieeetran/bibtex/
%\bibliographystyle{IEEEtran}
% argument is your BibTeX string definitions and bibliography database(s)
%\bibliography{IEEEabrv,../bib/paper}
%%% This is a skeleton file demonstrating the use of IEEEtran.cls
%% (requires IEEEtran.cls version 1.7 or later) with an IEEE conference paper.
\documentclass[conference]{IEEEtran}
% If IEEEtran.cls has not been installed into the LaTeX system files,
% manually specify the path to it like:
% \documentclass[conference]{../sty/IEEEtran}
% Some very useful LaTeX packages include:
% (uncomment the ones you want to load)
\usepackage{cite}
% cite.sty was written by Donald Arseneau
% V1.6 and later of IEEEtran pre-defines the format of the cite.sty package
% \cite{} output to follow that of IEEE. Loading the cite package will
% result in citation numbers being automatically sorted and properly
% "compressed/ranged". e.g., [1], [9], [2], [7], [5], [6] without using
% cite.sty will become [1], [2], [5]--[7], [9] using cite.sty. cite.sty's
% \cite will automatically add leading space, if needed. Use cite.sty's
% noadjust option (cite.sty V3.8 and later) if you want to turn this off.
% cite.sty is already installed on most LaTeX systems. Be sure and use
% version 4.0 (2003-05-27) and later if using hyperref.sty. cite.sty does
% not currently provide for hyperlinked citations.
% The latest version can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/cite/
% The documentation is contained in the cite.sty file itself.
\usepackage{url}
% url.sty was written by Donald Arseneau. It provides better support for
% handling and breaking URLs. url.sty is already installed on most LaTeX
% systems. The latest version can be obtained at:
% http://www.ctan.org/tex-archive/macros/latex/contrib/misc/
% Read the url.sty source comments for usage information. Basically,
% \url{my_url_here}.
% *** Do not adjust lengths that control margins, column widths, etc. ***
% *** Do not use packages that alter fonts (such as pslatex). ***
% There should be no need to do such things with IEEEtran.cls V1.6 and later.
% (Unless specifically asked to do so by the journal or conference you plan
% to submit to, of course. )
% correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor}
\begin{document}
%
\title{CCI: Common Communication Interface}
%
\author{\IEEEauthorblockN{Scott Atchley\IEEEauthorrefmark{1},
David Dillow\IEEEauthorrefmark{1},
Galen Shipman\IEEEauthorrefmark{1},
Patrick Geoffray\IEEEauthorrefmark{2} and
Jeff Squyers\IEEEauthorrefmark{3}}
\IEEEauthorblockA{\IEEEauthorrefmark{1}Oak Ridge National Laboratory, Oak Ridge, TN}
\IEEEauthorblockA{\IEEEauthorrefmark{2}Myricom, Inc., Arcadia, CA}
\IEEEauthorblockA{\IEEEauthorrefmark{3}Cisco Systems, Inc., San Jose, CA}}
% make the title area
\maketitle
\begin{abstract}
%\boldmath
The abstract goes here.
\end{abstract}
% IEEEtran.cls defaults to using nonbold math in the Abstract.
% This preserves the distinction between vectors and scalars. However,
% if the conference you are submitting to favors bold math in the abstract,
% then you can use LaTeX's standard command \boldmath at the very start
% of the abstract to achieve this. Many IEEE journals/conferences frown on
% math in the abstract anyway.
% no keywords
% For peer review papers, you can put extra information on the cover
% page as needed:
% \ifCLASSOPTIONpeerreview
% \begin{center} \bfseries EDICS Category: 3-BBND \end{center}
% \fi
%
% For peerreview papers, this IEEEtran command inserts a page break and
% creates the second title. It will be ignored for other modes.
\IEEEpeerreviewmaketitle
\section{Introduction}
% no \IEEEPARstart
We will talk about the various communication interfaces including sockets, the *M (AM, FM,
GM, PM, etc), Gamma, Portals, MX, and VIA/Verbs.
\section{Basic Elements of Communication}
For two endpoints to communicate, they must agree on some basic elements including whether
to acknowledge receipt (reliability), whether the receiver must enforce in-order delivery
even if messages arrive out-of-order, whether direct placement of data is possible (remote
memory access or RMA), and how to handle small, unexpected messages.
\subsection{Reliability}
When establishing communications, the two must agree on whether they will acknowledge
receipt (or \emph{ACK}) of each other's messages and retransmit messages that are lost.
Without ACKs, the sender never knows if the message arrived successfully at the receiver.
Providing ACKs, however, imposes costs in complexity and performance. The complexity is
due to the need to manage state about which messages have been sent but not yet ACKed. The
performance impact is due to the need to service the ACKs as well as the additional
network traffic consumed by them and the retransmitted messages.
Not all applications need guaranteed delivery such as video streaming since new data will
immediately replace the missing data or financial trading software which is most
interested in the most recent bid.
\subsection{Order}
In addition to reliability, two endpoints must agree on whether messages that arrive
out-of-order may be delivered out-of-order or if they must be delivered in-order. In-orer
delivery is not an issue on simple networks (e.g. a rack of machines with a single
interface each connected to a single switch) or networks that enfore strict ordering on
the wire (e.g. Infiniband).
More complex networks, however, are becoming more commmon. Examples include hosts with
multiple interfaces connected to multiple switches for fault-tolerance as well as
high-radix switches that provide multiple paths between each pair of endpoints. These more
complex networks trade in-order delivery for higher-bandwidth and/or congestion avoidance.
If an application needs in-order delivery, the communications layer must buffer messages
until all preceding messages have arrived. If an earlier message is lost, the network must
wait for retransmission of the lost message to arrive.
\subsection{Remote Memory Access}
To achieve maximum performance, some network APIs \cite{GM, MX, Verbs} provide remote
memory access (RMA) to enable direct placement of incoming data in the application buffer
to avoid copying as well as the kernel stack. RMA typically requires memory registration
(i.e. pinning) to prevent the pages from swapping out during the transfer. It also
requires previous communication between the endpoint to exchange RMA handles so that the
read or write has both source and destination handles.
\subsection{Small, Unexpected Messages}
As mentioned above, RMA transfers require the previous exchange of the RMA handles. This
leads to a recursive loop if the exchange requires RMA as well. Many APIs, instead,
support the exchange of small, unexpected messages \cite{AM, MX} that do not require RMA
while others treat all data as a series of smaller messages (e.g. SOCK\_STREAM).
\section{CCI API}
We have three goals for the API:
\begin{itemize}
\item Simplicity: small API (easier than Verbs, simple as Sockets)
\item Portability: support multiple different underlying transports
\item Performance: faster than TCP sockets, scalable, fault-tolerant,
relaxed network constraints
\end{itemize}
\subsection{Initialization of Devices}
CCI can support multiple drivers (e.g. sock, Portals, Verbs, or hardware native)
concurrently, each of which may have multiple devices. A device structure includes name
and info strings, a NULL-terminated array of key=value pairs, a maximum send size, a data
rate in bits per second, and a PCI struct to use when registering memory.
Before calling any other CCI function, the application must call the init function:
\small
\begin{verbatim}
int cci_init(uint32_t abi_ver,
uint32_t flags,
uint32_t *caps);
\end{verbatim}
\normalsize
After init, the application will want to get the array of available devices:
\small
\begin{verbatim}
int cci_get_devices(cci_device_t const
*** const devices);
\end{verbatim}
\normalsize
The user or system administrator provides a config file with a device name and driver and
optional arguments that are driver specific (e.g. ip=10.0.120.5). The arguments are
key=value pairs and are returned in the the device struct. A device specification may look
like:
\small
\begin{verbatim}
[eth1]
driver = sock
ip = 10.0.120.5
mtu = 9000
\end{verbatim}
\normalsize
In the above example, CCI will create a device called \emph{eth1} which uses the sock
driver, has an IP address of 10.0.120.5, and uses a MTU of 9000.
Users can also add the \emph{default=1} pair to designate a device as the default and each
device may have the \emph{priority=N} pair set where N is between 0 and 100. If not set,
the device is assigned a value of 50.
When shutting down all communication, the application must call
\small\verb,cci_free_devices(),\normalsize. There is not a finish call.
\subsection{Endpoints}
CCI considers an endpoint to be a set of resources associated with a single NUMA locality.
These resources include buffers and an event completion queue. Endpoints are "thread safe"
by default; multiple threads can call functions on an endpoint simultaneously and
it is \emph{safe}. No guarantees are made about serialization or concurrency.
The endpoint struct only contains an unsigned 32-bit
\small\verb,max_recv_buffer_count,\normalsize element. This is the maximum number of
receive buffers for this endpoint that can be loaned to the application. When this number
of buffers have been loaned to the application, incoming messages may be dropped.
The open and endpoint, the application calls:
\small
\begin{verbatim}
int cci_create_endpoint(cci_device_t *device,
int flags,
cci_endpoint_t **endpoint,
cci_os_handle_t *fd);
\end{verbatim}
\normalsize
where the device is the device overwhich it will communicate. Flags is currently unused.
The endpoint and an OS-specific handle are OUT parameters. The OS handle can be passed to
\small\verb,select(),\normalsize, \small\verb,poll(),\normalsize, etc. to check if events
are ready and optionally to block while doing so.
To close an endpoint, the application uses \small\verb,cci_destroy_endpoint(),\normalsize.
\subsection{Connections}
Because different applications need different guarantees from the communication layer, CCI
has an explicit client/server connection scheme similar to sockets.
\begin{itemize}
\item Unreliable, Unordered (UU)
\item Reliable, Unordered (RU)
\item Reliable, Ordered (RO)
\end{itemize}
\subsubsection{Connection Types}
Talk about UU, RU, and RO in order of least guarantees to most.
\subsubsection{Connection Handshake}
Briefly discuss the connection handshake between client and server.
\subsection{Data Transfer}
Subsection text here.
\section{Suitability of CCI Interface}
Building other APIs on top of CCI primitives:
\subsection{Sockets}
TCP and UDP
\subsection{Active Messages}
AM
\subsection{Portals}
Portals
\subsection{MyrinetExpress (MX)}
MX
\section{Overhead of CCI over Portals}
Review performance of CCI over Portals with native Portals for pingpong, stream, and
naive alltoall.
\section{Conclusion}
The conclusion goes here.
% use section* for acknowledgement
\section*{Acknowledgment}
The authors would like to thank...
% trigger a \newpage just before the given reference
% number - used to balance the columns on the last page
% adjust value as needed - may need to be readjusted if
% the document is modified later
%\IEEEtriggeratref{8}
% The "triggered" command can be changed if desired:
%\IEEEtriggercmd{\enlargethispage{-5in}}
% references section
% can use a bibliography generated by BibTeX as a .bbl file
% BibTeX documentation can be easily obtained at:
% http://www.ctan.org/tex-archive/biblio/bibtex/contrib/doc/
% The IEEEtran BibTeX style support page is at:
% http://www.michaelshell.org/tex/ieeetran/bibtex/
%\bibliographystyle{IEEEtran}
% argument is your BibTeX string definitions and bibliography database(s)
%\bibliography{IEEEabrv,../bib/paper}
%
% <OR> manually copy in the resultant .bbl file
% set second argument of \begin to the number of references
% (used to reserve space for the reference number labels box)
\begin{thebibliography}{1}
\bibitem{IEEEhowto:kopka}
H.~Kopka and P.~W. Daly, \emph{A Guide to \LaTeX}, 3rd~ed.\hskip 1em plus
0.5em minus 0.4em\relax Harlow, England: Addison-Wesley, 1999.
\end{thebibliography}
% that's all folks
\end{document}