Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 7.13 KB

File metadata and controls

36 lines (24 loc) · 7.13 KB

Introduction

This guide provides a practical overview of setting up and running parallel computing R and Python scripts on the Institute for CyberScience Advanced CyberInfrastructure (ICS-ACI). ICS-ACI is the high-performance computing (HPC) at Penn State. For an introduction and further background on ICS-ACI, please see the following resources:

The guide consists of three sections:

  • Overview of parallel computing and common parallel computing programming models
  • Overview of the standard software required to submit and execute parallel computing applications on HPC systems like ICS-ACI
  • Specific code examples for R and Python, including Portable Batch System (PBS) examples for submitting jobs

What is parallel computing?

Parallel computing is the “simultaneous use of multiple compute resources to solve a computational problem” (Barney 2017). A workload is split into separate tasks, the tasks are executed concurrently on different computing resources, and then the results of the tasks are brought back together. There are many different parallel programming paradigms (see Barney 2017 for a full review), but for practical purposes, it’s useful to think of two different models: (1) a shared memory multithreaded model; and (2) a distributed memory multiprocess model. This guide focuses on the latter, but we describe both models for context.

Shared memory multithreaded model

Under the shared memory multithreaded model, a parent process has offshoots of concurrent execution called threads that all have access to the same shared memory. The threads can simultaneously run across multiple different processor cores, thus enabling parallelization. Because the threads share the same data structures, it is easy for them to communicate. However, this means proper thread-safe code must be implemented to avoid race conditions and data inconsistencies. For instance, if multiple threads can change the value of a variable at the same time, unexpected results can occur. Typically, shared memory multithreaded implementations can only be executed on a single computer. They can be run on HPC systems like ICS-ACI, but can only execute on a single node (i.e. computer) and cannot be used to execute a workload across multiple different nodes. In scientific computing, the most popular implementation of shared memory multithreading is the OpenMP application programming interface for C/C++ and Fortran. Multithreading libraries are also available in Python and R. It is important to note that while the Python threading library can be useful in specific cases, it is limited by the Global Interpreter Lock.

Distributed memory multiprocess model

In the distributed memory multiprocess model, each parallel line of execution is a true independent process with its own memory space. Because the processes do not share memory, they coordinate and exchange data via messages. Although the message passing can occur added latency and decreased performance relative to a multithreaded model, the overhead required to ensure thread-safe code is eliminated. Workload processes can exist on the same node or across nodes, but multiprocessing libraries often only support execution on a single node (e.g. Python multiprocessing package). In the next section, we will discuss the Message Passing Interface (MPI), the industry standard for executing a distributed memory multiprocess model workload across nodes on a HPC system.

Standard Software used on HPC Systems

Message Passing Interface (MPI)

The Message Passing Interface (MPI) is the industry standard for executing a distributed memory multiprocess model workload across nodes on a HPC system. MPI itself is not a software library, but an interface specification standard that defines what interface and features any MPI implementation must have. The two most popular MPI implementations are Open MPI (not to be confused with OpenMP) and MPICH. Both are available on ICS-ACI. Although Open MPI and MPICH only provide native library interfaces for C and Fortran, MPI bindings are provided for Python by the mpi4py package and for R by the Rmpi package. It is important to note that while we focus on MPI within a distributed memory multiprocess model, it can also be used for shared memory and hybrid shared-distributed memory parallel programming.

Portable Batch System (PBS)

Because HPC systems like ICS-ACI serve many users, resource management software is required to effectively manage and utilize the computing resources. The Portable Batch System (PBS) is the most widely used HPC resource management software. ICS-ACI uses TORQUE, a version of PBS, in combination with the Moab workload manager to manage and utilize the cluster's computing resources. For the purposes of this tutorial, we will refer to the TORQUE/MOAB software suite simply as "PBS". Users submit requests for computational resources to PBS via job scripts that specify the computing resources required (e.g. number of processes and nodes, memory, compute time, etc.) and the command for the parallel processing application to be executed. The user requests are placed on a queue and PBS automatically allocates resources for the different jobs. When resources become available for a particular job, PBS gives the job exclusive use of the compute resources requested and then executes the job.

Code Examples