-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathlistOfExperiences-bySkill.tex
More file actions
133 lines (109 loc) · 7.64 KB
/
listOfExperiences-bySkill.tex
File metadata and controls
133 lines (109 loc) · 7.64 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
\newcommand{\myExpOne}{
\item Software design and implementation for tools for distributed GPU programming with Kokkos, specifically (1) inter-process profiling and adaptivity via PMPI and (2) job-level monitoring and feedback via LDMS and DCGM.
\item Augmenting Kokkos Tools runtime auto-tuning via low-level metrics from, e.g., PAPI, CUPTI, instead of timings, reducing time to converge to optimal Kokkos tuning parameters by 32\%.
\item Developing energy-aware optimization for Kokkos via NVML, achieving 4x energy reduction.
\item Released a Spack package for Kokkos Tools, resulting in 16 new users of Kokkos Tools.
\item Developed AI-assisted HPC Tools through LLMs (coderosetta.com) and autotuning (TAU+APEX) for Kokkos applications run on NVIDIA GPUs, resulting in a poster presentation at GTC 2025.
\item Research and pathfinding on the use of AI chips with dataflow architectures, e.g., Cerebras WSE-3, for science simulations.
}
\subsection*{Technical Leadership and Industry-grade Open-source HPC Software}
\textbf{Sandia National Laboratories}\\
{Principal Member of Technical Staff II} \hfill \textit{July 2024 - Present}
%\vspace{-0.02in}
\noindent
\begin{itemize}[itemsep=-0.1em]\onlyitems[include={1,2}]
\myExpOne
\end{itemize}
\newcommand{\myExpTwo}{
\item Developed and maintained Kokkos Tools for the CMake build system, low tooling overheads, CI/CD, auto-tuning, and nvtx/roctx/vtune integration, leading to 15 merged github PRs.
\item Developed a debugging tool that detected 7 common Kokkos user bugs by analyzing LLVM IR of Kokkos programs via symbolic execution, leading to a paper at SC24's Correctness workshop.
\item Implemented 5 new loop transformation features in LLVM OpenMP, leading to a 1.7x speedup for a Kokkos-OpenMP+CUDA benchmark using the index set split construct, 3 accepted OpenMP 6.0 features, and 11 feature proposals in OpenMP 7.0.
\item Implemented OpenMP multi-GPU parallelism features in a prototype library for LLVM OpenMP, leading to a 1.3x speedup over the corresponding CUDA-aware MPI approach, and 9 new OpenMP multi-GPU feature proposals for OpenMP 7.0.
}
\noindent
{Senior Member of Technical Staff} \hfill \textit{August 2022 - July 2024}
%\vspace{-0.02in}
\begin{itemize}[itemsep=-0.1em]
\myExpTwo
\end{itemize}
\newcommand{\myExpThree}{
% \item Contributed to developing an LLVM OpenMP implementation, specifically the OpenMP implementation's compiler and its runtime, targetted for Department of Energy's upcoming Exascale Supercomputer platforms.
\item Implemented OpenMP user-defined multi-GPU scheduling for LLVM, offering 2.1x speedup over using MPI parallelization, leading to papers at IWOMP 2020 and BCB 2021.
\item Implemented performance optimizations in LLVM for OpenMP asynchronous GPU offloading that achieved a 1.2x speedup, leading to a paper at SC22's HiPar workshop.
\item Developed performance benchmarks that evaluated 5 major vendor OpenMP GPU implementations, leading to an ACM journal paper and an IWOMP 2021 workshop paper.
% \item Developed benchmarks and evaluating OpenMP implementations, e.g., LLVM's OpenMP, NVIDIA's OpenMP, on Exascale Supercomputers.
\item Demonstrated technical leadership as technical project manager for the ECP SOLLVE project, submitting 12 ECP milestone reports, organizing 7 GPU hackathons, and defining 3 project KPIs.
%and voting in 5 OpenMP Committee meetings.
}
\noindent
\textbf{Brookhaven National Laboratory}\hfill
{Assistant Computational Scientist} \hfill \textit{May 2019 - August 2022}
\begin{itemize}[itemsep=-0.1em]
\myExpThree
\end{itemize}
\subsection*{HPC Software Development and Performance Engineering}
\newcommand{\myExpFour}{
\item Implemented User-defined Loop Schedules (UDS) for OpenMP and RAJA via a prototype library for LLVM and GCC, leading to a paper at IWOMP 2018 and 3 github PRs merged in Charm++.
\item Performance analysis and optimization of MPI+CUDA scientific applications on NVIDIA GPUs via CUPTI and auto-tuning, leading to 1.4x speedup of an application for computer chip design.
\item Developed novel and efficient multi-level loop schedulers in Charm++, leading to a 1.2x speedup on the PRK particle-in-cell benchmark code and a Best Poster Candidate at SC18.
%\item Added the UDS feature to RAJA and Charm++'s CkLoop, with 1 github PR merged in Charm++.
}
\noindent
\textbf{USC/ISI + Charmworks, Inc.}\hfill
{Software Engineer} \hfill \textit{Dec 2015 - May 2019}
\vspace{-0.0in}
\begin{itemize}[itemsep=-0.1em]
\myExpFour
\end{itemize}
\comments{
\newcommand{\myExpFive}{
\item Performance analysis and optimization of 3-D image reconstruction application on NVIDIA GPUs via CUPTI and auto-tuning, leading to a performance-enhanced CUDA version of the application.
\item Developed tuning support for coordinated loop scheduling and load balancing in Charm++, leading to a 1.2x speedup on a particle-in-cell benchmark code and a Best Poster Candidate at SC18.
}
\noindent
\textbf{USC - Information Sciences Institute}\hfill
\textit{Computer Scientist} \hfill \textit{Dec 2016 - May 2018}
\begin{itemize}[itemsep=-0.1em]
\myExpFive
\end{itemize}
% accomplished x as measured by y, by doing z
\newcommand{\myExpSix}{
\item Extended Charm++ to offer a novel runtime system capability of coordinating inter-node load balancing and intra-node loop scheduling, leading to 2 github PRs merged in Charm++.
}
\noindent
\textbf{Charmworks, Inc.}\hfill
\textit{Software Developer} \hfill \textit{Jan 2016 - Dec 2016}
%\vspace*{-0.02in}
\begin{itemize}
\myExpSix
\end{itemize}
\noindent
\textbf{University of Illinois}\hfill
\textit{Postdoctoral Associate} \hfill \textit{Jul 2015 - Dec 2015}
\begin{itemize}[itemsep=-0.1em]
\item Sped up a plasma-physics Fortran MPI+OpenACC code by 1.2x via a combination of GPU offload optimizations and loop transformations on an NVIDIA K80 GPU.
\end{itemize}
}
\noindent
\textbf{LLNL + UIUC}\hfill {Researcher} \hfill \textit{Jun 2010 – Dec 2015}
\vspace*{-0.0in}
\begin{itemize}[itemsep=-0.1em]
%\item Measured MPI communication delays for micro-benchmarks codes run on supercomputers and worked to find tools to measure dequeue overheads of OpenMP loop schedulers.
%\item Created a software system for automated performance optimization and application programmer usability of low-overhead hybrid scheduling
%strategies.
\item Implemented a ROSE-based compiler pass and PMPI-based runtime system for MPI+OpenMP applications to use loop scheduling techniques, leading to a 1.4x speedup on a multicore cluster.
\item Implemented shared memory extensions for MPICH, leading to a paper with 140+ citations.
\item Implemented multicore and GPU performance optimizations for domains of linear algebra, blood flow, fusion, and combustion, leading to 2 papers at IPDPS.
%\item Assessed further opportunities for performance improvement of low-overhead schedulers, including improvement of spatial locality
%of low-overhead schedulers.
\end{itemize}
\subsection*{General Software Development}
\noindent
\textbf{Proteus Technologies + Wolfram} \hfill {Software Developer} \hfill \textit{Aug 2007 – September 2008}
\vspace*{-0.0in}
\begin{itemize}[itemsep=-0.1em]
\item Developed and tested service-oriented software to monitor the health of a large-scale distributed system for the US government, leading to an internal white paper and software package.
\item Implemented functionality in Mathematica for users to send emails from within a Mathematica evaluation kernel, via sendmail and TLS, leading to a new software feature in Mathematica.
%\item Developed company standards for software development through system requirements specifications, Design Documentation.
%\item Designed and implemented algorithms for power management of clusters, leading to a white paper
\end{itemize}