-
Notifications
You must be signed in to change notification settings - Fork 134
Publications
This page contains some of the research papers associated with the ROSE project over the last several years. For numerous reasons, we feel that the latest papers are the best papers, this is likely typical of any ambitious project; but we have included everything for completeness. It is hoped that the underlying goal within each paper of supporting the use of high-level abstractions will be clear together with our attempts to address the performance issues required for the use of high-level abstractions within scientific computing.
How to cite ROSE, you can cite the following paper:
- The ROSE source-to-source compiler infrastructure, Cetus users and compiler infrastructure workshop, 2011.
@inproceedings{quinlan2011rose,
title={The {ROSE} source-to-source compiler infrastructure},
author={Quinlan, Dan and Liao, Chunhua},
booktitle={Cetus users and compiler infrastructure workshop, in conjunction with PACT},
volume={2011},
pages={1},
year={2011},
organization={Citeseer}
}
Peter Pirkelbauer, Pei-Hung Li, Tristan Vanderbruggen , Chunhua Liao, XPlacer: Automatic Analysis of CPU/GPU Access Patterns, IPDPS 2020 (accepted), LLNL-CONF-795057
Anjia Wang, Alok Mishra, Chunhua Liao, Yonghong Yan, Barbara Chapman, FreeCompilerCamp.org: Training for OpenMP Compiler Development from Cloud, Sixth SC Workshop on Best Practices for HPC Training and Education: BPHTE19, 2019 LLNL-CONF-791339
Anjia Wang, Yaying Shi, Xinyao Yi, Yonghong Yan, Chunhua Liao and Bronis R. de Supinski, ompparser: A Standalone and Unified OpenMP Parser, Fifteenth International Workshop on OpenMP (IWOMP 2019), Auckland, New Zealand, September 11–13, 2019. LLNL-CONF-774801.
Yonghong Yan, Anjia Wang, Chunhua Liao, Tom Scogland and Bronis R. de Supinski, Extending OpenMP Metadirective Semantics for Runtime Adaptation, Fifteenth International Workshop on OpenMP (IWOMP 2019), Auckland, New Zealand, September 11–13, 2019. LLNL-CONF-774899
Larisa Stoltzfus, Murali Emani, Pei-Hung Lin, and Chunhua Liao. 2018. Data Placement Optimization in GPU Memory Hierarchy using Predictive Modeling. In Proceedings of the Workshop on Memory Centric High Performance Computing (MCHPC'18). ACM, New York, NY, USA, 45-49. presentation
M. Bari, L. Stoltzfus, P. Lin, C. Liao, M. Emani, B.Chapman, Is Data Placement Optimization Still Relevant on Newer GPUs?, The 9th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems (PMBS18), Dallas, TX, Nov 12th, 2018 LLNL-CONF-757796
Chunhua Liao, Pei-Hung Lin, Markus Schordan and Ian Karlin, A Semantics-Driven Approach to Improving DataRaceBench's OpenMP Standard Coverage, IWOMP 2018: 14th International Workshop on OpenMP, Barcelona, Spain, September 26-28, 2018, Proceedings. 189-202. 10.1007/978-3-319-98521-3_13. LLNL-CONF-750770
Chunhua Liao, Pei-Hung Lin, Joshua Asplund, Markus Schordan and Ian Karlin, DataRaceBench: A Benchmark Suite for Systematic Evaluation of Data Race Detection Tools, The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, Nov. 12-17, 2017 (Best Paper Nominee).
Tristan Vanderbruggen, John Cavazos, Chunhua Liao, Daniel J. Quinlan: Directive-based tile abstraction to distribute loops on accelerators. GPGPU@PPoPP 2017: 53-62
Pei-Hung Lin, Qing Yi, Daniel Quinlan, Chunhua Liao and Yongqing Yan, Automatically Optimizing Stencil Computations on Many-core NUMA Architectures, The 29th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2016) September 28-30, 2016 Rochester NY, USA
Pedro Diniz, Chunhua Liao, Daniel Quinlan and Robert Lucas, Pragma-controlled Source-to-Source Code Transformations for Robust Application Execution, August 24-26, Euro-Par 2016
C. Liao, P. Lin, D. J. Quinlan, Y. Zhao, and X. Shen, “Enhancing domain specific language implementations through ontology,” in Proceedings of the 5th international workshop on domain-specific languages and high-level frameworks for high performance computing, New York, NY, USA, 2015, p. 3:1–3:9.
P. Lin, C. Liao, D. J. Quinlan, and S. Guzik, “Experiences of using the openmp accelerator model to port DOE stencil applications,” in Openmp: heterogenous execution and data movements – 11th international workshop on openmp, IWOMP 2015, aachen, germany, october 1-2, 2015, proceedings, 2015, pp. 45-59.
Y. Yan, P. Lin, C. Liao, B. R. de Supinski, and D. J. Quinlan, “Supporting multiple accelerators in high-level programming models,” in Proceedings of the sixth international workshop on programming models and applications for multicores and manycores, New York, NY, USA, 2015, pp. 170-180.
Verification of polyhedral optimizations with constant loop bounds in finite state space computations
M. Schordan, P. Lin, D. Quinlan, and L. Pouchet, “Verification of polyhedral optimizations with constant loop bounds in finite state space computations,” in Leveraging applications of formal methods, verification and validation. specialized techniques and applications, T. Margaria and B. Steffen, Eds., Springer Berlin Heidelberg, 2014, vol. 8803, pp. 493-508.
In this paper, we examine the newly released accelerator directives and create an initial reference implementation, referred to as HOMP (Heterogeneous OpenMP). Focused on targeting NVIDIA GPUs, our work is based on an existing OpenMP implementation in the ROSE source-to-source compiler infrastructure. HOMP includes extensions to parse the new constructs and to represent them in the AST and other compiler translation details. Further we provide initial runtime support. For our evaluation, we have adapted a few existing OpenMP codes to use the accelerator model directives and present preliminary performance results. Finally, we critique the accelerator model in terms of its impact on developers and compiler writers and suggest possible improvements
C. Liao, Y. Yan, B. R. de Supinski, D. J. Quinlan, and B. Chapman, “Early experiences with the openmp accelerator model,” in Openmp in the era of low power devices and accelerators, Springer, 2013, pp. 84-98.
This paper presents a novel technique to detect data races and deadlocks of OpenMP programs, using hybrid program analysis. Specifically, we use an SMT-solver based static analysis to analyze OpenMP source code. Then we use a dynamic analysis to confirm, or rule out, the potential errors. The static analysis narrows down the code regions and events that need to be monitored, significantly reducing the overhead of the dynamic analysis. Our experiments show that OpenMP-Checker is more scalable and accurate at pinpointing concurrency errors within a set of chosen benchmarks, compared to the two commercial tools, Sun Thread Analyzer and Intel Thread Checker.
H. Ma, Q. Chen, L. and Wang, C. Liao, and D. Quinlan, “Openmp-checker: detecting concurrency errors of openmp programs using hybrid program analysis,” in Poster paper icpp’12, the 41st international conference on parallel processing, , 2012.
This paper presents a compiler based transformation released in ROSE and demonstrates the use of Triple Modular Redundancy as an approach to provide HPC software with fault tolerance against transient faults, as we expect them to manifest themselves on future Exascale architectures. The paper presents performance results showing that for a randomly selected subset of benchmarks the overhead of this extra layer of support is about 20%. We expect that may be competitive with future approaches to fault tolerance using check-point restart that may be much more expensive or maybe even intractable for Exascale. This work is released as a framework within ROSE to support research work in this area by ourselves and collaborators.
J. Lidman, D. J. Quinlan, C. Liao, and S. A. McKee, “Rose:: fttransform-a source-to-source translation framework for exascale fault-tolerance research,” in Dependable systems and networks workshops (dsn-w), 2012 ieee/ifip 42nd international conference on, 2012, pp. 1-6.
This paper presents an auto-scoping algorithm to work with OpenMP tasks. (Auto-scoping is the process of automatically determining the data sharing dependencies of variables in OpenMP programs). This is a much more complex challenge due to the uncertainty of when a task will be executed, which makes it harder to determine what parts of the program will run concurrently. We also introduce an implementation of the algorithm and results with several benchmarks showing that the algorithm is able to correctly scope a large percentage of the variables appearing in them.
S. Royuela, A. Duran, C. Liao, and D. J. Quinlan, “Auto-scoping for openmp tasks,” in Openmp in a heterogeneous world, Springer, 2012, pp. 29-43.
Studying the impact of application-level optimizations on the power consumption of multi-core architectures
This paper presents an extensive study of the impact of application level optimizations on both the performance and power efficiencies of applications from a wide range of scientific and embedded systems domains. We observe that application-level optimizations often have a much larger impact on performance than on power consumption. However, optimizing for performance does not necessarily lead to better power consumption, and vice versa. Compared to sequential applications, multithreaded applications give more room for performance and power improvements. Additionally, a number of optimizations, including loop and thread affinity optimizations, have shown great potential in supporting collective enhancement of both performance and power efficiency. Our experimental results provide several insights to help exploit these optimizations effectively.
S. M. F. Rahman, J. Guo, A. Bhat, C. Garcia, M. H. Sujon, Q. Yi, C. Liao, and D. Quinlan, “Studying the impact of application-level optimizations on the power consumption of multi-core architectures,” in Proceedings of the 9th conference on computing frontiers, 2012, pp. 123-132.
T. Nguyen, P. Cicotti, E. Bylaska, D. Quinlan, and S. B. Baden, “Bamboo: translating mpi applications to a latency-tolerant, data-driven form,” in Proceedings of the international conference on high performance computing, networking, storage and analysis, 2012, p. 39.
This paper presents work combining the LBL node-simulator, the SNL, network simulator, and the ROSE compiler to demonstrate analysis of software and the workflow required for such tools to analyze the power requirements of HPC code using autotuning to define optimial points in the design space. The paper lays out an approach to co-design at the start of work that is a part of the CoDEX project lead by LBL and including both SNL and LLNL.
J. Shalf, D. Quinlan, and C. Janssen, “Rethinking hardware-software codesign for exascale systems,” Computer, vol. 44, iss. 11, pp. 22-30, 2011.
D. Quinlan and C. Liao, “The rose source-to-source compiler infrastructure,” in Cetus users and compiler infrastructure workshop, in conjunction with pact 2011, 2011.
This paper presents an OpenCL code generator leveraging the semantics of the F90 array constructs. Such GPU work is expected to be an important part of future Exascale programming environments, this work demonstrates how ROSE is used to support the analysis of the input code, and the translation and code generation required to generate OpenCL code for GPUs.
M. J. Sottile, C. E. Rasmussen, W. N. Weseloh, R. W. Robey, D. Quinlan, and J. Overbey, “Foropencl: transformations exploiting array syntax in fortran for accelerator programming,” in 2nd international workshop on gpus and scientific applications (gpusca 2011), 2011, p. 23.
This paper present work to define a dynamic analysis for correctness of UPC usage and leverages the RTED test suite from Iowa State University. This work is released in ROSE and shows how to build a dynamic analysis level of support to catch errors as represented by test codes in the RTED test suit for UPC. The correctness of using programming models is an important aspect of the design of future programming models for Exascale. This paper shows how to design dynamic analysis-based tools to evaluate correctness of the UPC languages programming model.
P. Pirkelbauer, C. Liao, T. Panas, and D. Quinlan, “Runtime detection of c-style errors in upc code,” in Proceedings of fifth conference on partitioned global address space programming models, pgas, 2011.
C. Liao, D. J. Quinlan, T. Panas, and B. R. de Supinski, “A rose-based openmp 3.0 research compiler supporting multiple runtime libraries,” in Beyond loop level parallelism in openmp: accelerators, tasking and more, Springer, 2010, pp. 15-28.
C. Liao, D. J. Quinlan, J. J. Willcock, and T. Panas, “Semantic-aware automatic parallelization of modern applications using high-level abstractions,” International journal of parallel programming, vol. 38, iss. 5-6, pp. 361-378, 2010.
Towards an abstraction-friendly programming model for high productivity and high performance computing
C. Liao, D. Quinlan, and T. Panas, “Towards an abstraction-friendly programming model for high productivity and high performance computing,” Lawrence Livermore National Laboratory (LLNL), Livermore, CA 2009.
This paper describes our work of using ROSE to build an effective source-to-source outliner in order to support whole program empirical optimization (also called autotuning). The ROSE outliner addresses the problem of extracting tunable kernels out of large scale applications, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks. In particular, the outliner can generate kernels which preserve performance characteristics of tuning targets which can be easily handled by other tools. This work also demonstrates how one can use ROSE’s compiler analyses to enhance the quality of source-to-source translation.
C. Liao, D. J. Quinlan, R. Vuduc, and T. Panas, “Effective source-to-source outlining to support whole program empirical optimization,” in Languages and compilers for parallel computing, Springer, 2010, pp. 308-322.
A. Sæbj{o}rnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su, “Detecting code clones in binary executables,” in Proceedings of the eighteenth international symposium on software testing and analysis, 2009, pp. 117-128.
T. Panas and D. Quinlan, “Techniques for software quality analysis of binaries: applied to windows and linux,” Defects, vol. 9, pp. 6-10, 2009.
This paper describes an approach to extending automatic parallelization to optimize applications written using high level abstractions. This work exemplifies a typical usage of ROSE and an initial work by us on the general subject of how to leverage semantics associated with high level of abstractions to enable more optimizations.
C. Liao, D. J. Quinlan, J. J. Willcock, and T. Panas, “Extending automatic parallelization to optimize high-level abstractions for multicore,” in Evolving openmp in an age of extreme parallelism, Springer, 2009, pp. 28-41.
T. Panas, “Signature visualization of software binaries,” in Proceedings of the 4th acm symposium on software visualization, 2008, pp. 185-188.
D. J. Quinlan, G. Barany, and T. Panas, “Towards distributed memory parallel program analysis,” in Scalable program analysis, Dagstuhl, Germany, 2008.
Shared and distributed memory parallel security analysis of large-scale source code and binary applications
D. Quinlan, G. Barany, and T. Panas, “Shared and distributed memory parallel security analysis of large-scale source code and binary applications,” Lawrence Livermore National Laboratory (LLNL), Livermore, CA 2007.
T. Panas, T. Epperly, D. Quinlan, A. Saebjornsen, and R. Vuduc, “Communicating software architecture using a unified single-view visualization,” in Engineering complex computer systems, 2007. 12th ieee international conference on, 2007, pp. 217-228.
D. J. Quinlan, R. W. Vuduc, and G. Misherghi, “Techniques for specifying bug patterns,” in Proceedings of the 2007 acm workshop on parallel and distributed systems: testing and debugging, 2007, pp. 27-35.
T. Panas, D. Quinlan, and R. Vuduc, “Analyzing and visualizing whole program architectures,” in Icse workshop on aerospace software engineering (aerose), minneapolis, mn, 2007.
T. Panas, D. Quinlan, and R. Vuduc, “Tool support for inspecting the code quality of hpc applications,” in Proceedings of the 3rd international workshop on software engineering for high performance computing applications, 2007, p. 2.
R. Vuduc, M. Schulz, D. Quinlan, B. De Supinski, and A. Sæbj{o}rnsen, “Improving distributed memory applications testing by message perturbation,” in Proceedings of the 2006 workshop on parallel and distributed systems: testing and debugging, 2006, pp. 27-36.
D. Quinlan, R. Vuduc, T. Panas, J. Härdtlein, and A. Sæbj{o}rnsen, “Support for whole-program analysis and the verification of the one-definition rule in c++,” Paul e. black, helen gill, and w. bradley martin (co-chairs), vol. 500, p. 27, 2006.
This paper is about the optimization of unstructured grid applications and represent preparatory work for future automated transformations specific to unstructured grid applications within DOE using ROSE.
B. S. White, S. A. McKee, B. R. de Supinski, B. Miller, D. Quinlan, and M. Schulz, “Improving the computational intensity of unstructured mesh applications,” in Proceedings of the 19th annual international conference on supercomputing, 2005, pp. 341-350.
Applying loop optimizations to object-oriented abstractions through general classification of array semantics
This paper outlines an approach to the optimization of user-defined abstractions. This work represents a substantial goal for ROSE and an initial work by us on the general subject of how to write code at a very high level of abstraction and have the lower level code required to get good performance be automatically generated. This paper covers the details of optimizing object-oriented abstractions usingROSE. Unfortunately, ROSE is not mentioned anywhere in the paper, a ridiculous oversight, but oh well. The subject is the optimization, not the ROSE compiler infrastructure.
Q. Yi and D. Quinlan, “Applying loop optimizations to object-oriented abstractions through general classification of array semantics,” in Languages and compilers for high performance computing, Springer, 2005, pp. 253-267.
This paper is a general introduction to recent work in the ROSE project.
D. Quinlan, M. Schordan, Q. Yi, and A. Saebjornsen, Classification and utilization of abstractions for optimization, Springer, 2006.
This paper covers the architecture of ROSE as a project.
Schordan M., Quinlan D., “A Source-To-Source Architecture for User-Defined Optimizations”, Joint Modular Languages Conference held in conjunction with EuroPar’03, Austria, August 2003
This paper is the informal proceedings version and demonstrates the optimization of generalized container abstractions and is related to Active Library research (or so I understand). It is also related to Telescoping Language research. The paper demonstrates a few of the newest features in ROSE and has served an an introduction for the authors into the optimization of the STL library more generally.
Daniel J. Quinlan, Markus Schordan, Qing Yi, Bronis R. de Supinski: Semantic-Driven Parallelization of Loops Operating on User-Defined Containers. LCPC 2003: 524-538
This paper demonstrates the use of ROSE to recognize OpenMP pragmas and, using the Nanos OpenMP runtime library, build a subset of an OpenMP specific compiler for C++.
Daniel J. Quinlan, Markus Schordan, Qing Yi, Bronis R. de Supinski: A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives. WOMPAT 2003: 13-25
This paper is specific to compile-time optimization of array classes. It demonstrates what was at the time the most current work on the compile-time optimization of an array class library. ROSE is more general, but this paper is very specific to the optimization of a single library.
Quinlan, D. J., Miller, B., Philip, B., and Schordan, M. 2002. Treating a User-Defined Parallel Library as a Domain-Specific Language. In Proceedings of the 16th international Parallel and Distributed Processing Symposium (April 15 – 19, 2002). IEEE Computer Society, Washington, DC, 324
This is one of the first papers on ROSE presented at CPC2001 and later updated for publication into the Journal of Concurrency, Practice, and Experience.
Quinlan, D. Schordan, M. Philip, B. Kowarschik, M. “Parallel Object-Oriented Framework Optimization”, Special Issue of Concurrency: Practice and Experience (2003), also in Proceedings of Conference on Parallel Compilers (CPC2001), Edinburgh, Scotland, June 2001.
The Specification of Source-To-Source Transformations for the Compile-Time Optimization of Parallel Object-Oriented Scientific Applications
This was a paper which specified some elements of what later became the string based AST rewrite mechanism used in ROSE.
Quinlan, D., Schordan, M. Philip, B. Kowarschik, M. “The Specification of Source-To-Source Transformations for the Compile-Time Optimization of Parallel Object-Oriented Scientific Applications”, Submitted to Parallel Processing Letters, also in Proceedings of 14th Workshop on Languages and Compilers for Parallel Computing (LCPC2001), Cumberland Falls, KY, August 1-3 2001.
ROSETTA: The Compile-Time Recognition of Object-Oriented Library Abstractions and Their Use Within User Applications
This paper describes the development of a tool, ROSETTA, which build object-oriented Intermediate Representations (IRs) for compilers. It is a tool used within ROSE to build the SAGE III IR which we use internally with the EDG front-end. It is specific to details of the internal ROSE compiler infrastructure.
D. Quinlan and B. Philip, “ROSETTA: The Compile-Time Recognition of Object-Oriented Library Abstractions and Their Use Within User Applications”, in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), 2001
This paper was an introduction to the work being done at the time on ROSE complete with a more detailed motivation for compile-time optimization of specific libraries.
Quinlan, D., “ROSE: Compiler Support for Object-Oriented Frameworks” Proceedings of Conference on Parallel Compilers (CPC2000), Aussois, France, January 2000. Also published in special issue of Parallel Processing Letters, Vol. 10.
This paper present preliminary work on the compile-time optimization of array class libraries.
Kei Davis and Dan Quinlan, ROSE II: An Optimizing Code Transformer for C++ Object-Oriented Array Class Libraries, World Multiconference on Systemics, Cybernetics and Informatics and 5th International Conference on Information Systems Analysis and Synthesis Vol.5: Computer Science and Engineering, Jul 31-Aug 4, 1999, Orlando, Florida
Discusses the different approaches to the optimization of array class libraries. Optimization of array class libraries led to the development of ROSE as a project, though ROSE is not at all specific to array class libraries and addresses the optimization of libraries generally. This paper can be helpful in understanding what work was done using language template features within C++ before attempting to address the optimization issues more generally at compile time. Prior work started on ROSE had been abandoned because of the perceived significant advantages of template meta-programming techniques for scientific computing. Several papers on the details of template use were written, this is the most complete of them. It is included with these papers to provide a bit of perspective (currently historical).
F. Bassetti, K. Davis, D. Quinlan, “C++ Expression Templates Performance Issues in Scientific Computing,” ipps, pp.0635, 12th. International Parallel Processing Symposium, 1998
R. Parsons and D. Quinlan, “A++/P++ array classes for architecture independent finite difference computations,” in Proceedings of the second annual object-oriented numerics conference (oonski’94), 1994.
P++, a c++ virtual shared grids based programming environment for architecture-independent development of structured grid applications
M. Lemke and D. Quinlan, “P++, a c++ virtual shared grids based programming environment for architecture-independent development of structured grid applications,” in Preceeding of the conpar/vapp v, 1992.
D. Brown, W. Henshaw, and D. Quinlan, “Overture: a framework for the complex geometries,” in Proceedings of the iscope’99 conference, 1999.