rewriting abstract

sshin23 · sshin23 · commit c62556025383 · 2025-08-04T15:16:44.000-04:00
diff --git a/tex/ *Minibuf-1* b/tex/ *Minibuf-1*
diff --git a/tex/main.tex b/tex/main.tex
@@ -54,17 +54,15 @@
 \maketitle
 
 \begin{abstract}
-  This paper explores two condensed-space interior-point methods, HyKKT and LiftedKKT, designed to solve large-scale nonlinear programs on graphics processing units (GPUs).
-  Interior-point methods (IPM) require solving a sequence of symmetric indefinite linear systems, known as Karush-Kuhn-Tucker (KKT) systems, which become increasingly ill-conditioned near the solution.
-  Traditional sparse factorization techniques for KKT systems rely on numerical pivoting, posing challenges for parallel execution on GPUs.
-  A viable alternative is to transform the KKT system into a symmetric positive-definite matrix and solve it using Cholesky factorization, a method that maintains numerical stability without the need for numerical pivoting.
-  Despite the potential for greater ill-conditioning in the condensed systems, we demonstrate that their inherent structures effectively mitigate the loss of accuracy in the IPM.
-  We implement both methods on GPUs using MadNLP.jl, an optimization solver interfaced with NVIDIA's sparse linear solver cuDSS and the GPU-accelerated modeling framework ExaModels.jl.
-  Our experiments on the PGLIB and CUTEst benchmarks show that
+  This paper presents a theoretical analysis of the numerical properties of two variants of condensed-space interior-point methods designed for GPUs---HyKKT and LiftedKKT---and examines their real-world performance with comprehensive numerical experiments.
+  Conventional implementations of interior-point methods (IPMs) rely on repeatedly solving indefinite augmented KKT systems, typically carried out with direct sparse solvers based on LBL$^\top$ factorization equipped with sophisticated numerical pivoting strategies.
+  While this approach achieves high performance and robustness on CPUs, the serial nature of numerical pivoting poses challenges for effective implementation on GPUs.
+  Recently, various condensed-space IPM strategies have emerged to address this issue by transforming the KKT system into a symmetric positive-definite matrix, enabling solutions via Cholesky factorization with static pivoting on GPUs, but their numerical properties have not been thoroughly analyzed.
+  In this paper, we show that although the condensed systems may exhibit increased ill-conditioning, the intrinsic structures of the condensed KKT system effectively mitigate any potential accuracy loss in the IPM with the numerical error analysis, explaining the observed numerical stability.
+  Additionally, we present numerical results that showcase the capabilities of a fully GPU-resident nonlinear programming framework, provided by MadNLP.jl (a filter line-search IPM solver), cuDSS (a direct sparse solver utilizing Cholesky factorization), and ExaModels.jl (a modeling framework).
   \add{
-    (i) condensed-psace approach on GPUs can achieve up to a tenfold speedup over single-threaded CPUs with the classical augmented KKT system formulation on large-scale OPF instances,
-    (ii) the raw performance and robustness are more mitigated on the CUTEst benchmark, the two condensed-space methods being penalized for
-  certain classes of problems where the condensed matrix has a large number of nonzeros.
+    Benchmark results against the pglib-opf and CUTEst libraries indicate that condensed-space methods hold promise for large-scale nonlinear programming on GPUs, although further work is required to improve their robustness and performance across a broader range of problem types:
+    a speedup of up to tenfold over single-threaded CPUs can be achieved for extremely large and sparse optimal power flow (OPF) instances, but GPU solvers often exhibit reduced robustness and limited speedups for various CUTEst instances.
   }
 \end{abstract}
 
diff --git a/tex/sections/introduction.tex b/tex/sections/introduction.tex
@@ -1,11 +1,12 @@
 \section{Introduction}
-Graphics Processing Units (GPUs) have become a cornerstone of scientific computing, with their most prominent success being in the training and deployment of large-scale artificial intelligence (AI) models.
-GPUs offer two main advantages: (1) massive parallel computing capabilities, especially for applications that exploit coarse-grain parallelism and high memory bandwidth, and (2) superior power efficiency through the use of ``Single Instruction, Multiple Data'' (SIMD) execution.
+Graphics Processing Units (GPUs) have become a cornerstone of scientific computing, with their most notable achievement being the training and deployment of large-scale artificial intelligence (AI) models.  
+GPUs provide two primary advantages: (1) significant parallel computing capabilities, particularly for applications that leverage coarse-grain parallelism and high memory bandwidth, and (2) enhanced power efficiency through the implementation of ``Single Instruction, Multiple Data'' (SIMD) execution.
+
+Despite their success in machine learning, GPUs have seen limited adoption in mathematical programming.  
+This is primarily due to the dependence of second-order optimization methods on direct linear algebra for computing Newton directions, a task that is challenging to parallelize effectively.  
+Sparse matrix factorizations, which are central to these methods, present significant difficulties on SIMD architectures.  
+However, several recent developments are beginning to shift this landscape:  
 
-Despite their success in machine learning, GPUs have seen limited adoption in the mathematical programming community.
-This is largely due to the reliance of second-order optimization methods on direct linear algebra to compute Newton directions, a task that remains difficult to parallelize efficiently.
-Sparse matrix factorizations, central to these methods, pose significant challenges on SIMD architectures.
-However, several recent developments are shifting this landscape:
 
 \begin{enumerate}
   \item \textbf{Faster sparse matrix operations}: Improvements in CUDA libraries and the integration of tensor cores in modern GPUs have significantly accelerated sparse matrix operations~\cite{markidis2018nvidia}.