Skip to content

chore: compile postgres statically to reduce the PLT function call #1064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gongxun0928
Copy link
Contributor

When building postgres, link all the *.o files directly into the binary instead of linking libpostgres.so.
This approach reduces the overhead of PLT function calls. Based on our internal performance tests,
this change improves the performance of a 1TB TPC-DS benchmark by 5-8%.

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from 8ff9d04 to 097d18c Compare April 27, 2025 07:31
@my-ship-it my-ship-it requested a review from gfphoenix78 April 28, 2025 05:18
@my-ship-it my-ship-it force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from 097d18c to 42e6c83 Compare May 9, 2025 05:42
@avamingli avamingli force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from 42e6c83 to e47a61a Compare May 12, 2025 06:49
@gongxun0928 gongxun0928 requested a review from gfphoenix78 May 12, 2025 16:19
@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from dc0a421 to f57317e Compare May 16, 2025 16:17
@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch 2 times, most recently from 1df9a1d to 03b5cf8 Compare May 20, 2025 04:42
@gongxun0928 gongxun0928 requested a review from gfphoenix78 May 20, 2025 04:43
@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from 03b5cf8 to e1244e6 Compare May 20, 2025 16:45
@tuhaihe
Copy link
Member

tuhaihe commented May 22, 2025

Can wait for #1081 to be merged.

@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from e1244e6 to 76a530e Compare May 27, 2025 07:34
Copy link
Contributor

@gfphoenix78 gfphoenix78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yjhjstz
Copy link
Member

yjhjstz commented May 27, 2025

ic-cbdb-parallel (pull_request) still failed .

@gfphoenix78 gfphoenix78 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch 2 times, most recently from 9c60438 to 890474c Compare May 28, 2025 04:29
@gongxun0928
Copy link
Contributor Author

ic-cbdb-parallel (pull_request) still failed .

The job failed with the following error log, it looks like the github-ci runner has no space left on device

System.IO.IOException: No space left on device : '/home/runner/runners/2.324.0/_diag/Worker_20250528-090814-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.324.0/_diag/Worker_20250528-090814-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.324.0/_diag/Worker_20250528-090814-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.Tracing.Dispose(Boolean disposing)
   at GitHub.Runner.Common.Tracing.Dispose()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)

When compiling postgres, statically linking all object files causes the postgres binary to grow from less than 100KB
to over 180MB. As a result, the binary data generated during compilation increases by approximately 180MB. I'm not
sure if this is the root cause.

@edespino @my-ship-it could someone help check the disk space limit of the runner?
I suspect the remaining space might be less than 180MB. Thanks!

@my-ship-it
Copy link
Contributor

ic-cbdb-parallel (pull_request) still failed .

The job failed with the following error log, it looks like the github-ci runner has no space left on device

System.IO.IOException: No space left on device : '/home/runner/runners/2.324.0/_diag/Worker_20250528-090814-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.324.0/_diag/Worker_20250528-090814-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.324.0/_diag/Worker_20250528-090814-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.Tracing.Dispose(Boolean disposing)
   at GitHub.Runner.Common.Tracing.Dispose()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)

When compiling postgres, statically linking all object files causes the postgres binary to grow from less than 100KB to over 180MB. As a result, the binary data generated during compilation increases by approximately 180MB. I'm not sure if this is the root cause.

@edespino @my-ship-it could someone help check the disk space limit of the runner? I suspect the remaining space might be less than 180MB. Thanks!

@tuhaihe @edespino Could you please help adjust space limit? thanks!

@my-ship-it my-ship-it force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from 890474c to c0b56a7 Compare May 29, 2025 01:30
@yjhjstz
Copy link
Member

yjhjstz commented May 29, 2025

The job failed with the following error log, it looks like the github-ci runner has no space left on device

any coredump file that takes up space ?

@tuhaihe
Copy link
Member

tuhaihe commented May 29, 2025

The default storage is 14GB (https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories), maybe we can try to specify the docker option with a more larger storage, like:

~~diff --git a/.github/workflows/build-cloudberry.yml b/.github/workflows/build-cloudberry.yml
index a6e659596b5..0faa72e8ff1 100644
--- a/.github/workflows/build-cloudberry.yml
+++ b/.github/workflows/build-cloudberry.yml
@@ -887,6 +887,7 @@ jobs:
         --ulimit core=-1
         --cgroupns=host
         -v /sys/fs/cgroup:/sys/fs/cgroup:rw
+        --storage-opt size=20G
~~~

I'm not sure if it works, can try it (it doesn't work)... or we can clean up the tmp files.

Update:

@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from f4632a3 to 14d9f83 Compare May 29, 2025 02:41
@gfphoenix78 gfphoenix78 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from 1f63146 to b293297 Compare May 29, 2025 04:38
@edespino
Copy link
Contributor

@gongxun0928 In a diagnostic manner, I am going to try and reproduce this issue (using your branch). I will let you know what I find.

Obviously, please do not commit these changes until we identify the root cause.

@gongxun0928
Copy link
Contributor Author

@edespino I added an option --link-postgres-with-shared, and when using --link-postgres-with-shared=no together with --enable-shared-postgres-backend, the ic-cbdb-parallel test triggers a disk space warning and make test failed:
##[warning]You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 0 MB.

From previous debugging information, the available disk space during the execution of ic-cbdb-parallel was only 23GB. On my local machine, when running ic-cbdb-parallel, the filesystem usage for datadirs during the test exceeds 23GB.

@gongxun0928
Copy link
Contributor Author

gongxun0928 commented Jun 4, 2025

Here is the 1TB TPC-DS performance comparison before and after the PR

Hardware and Operating System

CPU (Core Number) Memory (GB) Disk (TB) Operating System Disk Count Server Type
96 376 15 Oracle Linux 9 x86 1 Physical Server

Database Cluster Configuration

Number of Nodes Segment Configuration Database System Version
4 96 segments PostgreSQL 14.4 (Apache Cloudberry 2.0.0+c4ad1602 build 107528) (HashData Lightning 2.0.0+c4ad1602 build 107528) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-7.7.0.2), 64-bit compiled on Apr 21 2025 17:30:17

GUC Configuration

Name Value Master Only
optimizer on No
optimizer_analyze_root_partition on Yes
gp_autostats_mode none Yes
default_statistics_target 100 No
gp_vmem_protect_limit 15000 No
max_statement_mem 14GB No
statement_mem 1GB No
gp_fts_probe_timeout 600 No
gp_fts_probe_interval 600 No
gp_segment_connect_timeout 1800 No
gp_interconnect_queue_depth 20 No
gp_interconnect_snd_queue_depth 8 No
gp_interconnect_min_retries_before_timeout 100 No
autovacuum off No
max_connections 500 No
max_prepared_transactions 100 No
work_mem 512MB No
gp_interconnect_type tcp No
gp_interconnect_tcp_listener_backlog 2048 No

Query Performance Results

Query / Report main(link libpostgres.so) Query Duration (Seconds) With PR(link obj files) Query Duration (Seconds)
Total Time 2404 2286
Data Load 517 510
Query 101 1 1
Query 102 20 17
Query 103 8 7
Query 104 94 85
Query 105 18 16
Query 106 8 9
Query 107 13 12
Query 108 8 8
Query 109 30 28
Query 110 9 9
Query 111 47 43
Query 112 3 3
Query 113 14 14
Query 114 86 75
Query 115 5 4
Query 116 11 10
Query 117 17 17
Query 118 9 8
Query 119 9 9
Query 120 5 4
Query 121 6 7
Query 122 14 13
Query 123 254 246
Query 124 128 121
Query 125 16 15
Query 126 7 7
Query 127 13 12
Query 128 16 15
Query 129 15 15
Query 130 2 2
Query 131 12 11
Query 132 8 7
Query 133 12 11
Query 134 11 10
Query 135 13 12
Query 136 14 13
Query 137 7 6
Query 138 21 19
Query 139 27 24
Query 140 6 5
Query 141 0 0
Query 142 8 7
Query 143 11 11
Query 144 7 7
Query 145 4 3
Query 146 13 12
Query 147 19 18
Query 148 14 13
Query 149 12 12
Query 150 13 11
Query 151 20 19
Query 152 8 7
Query 153 9 9
Query 154 17 16
Query 155 8 7
Query 156 11 12
Query 157 11 10
Query 158 10 9
Query 159 153 156
Query 160 12 11
Query 161 13 11
Query 162 5 5
Query 163 9 8
Query 164 34 32
Query 165 29 28
Query 166 8 8
Query 167 300 290
Query 168 13 12
Query 169 9 8
Query 170 25 24
Query 171 15 13
Query 172 100 98
Query 173 9 9
Query 174 37 36
Query 175 21 27
Query 176 6 5
Query 177 12 12
Query 178 67 64
Query 179 15 14
Query 180 16 16
Query 181 3 3
Query 182 16 14
Query 183 3 2
Query 184 2 3
Query 185 4 5
Query 186 5 5
Query 187 20 18
Query 188 36 33
Query 189 10 9
Query 190 4 4
Query 191 1 1
Query 192 6 5
Query 193 12 12
Query 194 10 10
Query 195 100 93
Query 196 8 7
Query 197 24 23
Query 198 7 7
Query 199 10 9

Copy link
Contributor

@gfphoenix78 gfphoenix78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gfphoenix78 gfphoenix78 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from 4b00031 to b16daa3 Compare June 9, 2025 02:29
@gfphoenix78 gfphoenix78 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from dfd774c to 09a2eab Compare June 19, 2025 03:14
@my-ship-it
Copy link
Contributor

my-ship-it commented Jun 20, 2025

Two commits also needed to be cherry picked:
greenplum-db/gpdb-archive@b3ad725
greenplum-db/gpdb-archive@e7b1f88

@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch 5 times, most recently from 6275817 to 3632053 Compare June 25, 2025 04:11
Copy link
Member

@tuhaihe tuhaihe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, FYI.

Also, need to generate the latest configure file using the autoconf cmd.

@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from f5e8a8f to 00c6b06 Compare July 1, 2025 16:47
@avamingli avamingli force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from 00c6b06 to 0cb0c1f Compare July 2, 2025 03:51
@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from 0cb0c1f to f61616b Compare July 3, 2025 01:43
During TPC-DS testing, we observed that compiling postgres with
libpostgres.so introduces PTL function call overhead for some functions.
By linking object files (*.o) directly instead, we achieved a 5-8%
performance improvement in the TPC-DS 1TB benchmark.

This commit added an option enable_link_postgres_with_shared to
link libpostgres.so when compiling postgres, and The default value
is false, just like greenplum, statically linking all object files
when compiling postgres.

Additionally, this update fixes a minor bug: the pax extension has a
dependency on libpostgres.so. Now, when enabling the pax entension, we
check that enable_shared_postgres_backend is set to 'yes' to ensure
proper functionality.

And the ic-cbdb-parallel test has been migrated to use the release
version instead of the debug version.

This change was made because running the test on the debug version
caused disk space issues. When both libpostgres.so and postgres are
compiled in the debug version, disk usage increases by several hundred
megabytes compared to the release version. As a result, the
ic-cbdb-parallel test failed due to insufficient disk space.

By switching to the release version, this issue is resolved, and the
test runs faster as well.
@gongxun0928 gongxun0928 force-pushed the chore/build-postgres-in-static-and-build-libpostgres.so-for-other-extension-which-write-unittests branch from f61616b to f925e2b Compare July 3, 2025 09:56
@my-ship-it my-ship-it dismissed edespino’s stale review July 4, 2025 06:02

As discussed with Engineers, we have adopted most of the responses, and there is a long time no response yet.

@my-ship-it my-ship-it merged commit 8793ec0 into apache:main Jul 4, 2025
26 checks passed
@github-project-automation github-project-automation bot moved this from Awaiting Feedback to Done in Apache Cloudberry (Incubating) 2.1.0 Jul 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

7 participants