Skip to content

Commit 96b100f

Browse files
bujjibabukattaS4sriharikukushking
authored
fix: reduce Lambda layer size by building Arrow without ICU (#3336)
* fix: reduce Lambda layer size by building Arrow without ICU (#3331) Arrow 22 links against libicu when it is present in the build environment. Dockerfile.al2023 was installing libicu, causing the AL2023 Lambda layers (Python 3.12/3.13/3.14) to bundle libicudata.so.67 (~30 MB), libicui18n.so.67, and libicuuc.so.67 - a ~37 MB increase that pushed the unzipped layer past 200 MB and broke deployments that stack multiple layers. The AL2 Dockerfile (Python 3.9-3.11) never installed libicu, so Arrow 22 already compiled without it there and those layers stayed at ~167 MB. Fix: - Remove libicu from Dockerfile.al2023 dnf install. - Add -DARROW_WITH_ICU=OFF to the Arrow cmake invocation. - Drop libicudata.so.67, libicui18n.so.67, libicuuc.so.67 from the bundle loop. - Strip the bundled lib/*.so files to recover a few extra MB. Expected layer size after this fix: ~167 MB (back to pre-3.16.1 levels). Co-Authored-By: Bujji Babu Katta & Srihari Ponakala * Strip debug info from shared libraries Added a step to strip symbol tables and debug info from shared libraries to reduce binary size before zipping. --------- Co-authored-by: S4srihari <srihariponakala@gmail.com> Co-authored-by: Anton Kukushkin <kukushkin.anton@gmail.com>
1 parent 493a539 commit 96b100f

2 files changed

Lines changed: 7 additions & 2 deletions

File tree

building/lambda/Dockerfile.al2023

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ RUN dnf install -y \
88
jemalloc-devel \
99
libxml2-devel \
1010
libxslt-devel \
11-
libicu \
1211
libatomic \
1312
bison \
1413
make \

building/lambda/build-lambda-layer.sh

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ cmake \
4747
-DARROW_WITH_LZ4=OFF \
4848
-DARROW_WITH_BROTLI=OFF \
4949
-DARROW_BUILD_TESTS=OFF \
50+
-DARROW_WITH_ICU=OFF \
5051
-GNinja \
5152
..
5253

@@ -110,8 +111,10 @@ find python -regex '^.*\(__pycache__\|\.py[co]\)$' -delete
110111
# libatomic for pyarrow 22+). Lambda extracts layers to /opt/ and /opt/lib
111112
# is on LD_LIBRARY_PATH. Search ldconfig cache first, then fall back to a
112113
# filesystem search (libatomic under gcc10 lives in /usr/lib/gcc/*).
114+
# ICU libraries are intentionally excluded: Arrow is built with -DARROW_WITH_ICU=OFF
115+
# so pyarrow has no ICU dependency, keeping the layer size minimal.
113116
mkdir -p lib
114-
for libfile in libxslt.so.1 libexslt.so.0 libatomic.so.1 libicudata.so.67 libicui18n.so.67 libicuuc.so.67; do
117+
for libfile in libxslt.so.1 libexslt.so.0 libatomic.so.1; do
115118
src=$(ldconfig -p 2>/dev/null | awk -v lib="${libfile}" '$1 == lib { print $NF; exit }')
116119
if [ -z "${src}" ] || [ ! -e "${src}" ]; then
117120
src=$(find /usr/lib /usr/lib64 -name "${libfile}" -print -quit 2>/dev/null)
@@ -124,6 +127,9 @@ for libfile in libxslt.so.1 libexslt.so.0 libatomic.so.1 libicudata.so.67 libicu
124127
fi
125128
done
126129

130+
# Strip symbol tables and debug info to reduce binary size
131+
find lib -name '*.so*' -type f -exec strip "{}" \;
132+
127133
zip -r9 "${FILENAME}" ./python ./lib
128134
mv "${FILENAME}" dist/
129135

0 commit comments

Comments
 (0)