Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions build/base/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,18 @@ function resolve_host() {
check="nslookup $host"
max_retry=10
counter=0
backoff=0.1
backoff=1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backoff=1
backoff=3

@Syed-Suhaan It seems 1 sec is not enough: https://github.com/kubeflow/mpi-operator/actions/runs/20128945073/job/57765870629?pr=757

Let's increase it to 3 sec.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Syed-Suhaan Uhm, 3 sec seems still not enough time. Did 3 sec work well in your local?
If not, we might need to increase it to 5 sec.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 3s works locally, but my local environment resolves instantly so it doesn't reproduce the race condition. Since the CI logs show the environment is noticeably slower, I've updated it to 5s as suggested to ensure we clear that race condition reliably.

until $check > /dev/null
do
if [ $counter -eq $max_retry ]; then
echo "Couldn't resolve $host"
return
fi
echo "Couldn't resolve $host. Sleeping ${backoff}s before retry..."
sleep $backoff
echo "Couldn't resolve $host... Retrying"
echo "Retrying resolution of $host..."
((counter++))
backoff=$(echo - | awk "{print $backoff + $backoff}")
backoff=$((backoff + backoff))
done
echo "Resolved $host"
}
Expand Down
2 changes: 1 addition & 1 deletion build/base/intel-builder.Dockerfile
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the previous indent?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the indentation to match the previous style.

Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ RUN apt update \
&& apt install -y --no-install-recommends \
libstdc++-12-dev binutils procps clang \
intel-oneapi-compiler-dpcpp-cpp \
intel-oneapi-mpi-devel-2021.13 \
intel-oneapi-mpi-devel-2021.14 \
&& apt remove -y gnupg2 ca-certificates apt-transport-https \
&& apt autoremove -y \
&& rm -rf /var/lib/apt/lists/*
Expand Down
2 changes: 1 addition & 1 deletion build/base/intel.Dockerfile
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the indentation to match the previous style.

Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ RUN apt update \
&& apt update \
&& apt install -y --no-install-recommends \
dnsutils \
intel-oneapi-mpi-2021.13 \
intel-oneapi-mpi-2021.14 \
&& apt remove -y gnupg2 ca-certificates \
&& apt autoremove -y \
&& rm -rf /var/lib/apt/lists/*
Expand Down
Loading