Skip to content

Fix #5 unzip overwrite failure and add cache reuse#6

Open
vishal24367 wants to merge 2 commits into
mainfrom
5-fix-unzip-overwrite-on-cache-restore
Open

Fix #5 unzip overwrite failure and add cache reuse#6
vishal24367 wants to merge 2 commits into
mainfrom
5-fix-unzip-overwrite-on-cache-restore

Conversation

@vishal24367
Copy link
Copy Markdown
Contributor

Summary

  • Root cause: unzip -q awscli.zip (without -o) prompts to overwrite existing files when the cache is restored. In non-interactive builds it reads EOF, treats as "None", exits non-zero → "Failed to extract AWS CLI"
  • Fix 1: Add -o to unzip so it overwrites without prompting
  • Fix 2: Add cache reuse — if $AWS_CLI_BIN exists, runs, and was built for the same architecture (checked via cached TOML metadata), skip the 64MB download + install entirely
  • Also adds --update to aws/install for the case where the aws-cli/ directory already exists, and removes the zip after install to clean up

Behavior change

Scenario Before After
First build (no cache) Download + install Download + install
Subsequent build (cache restored, same arch) Crash (EOF in unzip) Skip (reuse cache)
Subsequent build (arch changed) Crash Download + install

Test plan

  • Deploy slug-compiler-web — build should succeed (was failing continuously before this fix)
  • Confirm log shows "Reusing cached AWS CLI for arm64" on second build
  • Confirm no 64MB download on second build

🤖 Generated with Claude Code

When the awscli cache layer is restored by the lifecycle, the layer
directory already contains the aws/ tree from the previous install.
unzip without -o prompts to replace existing files; in non-interactive
builds it reads EOF, chooses "None", and exits non-zero, triggering the
"Failed to extract AWS CLI" error.

Two fixes:
- Add -o flag to unzip so it overwrites without prompting
- Add cache reuse: if the cached binary exists and runs for the same
  architecture, skip the 64MB download+install entirely

Also pass --update to aws/install to handle the case where the previous
aws-cli directory already exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@neeto-bugwatch neeto-bugwatch Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 NeetoBugWatch Review

I've analyzed your pull request and found some potential improvements that might be worth considering. Please review the specific line comments below for helpful suggestions.

💡 To trigger a re-review of this PR, add a comment with neeto-bugwatch-run on its own line (not inside a code block or blockquote).

Warning

Verdict: Needs review — 3 comments found, including major issues

Click to copy prompt for AI agent
Verdict: review_required

The following are code review comments from an AI reviewer that only has access
to the PR diff, not the full codebase. Treat every comment as a hypothesis,
not a fact. For each comment:

1. Investigate the verification questions against the full codebase first.
2. Go beyond the provided questions. Use your full codebase access to ask your
   own questions, check related code paths, and consider edge cases that the
   reviewer could not have seen from the diff alone.
3. Only implement the suggested fix if your investigation confirms the issue is real.
4. If the issue is not real, skip it entirely. Do not implement speculative fixes.

Prioritize by severity (critical > major > minor > trivial) and confidence score.
Only attend to comments with confidence below 0.5 if your investigation proves them valid.

[
  {
    "file_path": "bin/build",
    "line_number": 93,
    "comment": "The installation process leaves the extracted `aws/` installer directory inside `TARGET_DIR`. Since `TARGET_DIR` acts as the buildpack layer directory, leaving this large installer directory (150MB+) will bloat the cached layer and the final container image size.",
    "confidence": 0.95,
    "severity": "major",
    "suggested_fix": "Remove the extracted installer directory by appending `rm -rf \"$TARGET_DIR/aws\"` when removing the zip file.",
    "verification_questions": [
      "Does `unzip awscli.zip` extract an `aws/` directory inside `TARGET_DIR`?",
      "Are the files inside `TARGET_DIR/aws/` needed after the `TARGET_DIR/aws/install` script completes?"
    ]
  },
  {
    "file_path": "bin/build",
    "line_number": 68,
    "comment": "The `curl` command lacks the `-f` (or `--fail`) flag. If the URL responds with an HTTP error (like 404), `curl` will still succeed (exit code 0) but save the HTML error page into `awscli.zip`. This obscures the root cause, as the script will confusingly fail at the `unzip` step instead.",
    "confidence": 0.95,
    "severity": "minor",
    "suggested_fix": "Add the `-f` flag to `curl` (e.g., `curl -fL`) so that it exits with a non-zero status code on HTTP errors.",
    "verification_questions": [
      "What is the current behavior if `awscli_arch` evaluates to an invalid architecture resulting in a 404?",
      "Would debugging be easier if the script failed immediately at the download step with an explicit HTTP error?"
    ]
  },
  {
    "file_path": "bin/build",
    "line_number": 61,
    "comment": "Parsing TOML with `grep` and `sed` can be fragile. `grep 'architecture'` might accidentally match descriptions or comments containing the word 'architecture' before the actual key. Also, the `sed` regex strictly requires double quotes; if single quotes are used, it will fail to capture the value and fallback to the entire matched line, causing unnecessary cache misses.",
    "confidence": 0.85,
    "severity": "minor",
    "suggested_fix": "Consider anchoring the grep search (e.g., `grep -E '^\\s*architecture\\s*='`) to match only the key, and relaxing the `sed` regex to handle optional single quotes, or use a tool like `yj` or `jq` if available.",
    "verification_questions": [
      "Could the TOML file contain the word 'architecture' in preceding lines, such as descriptions or comments?",
      "Does the buildpack lifecycle ever write TOML values using single quotes instead of double quotes?"
    ]
  }
]

Comment thread bin/build
else
echo " ! awscli: Binary not found after installation"
exit 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation process leaves the extracted aws/ installer directory inside TARGET_DIR. Since TARGET_DIR acts as the buildpack layer directory, leaving this large installer directory (150MB+) will bloat the cached layer and the final container image size.

Click to see review metadata
{
  "confidence": 0.95,
  "severity": "major",
  "suggested_fix": "Remove the extracted installer directory by appending `rm -rf \"$TARGET_DIR/aws\"` when removing the zip file.",
  "verification_questions": [
    "Does `unzip awscli.zip` extract an `aws/` directory inside `TARGET_DIR`?",
    "Are the files inside `TARGET_DIR/aws/` needed after the `TARGET_DIR/aws/install` script completes?"
  ]
}

Comment thread bin/build
fi

if [[ "$cached_arch" == "$target_arch" ]] && [[ -f "$AWS_CLI_BIN" ]] && "$AWS_CLI_BIN" --version > /dev/null 2>&1; then
echo " awscli: Reusing cached AWS CLI for ${target_arch}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl command lacks the -f (or --fail) flag. If the URL responds with an HTTP error (like 404), curl will still succeed (exit code 0) but save the HTML error page into awscli.zip. This obscures the root cause, as the script will confusingly fail at the unzip step instead.

Click to see review metadata
{
  "confidence": 0.95,
  "severity": "minor",
  "suggested_fix": "Add the `-f` flag to `curl` (e.g., `curl -fL`) so that it exits with a non-zero status code on HTTP errors.",
  "verification_questions": [
    "What is the current behavior if `awscli_arch` evaluates to an invalid architecture resulting in a 404?",
    "Would debugging be easier if the script failed immediately at the download step with an explicit HTTP error?"
  ]
}

Comment thread bin/build
if [ -f "$AWS_CLI_BIN" ]; then
# Reuse cached layer if binary exists and was built for the same architecture.
# Reading cached_arch from the TOML written by a previous build is reliable
# because the lifecycle restores it from the cache image before bin/build runs.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing TOML with grep and sed can be fragile. grep 'architecture' might accidentally match descriptions or comments containing the word 'architecture' before the actual key. Also, the sed regex strictly requires double quotes; if single quotes are used, it will fail to capture the value and fallback to the entire matched line, causing unnecessary cache misses.

Click to see review metadata
{
  "confidence": 0.85,
  "severity": "minor",
  "suggested_fix": "Consider anchoring the grep search (e.g., `grep -E '^\\s*architecture\\s*='`) to match only the key, and relaxing the `sed` regex to handle optional single quotes, or use a tool like `yj` or `jq` if available.",
  "verification_questions": [
    "Could the TOML file contain the word 'architecture' in preceding lines, such as descriptions or comments?",
    "Does the buildpack lifecycle ever write TOML values using single quotes instead of double quotes?"
  ]
}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant