Skip to content

Base image based on Alpine #99

@dragospopa420

Description

@dragospopa420

Which package is the feature request for? If unsure which one to select, leave blank

@crawlee/core

Feature

The base image is based on Debian which has a much bigger fingerprint than the Alpine Linux.
So I was thinking maybe the included dockerfile can be based on Alpine Linux, for fast deployment and testing
The apify/actor-node-puppeteer-chrome has 2.53gb, my version has 698mb

Motivation

I'm building an infrastructure of spiders based on Crawlee and I wanted to have the fastest possible deployment time.

Ideal solution or implementation, and any additional constraints

FROM node:current-alpine

# Set workdir
WORKDIR /usr/src/app

# Copy just package.json and package-lock.json
# to speed up the build using Docker layer cache.
COPY package*.json ./

# Change rights for package-lock.json
RUN chmod 744 package-lock.json

# Install chromium and it's dependencies, node is also here to be sure that is updated
RUN apk add --no-cache \
      chromium \
      nss \
      freetype \
      harfbuzz \
      ca-certificates \
      ttf-freefont \
      nodejs \
      yarn

# This tells puppeteer to not download chrome again
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
    && npm install --omit=dev --omit=optional \
    && echo "Installed NPM packages:" \
    && (npm list --omit=dev --all || true) \
    && echo "Node.js version:" \
    && node --version \
    && echo "NPM version:" \
    && npm --version

# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./

# Required for Crawlee
ENV CRAWLEE_CHROME_EXECUTABLE_PATH=/usr/bin/chromium-browser
RUN chmod 744 /usr/bin/chromium-browser

# Run the image.
CMD npm start 

Alternative solutions or implementations

No response

Other context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions