Scraping Flow Dependencies | Possible to set Docker security profile for Flow running in a container? #3444
Unanswered
aaronjbecker
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I would like to use Prefect to build a Flow in Python that begins with scraping a website using Playwright. As I understand it, I should be able to manage dependencies for this task by specifying a custom Dockerfile (e.g. based on this one) for the DockerStorage and running it using the Docker agent. So far so good?
In order for Chromium to support sandboxing, which the Playwright dev team highly recommends for scraping tasks, the Docker container needs to be run with a custom seccomp security profile. The security profile is usually supplied as a path to a JSON file in docker run arguments, but it is also one of the security_options you can set in the Python Docker SDK’s create_host_config function. Chromium sandboxes also work if you run the container with “cap-add SYS_ADMIN”, but that adds more capabilities than are needed and would still require customizing a container's host config.
Is there currently a way to have the DockerAgent run a container with a specified security profile? Would it break any abstractions/make sense conceptually to add one?
I think one could implement this by adding some additional fields (something like extra_host_config_args) to DockerStorage, and then passing those along to create_host_config in the DockerAgent’s deploy_flow function… does that make sense?
I’m very new to Prefect so apologies if I’m not seeing broader implications or unintended side-effects.
Beta Was this translation helpful? Give feedback.
All reactions