Create a Spark image using the binary spark distribution tars

Currently the spark distribution / hadoop libs in the image is installed using conda / pip which has a few implications.

* Because pip is being used some parts of the distribution are being left out (such as a `start-thriftserver.sh` script)
* The location of the distribution is a weird one, as it's within the conda directory (/opt/miniconda3/lib/python3.8/site-packages/pyspark)

Other findings:
 * variables like `SPARK_HOME` aren't set
 * Root user is being used
 * Could be using a multi-stage build to reduce image size and to avoid uninstalling dependencies in the Dockerfile

Might also be an idea to use a spark base image, like https://github.com/bitnami/bitnami-docker-spark which improves on all of these points 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a Spark image using the binary spark distribution tars #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create a Spark image using the binary spark distribution tars #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions