OVHcloud Data Processing Spark-submit

With OVHcloud Data Processing, you can run your processing job over the cloud, in a fast, easy and cost-efficient way. Everything is managed and secured by OVHcloud and you just need to define how much resources you would like. You can learn more about OVHcloud Data Processing here.

In this repository you will find a spark-submit wrapper to run spark job on OVHcloud Data Processing.

Configuration

Create an OVHcloud token by visiting https://eu.api.ovh.com/createToken/ and add right GET/POST/PUT on endpoint /cloud/project/*/dataProcessing/*

If you want to use the auto upload you need set storage's parameters too.

Supported storage protocol :

swift (OVHcloud Object Storage with Keystone v3 authentication)

Then create the configuration file configuration.ini in the same directory as below :

[ovh]
; configuration specific to 'ovh-eu' endpoint
endpoint=ovh-eu
application_key=my_app_key
application_secret=my_application_secret
consumer_key=my_consumer_key

; configuration specific for protocol swift (OVHcloud Object Storage with Keystone v3 authentication)
[swift]
user_name=openstack_user_name
password=openstack_password
auth_url=openstack_auth_url
domain=openstack_auth_url_domain
region=openstack_region

Build

make init
make release

Run

ovh-spark-submit [--jobname JOBNAME] [--region REGION] --projectid PROJECTID [--spark-version SPARK-VERSION] [--upload UPLOAD] [--class CLASS] --driver-cores DRIVER-CORES --driver-memory DRIVER-MEMORY [--driver-memoryOverhead DRIVER-MEMORYOVERHEAD] --executor-cores EXECUTOR-CORES --num-executors NUM-EXECUTORS --executor-memory EXECUTOR-MEMORY [--executor-memoryOverhead EXECUTOR-MEMORYOVERHEAD] FILE [PARAMETERS [PARAMETERS ...]]
                 
Positional arguments:
   FILE
   PARAMETERS
 
Options:
   --jobname JOBNAME      Job name (can be set with ENV vars JOB_NAME)
   --region REGION        Openstack region of the job (can be set with ENV vars OS_REGION) [default: GRA]
   --projectid PROJECTID
                          Openstack ProjectID (can be set with ENV vars OS_PROJECT_ID)
   --spark-version SPARK-VERSION
                          Version of spark (can be set with ENV vars SPARK_VERSION) [default: 2.4.3]
   --upload UPLOAD        Comma-delimited list of file path/dir to upload before running the job (can be set with ENV vars UPLOAD)
   --class CLASS          main-class
   --driver-cores DRIVER-CORES
   --driver-memory DRIVER-MEMORY
                          Driver memory in (gigi/mebi)bytes (eg. "10G")
   --driver-memoryOverhead DRIVER-MEMORYOVERHEAD
                          Driver memoryOverhead in (gigi/mebi)bytes (eg. "10G")
   --executor-cores EXECUTOR-CORES
   --num-executors NUM-EXECUTORS
   --executor-memory EXECUTOR-MEMORY
                          Executor memory in (gigi/mebi)bytes (eg. "10G")
   --executor-memoryOverhead EXECUTOR-MEMORYOVERHEAD
                          Executor memory in (gigi/mebi)bytes (eg. "10G")
   --packages PACKAGES    Comma-delimited list of Maven coordinates
   --repositories REPOSITORIES
                          Comma-delimited list of additional repositories (or resolvers in SBT)
   --properties-file      Read properties from the given file
   --ttl                  Maximum "Time To Live" (in RFC3339 (duration) eg. "P1DT30H4S") of this job, after which it will be automatically terminated
   --help, -h             display this help and exit

Example

Without Auto Upload:

OS_PROJECT_ID=1377b21260f05b410e4652445ac7c95b  ./ovh-spark-submit --class org.apache.spark.examples.SparkPi --driver-cores 1 --driver-memory 4G --executor-cores 1 --executor-memory 4G --num-executors 1 swift://odp/spark-examples.jar 1000

With Auto Upload

OS_PROJECT_ID=1377b21260f05b410e4652445ac7c95b  ./ovh-spark-submit --upload ./spark-examples.jar --class org.apache.spark.examples.SparkPi --driver-cores 1 --driver-memory 4G --executor-cores 1 --executor-memory 4G --num-executors 1 swift://odp/spark-examples.jar 1000

Outputs

Once your job is executed successfully, the CLI prints out jobs information:

the jobs log location
the exitcode is printed out
the CLI exit with this code

2022-10-07 09:01:09,693 - deploy - INFO - End of job cc5724d1-bdce-4e99-a72f-xxxx with status 0
2022/10/07 11:01:12 You can download your logs at https://storage.gra.cloud.ovh.net/v1/AUTH_4beb99ff282e4d16b215375xxxx/odp-logs?prefix=cc5724d1-bdce-4e99-a72f-xxxx
2022/10/07 11:01:12 Job status is : COMPLETED
2022/10/07 11:01:12 Job exit code : 0

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
testdata		testdata
upload		upload
utils		utils
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS		AUTHORS
CONTRIBUTORS		CONTRIBUTORS
LICENSE		LICENSE
MAINTAINERS		MAINTAINERS
Makefile		Makefile
README.md		README.md
data_processing.go		data_processing.go
data_processing_test.go		data_processing_test.go
go.mod		go.mod
go.sum		go.sum
golangci.yml		golangci.yml
submit.go		submit.go
submit_test.go		submit_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OVHcloud Data Processing Spark-submit

Configuration

Build

Run

Example

Outputs

About

Uh oh!

Releases

Packages

Languages

License

pyverdon/data-processing-spark-submit

Folders and files

Latest commit

History

Repository files navigation

OVHcloud Data Processing Spark-submit

Configuration

Build

Run

Example

Outputs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages