Simple setup for it-hadoop-client

I found that by just following the instructions at https://hadoop-user-guide.web.cern.ch/hadoop-user-guide/gettingstarted_md.html I can submit this minimal job:
```python
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession

conf = SparkConf().setMaster("yarn").setAppName("CMS Working Set")
sc = SparkContext(conf=conf)
spark = SparkSession(sc)

readavro = spark.read.format("com.databricks.spark.avro")
fwjr = readavro.load("/cms/wmarchive/avro/fwjr/201[789]/*/*/*.avro")
```
with
```sh
spark-submit --packages com.databricks:spark-avro_2.11:4.0.0 test.py
```

Perhaps this is a better soft introduction than the RDD complexity?  Also, there seem to be lxplus options.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple setup for it-hadoop-client #25

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Simple setup for it-hadoop-client #25

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions