Adapt code and dependencies to Spark 3.0.1 by asqasq · Pull Request #12 · zrlio/crail-spark-io

asqasq · 2020-12-14T14:43:17Z

Adapt the plugin tp Spark 3.0. The version for Spark 2.2.0 is under a new branch spark_2_2_0 so that we can
keep the newest version for the newrest Spark version in master.

I have tested the plugin with Spark 3.0.1, Hadoop 2.7, Apache Crail 1.3 and Crail Spark Terasort
with 1GB, 4GB, 16HB and 64GB and validated the correct sorting with and without this
plugin. I did not run into problems or incorrect sortings.

Please have a look at the code.

asqasq · 2020-12-14T14:45:12Z

src/main/scala/org/apache/spark/shuffle/crail/CrailShuffleManager.scala

+
+
+
+  override def getReaderForRange[K, C](


Unclear how to specify a range for CrailShuffleReader.

Can you clarify?

asqasq · 2020-12-14T14:46:01Z

src/main/scala/org/apache/spark/shuffle/crail/CrailShuffleWriter.scala

      overhead = 100/initRatio
      logInfo("shuffler writer: initTime " + initTime + ", runTime " + runTime + ", initRatio " + initRatio + ", overhead " + overhead)
-      return Some(MapStatus(blockManager.shuffleServerId, sizes))
+      return Some(MapStatus(blockManager.shuffleServerId, sizes, context.taskAttemptId()))


Unclear what a good value would be. It also works with a constant value.

asqasq

Added two comments in the code, please have a look and let me know your thoughts.

PepperJo · 2020-12-14T17:50:47Z

pom.xml

      </properties>
    </profile>
+    <profile>
+      <id>spark-3.0.1</id>


Do the 2.X profile still work or should we remove them?

PepperJo · 2020-12-14T17:53:17Z

pom.xml

      <groupId>org.apache.crail</groupId>
      <artifactId>crail-client</artifactId>	
-      <version>1.2-incubating-SNAPSHOT</version>
+      <version>1.3-incubating-SNAPSHOT</version>


Should we depend on a non-snapshot? Or are there any changes that require latest and greatest (since we do not push the SNAPSHOT to maven central this would require crail source to compile)

PepperJo · 2020-12-14T17:56:02Z

src/main/scala/org/apache/spark/shuffle/crail/CrailShuffleManager.scala

+
+
+
+  override def getReaderForRange[K, C](


Can you clarify?

PepperJo · 2020-12-14T18:02:28Z

Feel free to add yourself as an authors in https://github.com/zrlio/crail-spark-io/blob/master/AUTHORS

Adapt code and dependencies to Spark 3.0.1

3beb2ac

asqasq commented Dec 14, 2020

View reviewed changes

PepperJo reviewed Dec 14, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt code and dependencies to Spark 3.0.1#12

Adapt code and dependencies to Spark 3.0.1#12
asqasq wants to merge 1 commit intozrlio:masterfrom
asqasq:spark3

asqasq commented Dec 14, 2020

Uh oh!

asqasq Dec 14, 2020

Uh oh!

PepperJo Dec 14, 2020

Uh oh!

asqasq Dec 14, 2020

Uh oh!

asqasq left a comment

Uh oh!

PepperJo Dec 14, 2020

Uh oh!

PepperJo Dec 14, 2020

Uh oh!

PepperJo Dec 14, 2020

Uh oh!

PepperJo commented Dec 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asqasq commented Dec 14, 2020

Uh oh!

asqasq Dec 14, 2020

Choose a reason for hiding this comment

Uh oh!

PepperJo Dec 14, 2020

Choose a reason for hiding this comment

Uh oh!

asqasq Dec 14, 2020

Choose a reason for hiding this comment

Uh oh!

asqasq left a comment

Choose a reason for hiding this comment

Uh oh!

PepperJo Dec 14, 2020

Choose a reason for hiding this comment

Uh oh!

PepperJo Dec 14, 2020

Choose a reason for hiding this comment

Uh oh!

PepperJo Dec 14, 2020

Choose a reason for hiding this comment

Uh oh!

PepperJo commented Dec 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants