Adapt code and dependencies to Spark 3.0.1#12
Conversation
|
|
||
|
|
||
|
|
||
| override def getReaderForRange[K, C]( |
There was a problem hiding this comment.
Unclear how to specify a range for CrailShuffleReader.
| overhead = 100/initRatio | ||
| logInfo("shuffler writer: initTime " + initTime + ", runTime " + runTime + ", initRatio " + initRatio + ", overhead " + overhead) | ||
| return Some(MapStatus(blockManager.shuffleServerId, sizes)) | ||
| return Some(MapStatus(blockManager.shuffleServerId, sizes, context.taskAttemptId())) |
There was a problem hiding this comment.
Unclear what a good value would be. It also works with a constant value.
asqasq
left a comment
There was a problem hiding this comment.
Added two comments in the code, please have a look and let me know your thoughts.
| </properties> | ||
| </profile> | ||
| <profile> | ||
| <id>spark-3.0.1</id> |
There was a problem hiding this comment.
Do the 2.X profile still work or should we remove them?
| <groupId>org.apache.crail</groupId> | ||
| <artifactId>crail-client</artifactId> | ||
| <version>1.2-incubating-SNAPSHOT</version> | ||
| <version>1.3-incubating-SNAPSHOT</version> |
There was a problem hiding this comment.
Should we depend on a non-snapshot? Or are there any changes that require latest and greatest (since we do not push the SNAPSHOT to maven central this would require crail source to compile)
|
|
||
|
|
||
|
|
||
| override def getReaderForRange[K, C]( |
|
Feel free to add yourself as an authors in https://github.com/zrlio/crail-spark-io/blob/master/AUTHORS |
Adapt the plugin tp Spark 3.0. The version for Spark 2.2.0 is under a new branch spark_2_2_0 so that we can
keep the newest version for the newrest Spark version in master.
I have tested the plugin with Spark 3.0.1, Hadoop 2.7, Apache Crail 1.3 and Crail Spark Terasort
with 1GB, 4GB, 16HB and 64GB and validated the correct sorting with and without this
plugin. I did not run into problems or incorrect sortings.
Please have a look at the code.