Skip to content

Question: How to use DataFrame API to achieve the function equivalent to map/reduce in spark.net #1156

Open
@JunweiSUN

Description

@JunweiSUN

Hi,
we have a scenario that need to use the map/reduce function in spark.net,
For example, we want to call

public IEnumerable<object[]> MapCallback(IEnumerable<Row> input)
{
   // do something with `IEnumerable<Row> input`
}

df.Rdd.MapPartitions(MapCallback, true)

The thing is, we need this IEnumerable input so that we can do some operation on the row level.
In Mobius, we can access all the Rdd-related APIs, but according to this issue seems all the Rdd-related APIs are no longer accessible.

So we have the following questions:

  1. Is there any API in current Spark.Net that can implement the function that exactly equivalent to Rdd.Map, Rdd.Reduce and other mapreduce related function? Note that we need to deal with a IEnumerable with arbitrary number of elements in one row, i.e., we may not know how many elements (columns) in a row until runtime.
  2. If the answer of 1 is false, can we just download the source code, change the visibility of Rdd-related APIs to public, and build a private bits to use?
  3. Any other related suggestions will be really appreciated.

Looking forward to your answer! Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions