Skip to content

Hadoop DSL -> Jar compiler #96

@convexquad

Description

@convexquad

This is a visionary idea that will require more technical depth and work, but it would be a pretty awesome enhancement!

Internal to LinkedIn, the Photon Plugin extends the Hadoop DSL with syntax to declare re-usable Hadoop DSL workflows. I intend to extend the Hadoop DSL language directly to encompass these features sometime in the near future.

However, that's only one side of the issue. The second issue (which the Photon Plugin does not itself solve) is how to package re-usable Hadoop DSL workflows so that other teams can instantiate them.

The way to do this to setup a new Hadoop DSL compiler. Instead of compiling to Azkaban or Oozie, the compiler will build a jar that encodes the structure of the declared Hadoop DSL in the jar! Then users can add the jar to their buildscript classpath, and there needs to be a special Hadoop DSL method that is able to read back the encoded Hadoop DSL structure from the jar.

This feature would enable various teams (like the ML-Algorithms Team or even like UMP) to declare re-usable Hadoop DSL workflows, distribute them as multiproduct artifacts, and other teams to invoke them / reuse them. This could be a giant win for LinkedIn and would be a highly-visible technical accomplishment.

This is probably a 1-2 month effort (perhaps even a quarter-long effort) for a single developer. @nntnag17 @akshayrai @pranayhasan @rajagopr keep this in mind as a potential larger and more technical task. I would agree to providing design ideas and technical feedback for this enhancement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions