More Dataset Support #341
-
Hello, I was wondering if there was a plan to do more support for Datasets. Essentially, I am trying to answer questions like which field in the input data ended up in which field in the output data? I wrote this small program and I dug through the Arango data base to try to figure out how I could find that the field package za.co.absa.spline.example.batch
import za.co.absa.spline.SparkApp
object ErinExample1Job extends SparkApp("Erin Example 1") {
case class Person (
first_name: String,
last_name:String
)
case class NewPerson (
first_name_new: String,
last_name_new:String
)
case class CSVModel (
d_code: String,
d_name: String,
people: Seq[Person]
)
case class NewCSVModel (
d_code_new: String,
d_name_new: String,
people_new: Seq[NewPerson]
)
import org.apache.spark.sql._
import za.co.absa.spline.harvester.SparkLineageInitializer._
def PersonMaker(p: Person): NewPerson = {
NewPerson(
first_name_new = p.first_name.concat(" TEST"),
last_name_new = p.last_name.concat(" TEST")
)
}
spark.enableLineageTracking()
val encoder = org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[CSVModel]
val ds: Dataset[CSVModel] = spark.read
.option("inferSchema", "true")
.json("data/input/batch/test.json")
.as(encoder)
ds.map{
row => {
NewCSVModel(
d_code_new = row.d_code.concat("123"),
d_name_new = row.d_code.concat("123"),
people_new = row.people.map(PersonMaker(_))
)
}
}.write.mode(SaveMode.Overwrite).json("data/output/batch/erin_test_results")
} |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hello, even though Spline recognizes the lineage at the operation level, the extraction of attribute level lineage for commands used here is not yet implemented. I created a ticket for that: #342 |
Beta Was this translation helpful? Give feedback.
Hello,
even though Spline recognizes the lineage at the operation level, the extraction of attribute level lineage for commands used here is not yet implemented.
I created a ticket for that: #342