Open
Description
This issue has been migrated from deeplearning4j/deeplearning4j#5213
Original author @AlexDBlack
https://github.com/deeplearning4j/DataVec/issues/355
Issue Description
As a total newcomer to datavec transform processes, it was very difficult for me to infer how to build a full transform pipeline. With the exception of @tomthetrainer's easy to follow video, I was lost on how to properly work with CSV data.
This issue is to address the need for an example that does the following:
- Works with a complex CSV of different types (categorical, integers, doubles strings)
- Shows how to use custom conditions and transforms to replace null values
- Shows how to do advanced transformation, including applying advanced code on string values (for example, CSV has human input text and we want to classify it before passing further into pipeline)
- Shows how to save to Hadoop map file, then load it into an MLP into training.
- Shows how to do advanced schemas for complex CSVs (what happens when you have 5,000 columns!?).
Version Information
Please indicate relevant versions, including, if relevant:
Master, current, etc.
Contributing
I'm very happy to contribute here. First, I'd like to find out if there's more that should be on this example before continuing.