If you want to explore spark or SQL or write tests, IMHO its cumbersome to write a dataframe definition in a text editor. A table calculation tool provides a better user experience for that. This tool is meant to bridge the gap and allow you to generate dataframe or SQL VALUES statements from a copy/paste of such a tool.
- draw some dataframe in excel (excel.new) or spreadsheets (spreadsheet.new)
- should look like this one
- has a header line containing the column names (line one)
- has data rows/lines (starting from line 2)
- can have empty values
- example:
date name debt remark 2022-02-01 max 1 2022-02-02 max 2 2022-02-03 max 3 2022-02-04 max 4 2022-02-01 john 3
- copy the range, including the header
- paste the range into the
input.tsvfile - run the script via
make generateorpython main.pyOR via running it on repl.it - get your code pieces from either of the
output.*files
- will detect constants - à la
colnames.date- and not string-quote them in the statements - will replace empty string values with
nullorNone - smart enough to figure out end of data values
- additional comments for creating a spark context
- if fields have non.scalar types, the script will still string-quote them
the repository on github