@@ -48,7 +48,7 @@ The actions are:
4848- ` convert ` : Convert a register SAS file (or multiple) to Parquet.
4949- ` list ` : List files in a directory, e.g., SAS or Parquet files.
5050- ` read ` : Read a Parquet register into R as a DuckDB table.
51- - ` use ` : Use a template in the current project .
51+ - ` use ` : Set up ` _targets.R ` and a Quarto report template .
5252- ` get ` : Get or guess some information, e.g., the project ID, workdata
5353 directory, or rawdata directory from the current working directory.
5454
@@ -144,28 +144,73 @@ within the same register contain exactly the same columns and data
144144types, the conversion report helps identify any differences between
145145these files.
146146
147- It addresses the following question:
148-
149- - Are the column names, number of columns, and data types the same
150- across all Parquet files within a register?
151-
152- The report is produced by the following functions with distinct
153- responsibilities:
154-
155- - ` get_parquet_info() ` : Reads the Parquet parts for a single file and
156- returns the column names and data types.
157- - ` create_report() ` : Assembles the report by taking the outputs of
158- ` get_parquet_info() ` for all files in a register and returns a
159- structured list with a comparison and basic check of the files. It
160- reports the findings of the check for the register as either OK (ll
161- column names, data types, and number of columns are the same across
162- all Parquet files) or DIFF (one or more files have different column
163- names, number of columns, or data types).
164-
165147::: callout-note
166148Discrepancies (different columns or incompatible data types) between
167149files within the same register do not stop the conversion, but will be
168150noted in the report.
169151:::
170152
171- The results are written to a txt file to allow for later inspection.
153+ ` convert_file() ` returns a metadata tibble with one row per written
154+ chunk. This can be queried with ` dplyr ` directly or rendered into a
155+ Quarto report.
156+
157+ ### Return value of ` convert_file() `
158+
159+ ` convert_file() ` returns a tibble with one row per written chunk:
160+
161+ | Column | Description |
162+ | ----------------| ----------------------------------------------|
163+ | ` input_path ` | Path to the source SAS file |
164+ | ` output_path ` | Path to the written Parquet part file |
165+ | ` row_count ` | Number of rows in the chunk |
166+ | ` column_count ` | Number of columns in the chunk |
167+ | ` columns ` | Nested tibble with column ` name ` and ` type ` |
168+
169+ The information is derived from the chunk already in memory, not by
170+ reading the Parquet file back.
171+
172+ ``` r
173+ # Before repeat loop.
174+ chunk_info_list <- list ()
175+
176+ # Inside the repeat loop, after writing.
177+ chunk_info_list <- c(chunk_info_list , list (tibble :: tibble(
178+ input_path = path ,
179+ output_path = fs :: path(file_path ),
180+ row_count = nrow(chunk ),
181+ column_count = ncol(chunk ),
182+ columns = list (tibble :: tibble(
183+ name = colnames(chunk ),
184+ type = purrr :: map_chr(chunk , class )
185+ ))
186+ )))
187+
188+ # After the loop, bind all chunk tibbles.
189+ dplyr :: bind_rows(chunk_info_list )
190+ ```
191+
192+ ### Quarto report template
193+
194+ ` use_fastreg_template() ` copies both ` _targets.R ` and
195+ ` conversion_report.qmd ` into the current working directory. The Quarto
196+ doc reads ` chunk_info ` via ` targets::tar_read() ` and produces an HTML or
197+ PDF report for review.
198+
199+ <!-- TODO: What should the default format be? -->
200+
201+ ``` r
202+ chunk_info <- targets :: tar_read(chunk_info )
203+
204+ # Nice overview of the info + schema comparison within registers.
205+ ...
206+ ```
207+
208+ The report is added to the targets pipeline as a last target:
209+
210+ ``` r
211+ tar_target(
212+ name = report ,
213+ command = quarto :: quarto_render(" conversion_report.qmd" ),
214+ deployment = " main"
215+ )
216+ ```
0 commit comments