|
| 1 | +# Write `create_result_table.py` script |
| 2 | + |
| 3 | +## Initial prompt |
| 4 | + |
| 5 | +I need a Python script which collects evaluation results for a number of datasets |
| 6 | +and cases, computes averages, and write LaTeX code for a table containing these |
| 7 | +average results. |
| 8 | + |
| 9 | +* I started a script in `~/git/keys_values/keys_values/scripts/create_result_table.py`. |
| 10 | + Extend this file by adding the `main` function. |
| 11 | +* For each `(dataset, case), dataset in datasets, |
| 12 | + case in [x[0] for x in cases]`, there is a CSV |
| 13 | + file `base_path / dataset / case / EVAL_METRICS_ALL_FILENAME`. |
| 14 | + The result file has columns `idx, task, sub_exact_match`. The `sub_exact_match` |
| 15 | + column contains metric values: average them over different `idx` for the same |
| 16 | + `task`. |
| 17 | +* The result table has one column for each `dataset`, one row for each `case`. |
| 18 | +* A cell can have several rows, one for each distinct `task` value contained in |
| 19 | + the result file for `(dataset, case)`. These `task` values can be different for |
| 20 | + every cell. |
| 21 | +* The rows in a cell for `(dataset, case)` should be |
| 22 | + f"{short_task}:{avg_sub_exact_match:.4f}", where `avg_sub_exact_match` is the |
| 23 | + average `sub_exact_match` value for some `task`, and `short_task = "fin"` if |
| 24 | + `task == "final"`, `short_task = "120"` if `task == "step-000120"` (the last 3 |
| 25 | + digits). |
| 26 | +* If there is no result file for a combination `(dataset, case)`, the corresponding |
| 27 | + cell is left empty. |
| 28 | +* Label the columns with the `dataset` name, stripping off "helmet_" |
| 29 | +* Label the rows with `x[1]` for the tuple `x` in `cases`. |
| 30 | +* The `main` function should write LaTeX code for the table into the file |
| 31 | + `result_path`. |
| 32 | + |
| 33 | +### Answer |
| 34 | + |
| 35 | +Done. The main function: |
| 36 | + |
| 37 | +* Derives base_path from result_path.parent |
| 38 | +* For each `(dataset, case)` pair, reads the CSV if it exists, groups by task, averages sub_exact_match, and formats each row as short_task:avg |
| 39 | +* Maps task names via _short_task: "final" → "fin", others → last 3 digits |
| 40 | +* Uses `\makecell{...}` for cells with multiple task rows (requires \usepackage{makecell} in the LaTeX preamble) |
| 41 | +* Writes the full `tabular` environment to `result_path` |
| 42 | + |
| 43 | + |
| 44 | +## Comment |
| 45 | + |
| 46 | +The column and row labels may contain "_", which you need to change to "\_" in LaTeX. Could you please fix this? |
| 47 | + |
| 48 | +### Answer |
| 49 | + |
| 50 | +Fixed — underscores in both column and row labels are now escaped as \_. |
| 51 | + |
| 52 | + |
| 53 | +## Comment |
| 54 | + |
| 55 | +I'd like to have a few changes: |
| 56 | + |
| 57 | +* Rows (for `case` values) should be separated by horizontal lines. The top row |
| 58 | + with labels should be separated from the others by a double horizontal line, and |
| 59 | + some space below this double line. |
| 60 | +* For the column labels, please also strip away f"_{dataset_size}" from the right. |
| 61 | +* The multi-row content of a cell should not be centered, but aligned on top, the |
| 62 | + same for the row labels. |
| 63 | +* The different rows in a cell should be aligned. Both values before and after ":" |
| 64 | + should be aligned on the left. Also, the row with "fin" should always be last, and |
| 65 | + others should be sorted by the number before ":" |
| 66 | +* Please use a smaller fontsize for text in cells than for the row and column labels. |
| 67 | + |
| 68 | +### Answer |
| 69 | + |
| 70 | +Here's what changed: |
| 71 | + |
| 72 | +* Row separators: each data row is followed by \hline; the header is separated by \hline\hline with \noalign{\smallskip} below it for extra space. |
| 73 | +* Column label stripping: uses rsplit("_", 1)[0] to remove the trailing _{dataset_size} suffix before also removing the helmet_ prefix. |
| 74 | +* Top alignment: row labels use \makecell[lt]{...} (top-left); multi-row cells use \begin{tabular}[t]{...} (top-aligned). |
| 75 | +* Cell alignment: multi-row cells render as a nested tabular with two columns (@{}l@{}l@{}), so the short task keys and the :value parts are left-aligned independently. |
| 76 | +* Sort order: non-fin entries are sorted numerically by their 3-digit suffix; fin is always last. |
| 77 | +* Font size: cell content is wrapped in {\small ...} while labels remain at normal size. |
| 78 | + |
| 79 | + |
| 80 | +## Comment |
| 81 | + |
| 82 | +Just a few extra changes: |
| 83 | + |
| 84 | +* Add some space above and below each horizontal line. |
| 85 | +* Multiply each average value (number after ":") by 100, so they become percentages. |
| 86 | + The format should be f"{avg_sub_exact_match:.2f}" |
| 87 | +* For the rows in cells, insert a small space before and after the ":" |
| 88 | + |
| 89 | +### Answer |
| 90 | + |
| 91 | +Three changes made: |
| 92 | + |
| 93 | +* Spacing around \hline: every rule is now wrapped with \noalign{\smallskip} on both sides; the double rule after the header keeps the same pattern. |
| 94 | +* Percentages: values are multiplied by 100 and formatted as :.2f. |
| 95 | +* Space around :: single-entry cells use " : ", and multi-row cells use { : } (braces prevent LaTeX from treating it as a column separator inside the nested tabular). |
| 96 | + |
| 97 | + |
| 98 | +## Comment |
| 99 | + |
| 100 | +One more change would be great. |
| 101 | + |
| 102 | +Each cell has rows such as "fin : 45.10" or "100 : 9.95". |
| 103 | + |
| 104 | +* I'd like the first values (before ":") to be left-aligned, the second values |
| 105 | + (after ":") to be right-aligned. |
| 106 | +* Can you ensure that these alignments extend across cells? So that ALL first |
| 107 | + values in all cells of a column are left-aligned, and ALL second values in all |
| 108 | + cells of a column are right-aligned? |
| 109 | + |
| 110 | +### Answer |
| 111 | + |
| 112 | +The approach: |
| 113 | + |
| 114 | +* Each dataset column is now two outer-table columns (l for task, r for value), with the header spanned via \multicolumn{2}{c}{...}. |
| 115 | +* Each (case, dataset) cell's task rows become separate outer-table rows, so l/r alignment is shared across all cells in the same column. |
| 116 | +* Case labels spanning multiple sub-rows use \multirow{n}{*}{...} (requires \usepackage{multirow}). |
| 117 | +* The : separator is rendered as \,:\, (thin spaces) inside each value cell, keeping it visually between the two aligned columns. |
| 118 | + |
| 119 | + |
| 120 | +## Comment: |
| 121 | + |
| 122 | +* Now, rows "fin : 45.10" are formatted by grouping "fin" and ":45.10". Can you |
| 123 | + change this so that the grouping is "fin:" and "45.10", so the space is between |
| 124 | + ":" and the right value? |
| 125 | +* Also, please shrink the amount of space between first and second values in a cell. |
| 126 | + It should be quite a bit less than the space between different columns. |
| 127 | + |
| 128 | +### Answer |
| 129 | + |
| 130 | +The : moves to the end of the left cell (st:) and the right cell starts with \! (a negative thin space), pulling the number close to the colon while the normal inter-column gap still separates adjacent dataset pairs. |
| 131 | + |
0 commit comments