Skip to content

Commit b908cc5

Browse files
authored
PLU-469: add fix for excel get table row implementation (#960)
## Problem Excel `getTopNTableRows` is buggy because `usedRange` includes all non-empty cells (not limited to a table) It can accidentally include: - Stray formatting - Other tables if you resize too far There's no way to limit `usedRange` to a table specifically — it works on worksheet or relative range context ## Solution Considered using the `rows` API, but it involves pagination so the query takes around 5s for a table of 50k rows Also considered using the `columns` API, but it does not return the `headerSheetRowIndex` address so another API query has to be made to obtain it Eventually settled on using `range` which gives both the header row and data rows as well as the `headerSheetRowIndex`, keeping the API call to still be 1 ## Tests `getTableRow` action: - [ ] Works for multiple found rows (the first row found is returned) - [ ] Values outside the table will not be retrieved `updateTableRow` action: - [ ] Works for multiple found rows (the first row found is returned) - [ ] Values outside the table will not be retrieved
1 parent f6948b7 commit b908cc5

File tree

1 file changed

+63
-0
lines changed

1 file changed

+63
-0
lines changed

packages/backend/src/apps/m365-excel/actions/get-table-row/implementation/get-top-n-table-rows.ts

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,76 @@ const msGraphResponseSchema = z
1616
headerSheetRowIndex: response.rowIndex,
1717
}))
1818

19+
// Can get more info e.g. formulas, numberFormat, text for possible date manipulation but too complex for now
20+
const msGraphRangeResponseSchema = z
21+
.object({
22+
rowCount: z.number(),
23+
rowIndex: z.number(),
24+
values: z.array(z.array(z.coerce.string())), // first row is the header
25+
})
26+
.transform((response) => ({
27+
rowCount: response.rowCount,
28+
headerSheetRowIndex: response.rowIndex,
29+
headerRowValues: response.values[0],
30+
rowValues: response.values.slice(1),
31+
}))
32+
1933
interface GetTopNTableRowsResult {
2034
columns: string[]
2135
rows: string[][]
2236
headerSheetRowIndex: number
2337
}
2438

39+
/**
40+
* Using range endpoint to get both the header row and data rows
41+
* instead of headerRowRange and rows to avoid extra API call
42+
* Also, rows endpoint uses pagination so the query takes longer for large tables
43+
*
44+
* Considered using columns API but there is no headerSheetRowIndex returned so
45+
* an extra API call has to be made to retrieve it
46+
* Reference: https://learn.microsoft.com/en-us/graph/api/table-range?view=graph-rest-1.0&tabs=http
47+
*/
2548
export default async function getTopNTableRows(
49+
$: IGlobalVariable,
50+
session: WorkbookSession,
51+
tableId: string,
52+
n: number,
53+
): Promise<GetTopNTableRowsResult> {
54+
const rangeParseResult = msGraphRangeResponseSchema.safeParse(
55+
(await session.request(`/tables/${tableId}/range`, 'get')).data,
56+
)
57+
58+
if (rangeParseResult.success === false) {
59+
throw new StepError(
60+
'Invalid table range',
61+
'Check your Excel file and try again',
62+
$.step.position,
63+
$.app.name,
64+
)
65+
}
66+
67+
const { rowCount, headerSheetRowIndex, headerRowValues, rowValues } =
68+
rangeParseResult.data
69+
70+
// rowCount includes the header row
71+
if (rowCount > n) {
72+
throw new StepError(
73+
`Your Excel table has more than ${n.toLocaleString()} rows.`,
74+
`Reduce the number of rows and try again.`,
75+
$.step.position,
76+
$.app.name,
77+
)
78+
}
79+
80+
return {
81+
columns: headerRowValues,
82+
rows: rowValues,
83+
headerSheetRowIndex,
84+
}
85+
}
86+
87+
// Old method in case we need to rollback fast
88+
export async function getTopNTableRowsOld(
2689
// Typically, we should avoid making $ viral though the codebase, but this is
2790
// an exception because getTopNTableRows is not a common helper function.
2891
$: IGlobalVariable,

0 commit comments

Comments
 (0)