Skip to content

Commit 59cc38d

Browse files
authored
Refactor workflow helper hotspots from semantic clustering audit (#34144)
1 parent 99ffd28 commit 59cc38d

14 files changed

Lines changed: 436 additions & 349 deletions
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# ADR-34144: Cluster Config Preprocessing Helpers and Co-locate Workflow Helper Hotspots
2+
3+
**Date**: 2026-05-23
4+
**Status**: Draft
5+
**Deciders**: pelikhan, Copilot
6+
7+
---
8+
9+
## Part 1 — Narrative (Human-Friendly)
10+
11+
### Context
12+
13+
A semantic clustering audit of `pkg/workflow` surfaced several organizational hotspots: three near-identical `addSafeOutput*GitHubTokenForConfig` methods on `Compiler` duplicated branching logic for GitHub App token resolution; generic raw-config normalizers (`preprocessBoolFieldAsString`, `preprocessIntFieldAsString`, `preprocessStringArrayFieldAsTemplatable`, `preprocessProtectedFilesField`, `preprocessExpiresField`) were scattered across `templatables.go`, `parse_helpers.go`, and `config_helpers.go`; the structural `runs-on` shape validator `validateRunsOnValue` lived in `frontmatter_parsing.go` rather than next to `validateRunsOn` in `runs_on_validation.go`; and `git_helpers_wasm.go` retained `RunGit` and `GetCurrentGitTag` stubs whose non-WASM counterparts no longer exist. Two call sites also shadowed the validator name `validateRunsOnValue` with a local variable, creating a small but real readability hazard. ADR-29336 established that `preprocessProtectedFilesField` belongs in `parse_helpers.go`, so any move away from that file is an explicit revision of that prior decision rather than an accidental drift.
14+
15+
### Decision
16+
17+
We will (1) extract a shared `addResolvedSafeOutputGitHubTokenForConfig` resolver and re-express the three per-target token methods as thin wrappers around it, preserving the behavioral split that only standard and Copilot handlers may use GitHub App tokens while assign-to-agent bypasses them; (2) introduce `pkg/workflow/config_preprocessing.go` as the single home for generic raw-config normalization helpers, superseding the relevant clause of ADR-29336 for `preprocessProtectedFilesField`; (3) move `validateRunsOnValue` into `runs_on_validation.go` so structural and semantic `runs-on` validation live together and rename the two shadowing local variables to `formattedRunsOn`; (4) delete the unused `RunGit` and `GetCurrentGitTag` WASM stubs and the matching stale doc references. The driving principle is the file-naming convention established in ADR-27325 and reinforced in ADR-28282/29336: a file's name should accurately communicate the semantic domain of every function inside it, and duplicate logic should be collapsed when the behavioral variation can be expressed as a parameter.
18+
19+
### Alternatives Considered
20+
21+
#### Alternative 1: Keep all preprocessors in `parse_helpers.go` (status quo per ADR-29336)
22+
23+
Leave generic preprocessors where ADR-29336 placed them and only consolidate the safe-output token resolvers. This was rejected because `parse_helpers.go` had already accumulated a mix of pure coercion helpers (`coerceStringOrArrayField`, `parseStringSliceAny`) and stateful preprocessors that mutate `configData` in place — two distinguishable responsibilities. Splitting them gives each file a single clear charter: `parse_helpers.go` for coercion, `config_preprocessing.go` for normalization-with-side-effects. The cost of revising ADR-29336 is low because the prior decision was itself a relocation pass and inherits a "follow naming intent" rationale that this PR extends.
24+
25+
#### Alternative 2: Pass a flag instead of a resolver function
26+
27+
Replace the three safe-output token methods with a single method that accepts a token-target enum (`safeOutput`, `copilot`, `agent`) and switches internally. This was rejected because it pushes the per-target token resolver dispatch into the consolidated function, recreating the branching that the refactor is trying to eliminate. Passing the resolver function directly keeps `addResolvedSafeOutputGitHubTokenForConfig` agnostic of which token target it serves and lets each wrapper own its own resolver choice.
28+
29+
#### Alternative 3: Leave the WASM stubs in place as future-proofing
30+
31+
Keep `RunGit` and `GetCurrentGitTag` as WASM-only stubs in case future code paths need them. This was rejected because their non-WASM counterparts have already been removed, so the stubs are guaranteed-unreachable code that misleads readers about which Git surface area is supported. Reintroducing them later if needed is a one-line change.
32+
33+
### Consequences
34+
35+
#### Positive
36+
- The three safe-output token methods now share one resolver path, eliminating ~60 lines of duplicated branching and centralizing the GitHub App token precedence rule.
37+
- All generic raw-config preprocessors are co-located in `config_preprocessing.go`, making it obvious where to add a new templatable-or-literal preprocessor.
38+
- `validateRunsOnValue` is next to `validateRunsOn` in `runs_on_validation.go`, and a focused unit test (`TestValidateRunsOnValue`) now covers its shape-validation branches directly.
39+
- Removing the dead WASM stubs makes the WASM build surface match the non-WASM one — readers see only the Git operations actually available.
40+
- Renaming local `validateRunsOnValue` variables to `formattedRunsOn` removes the function-vs-variable shadowing hazard that made the two call sites harder to grep.
41+
42+
#### Negative
43+
- Revises ADR-29336's normative requirement that `preprocessProtectedFilesField` reside in `parse_helpers.go`. Future contributors must consult this ADR alongside 29336 to know the current placement rule.
44+
- Adds one new file (`config_preprocessing.go`) to `pkg/workflow/`, marginally increasing file count in a directory that is already large.
45+
- The shared `addResolvedSafeOutputGitHubTokenForConfig` introduces an `allowGitHubApp bool` parameter; readers must trace into the function to see which wrappers pass which value, where previously the behavior was visible at each call site.
46+
47+
#### Neutral
48+
- No public API surface changes; all moved functions retain their signatures and package-level visibility.
49+
- `git blame` for the moved functions will surface this PR rather than original authorship without `--follow`.
50+
- Header comments in `templatables.go`, `parse_helpers.go`, and `validation_helpers.go` are updated to point readers at `config_preprocessing.go`; this is documentation-only churn.
51+
52+
---
53+
54+
## Part 2 — Normative Specification (RFC 2119)
55+
56+
> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).
57+
58+
### Config Preprocessing Helpers (`pkg/workflow`)
59+
60+
1. The following functions **MUST** reside in `pkg/workflow/config_preprocessing.go`:
61+
- `preprocessBoolFieldAsString`
62+
- `preprocessIntFieldAsString`
63+
- `preprocessStringArrayFieldAsTemplatable`
64+
- `preprocessProtectedFilesField`
65+
- `preprocessExpiresField`
66+
2. New generic raw-config normalization helpers that mutate a `map[string]any` in place before YAML unmarshaling **MUST** be added to `config_preprocessing.go` and **MUST NOT** be placed in `parse_helpers.go`, `templatables.go`, `config_helpers.go`, or `validation_helpers.go`.
67+
3. `parse_helpers.go` **MUST NOT** contain stateful preprocessors that mutate `configData`; its scope **MUST** be limited to pure coercion helpers that return new values without side effects.
68+
4. `templatables.go` **MUST NOT** contain raw-config preprocessors; its scope **MUST** be limited to templatable field types and their YAML-emission helpers.
69+
5. Schedule-specific preprocessing **MAY** remain in its existing file when it depends on schedule-specific types or context that would not naturally fit a generic preprocessor file.
70+
6. This section supersedes the clause in [ADR-29336](29336-relocate-misplaced-functions-to-semantic-homes.md) that placed `preprocessProtectedFilesField` in `parse_helpers.go`.
71+
72+
### Safe-Output GitHub Token Resolution (`pkg/workflow`)
73+
74+
1. The three per-target safe-output token methods (`addSafeOutputGitHubTokenForConfig`, `addSafeOutputCopilotGitHubTokenForConfig`, `addSafeOutputAgentGitHubTokenForConfig`) **MUST** delegate to a single shared resolver function rather than each duplicate the GitHub App token precedence logic.
75+
2. The shared resolver **MUST** accept an `allowGitHubApp` boolean (or equivalent) parameter that gates whether `data.SafeOutputs.GitHubApp` is considered.
76+
3. The assign-to-agent token method **MUST** pass `allowGitHubApp = false` so that GitHub App installation tokens are never used for the Copilot assignment API.
77+
4. New safe-output token resolution paths **SHOULD** be expressed as additional thin wrappers around the shared resolver rather than as new copies of the precedence logic.
78+
79+
### `runs-on` Validation Co-location (`pkg/workflow`)
80+
81+
1. `validateRunsOnValue` **MUST** reside in `pkg/workflow/runs_on_validation.go` alongside `validateRunsOn` and `extractRunnerLabels`.
82+
2. New `runs-on` shape or semantic validators **MUST** be added to `runs_on_validation.go` and **MUST NOT** be placed in `frontmatter_parsing.go`.
83+
3. Local variables holding a formatted `runs-on` string **MUST NOT** be named `validateRunsOnValue` (or any other name that shadows an exported validator); `formattedRunsOn` **SHOULD** be used instead.
84+
85+
### WASM Git Surface (`pkg/workflow`)
86+
87+
1. `git_helpers_wasm.go` **MUST NOT** export Git helper functions whose non-WASM counterparts do not exist.
88+
2. Doc comments in `git_helpers.go` **MUST NOT** reference Git helpers that have been removed.
89+
90+
### Conformance
91+
92+
An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement — in particular, adding a new generic preprocessor outside `config_preprocessing.go`, reintroducing duplicated safe-output token resolver branching, placing a `runs-on` validator outside `runs_on_validation.go`, or shipping a WASM Git stub without a matching non-WASM definition — constitutes non-conformance.
93+
94+
---
95+
96+
*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/26319884249) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*

pkg/workflow/config_helpers.go

Lines changed: 0 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -139,40 +139,6 @@ func parseTargetRepoWithValidation(configMap map[string]any) (string, bool) {
139139
// to consolidate all time parsing logic in a single location. These functions are used
140140
// for parsing expiration configurations in safe output jobs.
141141

142-
// preprocessExpiresField handles the common expires field preprocessing pattern.
143-
// This function:
144-
// 1. Parses the expires value through parseExpiresFromConfig (handles integers, strings, and boolean false)
145-
// 2. Handles explicit disablement when expires=false (returns -1)
146-
// 3. Normalizes the value to hours and updates configData["expires"] in place
147-
// 4. Logs the parsed value with the provided logger
148-
//
149-
// Returns true if expires was explicitly disabled with false, false otherwise.
150-
// This helper consolidates duplicate preprocessing logic used in parseCreateIssuesConfig and parseCreateDiscussionsConfig.
151-
func preprocessExpiresField(configData map[string]any, debugLog *logger.Logger) bool {
152-
expiresDisabled := false
153-
if configData != nil {
154-
if expires, exists := configData["expires"]; exists {
155-
// Always parse the expires value through parseExpiresFromConfig
156-
// This handles: integers (days), strings (time specs like "48h"), and boolean false
157-
expiresInt := parseExpiresFromConfig(configData)
158-
if expiresInt == -1 {
159-
// Explicitly disabled with false
160-
expiresDisabled = true
161-
configData["expires"] = 0
162-
} else if expiresInt > 0 {
163-
configData["expires"] = expiresInt
164-
} else {
165-
// Invalid or missing - set to 0
166-
configData["expires"] = 0
167-
}
168-
if debugLog != nil {
169-
debugLog.Printf("Parsed expires value %v to %d hours (disabled=%t)", expires, expiresInt, expiresDisabled)
170-
}
171-
}
172-
}
173-
return expiresDisabled
174-
}
175-
176142
// ParseBoolFromConfig is a generic helper that extracts and validates a boolean value from a map.
177143
// Returns the boolean value, or false if not present or invalid.
178144
// If log is provided, it will log the extracted value for debugging.
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
package workflow
2+
3+
import (
4+
"fmt"
5+
"strconv"
6+
"strings"
7+
8+
"github.com/github/gh-aw/pkg/logger"
9+
)
10+
11+
// preprocessBoolFieldAsString converts the value of a boolean config field
12+
// to a string before YAML unmarshaling. This lets struct fields typed as
13+
// *string accept both literal boolean values (true/false) and GitHub Actions
14+
// expression strings (e.g. "${{ inputs.draft-prs }}").
15+
//
16+
// If the value is a bool it is converted to "true" or "false".
17+
// If the value is a string it must be a GitHub Actions expression (starts
18+
// with "${{" and ends with "}}"); any other free-form string is rejected
19+
// and an error is returned.
20+
func preprocessBoolFieldAsString(configData map[string]any, fieldName string, debugLog *logger.Logger) error {
21+
if configData == nil {
22+
return nil
23+
}
24+
if val, exists := configData[fieldName]; exists {
25+
switch v := val.(type) {
26+
case bool:
27+
if v {
28+
configData[fieldName] = "true"
29+
} else {
30+
configData[fieldName] = "false"
31+
}
32+
if debugLog != nil {
33+
debugLog.Printf("Converted %s bool to string before unmarshaling", fieldName)
34+
}
35+
case string:
36+
if !isExpression(v) {
37+
return fmt.Errorf("field %q must be a boolean or a GitHub Actions expression (e.g. '${{ inputs.flag }}'), got string %q", fieldName, v)
38+
}
39+
}
40+
}
41+
return nil
42+
}
43+
44+
// preprocessIntFieldAsString converts the value of an integer config field
45+
// to a string before YAML unmarshaling. This lets struct fields typed as
46+
// *string accept both literal integer values and GitHub Actions expression
47+
// strings (e.g. "${{ inputs.max-issues }}").
48+
//
49+
// If the value is an int, int64, float64, or uint64 it is converted to its
50+
// decimal string representation.
51+
// If the value is a string it must be a GitHub Actions expression (starts
52+
// with "${{" and ends with "}}"); any other free-form string is rejected
53+
// and an error is returned.
54+
func preprocessIntFieldAsString(configData map[string]any, fieldName string, debugLog *logger.Logger) error {
55+
if configData == nil {
56+
return nil
57+
}
58+
if val, exists := configData[fieldName]; exists {
59+
switch v := val.(type) {
60+
case int:
61+
configData[fieldName] = strconv.Itoa(v)
62+
if debugLog != nil {
63+
debugLog.Printf("Converted %s int to string before unmarshaling", fieldName)
64+
}
65+
case int64:
66+
configData[fieldName] = strconv.FormatInt(v, 10)
67+
if debugLog != nil {
68+
debugLog.Printf("Converted %s int64 to string before unmarshaling", fieldName)
69+
}
70+
case float64:
71+
configData[fieldName] = strconv.Itoa(int(v))
72+
if debugLog != nil {
73+
debugLog.Printf("Converted %s float64 to string before unmarshaling", fieldName)
74+
}
75+
case uint64:
76+
configData[fieldName] = strconv.FormatUint(v, 10)
77+
if debugLog != nil {
78+
debugLog.Printf("Converted %s uint64 to string before unmarshaling", fieldName)
79+
}
80+
case string:
81+
if !isExpression(v) {
82+
return fmt.Errorf("field %q must be an integer or a GitHub Actions expression (e.g. '${{ inputs.max }}'), got string %q", fieldName, v)
83+
}
84+
}
85+
}
86+
return nil
87+
}
88+
89+
// preprocessStringArrayFieldAsTemplatable handles a string-array config field that also
90+
// accepts a GitHub Actions expression string (e.g. "${{ inputs.labels }}").
91+
//
92+
// When the field value is an expression string it is wrapped in a single-element []string
93+
// so that existing YAML struct-unmarshal code (which expects []string) continues to work
94+
// unchanged. The handler config builder then detects this single-element expression slice
95+
// and stores it as a JSON string rather than a JSON array, allowing GitHub Actions to
96+
// evaluate the expression at runtime before the config.json file is written.
97+
//
98+
// Free-form strings that are not GitHub Actions expressions are rejected with an error.
99+
// Array values ([]string, []any) are left untouched for the normal YAML unmarshal path.
100+
func preprocessStringArrayFieldAsTemplatable(configData map[string]any, fieldName string, debugLog *logger.Logger) error {
101+
if configData == nil {
102+
return nil
103+
}
104+
if val, exists := configData[fieldName]; exists {
105+
if s, ok := val.(string); ok {
106+
if !isExpression(s) {
107+
var exampleExpr string
108+
if strings.Contains(fieldName, "-") {
109+
exampleExpr = fmt.Sprintf("${{ inputs['%s'] }}", fieldName)
110+
} else {
111+
exampleExpr = fmt.Sprintf("${{ inputs.%s }}", fieldName)
112+
}
113+
return fmt.Errorf("field %q must be an array of strings or a GitHub Actions expression (e.g. '%s'), got string %q", fieldName, exampleExpr, s)
114+
}
115+
configData[fieldName] = []string{s}
116+
if debugLog != nil {
117+
debugLog.Printf("Wrapped %s expression string in single-element array before unmarshaling", fieldName)
118+
}
119+
}
120+
}
121+
return nil
122+
}
123+
124+
// preprocessProtectedFilesField preprocesses the "protected-files" field in configData,
125+
// handling both the legacy string-enum form and the new object form.
126+
//
127+
// String form (unchanged): "blocked", "allowed", or "fallback-to-issue".
128+
// Object form: { policy: "blocked", exclude: ["AGENTS.md"] }
129+
// - policy is optional; when missing or empty, this preprocessing step treats it as absent
130+
// and leaves downstream default handling to apply (the "protected-files" key is deleted)
131+
// - exclude is a list of filenames/path-prefixes to remove from the default protected set
132+
//
133+
// When the object form is encountered the field is normalised in-place:
134+
// - "protected-files" is replaced with the extracted policy string, or deleted when policy is absent/empty
135+
// - The extracted exclude slice is returned so callers can store it in the config struct
136+
//
137+
// When the string form is encountered the field is left unchanged and nil is returned.
138+
// The debugLog parameter is optional; pass nil to suppress debug output.
139+
func preprocessProtectedFilesField(configData map[string]any, debugLog *logger.Logger) []string {
140+
if configData == nil {
141+
return nil
142+
}
143+
raw, exists := configData["protected-files"]
144+
if !exists || raw == nil {
145+
return nil
146+
}
147+
pfMap, ok := raw.(map[string]any)
148+
if !ok {
149+
return nil
150+
}
151+
if policy, ok := pfMap["policy"].(string); ok && policy != "" {
152+
configData["protected-files"] = policy
153+
if debugLog != nil {
154+
debugLog.Printf("protected-files object form: policy=%s", policy)
155+
}
156+
} else {
157+
delete(configData, "protected-files")
158+
if debugLog != nil {
159+
debugLog.Print("protected-files object form: no policy, using default")
160+
}
161+
}
162+
return parseStringSliceAny(pfMap["exclude"], debugLog)
163+
}
164+
165+
// preprocessExpiresField handles the common expires field preprocessing pattern.
166+
// This function:
167+
// 1. Parses the expires value through parseExpiresFromConfig (handles integers, strings, and boolean false)
168+
// 2. Handles explicit disablement when expires=false (returns -1)
169+
// 3. Normalizes the value to hours and updates configData["expires"] in place
170+
// 4. Logs the parsed value with the provided logger
171+
//
172+
// Returns true if expires was explicitly disabled with false, false otherwise.
173+
// This helper consolidates duplicate preprocessing logic used in parseCreateIssuesConfig and parseCreateDiscussionsConfig.
174+
func preprocessExpiresField(configData map[string]any, debugLog *logger.Logger) bool {
175+
expiresDisabled := false
176+
if configData != nil {
177+
if expires, exists := configData["expires"]; exists {
178+
expiresInt := parseExpiresFromConfig(configData)
179+
if expiresInt == -1 {
180+
expiresDisabled = true
181+
configData["expires"] = 0
182+
} else if expiresInt > 0 {
183+
configData["expires"] = expiresInt
184+
} else {
185+
configData["expires"] = 0
186+
}
187+
if debugLog != nil {
188+
debugLog.Printf("Parsed expires value %v to %d hours (disabled=%t)", expires, expiresInt, expiresDisabled)
189+
}
190+
}
191+
}
192+
return expiresDisabled
193+
}

0 commit comments

Comments
 (0)