Add validate subcommand for configuration validation#250
Add validate subcommand for configuration validation#250
validate subcommand for configuration validation#250Conversation
|
@copilot that script change looks fine - can you update the system-agent core code to give a better error message if the rancher2_connection_info.json file isn't found? |
Updated the error handling in main.go (commit b5b1ef3). The agent now provides clear messages when the connection info file is not found or contains invalid JSON:
Both error messages guide operators to check if the agent was installed correctly and if the file contains valid JSON content. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot this mostly looks good - but I'm curious if you can research any better ways to validate that the config file is json. we probably can't rely on jq being on the host unfortunately - what would that look like? feel free to just comment and let me choose rather than another commit. |
I researched JSON validation methods that don't require jq. Here are the options: Current approach (first character check):
Option 1: Enhanced shell-only validation (recommended): validate_json_structure() {
file="$1"
[ -s "$file" ] || return 1
FIRST=$(sed -n '/[^[:space:]]/p' "$file" | head -c 1 2>/dev/null)
LAST=$(sed 's/[[:space:]]*$//' "$file" | tail -c 2 | head -c 1 2>/dev/null)
if [ "$FIRST" = "{" ] && [ "$LAST" = "}" ]; then
OPEN=$(tr -cd '{' < "$file" | wc -c)
CLOSE=$(tr -cd '}' < "$file" | wc -c)
[ "$OPEN" -eq "$CLOSE" ] || return 1
grep -q '"' "$file" && grep -q ':' "$file" || return 1
return 0
elif [ "$FIRST" = "[" ] && [ "$LAST" = "]" ]; then
return 0
fi
return 1
}
Option 2: Python fallback (best validation): validate_json() {
file="$1"
if command -v python3 >/dev/null 2>&1; then
python3 -c "import json; json.load(open('$file'))" 2>/dev/null && return 0
fi
validate_json_structure "$file" # fallback
}
Why not jq? Not available on minimal/embedded systems. My recommendation: Option 1 (enhanced shell) is sufficient without dependencies. Option 2 adds perfect validation when python3 is available. Let me know which approach you prefer. |
|
@copilot i think we're good actually - if the response code isn't a 200 the file won't be moved anyway and the validation shouldn't occur. |
|
Why don't you just build a function into the system-agent and use it to validate it? You should have the binary by the point you download the connection info. |
|
Basic |
|
We also probably want it to stage the connection info as a |
|
The current changes in this PR already stage through a temp file, are you suggesting the check command also handle moving the temp file into place if it works? |
|
Nope, just that we should keep the current changes re: temp files, but only move it if the check succeeds. |
check subcommand for configuration validation
jakefhyde
left a comment
There was a problem hiding this comment.
Can you write a test for this command?
|
@copilot i need you to implement these PR comments, notably:
|
b8c8a4c to
a6c6070
Compare
|
ah ok - same thing it looks like. the arm64 build is failing because its using the ubuntu-latest runner which is amd64. will try to fix in another PR. |
b7f1c33 to
8523678
Compare
36d18bf to
676e804
Compare
install.sh
Outdated
| 200) | ||
| info "Successfully downloaded Rancher connection information" | ||
| # Validate using the system-agent validate command | ||
| if ! "${CATTLE_AGENT_BIN_DIR}/rancher-system-agent" validate "${TEMP_CONNECTION_INFO}" 2>&1; then |
There was a problem hiding this comment.
defaultConfigFile is "/etc/rancher/agent/config.yaml", but you're calling this with "rancher2_connection_info.json.tmp". I don't think this is validating the right thing.
There was a problem hiding this comment.
yeah that is the default fallback for when we're actually running the system-agent, the positional arg on the validate command validates whatever we ask it to though, falling back to the default. ex:
$> go build && sudo ./system-agent validate bad.json
INFO[0000] Rancher System Agent version dev (HEAD) - Configuration Validation
INFO[0000] Validating configuration file: bad.json
INFO[0000] Configuration file exists
FATA[0000] Fatal error running: failed to parse configuration file: file bad.json had permission 0644 which was not expected 0600
$> chmod 600 bad.json
$> sudo ./system-agent validate bad.json
INFO[0000] Rancher System Agent version dev (HEAD) - Configuration Validation
INFO[0000] Validating configuration file: bad.json
INFO[0000] Configuration file exists
FATA[0000] Fatal error running: failed to parse configuration file: file bad.json was not owned by uid=0 gid=0
$> sudo chown root:root bad.json
$> go build && sudo ./system-agent validate bad.json
INFO[0000] Rancher System Agent version dev (HEAD) - Configuration Validation
INFO[0000] Validating configuration file: bad.json
INFO[0000] Configuration file exists
INFO[0000] Configuration file parsed successfully
FATA[0000] Fatal error running: neither local nor remote watching is enabled There was a problem hiding this comment.
wait I'm confused, why is it complaining about permissions on go.mod if you asked it to validate bad.json?
otherwise this looks good, we want it to validate the temp file before using it.
There was a problem hiding this comment.
yeah that is the default fallback for when we're actually running the system-agent, the positional arg on the validate command validates whatever we ask it to though, falling back to the default.
Well no, because in one case (no args supplied) you validate the config file, in the install you validate the connection info file, but the code looks like this:
configFile := c.Args().First()
if configFile == "" {
configFile = os.Getenv(cattleAgentConfigEnv)
}
if configFile == "" {
configFile = defaultConfigFile
}
we don't want to validate the connection info file as the config file, we probably want a flag to validate a connection info file.
There was a problem hiding this comment.
lol - copy/paste error i had tried it on go.mod for fun then changed the filename. updated with some more output. we always check to make sure we're running as root here - is that something we want to change as well?
There was a problem hiding this comment.
Two nits:
Fatal error running:could perhaps beValidation failed:?- The permissions and ownership should not be reported as parser errors; the syntax may be valid but we haven't gotten that far yet.
There was a problem hiding this comment.
ok cool - thanks guys. went through this again and:
- this has not been a good litmus test as my first "ai-first" PR lol, faith in ai to do the right thing has went down a lot even if it "looks" right.
- separated the validation into two distinct args:
validate-connectionandvalidate-configthat do what they say: validate the argument passed for connection info or system-agent configuration - updated the error message since that'll be a little easier to understand.
install.sh
Outdated
| 200) | ||
| info "Successfully downloaded Rancher connection information" | ||
| # Validate using the system-agent validate command | ||
| if ! "${CATTLE_AGENT_BIN_DIR}/rancher-system-agent" validate "${TEMP_CONNECTION_INFO}" 2>&1; then |
There was a problem hiding this comment.
yeah that is the default fallback for when we're actually running the system-agent, the positional arg on the validate command validates whatever we ask it to though, falling back to the default.
Well no, because in one case (no args supplied) you validate the config file, in the install you validate the connection info file, but the code looks like this:
configFile := c.Args().First()
if configFile == "" {
configFile = os.Getenv(cattleAgentConfigEnv)
}
if configFile == "" {
configFile = defaultConfigFile
}
we don't want to validate the connection info file as the config file, we probably want a flag to validate a connection info file.
b0eb9b7 to
7b2919f
Compare
7b2919f to
5bfc3f0
Compare
|
@thatmidwesterncoder Almost there, what's up with CI? @HarrisonWAffel I know we've had issues with the arm64 tests before, do you think we should just disable it for this test as well? I feel like if we do, we should create an issue for this, I'd rather the tests work for arm and we only run a subset rather than have them be untestable. Also tagging @alexander-demicev for visibility. |
|
@jakefhyde afaik this is the "old" test and its always failed just due to the fact its running on the wrong architecture. even after rebasing the test failure hasn't gone away :\ I updated it to run on the right arch in #254 but that was kind of superceded by #256. Not sure why GH seems to not want to forget that run. |
|
@jakefhyde Yeah I think we just need to pull in the changes that Jacob tried to introduce in #254, we're running arm tests on amd boxes - This is something I missed when introducing the initial tests too, seems obvious now lol. Interestingly, I don't get the same error when running |
|
@jakefhyde @HarrisonWAffel i cherry-picked my changes on top - and we still have the old builds pending - which is kinda funny. i assume its how it is set up here: I assume they'll go away when this is merged? |
|
Can you squash now, or were you planning to do that when merging? |
Add JSON validation to retrieve_connection_info function Co-authored-by: thatmidwesterncoder <135059506+thatmidwesterncoder@users.noreply.github.com> Fix variable quoting in cleanup code Co-authored-by: thatmidwesterncoder <135059506+thatmidwesterncoder@users.noreply.github.com> Remove content preview to avoid exposing sensitive info Co-authored-by: thatmidwesterncoder <135059506+thatmidwesterncoder@users.noreply.github.com> Improve error messages for connection info file parsing failures Co-authored-by: thatmidwesterncoder <135059506+thatmidwesterncoder@users.noreply.github.com> Refine error messages to be more concise and clear Co-authored-by: thatmidwesterncoder <135059506+thatmidwesterncoder@users.noreply.github.com> Validate JSON in install script and improve agent error messages for rancher2_connection_info.json Co-authored-by: thatmidwesterncoder <135059506+thatmidwesterncoder@users.noreply.github.com> Implement check subcommand for configuration validation Co-authored-by: thatmidwesterncoder <135059506+thatmidwesterncoder@users.noreply.github.com> Refactor check to validate command with improvements per PR feedback Co-authored-by: thatmidwesterncoder <135059506+thatmidwesterncoder@users.noreply.github.com> use strings.Contains instead of re-inventing the wheel during tests misses from first pass slight refactor about uid/gid checks change pathOwnedByRoot -> pathOwnedByCurrentUser remove -s validation + let the command do everything go mod tidy in ./test/ error it up when missing ns/secretName fields in config refactor into multiple args for connection info vs configuration validation change up error message to be more friendly fix ARM ci runs
1e55568 to
a155f80
Compare
|
@brandond was planning to do on merge - but squashed it quick ahead of time! |
|
Does EIO need to remove the old required tests from this repo's config or something? I don't see where those are even coming from. |
|
@brandond I think you're right - it might be in the repo settings since those are showing up as "required" even though there physically isn't a workflow file that will spawn those jobs anymore. Will ping EIO monday. |
Implements a pre-flight validation command that verifies agent configuration without starting the agent. Operators can run
rancher-system-agent validateto diagnose installation and configuration issues.Changes
New CLI Command
validatesubcommand that validates:Command Usage
The
validatecommand accepts the config file path as a positional argument, with fallback options:Validation Logic
config.Parse()to catch permission and syntax errorsTesting
main_test.gowith 8 test cases covering:Output Example
Success case:
Error case (early return):
The validation command detects corrupted configuration files (e.g., webhook errors written to connection info file) before agent startup, providing clear, actionable error messages.
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.