Skip to content

Changes error message when using invalid endpoint.url #8603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

lucas-a-martins
Copy link
Collaborator

Description

When validating the endpoint.url global setting, such as when using the createKubernetesCluster API, we encounter the exception message Global setting endpoint.url has to be set to the Management Server's API endpoint. This message lacks clarity about the problem to be solved and exposes information about the environment. Upon further analysis, it was found that this same message was used across several classes, repeating the same code in each one of them.

This pull request addresses this issue by replacing the exception with a generic one that does not expose any information about the environment. Also, it introduces a more detailed and explicit message to the logs, including the current value of endpoint.url. Furthermore, a new method has been implemented to validate the endpoint.url whenever necessary.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I tested by using CloudMonkey and tried to create a Kubernetes cluster. I implemented unit tests for the new method too.

Exception before changes:

(lucas) :cat: > create kubernetescluster name=test description=test zoneid=0ea94256-6969-46a8-b02e-b1604c81dd01 kubernetesversionid=ea7dc03b-40d0-4cfe-9144-4a9044cf88bd serviceofferingid=3eb8e0f1-d909-4bfe-aa55-2b9329920349 size=1
:see_no_evil: Error: (HTTP 530, error code 9999) Global setting endpoint.url has to be set to the Management Server's API end point

Exception after changes:

  • New exception:
(lucas) :cat: > create kubernetescluster name=test description=test zoneid=0ea94256-6969-46a8-b02e-b1604c81dd01 kubernetesversionid=ea7dc03b-40d0-4cfe-9144-4a9044cf88bd serviceofferingid=3eb8e0f1-d909-4bfe-aa55-2b9329920349 size=1
:see_no_evil: Error: (HTTP 530, error code 9999) Unable to complete this operation. Contact your cloud admin.
  • New log:
2024-01-31 19:16:41,219 ERROR [o.a.c.c.ApiServiceConfiguration] (qtp775386112-17:ctx-ff635773 ctx-96fc2faf) (logid:a8dc806e) Global setting endpoint.url cannot contain localhost or be blank. Current value: http://localhost:8080/client/api

@sureshanaparti
Copy link
Contributor

@blueorangutan package

Copy link

codecov bot commented Feb 5, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (49cecae) 30.37% compared to head (6b86329) 30.76%.
Report is 30 commits behind head on main.

Files Patch % Lines
...bernetes/cluster/KubernetesClusterManagerImpl.java 0.00% 3 Missing ⚠️
...loud/network/lb/LoadBalancingRulesManagerImpl.java 0.00% 2 Missing ⚠️
...r/actionworkers/KubernetesClusterActionWorker.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8603      +/-   ##
============================================
+ Coverage     30.37%   30.76%   +0.39%     
- Complexity    32633    33078     +445     
============================================
  Files          5352     5353       +1     
  Lines        374419   374605     +186     
  Branches      54609    54633      +24     
============================================
+ Hits         113719   115244    +1525     
+ Misses       245523   244085    -1438     
- Partials      15177    15276      +99     
Flag Coverage Δ
simulator-marvin-tests 24.61% <7.14%> (+0.46%) ⬆️
uitests 4.38% <ø> (-0.01%) ⬇️
unit-tests 16.44% <57.14%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8536

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall code lgtm
left 2 minor comments

@lucas-a-martins
Copy link
Collaborator Author

lucas-a-martins commented Feb 6, 2024

@weizhouapache, thanks for your review. I've made changes to incorporate your suggestions in the latest commit (4cf74fc), so if you could provide another review, I would greatly appreciate it!

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@weizhouapache
Copy link
Member

@weizhouapache, thanks you for your review. I've made changes to incorporate your suggestions in the latest commit (4cf74fc), so if you could provide another review, I would greatly appreciate it!

@lucas-a-martins
thanks for the update
Looks good

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8556

@blueorangutan
Copy link

[SF] Trillian test result (tid-9096)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 48814 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8603-t9096-kvm-centos7.zip
Smoke tests completed. 122 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@JoaoJandre
Copy link
Contributor

Hey, @lucas-a-martins please change your logger imports to use log4j2

@DaanHoogland
Copy link
Contributor

@lucas-a-martins do you want to target this for 4.20?

@lucas-a-martins
Copy link
Collaborator Author

lucas-a-martins commented Feb 9, 2024

@lucas-a-martins do you want to target this for 4.20?

@DaanHoogland I do. I'm just testing the code with log4j2.

@rohityadavcloud rohityadavcloud added this to the 4.20.0.0 milestone Feb 13, 2024
…guration.java

Co-authored-by: Daniel Augusto Veronezi Salvador <[email protected]>
@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8694

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-9288)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42901 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8603-t9288-kvm-centos7.zip
Smoke tests completed. 129 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Collaborator

@bernardodemarco bernardodemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I manually tested the proposed changes in a local environment.

  • CloudMonkey:
(lab) 🐱 > createKubernetesCluster name=dummy-kubernetes-cluster zoneid=0ea94256-6969-46a8-b02e-b1604c81dd01
🙈 Error: (HTTP 530, error code 9999) Unable to complete this operation. Contact your cloud admin.
  • Management server logs:
2024-04-16T20:22:30,881 ERROR [o.a.c.c.ApiServiceConfiguration] (qtp1759482496-19:[ctx-9ff4e912, ctx-4928e0cd]) (logid:6f2b30d3) Global setting [endpoint.url] cannot contain localhost or be blank. Current value: http://localhost:8080/client/api

@weizhouapache
Copy link
Member

To be honest, I think the old error message makes more sense for root admin

@lucas-a-martins
Copy link
Collaborator Author

To be honest, I think the old error message makes more sense for root admin

@weizhouapache,

One of the major issues with the old exception message is its lack of clarity. Even for root admins, it would be difficult to discern what action needs to be taken.

The new exception is more generic compared to the old one, as the intention is to give no environment information. To give any direction to operators, this PR adds a new log that is more objective, indicating to the operator where and what the problem is. IMO, the new message can also help prevent operators from making the same mistake again, as they will now know that endpoint.url can't be blank or localhost. The old message states endpoint.url has to be set, but if the endpoint.url is localhost, then it is already set, albeit with an invalid value, and the old message would not make sense.

@weizhouapache
Copy link
Member

To be honest, I think the old error message makes more sense for root admin

@weizhouapache,

One of the major issues with the old exception message is its lack of clarity. Even for root admins, it would be difficult to discern what action needs to be taken.

The new exception is more generic compared to the old one, as the intention is to give no environment information. To give any direction to operators, this PR adds a new log that is more objective, indicating to the operator where and what the problem is. IMO, the new message can also help prevent operators from making the same mistake again, as they will now know that endpoint.url can't be blank or localhost. The old message states endpoint.url has to be set, but if the endpoint.url is localhost, then it is already set, albeit with an invalid value, and the old message would not make sense.

I know @lucas-a-martins

but the new error message Unable to complete this operation. Contact your cloud admin. is much more unclear

            LOGGER.error("Global setting [{}] cannot contain localhost or be blank. Current value: {}", ApiServletPath.key(), csUrl);
            throw new InvalidParameterValueException("Unable to complete this operation. Contact your cloud admin.");

@lucas-a-martins
Copy link
Collaborator Author

I know @lucas-a-martins

but the new error message Unable to complete this operation. Contact your cloud admin. is much more unclear

            LOGGER.error("Global setting [{}] cannot contain localhost or be blank. Current value: {}", ApiServletPath.key(), csUrl);
            throw new InvalidParameterValueException("Unable to complete this operation. Contact your cloud admin.");

@weizhouapache,

The exception being unclear is intentional, as this is what users will see. I understand what you mean by showing an exclusive message to ROOT admins, but is this really necessary since they have access to the logs with a clear message? Especially considering that the current exception is unclear, so this wouldn't change. I can't see a benefit in adding a "less unclear but not clear enough" exception. Also, is there any case of an exception in the UI being different between Users and Root Admins in the same situation?

@weizhouapache
Copy link
Member

@weizhouapache,

The exception being unclear is intentional, as this is what users will see. I understand what you mean by showing an exclusive message to ROOT admins, but is this really necessary since they have access to the logs with a clear message? Especially considering that the current exception is unclear, so this wouldn't change. I can't see a benefit in adding a "less unclear but not clear enough" exception. Also, is there any case of an exception in the UI being different between Users and Root Admins in the same situation?

@lucas-a-martins
I understand some error messages are unnecessary for normal users, not only this one, but also most other exceptions in vm the deployment, migration, etc. I agree we should display different error messages to normal users.

However I hope, as a root admin, we can get more information from error message on the UI and in the API response. Please bear in mind , not all root admins have access to the management server log and have the ability to find the root cause from massvie logs.

you can explore the options, for example in the API response or UI, display "Unable to complete this operation. Contact your cloud admin." (configurable) to normal users, and display the detailed error message to root admins. It could apply on ALL exceptions.

Copy link
Contributor

@GutoVeronezi GutoVeronezi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLGTM

Thanks, @lucas-a-martins. The previous message was misleading; the new log is way more clear.

@lucas-a-martins
Copy link
Collaborator Author

@lucas-a-martins I understand some error messages are unnecessary for normal users, not only this one, but also most other exceptions in vm the deployment, migration, etc. I agree we should display different error messages to normal users.

However I hope, as a root admin, we can get more information from error message on the UI and in the API response. Please bear in mind , not all root admins have access to the management server log and have the ability to find the root cause from massvie logs.

you can explore the options, for example in the API response or UI, display "Unable to complete this operation. Contact your cloud admin." (configurable) to normal users, and display the detailed error message to root admins. It could apply on ALL exceptions.

Hey, @weizhouapache

Considering these scenarios, I agree that having an exception to Root Admins makes sense and could be applied to all exceptions; however, this goes out of the scope of this PR. We have a discussion about patterns about logging (#8746) that could address these cases as well; we can discuss that there.

@weizhouapache
Copy link
Member

@lucas-a-martins I understand some error messages are unnecessary for normal users, not only this one, but also most other exceptions in vm the deployment, migration, etc. I agree we should display different error messages to normal users.
However I hope, as a root admin, we can get more information from error message on the UI and in the API response. Please bear in mind , not all root admins have access to the management server log and have the ability to find the root cause from massvie logs.
you can explore the options, for example in the API response or UI, display "Unable to complete this operation. Contact your cloud admin." (configurable) to normal users, and display the detailed error message to root admins. It could apply on ALL exceptions.

Hey, @weizhouapache

Considering these scenarios, I agree that having an exception to Root Admins makes sense and could be applied to all exceptions; however, this goes out of the scope of this PR. We have a discussion about patterns about logging (#8746) that could address these cases as well; we can discuss that there.

ok @lucas-a-martins
let's hold on this PR, and continue when we have a consensus in #8746 ?

@GutoVeronezi
Copy link
Contributor

ok @lucas-a-martins let's hold on this PR, and continue when we have a consensus in #8746 ?

I do not see why we need to wait that discussion to improve a log message. The previous message was misleading; the new log is way more clear. The idea of presenting different messages for operators and final users is interesting; however, is a different context from this PR. If we define something later, we can create another PR to address the concept.

@weizhouapache
Copy link
Member

weizhouapache commented May 9, 2024

ok @lucas-a-martins let's hold on this PR, and continue when we have a consensus in #8746 ?

I do not see why we need to wait that discussion to improve a log message. The previous message was misleading; the new log is way more clear. The idea of presenting different messages for operators and final users is interesting; however, is a different context from this PR. If we define something later, we can create another PR to address the concept.

@GutoVeronezi
why do you think the new log is way more clear......

For root admins, no information is given by the new error message, it makes no sense.

As a workaround, can you display the error message on UI for regular users? If your customers use UI not API. @lucas-a-martins

@GutoVeronezi
Copy link
Contributor

@GutoVeronezi why do you think the new log is way more clear......

The previous message says: Global setting endpoint.url has to be set to the Management Server's API endpoint; however, it validates if the setting has localhost in the name or it is blank. Without the change, operators will have to check the code to know that.

@weizhouapache
Copy link
Member

@GutoVeronezi why do you think the new log is way more clear......

The previous message says: Global setting endpoint.url has to be set to the Management Server's API endpoint; however, it validates if the setting has localhost in the name or it is blank. Without the change, operators will have to check the code to know that.

@GutoVeronezi
what I disagree is not the new error in the log (actually I like it), see below

        LOGGER.error("Global setting [{}] cannot contain localhost or be blank. Current value: {}", ApiServletPath.key(), csUrl);

What I disagree is the message in the exception, see below

        throw new InvalidParameterValueException("Unable to complete this operation. Contact your cloud admin.");

My suggestion is

  • update the description of the global setting to emphasize it should not be empty or contain localhost
  • update the message in the exception to include more information, for example "Unable to xxx because endpoint url is not configured correctly, please xxxx"

@JoaoJandre JoaoJandre modified the milestones: 4.20.0.0, 4.21.0.0 Sep 10, 2024
@kiranchavala
Copy link
Contributor

@lucas-a-martins any update on this

Could you please take a look at @weizhouapache suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants