Skip to content

Fix ConfigurationVO load exception after schema change #10485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

abh1sar
Copy link
Collaborator

@abh1sar abh1sar commented Feb 28, 2025

Description

This PR fixes #10480

The configuration table schema was changed in PR #10300
But it causes problem if the ConfigurationVO class structure was cached with the old fields.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

Copy link

codecov bot commented Feb 28, 2025

Codecov Report

Attention: Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.

Project coverage is 16.57%. Comparing base (8e4fe1c) to head (04dbc7e).

Files with missing lines Patch % Lines
...ain/java/com/cloud/utils/db/TransactionLegacy.java 0.00% 6 Missing ⚠️
...src/main/java/com/cloud/upgrade/dao/DbUpgrade.java 0.00% 3 Missing ⚠️
...ava/com/cloud/upgrade/dao/Upgrade42010to42100.java 0.00% 3 Missing ⚠️
...java/com/cloud/upgrade/DatabaseUpgradeChecker.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10485      +/-   ##
============================================
- Coverage     16.57%   16.57%   -0.01%     
  Complexity    13988    13988              
============================================
  Files          5745     5746       +1     
  Lines        510847   510862      +15     
  Branches      62140    62142       +2     
============================================
- Hits          84696    84695       -1     
- Misses       416677   416692      +15     
- Partials       9474     9475       +1     
Flag Coverage Δ
uitests 3.91% <ø> (ø)
unittests 17.47% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@abh1sar abh1sar changed the title Fix ConfigurationVO load exception on fresh install Fix ConfigurationVO load exception after schema change Feb 28, 2025
@DaanHoogland DaanHoogland added this to the 4.21.0 milestone Mar 3, 2025
@abh1sar
Copy link
Collaborator Author

abh1sar commented Mar 3, 2025

@blueorangutan package

@blueorangutan
Copy link

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 12634

@abh1sar
Copy link
Collaborator Author

abh1sar commented Mar 3, 2025

@blueorangutan package

@blueorangutan
Copy link

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12636

@abh1sar
Copy link
Collaborator Author

abh1sar commented Mar 3, 2025

@blueorangutan test

@blueorangutan
Copy link

@abh1sar a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-12539)

@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rohityadavcloud a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-12546)

@abh1sar
Copy link
Collaborator Author

abh1sar commented Mar 4, 2025

@blueorangutan package

@blueorangutan
Copy link

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-12553)

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12648

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-12555)

@JoaoJandre
Copy link
Contributor

Instead of changing how we get the value, shouldn't we normalize the database data so that it works with the current way of getting the values?

Otherwise, if someone in the future creates a method to get the value the old way and only tests on a new install, it might introduce a bug for people running old installs.

@abh1sar
Copy link
Collaborator Author

abh1sar commented Mar 11, 2025

@JoaoJandre We identified the issue with how BackupDaoImpl class caches the columns of the table. Even though both configurations table and ConfigurationVO code has the new schema, the ConfigurationsDao._allColumns field still had the older schema from before upgrade. That's why after management server restart ConfigurationsDaoImpl_allColumns was getting regenerated with the correct fields.
I have reverted the older commit and added the commit to regenerate ConfigurationsDaoImpl._allColumns when the Configurations table schema is changed.

@abh1sar abh1sar self-assigned this Mar 11, 2025
@abh1sar
Copy link
Collaborator Author

abh1sar commented May 19, 2025

@abh1sar based on your testing and discussion this would need changes.

I was able to not see the error by evicting old connections when HIkariCP is used for pooling.

Thanks @shwstppr I was able to verify the suggested changed.

I'm not sure about DBCP. I've not reproduced the issue there yet and based on my little research it may need closing the datasource. Also, we need to remove that markForColumnsRefresh logic as that doesn't seem to be the problem.

I don't see the issue with DBCP, have tried it multiple times. I am not sure but the caching behaviour is a bit different from HikariCP. So, I have not added any code to handle DBCP.

Sorry, it took a while to come back. I was busy with some other deliverables.
Please check @shwstppr @JoaoJandre .

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM.

@abh1sar
Copy link
Collaborator Author

abh1sar commented May 19, 2025

@blueorangutan package

@blueorangutan
Copy link

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13441

@JoaoJandre
Copy link
Contributor

Could you post the test steps you took to reproduce the issue and verify that this PR fixes it?

@abh1sar
Copy link
Collaborator Author

abh1sar commented May 20, 2025

Could you post the test steps you took to reproduce the issue and verify that this PR fixes it?

@JoaoJandre The first time management server is started after running the cloudstack-setup-database commands will cause lots of exceptions like below in the management-server.log

I tested the same with the changes with both HikariCP and Apache DBCP. I observed no Exceptions and management-server was coming up as expected.

2025-02-27 13:26:26,070 ERROR [o.a.c.f.c.d.ConfigurationDaoImpl] (VMSchedulerPollTask:[ctx-f804ce4a]) (logid:30a4a0a1) DB Exception on: HikariProxyPreparedStatement@1499732297 wrapping com.mysql.cj.jdbc.ServerPreparedStatement[782]: SELECT configuration.instance, configuration.component, configuration.name, configuration.value, configuration.default_value, configuration.description, configuration.category, configuration.is_dynamic, configuration.scope, configuration.updated, configuration.group_id, configuration.subgroup_id, configuration.parent, configuration.display_text, configuration.kind, configuration.options FROM configuration WHERE configuration.name = x'6472732e706c616e2e6578706972652e696e74657276616c' java.sql.SQLDataException: Cannot determine value type from string ''
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:115)
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:98)
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:90)
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:64)
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:74)
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:96)
	at com.mysql.cj.jdbc.result.ResultSetImpl.getObject(ResultSetImpl.java:1431)
	at com.mysql.cj.jdbc.result.ResultSetImpl.getInt(ResultSetImpl.java:830)
	at com.zaxxer.hikari.pool.HikariProxyResultSet.getInt(HikariProxyResultSet.java)
	at com.cloud.utils.db.GenericDaoBase.setField(GenericDaoBase.java:594)
	at com.cloud.utils.db.GenericDaoBase.setField(GenericDaoBase.java:2059)
	at com.cloud.utils.db.GenericDaoBase.toEntityBean(GenericDaoBase.java:1919)
	at com.cloud.utils.db.GenericDaoBase.toEntityBean(GenericDaoBase.java:1880)
	at com.cloud.utils.db.GenericDaoBase.findById(GenericDaoBase.java:1075)
	at com.cloud.utils.db.GenericDaoBase.lockRow(GenericDaoBase.java:1050)
	at com.cloud.utils.db.GenericDaoBase.findById(GenericDaoBase.java:996)
	at jdk.internal.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569) 

@abh1sar
Copy link
Collaborator Author

abh1sar commented May 21, 2025

@blueorangutan package

@blueorangutan
Copy link

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13470

@abh1sar
Copy link
Collaborator Author

abh1sar commented May 21, 2025

@blueorangutan test

@blueorangutan
Copy link

@abh1sar a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13371)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 48522 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10485-t13371-kvm-ol8.zip
Smoke tests completed. 133 look OK, 1 have errors, 7 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestIpv6Vpc>:setup Error 0.00 test_vpc_ipv6.py
all_test_vpc_redundant Skipped --- test_vpc_redundant.py
all_test_vpc_router_nics Skipped --- test_vpc_router_nics.py
all_test_vpc_vpn Skipped --- test_vpc_vpn.py
all_test_webhook_delivery Skipped --- test_webhook_delivery.py
all_test_webhook_lifecycle Skipped --- test_webhook_lifecycle.py
all_test_host_maintenance Skipped --- test_host_maintenance.py
all_test_hostha_kvm Skipped --- test_hostha_kvm.py

@shwstppr
Copy link
Contributor

shwstppr commented Jul 9, 2025

@abh1sar @sureshanaparti I think we should try to get this in 4.21

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14115

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13737)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 59320 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10485-t13737-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Contributor

@abh1sar et al, please advice on testing. It seems to me an upgrade test is needed at least.

@abh1sar
Copy link
Collaborator Author

abh1sar commented Jul 10, 2025

@DaanHoogland
The exceptions are seen every time the management server is started after running the cloudstack-setup-database command. So, the fix can be verified that way.
Both HikariCP and Apache DBCP should be tested.

Upgrade should also be tested for any regressions, and logs reviewed after the upgrade to check that the exceptions are not there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[DB] Exceptions logged on fresh management server start
8 participants