-
Notifications
You must be signed in to change notification settings - Fork 0
corosync: Implement double ring #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: stable/sap/3.0
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| #!/bin/bash | ||
|
|
||
| # Script to re-load corosync configuration on a crowbar managed pacemaker | ||
| # cluster. | ||
| # This script can be used when you switch from single corosync ring to | ||
| # dual corosync ring. | ||
| # After you applied the modified proposal successfully, run this script on | ||
| # crowbar | ||
| # Please stop all chef-client runs while running this script. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't this be automated, or at least checked before proceeding? |
||
| set -eux | ||
|
|
||
| CLUSTER_PROPOSAL_NAME="$1" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please do some input validation on this. |
||
|
|
||
| # Make sure that the proposal exists by asking for it's details | ||
| # if this command fails, the script exits | ||
| crowbarctl proposal show pacemaker "$CLUSTER_PROPOSAL_NAME" | ||
|
|
||
| NODES=$(crowbarctl \ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tiny nit (take it or leave it): stylistically, with multi-line |
||
| proposal show pacemaker "$CLUSTER_PROPOSAL_NAME" \ | ||
| --format=plain \ | ||
| --filter "deployment.pacemaker.elements.pacemaker-cluster-member" | | ||
| cut -d " " -f 2) | ||
|
|
||
| # Get the first node - it will be used to issue crm commands | ||
| for node in $NODES; do | ||
| FIRST_NODE="$node" | ||
| break | ||
| done | ||
|
|
||
| # Print out initial configuration | ||
| ssh "$FIRST_NODE" corosync-cfgtool -s | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this fails, shouldn't it bail? |
||
|
|
||
| # Put the cluster in maintenance mode | ||
| ssh "$FIRST_NODE" crm --wait configure property maintenance-mode=true | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here; this should check for success. |
||
|
|
||
| # Restart corosync on all nodes | ||
| for node in $NODES; do | ||
| ssh "$node" systemctl restart corosync | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This worries me quite a bit. Firstly, won't it be hugely disruptive to the cloud if everything goes down at the same time? So at very least there should be a big fat warning first. Secondly, isn't it open to some kind of race conditions? E.g. what if one node takes ages to shut down, so that another node comes back up with two rings before the first node has shut down? |
||
| done | ||
|
|
||
| # Give some time for corosync to stand up | ||
| sleep 30 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah ... no :-) No guessing of the time it takes, please - let's poll in a loop here waiting for whatever condition gives us confidence it's safe to proceed. |
||
|
|
||
| # Exit from maintenance mode | ||
| ssh "$FIRST_NODE" crm --wait configure property maintenance-mode=false | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again there should be checks here. |
||
|
|
||
| # Print out new configuration | ||
| ssh "$FIRST_NODE" corosync-cfgtool -s | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| def upgrade ta, td, a, d | ||
| a["corosync"]["ring_mode"] = \ | ||
| ta["corosync"]["ring_mode"] | ||
| a["corosync"]["second_ring_network"] = \ | ||
| ta["corosync"]["second_ring_network"] | ||
| return a, d | ||
| end | ||
|
|
||
| def downgrade ta, td, a, d | ||
| a["corosync"].delete("ring_mode") | ||
| a["corosync"].delete("second_ring_network") | ||
| return a, d | ||
| end |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -369,6 +369,28 @@ def apply_role_pre_chef_call(old_role, role, all_nodes) | |
| role.default_attributes["corosync"]["members"] = member_nodes.map{ |n| n.get_network_by_type("admin")["address"] } | ||
| role.default_attributes["corosync"]["transport"] = role.default_attributes["pacemaker"]["corosync"]["transport"] | ||
|
|
||
| # Set up the second ring if dual_ring mode configured | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please can you add this large chunk of code as a separate method? Long methods really harm readability and maintainability. |
||
| ring_mode = role.default_attributes["pacemaker"]["corosync"]["ring_mode"] | ||
|
|
||
| if ring_mode == "dual_ring" | ||
| second_ring_network = role.default_attributes["pacemaker"]["corosync"]["second_ring_network"] | ||
| second_ring_net = Chef::DataBag.load("crowbar/#{second_ring_network}_network") | ||
|
|
||
| net_svc = NetworkService.new @logger | ||
| second_ring_members = member_nodes.map do |member_node| | ||
| allocated_ip_response = net_svc.allocate_ip "default", second_ring_network, "host", member_node.name | ||
| allocated_ip_response[1]["address"] | ||
| end | ||
|
|
||
| role.default_attributes["corosync"]["second_ring_used"] = true | ||
| role.default_attributes["corosync"]["second_ring_bind_addr"] = second_ring_net["network"]["subnet"] | ||
| role.default_attributes["corosync"]["second_ring_mcast_addr"] = role.default_attributes["pacemaker"]["corosync"]["mcast_addr"] | ||
| role.default_attributes["corosync"]["second_ring_mcast_port"] = role.default_attributes["pacemaker"]["corosync"]["mcast_port"] | ||
| role.default_attributes["corosync"]["second_ring_members"] = second_ring_members | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it just me or do you now have both |
||
| else | ||
| role.default_attributes["corosync"]["second_ring_used"] = false | ||
| end | ||
|
|
||
| role.default_attributes["drbd"] ||= {} | ||
| role.default_attributes["drbd"]["common"] ||= {} | ||
| role.default_attributes["drbd"]["common"]["net"] ||= {} | ||
|
|
@@ -626,6 +648,35 @@ def validate_proposal_after_save proposal | |
| ) | ||
| end | ||
|
|
||
| ring_mode = proposal["attributes"][@bc_name]["corosync"]["ring_mode"] | ||
| unless %w(single_ring dual_ring).include?(ring_mode) | ||
| validation_error I18n.t( | ||
| "barclamp.#{bc_name}.validation.transport_value", | ||
| ring_mode: ring_mode | ||
| ) | ||
| end | ||
|
|
||
| if ring_mode == "dual_ring" | ||
| unless transport == "udpu" | ||
| validation_error I18n.t( | ||
| "barclamp.#{bc_name}.validation.second_ring_only_udpu" | ||
| ) | ||
| end | ||
| second_ring_network = proposal["attributes"][@bc_name]["corosync"]["second_ring_network"] | ||
| if second_ring_network == "admin" | ||
| validation_error I18n.t( | ||
| "barclamp.#{bc_name}.validation.second_ring_network_must_differ_from_admin" | ||
| ) | ||
| end | ||
| second_ring_net = Chef::DataBag.load("crowbar/#{second_ring_network}_network") rescue nil | ||
| unless second_ring_net | ||
| validation_error I18n.t( | ||
| "barclamp.#{bc_name}.validation.second_ring_network_value", | ||
| second_ring_network: second_ring_network | ||
| ) | ||
| end | ||
| end | ||
|
|
||
| no_quorum_policy = proposal["attributes"][@bc_name]["crm"]["no_quorum_policy"] | ||
| unless %w(ignore freeze stop suicide).include?(no_quorum_policy) | ||
| validation_error I18n.t( | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,11 @@ | |
| .panel-body | ||
| = select_field %w(corosync transport), :collection => :transport_for_pacemaker | ||
|
|
||
| = select_field %w(corosync ring_mode), :collection => :ring_modes_for_corosync, "data-showit" => "dual_ring", "data-showit-target" => "#dual_ring_container", "data-showit-direct" => "true" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line is too long. [192/150] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with the dog. |
||
|
|
||
| %div{ :id => "dual_ring_container" } | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The hound is probably right here too: |
||
| = string_field %w(corosync second_ring_network) | ||
|
|
||
| = select_field %w(crm no_quorum_policy), :collection => :no_quorum_policy_for_pacemaker | ||
| %span.help-block | ||
| = t('.crm.no_quorum_policy_hint_html') | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,6 +60,11 @@ en: | |
| down or rebooted; you will have to manually start it after having | ||
| fixed the issue. "Automatic" means that this setting will be set to | ||
| "true" for two-nodes cluster, and to "false" otherwise.' | ||
| ring_mode: 'Ring configuration' | ||
| ring_modes: | ||
| single_ring: 'Single ring (only admin network)' | ||
| dual_ring: 'Dual ring' | ||
| second_ring_network: 'Network to use as second ring' | ||
| crm: | ||
| no_quorum_policy: 'Policy when cluster does not have quorum' | ||
| no_quorum_policy_hint_html: 'Refer to the <a href="http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/_available_cluster_options.html">pacemaker documentation</a> for a description of each value.' | ||
|
|
@@ -123,6 +128,10 @@ en: | |
| drbd: 'Setting up DRBD requires a cluster of two nodes.' | ||
| hae_repo: 'The HAE repositories have not been setup.' | ||
| transport_value: 'Invalid transport value: %{transport}.' | ||
| ring_mode_value: 'Invalid ring mode: %{ring_mode}.' | ||
| second_ring_network_value: 'Network not found for setting up second corosync ring: %{second_ring_network}.' | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: I prefer |
||
| second_ring_only_udpu: 'Dual corosync ring is only supported with Unicast (UDPU).' | ||
| second_ring_network_must_differ_from_admin: 'The second network must not be the admin network' | ||
| quorum_policy: 'Invalid no-quorum-policy value: %{no_quorum_policy}.' | ||
| platform: 'All nodes in proposal must have the same platform.' | ||
| pacemaker_proposal: 'Nodes cannot be part of multiple Pacemaker proposals, but %{other_member} is already part of proposal \"%{p_name}\".' | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need both
:ring_modeand:second_ring_used? Is there a circumstance in which:ring_modewould be"double_ring"and:second_ring_usedwould befalse?