Description
Hi All,
We are running a two node pacemaker cluster in AWS and we use "awsvip" resource type to configure the vip IP. Below is the conf
pcs resource show privip_node1
Resource: privip_node1 (class=ocf provider=heartbeat type=awsvip)
Attributes: secondary_private_ip=10.x.x.x
Operations: migrate_from interval=0s timeout=30s (privip_node1-migrate_from-interval-0s)
migrate_to interval=0s timeout=30s (privip_node1-migrate_to-interval-0s)
monitor interval=20s timeout=30s (privip_node1-monitor-interval-20s)
start interval=0s timeout=30s (privip_node1-start-interval-0s)
stop interval=0s timeout=30s (privip_node1-stop-interval-0s)
validate interval=0s timeout=10s (privip_node1-validate-interval-0s)
pcs resource show node1_vip
Resource: node1_vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.x.x.x
Operations: monitor interval=10s timeout=20s (node1_vip-monitor-interval-10s)
start interval=0s timeout=20s (node1_vip-start-interval-0s)
stop interval=0s timeout=20s (node1_vip-stop-interval-0s)
The EC2 instance is configured to use IMDSV2.The fence_aws agent and resource-agent have also been upgraded to the most recent versions, which support imdsv2. Additionally, the resource is set up to use the IAM Profile credentials.
fence-agents-aws-4.2.1-41.el7_9.3.x86_64
python-s3transfer-0.1.13-1.0.1.el7.noarch
resource-agents-4.1.1-61.el7_9.15.x86_64
pip list | grep -i boto
boto3 (1.10.0)
botocore (1.13.50)
aws --version
aws-cli/2.9.4 Python/3.9.11 Linux/3.10.0-1160.80.1.0.1.el7.x86_64 exe/x86_64.oracle.7 prompt/off
pip3 list | grep -i boto
boto3 1.23.10
botocore 1.26.10
The privip resource consistently fails with the different errors:
pengine: warning: unpack_rsc_op_failure: Processing failed monitor of privip_node2 on node2: unknown error | rc=1
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000:109357 - timed out after 30000ms
Jun 16 10:01:43 node2 lrmd[36967]: notice: privip_node2_monitor_20000:13042:stderr [ Unable to locate credentials. You can configure credentials by running "aws configure". ]
Jun 16 10:01:43 node2 crmd[36970]: notice: privip_node2_monitor_20000:91 [ % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 359 100 359 0 0 37513 0 --:--:-- --:--:-- --:--:-- 39888\n\nUnable to locate credentials. You can configure credentials by running "aws configure".\n ]
Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ #15 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ]
Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ #15 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ]
Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ An error occurred (MissingParameter) when calling the DescribeInstances operation: The request must contain the parameter InstanceId ]
Failed Resource Actions:
- privip_node1_start_0 on node1 'not running' (7): call=250, status=complete, exitreason='instance_id not found. Is this a EC2 instance?',
last-rc-change='Fri May 26 07:27:46 2023', queued=0ms, exec=6597ms
Any advice would be great.