Skip to content

AWS Pacemaker awsvip failing with different errors #1876

Open
@jayan800

Description

Hi All,

We are running a two node pacemaker cluster in AWS and we use "awsvip" resource type to configure the vip IP. Below is the conf

pcs resource show privip_node1

Resource: privip_node1 (class=ocf provider=heartbeat type=awsvip)
Attributes: secondary_private_ip=10.x.x.x
Operations: migrate_from interval=0s timeout=30s (privip_node1-migrate_from-interval-0s)
migrate_to interval=0s timeout=30s (privip_node1-migrate_to-interval-0s)
monitor interval=20s timeout=30s (privip_node1-monitor-interval-20s)
start interval=0s timeout=30s (privip_node1-start-interval-0s)
stop interval=0s timeout=30s (privip_node1-stop-interval-0s)
validate interval=0s timeout=10s (privip_node1-validate-interval-0s)

pcs resource show node1_vip

Resource: node1_vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.x.x.x
Operations: monitor interval=10s timeout=20s (node1_vip-monitor-interval-10s)
start interval=0s timeout=20s (node1_vip-start-interval-0s)
stop interval=0s timeout=20s (node1_vip-stop-interval-0s)

The EC2 instance is configured to use IMDSV2.The fence_aws agent and resource-agent have also been upgraded to the most recent versions, which support imdsv2. Additionally, the resource is set up to use the IAM Profile credentials.

fence-agents-aws-4.2.1-41.el7_9.3.x86_64
python-s3transfer-0.1.13-1.0.1.el7.noarch
resource-agents-4.1.1-61.el7_9.15.x86_64

pip list | grep -i boto
boto3 (1.10.0)
botocore (1.13.50)

aws --version
aws-cli/2.9.4 Python/3.9.11 Linux/3.10.0-1160.80.1.0.1.el7.x86_64 exe/x86_64.oracle.7 prompt/off

pip3 list | grep -i boto
boto3 1.23.10
botocore 1.26.10

The privip resource consistently fails with the different errors:

pengine: warning: unpack_rsc_op_failure: Processing failed monitor of privip_node2 on node2: unknown error | rc=1
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000:109357 - timed out after 30000ms

Jun 16 10:01:43 node2 lrmd[36967]: notice: privip_node2_monitor_20000:13042:stderr [ Unable to locate credentials. You can configure credentials by running "aws configure". ]
Jun 16 10:01:43 node2 crmd[36970]: notice: privip_node2_monitor_20000:91 [ % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 359 100 359 0 0 37513 0 --:--:-- --:--:-- --:--:-- 39888\n\nUnable to locate credentials. You can configure credentials by running "aws configure".\n ]

Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ #15 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ]
Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ #15 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ]
Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ An error occurred (MissingParameter) when calling the DescribeInstances operation: The request must contain the parameter InstanceId ]

Failed Resource Actions:

  • privip_node1_start_0 on node1 'not running' (7): call=250, status=complete, exitreason='instance_id not found. Is this a EC2 instance?',
    last-rc-change='Fri May 26 07:27:46 2023', queued=0ms, exec=6597ms

Any advice would be great.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions