Open
Description
See https://jenkins.scylladb.com/job/scylla-2025.1/job/alternator/job/longevity-alternator-3h-test/7/pipeline-console/log?nodeId=162 for a failure to run due to lack of i4i.4xlarge - but why would it need such a hefty instance size for Manager?
[2025-03-27T14:51:15.667Z] [us-east-1] Creating {count} on-demand instances using AMI id 'ami-0ccb7e0118ad5e03c' with following parameters:
[2025-03-27T14:51:15.667Z] {'ImageId': 'ami-0ccb7e0118ad5e03c', 'KeyName': 'scylla_test_id_ed25519', 'InstanceType': 'i4i.4xlarge', 'UserData': 'Content-Type: multipart/mixed; boundary="===============2748462579806779269=="\nMIME-Version: 1.0\n\n--===============2748462579806779269==\nContent-Type: x-scylla/json\nMIME-Version: 1.0\nContent-Disposition: attachment; filename="scylla_machine_image.json"\n\n{\n "cluster_name": "alternator-3h-2025-1-db-cluster-fd548c9f",\n "data_device": "instance_store",\n "raid_level": 0,\n "scylla_yaml": {\n "cluster_name": "alternator-3h-2025-1-db-cluster-fd548c9f"\n },\n "start_scylla_on_first_boot": false\n}\n--===============2748462579806779269==\nContent-Type: text/cloud-config\nMIME-Version: 1.0\nContent-Disposition: attachment; filename="cloud-config.txt"\n\n\n #cloud-config\n cloud_final_modules:\n - [scripts-user, always]\n \n--===============2748462579806779269==\nContent-Type: text/x-shellscript\nMIME-Version: 1.0\nContent-Disposition: attachment; filename="user-script.txt"\n\n#!/bin/bash\nset -x\nwhile ! systemctl status cloud-init.service | grep "active (exited)"; do sleep 1; done\n\nwrite_syslog_ng_destination() {\n disk_buffer_option=""\n if syslog-ng -V | grep -q disk; then\n disk_buffer_option="disk-buffer(\n mem-buf-size(1048576)\n disk-buf-size(104857600)\n reliable(yes)\n dir(\\"/var/log\\")\n )"\n fi\n\ncat <<EOF >/etc/syslog-ng/conf.d/remote_sct.conf\ndestination remote_sct {\n syslog(\n "10.12.8.220"\n transport("tcp")\n port(32768)\n throttle(10000)\n $disk_buffer_option\n );\n};\nEOF\n}\n\nif [ -f /var/lib/sct/cloud-init/done ]; then\n write_syslog_ng_destination\n sudo systemctl restart syslog-ng\n exit 0\nfi\nif apt-get --help >/dev/null 2>&1 ; then\n if [ ! -f /tmp/disable_daily_apt_triggers_done ]; then\n rm -f /etc/apt/apt.conf.d/*unattended-upgrades /etc/apt/apt.conf.d/*auto-upgrades || true\n rm -f /etc/apt/apt.conf.d/*periodic /etc/apt/apt.conf.d/*update-notifier || true\n systemctl stop apt-daily.timer apt-daily-upgrade.timer apt-daily.service apt-daily-upgrade.service || true\n systemctl disable apt-daily.timer apt-daily-upgrade.timer apt-daily.service apt-daily-upgrade.service || true\n apt-get remove -o DPkg::Lock::Timeout=300 -y unattended-upgrades update-manager || true\n touch /tmp/disable_daily_apt_triggers_done\n fi\nfi\nSYSLOG_NG_INSTALLED=""\nif yum --help 2>/dev/null 1>&2 ; then\n if rpm -q syslog-ng ; then\n rm /etc/syslog-ng/syslog-ng.conf # Make sure we have default syslog-ng.conf\n yum reinstall -y syslog-ng\n SYSLOG_NG_INSTALLED=1\n else\n yum install -y epel-release\n for n in 1 2 3 4 5 6 7 8 9; do # cloud-init is running it with set +o braceexpand\n if yum install -y --downloadonly syslog-ng; then\n break\n fi\n done\n\n for n in 1 2 3; do # cloud-init is running it with set +o braceexpand\n if yum install -y syslog-ng; then\n SYSLOG_NG_INSTALLED=1\n break\n fi\n sleep 10\n done\n fi\nelif apt-get --help 2>/dev/null 1>&2 ; then\n if dpkg-query --show syslog-ng ; then\n rm /etc/syslog-ng/syslog-ng.conf # Make sure we have default syslog-ng.conf\n apt-get purge -o DPkg::Lock::Timeout=300 -y syslog-ng*\n DPKG_FORCE=confmiss apt-get --reinstall -o DPkg::Lock::Timeout=300 -y install syslog-ng\n SYSLOG_NG_INSTALLED=1\n else\n cat /etc/apt/sources.list\n for n in 1 2 3 4 5 6 7 8 9; do # cloud-init is running it with set +o braceexpand\n if apt-get -y update ; then\n break\n fi\n sleep 0.5\n done\n\n for n in 1 2 3; do # cloud-init is running it with set +o braceexpand\n DEBIAN_FRONTEND=noninteractive apt-get install -o DPkg::Lock::Timeout=300 -y syslog-ng || true\n if dpkg-query --show syslog-ng ; then\n SYSLOG_NG_INSTALLED=1\n break\n fi\n done\n fi\nelse\n echo "Unsupported distro"\nfi\n\nsource_name=`cat /etc/syslog-ng/syslog-ng.conf | tr -d "\\n" | tr -d "\\r" | sed -r "s/\\};/\\};\\n/g; s/source /\\nsource /g" | grep -P "^source.*system\\(\\)" | cut -d" " -f2`\n\nif grep -P "keep-timestamp\\([^)]+\\)" /etc/syslog-ng/syslog-ng.conf; then\n sed -i -r "s/keep-timestamp([ ]*yes[ ]*)/keep-timestamp(no)/g" /etc/syslog-ng/syslog-ng.conf\nelse\n sed -i -r "s/([ \t]*options[ \t]*\\\\{)/\\\\1\\n keep-timestamp(no);\\n/g" /etc/syslog-ng/syslog-ng.conf\nfi\n\nwrite_syslog_ng_destination\n\nif ! grep -P "log {.*destination\\\\(remote_sct\\\\)" /etc/syslog-ng/syslog-ng.conf; then\n echo "\nfilter filter_sct {\n # filter audit out\n not program(\\"^audit\\");\n};\n " >> /etc/syslog-ng/syslog-ng.conf\n echo "log { source($source_name); filter(filter_sct); destination(remote_sct); };" >> /etc/syslog-ng/syslog-ng.conf\nfi\n\nif [ ! -z "" ]; then\n if grep "rewrite r_host" /etc/syslog-ng/syslog-ng.conf; then\n sed -i -r "s/rewrite r_host \\{ set\\(\\"[^\\"]+\\"/rewrite r_host { set(\\"\\"/" /etc/syslog-ng/syslog-ng.conf\n else\n echo "rewrite r_host { set(\\"\\", value(\\"HOST\\")); };" >> /etc/syslog-ng/syslog-ng.conf\n sed -i -r "s/destination\\(remote_sct\\);[ \\t]*\\};/destination\\(remote_sct\\); rewrite\\(r_host\\); \\};/" /etc/syslog-ng/syslog-ng.conf\n fi\nfi\nsystemctl restart syslog-ng || true\ncurl -L -O https://github.com/brandond/syslog_ng_exporter/releases/download/0.1.0/syslog_ng_exporter\nchmod +x syslog_ng_exporter\nmv syslog_ng_exporter /usr/local/bin\n\nif [ -e /etc/systemd/system/syslog_ng_exporter.service ]; then\n rm /etc/systemd/system/syslog_ng_exporter.service\nfi\n\ncat <<EOM >> /etc/systemd/system/syslog_ng_exporter.service\n[Unit]\nDescription=Syslog-ng metrics Exporter\nWants=network.target network-online.target\nAfter=network.target network-online.target\n\n[Service]\nType=simple\nExecStart=/usr/local/bin/syslog_ng_exporter\nStandardOutput=journal\nStandardError=journal\nRestart=on-failure\n\n[Install]\nWantedBy=multi-user.target\nEOM\n\nsystemctl daemon-reload\nsystemctl enable syslog_ng_exporter.service\nsystemctl start syslog_ng_exporter.service\n\nif [ -f "/etc/security/limits.d/20-nproc.conf" ]; then\n sed -i -e "s/^\\*[[:blank:]]*soft[[:blank:]]*nproc[[:blank:]]*.*/*\t\tsoft\tnproc\t\tunlimited/" /etc/security/limits.d/20-nproc.conf || true\nelse\n echo "* hard nproc unlimited" > /etc/security/limits.d/20-nproc.conf || true\nfi\n\nsed -i "s/#MaxSessions \\(.*\\)$/MaxSessions 1000/" /etc/ssh/sshd_config || true\nsed -i "s/#MaxStartups \\(.*\\)$/MaxStartups 60/" /etc/ssh/sshd_config || true\nsed -i "s/#LoginGraceTime \\(.*\\)$/LoginGraceTime 15s/" /etc/ssh/sshd_config || true\nsed -i "s/#ClientAliveInterval \\(.*\\)$/ClientAliveInterval 60/" /etc/ssh/sshd_config || true\nsed -i "s/#ClientAliveCountMax \\(.*\\)$/ClientAliveCountMax 10/" /etc/ssh/sshd_config || true\nsystemctl restart sshd || systemctl restart ssh || true\nmkdir -p /var/lib/sct/cloud-init && touch /var/lib/sct/cloud-init/done\n--===============2748462579806779269==--\n', 'NetworkInterfaces': [{'DeviceIndex': 0, 'SubnetId': 'subnet-090ce5c775e0dbc19', 'Groups': ['sg-0feef3370ee8305ac']}], 'IamInstanceProfile': {'Name': 'qa-scylla-manager-backup-instance-profile'}, 'BlockDeviceMappings': [{'DeviceName': '/dev/sda1', 'Ebs': {'VolumeType': 'gp3', 'VolumeSize': 30}}], 'Placement': {'AvailabilityZone': 'us-east-1c'}}
[2025-03-27T14:51:25.381Z] Traceback (most recent call last):
[2025-03-27T14:51:25.381Z] File "/home/ubuntu/scylla-cluster-tests/./sct.py", line 1890, in <module>
[2025-03-27T14:51:25.381Z] cli.main(prog_name="hydra")
[2025-03-27T14:51:25.381Z] File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
[2025-03-27T14:51:25.381Z] rv = self.invoke(ctx)
[2025-03-27T14:51:25.381Z] File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
[2025-03-27T14:51:25.381Z] return _process_result(sub_ctx.command.invoke(sub_ctx))
[2025-03-27T14:51:25.381Z] File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
[2025-03-27T14:51:25.381Z] return ctx.invoke(self.callback, **ctx.params)
[2025-03-27T14:51:25.381Z] File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
[2025-03-27T14:51:25.381Z] return __callback(*args, **kwargs)
[2025-03-27T14:51:25.381Z] File "/home/ubuntu/scylla-cluster-tests/./sct.py", line 236, in provision_resources
[2025-03-27T14:51:25.381Z] layout.provision()
[2025-03-27T14:51:25.381Z] File "/home/ubuntu/scylla-cluster-tests/sdcm/sct_provision/aws/layout.py", line 34, in provision
[2025-03-27T14:51:25.381Z] self.db_cluster.provision()
[2025-03-27T14:51:25.381Z] File "/home/ubuntu/scylla-cluster-tests/sdcm/sct_provision/aws/cluster.py", line 334, in provision
[2025-03-27T14:51:25.381Z] instances = self.provision_plan(region_id, self._azs[az_id]).provision_instances(
[2025-03-27T14:51:25.381Z] File "/home/ubuntu/scylla-cluster-tests/sdcm/provision/common/provision_plan.py", line 40, in provision_instances
[2025-03-27T14:51:25.381Z] if instances := self.provisioner.provision(
[2025-03-27T14:51:25.382Z] File "/home/ubuntu/scylla-cluster-tests/sdcm/provision/aws/provisioner.py", line 74, in provision
[2025-03-27T14:51:25.382Z] return self._provision_on_demand_instances(
[2025-03-27T14:51:25.382Z] File "/home/ubuntu/scylla-cluster-tests/sdcm/provision/aws/provisioner.py", line 120, in _provision_on_demand_instances
[2025-03-27T14:51:25.382Z] instances = ec2_services[provision_parameters.region_name].create_instances(
[2025-03-27T14:51:25.382Z] File "/usr/local/lib/python3.10/site-packages/boto3/resources/factory.py", line 580, in do_action
[2025-03-27T14:51:25.382Z] response = action(self, *args, **kwargs)
[2025-03-27T14:51:25.382Z] File "/usr/local/lib/python3.10/site-packages/boto3/resources/action.py", line 88, in __call__
[2025-03-27T14:51:25.382Z] response = getattr(parent.meta.client, operation_name)(*args, **params)
[2025-03-27T14:51:25.382Z] File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 534, in _api_call
[2025-03-27T14:51:25.382Z] return self._make_api_call(operation_name, kwargs)
[2025-03-27T14:51:25.382Z] File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 976, in _make_api_call
[2025-03-27T14:51:25.382Z] raise error_class(parsed_response, operation_name)
[2025-03-27T14:51:25.382Z] botocore.exceptions.ClientError: An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 4): We currently do not have sufficient i4i.4xlarge capacity in the Availability Zone you requested (us-east-1c). Our system will be working on provisioning additional capacity. You can currently get i4i.4xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1a, us-east-1b, us-east-1d, us-east-1f.
[2025-03-27T14:51:25.382Z] Cleaning SSH agent
Metadata
Metadata
Assignees
Labels
No labels