Skip to content

Conversation

@drzee99
Copy link

@drzee99 drzee99 commented Jan 10, 2026

Fix DNS resolution performance regression during cloud-init local

Summary

This PR addresses critical DNS resolution performance issues during the early cloud-init local stage that cause boot delays of 2+ minutes, particularly with systemd version 259 and later.

Problem

  • Boot delays: 2+ minutes (up from <30 seconds) during cloud-init local
  • Root cause: DNS queries for IP addresses during DNS redirect detection
  • Systemd 259 regression: Recent systemd changes make DNS resolution significantly slower during early boot
  • Legacy URL: Hardcoded DNS-dependent metadata URL that's no longer documented by AWS

Solution

1. Optimize IP address handling in util.py

  • Move IP address detection to function start to bypass all DNS operations
  • Remove duplicate IP check that occurred after expensive DNS queries
  • IP addresses now completely avoid DNS redirect detection

2. Remove legacy DNS-dependent URL from DataSourceEc2.py

  • Remove http://instance-data.:8773 which is not in current AWS IMDS documentation
  • Keep only IP-based endpoints that work without DNS resolution

Changes

  • cloudinit/util.py: Early return for IP addresses in is_resolvable()
  • cloudinit/sources/DataSourceEc2.py: Remove legacy DNS-dependent metadata URL

Testing

  • IP addresses return immediately from is_resolvable()
  • Cloud-init local completes in <30 seconds (down from 2+ minutes)
  • IMDS access works without DNS resolution
  • No functional regressions

Related Issues

Fixes #6641 - Systemd version 259 slows down DNS check during cloud-init local

Backward Compatibility

  • ✅ No breaking changes
  • ✅ Maintains all existing functionality
  • ✅ Uses only documented AWS IMDS endpoints

Fixes DNS queries for IP addresses that cause 2+ minute boot delays,
particularly with systemd 259+. Moves IP detection earlier in
is_resolvable() and removes legacy DNS-dependent metadata URL.

Fixes canonical#6641
@holmanb holmanb self-assigned this Jan 12, 2026
Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Please make sure that you have read the contribution guide.

metadata_urls = [
"http://169.254.169.254",
"http://[fd00:ec2::254]",
"http://instance-data.:8773",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ec2 datasource is used by various other clouds besides just ec2 - and unfortunately not all clouds are known, so this change poses a risk.

Copy link
Author

@drzee99 drzee99 Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is confusing. There should be a separate Data Source implementation for each provider. Even if some are same/similar to allow for future changes a cloud provider may implement.

That being said.

It is possible to override the metadata_urls ref: https://cloudinit.readthedocs.io/en/latest/reference/datasources/ec2.html

IMHO the default should be the ones that are provided by the named Data Source (in this case EC2). If this breaks other cloud providers that use the same path, then they should create an config setting for the metadata_url as pr. documentation to add the relevant metadata_urls.

@drzee99
Copy link
Author

drzee99 commented Jan 12, 2026

I have approved the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Systemd version 259 slows down DNS check during cloud-init local

2 participants