Skip to content

[processor/dnslookup] Initial implementation #39642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

kaisecheng
Copy link
Contributor

@kaisecheng kaisecheng commented Apr 24, 2025

Description

This PR adds a new DNS Lookup processor for resolving hostnames to IP and reversing DNS lookups IP to hostname.

Resolution Order and Retry Mechanism

The processor implements a chain of resolvers with the following resolution order

  1. Hostfiles - when hostname is not in files, the processor continues to the next resolver in the chain. Note that later hostfiles in the list take precedence over earlier ones for duplicate entries.
  2. Custom DNS servers - "no resolution" is a valid result and the chain stops. For each DNS server, temporary failures trigger retries. The resolver prefers IPv4 addresses over IPv6 when both are available.
  3. System resolver - same as (2), uses the operating system's default DNS as the final fallback

Caching

  • Successful resolutions are cached in a "hit cache" with configurable TTL
  • Failed lookups are cached in a "miss cache" to avoid repeated expensive failures
  • Both caches implement LRU eviction policies

Config

  dnslookup:
    # Forward DNS resolution configuration (hostname to IP)
    # Default: enabled
    resolve:
      enabled: true
      # Context for attributes: "resource" or "record". Default: "resource"
      context: "record"
      # List of attributes to check for hostnames. The first valid hostname is used. Default: ["source.address"]
      attributes: ["source.address"]
      # Attribute to store the resolved IP.  Default: "source.ip"
      resolved_attribute: "source.ip"

    # Reverse DNS resolution configuration (IP to hostname)
    # Default: disabled
    reverse:
      enabled: true
      # Context for attributes: "resource" or "record". Default: "resource"
      context: "record"
      # List of attributes to check for IPs. The first valid IP is used. Default: ["source.ip"]
      attributes: ["server.ip"]
      # Attribute to store the resolved hostname. Default: "source.address"
      resolved_attribute: "server.address"

    # Maximum number of failed resolutions to cache. Default: 1000
    miss_cache_size: 1000
    # Time-to-live (seconds) for failed resolution cache entries. Default: 60
    miss_cache_ttl: 60
    # Maximum number of successful resolutions to cache. Default: 10000
    hit_cache_size: 10000 
    # Time-to-live (seconds) for successful resolution cache entries. Default: 300
    hit_cache_ttl: 300
    # Maximum number of retry attempts for DNS lookups. Default: 1
    max_retries: 1
    # Timeout (seconds) for individual DNS lookups. Default: 0.5
    timeout: 0.5
    
    # Path to custom host files. Default: []
    hostfiles: 
      - "/path/to/host/file"
      - "/path/to/other/file"
    
    # Address of custom DNS servers. Default: []
    nameservers:
      - 8.8.8.8
      - 1.1.1.1
    
    # Enable the system resolver. Default: true
    enable_system_resolver: true

Link to tracking issue

#34398

TODO

  • tests for factory and prcoessor
  • readme
  • changelog

Testing

Added unit test for all resolvers

Documentation

cc @edmocosta

Move retry from chain resolver to nameserver resolver.
No resolution is considered as success and stored in hit cache.
Handle hostname/IP not found
Handle non retryable error
- Add test dependency
- Add Lookup interface for mock
- Add validation to nameserver address
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 17, 2025
The golang-lru library has a known issue where its expirable LRU cache spawns a background goroutine for TTL eviction that cannot be properly stopped.
This commit replaces it with go-freelru, which provides similar functionality without leaking goroutines.
@github-actions github-actions bot removed the Stale label May 24, 2025
@edmocosta
Copy link
Contributor

Hi @kaisecheng! I'm wondering if we should handle the configurations per signal, like other processor are doing, so we wouldn't need one processor per singal if the the source or target attribute config changes. For example, with the current config format, I wouldn't be able to set different resolved_attribute names per signal, or choose where exactly I want the result attribute to be added, as record is hard-coded. For traces for example, I might want to add it to span, not spanevent:

resolve:
  enabled: true
  context: "record"
  attributes: ["source.address"]
  resolved_attribute: "source.ip"

I can think about a few alternatives (added to each signal as an example), being the first 2 already possible, and the last one would depend on an OTTL change:

resolve:
    logs: # signal
        source_context: log # context from where values should be taken
        source_values: [body["ip"], attributes["ip"], resource.attributes["ip"]] # OTTL value expressions
        target_context: "resource" # context where the result attribute should be set
        target_attribute: "bar" # new attribute name
    metrics: # singal
        sources: [datapoint.attributes["ip"], resource.attributes["ip"]] # Value expressions (possible) + context inference (PR open)
        target_context: "datapoint" # context where the result attribute should be set
        target_attribute: "bar" # new attribute name
    traces: # signal    
        sources: [span.attributes["ip"], resource.attributes["ip"]] # Value expressions (possible) + context inference (PR open)
        target: "span.attributes[\"bar\"]" # use OTTL for parsing and setting the path's value (not currently supported)

Thoughts? @andrzej-stencel WDYT?

@kaisecheng
Copy link
Contributor Author

After a discussion, we've agreed to retain the current design, aligning it with the configuration of the GeoIPProcessor and accepting the same limitations. We believe that supporting OTTL value expressions with context inference for getting and setting attributes offers the greatest flexibility to users, and hopefully we will get there in the future.

If further discussion is needed on how to resolve attributes and context, I will start with an initial version that supports only resource, and iterate on it later

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@kaisecheng
Copy link
Contributor Author

This draft will be split to smaller PRs

@kaisecheng kaisecheng closed this Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants