Skip to content

DNS: Reviewed #4611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

DNS: Reviewed #4611

wants to merge 6 commits into from

Conversation

patterniha
Copy link
Contributor

@patterniha patterniha commented Apr 13, 2025

#4560 (comment)

After reviewing DNS-codes, I found and fixed some bugs, optimized the code and added some new features.

Fixed bugs:

  1. When the cache is disabled (disableCache = true), instead of not using the cache and sending a new IP-query, it both uses the cache and sends new IP-Query!
    This is because the order of the codes in nameserver_xxx-QueryIP function is wrong and the ips, ttl, err := s.findIPsForDomain(fqdn, option) should be after select-case code. link
    also we should not use for loop for this part of code.

  2. Suppose dialing a dns-server is unsuccessful(for tcp/quic base DNS)[for example receiving rst-ack after sending syn or receiving http-error response for doh], Instead of immediately returning an error and trying the next DNS-server-fallback, it waits until the timeout ends and then tries the next fallback!

  3. When ipOption.IPv4Enable and ipOption.IPv6Enable are both true, two IP-Query(A, AAAA) are sent and it waits for both responses to be received, then it merges the responses and returns, but suppose only AAAA-response is received and in the meantime, another request comes, while the first request is still waiting for A-response, For the second request, since AAAA-record is in the cache, it uses the cache and incorrectly only v6-IPs is returned for the second request! while the second request must wait for A-response like the first request, more details: DNS: Reviewed #4611 (comment)

  4. when IP-record-until-expire-time is less than 1 seconds and the IP-list is not empty, it returns IPs with TTL = 0 but we shouldn't set record-TTL to 0. The number 0 is not defined in the standard, and it may cause DNS information to be ignored or rejected,
    so after converting to uint32, It should be rounded up, not down. link

  5. the IsOwnLink function in "app > dns > dns.go" not updated after adding tag for each DNS-server.

  6. instead of creating new GeoIPMatcherContainer in "dns.go" we should use GlobalGeoIPContainer to reduce memory usage. GeoIPMatcherContainer is implemented to reduce memory usage, so we should only have one instance of GeoIPMatcherContainer in the entire code.

  7. when no IPs match expectedIPs, we should return ErrEmptyResponse instead of errExpectedIPNonMatch,
    otherwise, DNS-proxy send no response to client/browser DNS-query.

  8. in multi_error.go > AllEqual errors compare with == instead of errors.Is !

New features:

disableCache for each DNS-Server-Object:

currently we have only one global disableCache option that affects all DNS-servers, but we may want to disable the cache only for a specific DNS-Server.

///

priorIPs for each DNS-Server-Object:

allowUnexpectedIPs option is a bit confusing and when it is true expectedIPs act as priorIPs,
so it is better that we have priorIPs option instead, which is more clear.

also, we may have an expectedIPs separate from priorIPs.

for example suppose that we can only handle cloudflare and cloudfront IPs, but we prefer cloudflare IPs, so we should set:

expectedIPs: ["geoip:cloudflare", "geoip:cloudfront"],
priorIPs: ["geoip:cloudflare"]

so we may need seperate priorIPs option.

as a result, I add priorIPs option and remove allowUnexpectedIPs option.

///

finalQuery for DNS-Server-Object:**

Suppose you want to use DNS-Server-A for "youtube.com", but use DNS-Server-B for other google sites and use DNS-Server-C for others, so you should set:

"dns":{
    "servers": [
      {
       "address": "server-A",
       "domains": ["youtube.com"]
      },
      {
       "address": "server-B",
       "domains": ["geosite:google"]
      },
      "server-C"
    ]
}

But for whatever reason, server-A may be unavailable for a while(for example, the network may be unreachable for a while) so it uses server-B for "youtube.com", but we don't want this to happen.

Currently, there is no mechanism to prevent using server-B for "youtube.com", this is due to strange behavior of skipFallback (except creating custom-geosite where "youtube.com" is removed from "google" list, but this is not possible for all users)

but now we can set finalQuery= true for server-A, so any result from server-A return as a final-result and no other DNS-server will be performed.

"dns":{
    "servers": [
      {
       "address": "server-A",
       "domains": ["youtube.com"],
       "finalQuery": true
      },
      {
       "address": "server-B",
       "domains": ["geosite:google"]
      },
      "server-C"
    ]
}

///

unexpectedIPs for DNS-Server-Object:**

Suppose we want no IP to be in a IP-range-A, and if all IPs in IP-range-A, the next-dns-fallback performed.
for example for Serverless-for-Iran anti-sanction-version, i want to use a anti-sanction DNS, but goverment-run-anti-sanction-DNS only bypass sanctions and not filter.
IRGFW return 10.10.34.0/24, 2001:4188:2:600:10:10:34:0/120 range for blocked domain, so if the return-IPs is in these ranges, the fallback-DNS should be performed.

one way to achieve this goal is creating custom-geosite and then using ! sign, but this is not possible for all users.
another way is to calculate reverse-CIDR-list, for example using online-tools to calculate reverse-CIDR-list, but the reverse-CIDR-list is long and it causes the configuration to be messy.

as a result i add unexpectedIPs option, and a IP is matched if and only it does not match any of the IP-ranges in the unexpectedIPs list, in other words:
expectedIPs = [0.0.0.0/0, ::/0] - unexpectedIPs.

also, we may need all IPs to be in range-A, and no IP to be in range B, so we need both expectedIPS and unexpectedIPs: expectedIPS=[range-A], unexpectedIPs=[range-B]

"dns":{
    "servers": [
      {
       "address": "anti-sanction-dns",
       "unexpectedIPs": ["10.10.34.0/24", "2001:4188:2:600:10:10:34:0/120"]
      },
      "fallback-dns"
    ]
}

///

unpriorIPs(notPreferredIPs) for DNS-Server-Object:**

reverse form of priorIPs, Similar to unexpectedIPs, we also need unpriorIPs(notPreferredIPs) , and these are the IPs we don't prefer to have.

(Currently the names priorIPs and unpriorIPs have been chosen, but we can also name preferredIPs and notPreferredIPs )
///

Add to Documentation:

disableCache: bool

* The cache is disabled, if disableCache = true in DNS-settings or this DNS-Server-settings.

///

priorIPs(preferredIPs): [string]

* same as expectedIPs, but if all IPs do not meet the condition and are filtered, the IPs will still be returned.

///

finalQuery:bool

* when true, the result is returned in any case(even when IP-list is empty) and no other fallback will be performed.

///
unexpectedIPs: [string]

* reverse form of expectedIPs, a IP is matched if and only it does not match any of the IP-ranges in the list, in other words: 
`expectedIPs = [0.0.0.0/0, ::/0] - unexpectedIPs.`

///
unpriorIPS(notPreferredIPs): [string]

* reverse form of priorIPs(preferredIPs), these are the IPs we don't prefer to have

///

allowUnexpectedIPs -> removed.

optimization:

Also, some optimizations have been made which are clear in the code and do not need to be explained.

@Fangliding
Copy link
Member

1 好像确实是个BUG 挪个位置就行了
2 主要是为了和UDP DNS保持一致 他们都是改过来的 改了行为不一样但是能快速回退可能更好但是我必须说这套fallback很大程度上是为了执行IP过滤而不是故障回退
3 我看到了大段 query strategy 的改动 似乎是因为 3 个人觉得与其巨幅修改直接让它最开始执行两种query就行了 反正是异步是没什么额外开支 逻辑变成 请求IP→按option过滤返回的IP
4 是个小egde case 动不动都随便
5 是我疏忽了 虽然没注意到
这后面的新加字段还是那句老话 病态的需求产生病态的配置 allowUnexpectedIPs 确实confusing 这新加的几个一样confusing甚至有过之无不及 不过rprx似乎很挺你们的mitm的事业那算了吧我说了不算
以及test炸了

@Meo597
Copy link
Contributor

Meo597 commented Apr 13, 2025

finalQuery的确有用,这个开关给用户更多自由度
在我的用例中,dns分流需求有时会导致降级但没任何办法阻止
这正是我需要的,谢谢

2和3有歧义,没看懂这个改动是否会等待更长时间,如果会变慢:
在真实世界彻底移除v4还有十几年,到时候墙长什么样还在不在都不知道
因此我认为不应该过于标准地支持v6导致速度变慢,这是不可接受的

dns和routing使用同一个geoip instance也是我需要的,我用了商业ip库,启动xray就吃掉了2g内存

最后,我不看好unexpectedIPs,因为IRGFW可以轻松改为正版GFW那样把Facebook等IP响应给你

@patterniha
Copy link
Contributor Author

patterniha commented Apr 13, 2025

priorIPs logic is simple: act same as expectedIPs but if no ip matched, the original-IPs still returned without any change, it is useful for workers/MitM configs.

unexpectedIPs and unpriorIPs are just reverse-form of expected/priorIPs, and i explain why we need reverse-form too.

I saw a lot of changes in query strategy

most codes in nameserver files are duplicated, QueryIP, findIPsForDomain, updateIP, ...
i intended to redesign the code so that each code is written only once, but i didn't do that, because localhost and fakedns behave differently and also golang doesn't support inheritance well.

anyway, queryStrategy-code repeated in all nameservers, even localhost and fakedns, so i move it to (nameserver)Client struct (as ipOption) and now it's code is written only once.

///

also, you write tag code in QueryIP function, but that is wrong, because tag is a fixed-option and should be set once, not each time QueryIP executed.

///

anyway, these are only minor-optimizations and don't matter much, what matters is the bugs that are fixed.

also, no logic has changed and everything is the same as before.

///

in addition, test explodes is related to xhttp and not to me.

@Fangliding
Copy link
Member

related to xhttp

image

related to xhttp?

@patterniha patterniha force-pushed the main branch 2 times, most recently from 28245fd to fbe2bb9 Compare April 13, 2025 20:47
@patterniha
Copy link
Contributor Author

patterniha commented Apr 13, 2025

related to xhttp?

This was related to 3 other bugs in xray-core:

  1. in multi_error.go > AllEqual errors compare with == instead of errors.Is !

  2. in nameserver_tcp/doh_test.go > TestTCP/DOHLocalNameServerWithCache you had set disableCache = true and you expected the answers to be the same ! this test should have failed before, but it was passed incorrectly because of bug 1 in first-comment!

  3. in app>dns>dns_test.go you didn't define an answer for "notexist.google.com" type-A.
    so for type-A we were receiving EmptyResponse-error, but for type-AAAA were receiving NXDOMAIN-error.
    Previously, if we had two errors, only one of them would be returned, so the test was passed incorrectly.
    but now we compare the errors first, If they are equal, one of them will be returned, otherwise combine them and returned.

@patterniha
Copy link
Contributor Author

patterniha commented Apr 14, 2025

2 and 3 are ambiguous. I don't understand whether this change will take longer to wait.

3 is a critical bug, suppose we receive only AAAA-response and the A-response dropped for a request,

So for all subsequent requests and for 600 seconds we only have ipv6!!!
and suppose your network is IPv4 only, so for 600 seconds we cannot access internet!!!

///

this problem affect internal usage of built-in-DNS (for example domainStrategy = "UseIP" or domainStrategy = "IPOnDemand" for routing) but not affect client/browser request, because for client/browser IP-request we have two distinct request for IPv4 and IPv6 but for UseIP/IPOnDemand requests we have one merged request, and this bug only affect merged requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants