Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks #50173

Closed
wants to merge 1 commit into from

Conversation

cnauroth
Copy link
Contributor

@cnauroth cnauroth commented Mar 5, 2025

What changes were proposed in this pull request?

While verifying Spark 4.0.0 RC2, I consistently saw YARN test AmIpFilterSuite#testProxyUpdate failing in my environment. The test is written to eventually expect a ServletException from getProxyAddresses after 5 seconds of retries, but I never received this exception.

This test and the corresponding AmIpFilter were introduced in SPARK-48238 as a fork of the Hadoop implementation to resolve a dependency conflict. However, it seems this test had a small bug in the way it was adapted into the Spark codebase. The AmIpFilter#getProxyAddresses() logic may either return an empty set or throw a ServletException if it can't find any valid configured proxies. The Hadoop test's assertion allows for either of these conditions:

https://github.com/apache/hadoop/blob/rel/release-3.4.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java#L212-L222

    // waiting for configuration update
    GenericTestUtils.waitFor(new Supplier<Boolean>() {
      @Override
      public Boolean get() {
        try {
          return filter.getProxyAddresses().isEmpty();
        } catch (ServletException e) {
          return true;
        }
      }
    }, 500, updateInterval);

The version in Spark strictly requires an exception to be thrown:

https://github.com/apache/spark/blob/v4.0.0-rc2/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala#L163-L168

    // waiting for configuration update
    eventually(timeout(5.seconds), interval(500.millis)) {
      assertThrows[ServletException] {
        filter.getProxyAddresses.isEmpty
      }
    }

The test involves updating the proxy configuration to use "unknownhost" as an invalid proxy. In my network, there is actually a host named "unknownhost", but it only has an IPv6 address, and I only have an IPv4 address. This causes a "network unreachable" error instead of "unknown host", resulting in an empty set instead of an exception.

This pull request changes the Spark test to be consistent with the Hadoop test, allowing either condition to succeed.

Why are the changes needed?

Maintain consistency with the intent of the original Hadoop test and ensure it can pass in any network setup.

Does this PR introduce any user-facing change?

No. The changes are in tests only.

How was this patch tested?

Existing tests pass in my environment after this change.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the YARN label Mar 5, 2025
@cnauroth
Copy link
Contributor Author

cnauroth commented Mar 5, 2025

@pan3793 and @LuciferYang , this relates to code you added/reviewed in #46611 . Would you please review this test fix? Thank you.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking a look at this.

@@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {
assert(!filter.getProxyAddresses.isEmpty)
// waiting for configuration update
eventually(timeout(5.seconds), interval(500.millis)) {
assertThrows[ServletException] {
try {
filter.getProxyAddresses.isEmpty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be assert(filter.getProxyAddresses.isEmpty)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there are two possible outcomes, and both must be considered valid for the assertion.

  1. "unknownhost" does not exist on the network. The logic of AmIpFilter detects this as an UnknownHostException, and the end result is an empty set of viable proxy addresses. AmIpFilter then throws ServletException.

  2. "unknownhost" does exist on the network. It does resolve through DNS. However, there is no routable path to the host. In my case, as mentioned in the original description, that's because I only have an IPv4 address and the "unknownhost" on my network only has an IPv6 address. In this case, no exception is thrown, and the expectation is for getProxyAddresses() to return an empty set.

This logic is perhaps a little unexpected, but I believe the intent of SPARK-48238 was to match the pre-defined behavior of Hadoop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, my point is that unlike GenericTestUtils.waitFor tests the body returning true or false, eventually tests the block whether to throw an exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I understand the feedback now. As it turns out, the result is a non-empty set of addresses, containing the IPv6 address. eventually is allowing this to pass, because any return value is considered successful.

I'll investigate this more and update the patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After another look, this definitely should be wrapped in an assert. I have pushed up the change.

I also changed the invalid "unknownhost" with some additional randomization to increase the likelihood that we are referring to a host that does not exist. For this part of the patch, I am also planning to commit it back to the original Hadoop test code that Spark forked.

Thanks for the great code review, @pan3793 !

filter.getProxyAddresses.isEmpty
} catch {
case e: ServletException => true
Copy link
Member

@pan3793 pan3793 Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case e: ServletException => true
case e: ServletException => // do nothing

Copy link
Contributor Author

@cnauroth cnauroth Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the intent of this proposed change is to succeed specifically if there is a ServletException, but fail for other exception types. I don't think we need code changes to achieve this though. Exceptions different from ServletException propagate from the test and cause a test failure.

To confirm this, I just tried introducing a throw new RuntimeException() in my local copy of AmIpFilter.java. As I expected, the test failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, correct, but true is unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed up changes to incorporate this suggestion. Thank you.

…ome networks

### What changes were proposed in this pull request?

While verifying Spark 4.0.0 RC2, I consistently saw YARN test `AmIpFilterSuite#testProxyUpdate` failing in my environment. The test is written to eventually expect a `ServletException` from `getProxyAddresses` after 5 seconds of retries, but I never received this exception.

This test and the corresponding `AmIpFilter` were introduced in [SPARK-48238](https://issues.apache.org/jira/browse/SPARK-48238) as a fork of the Hadoop implementation to resolve a dependency conflict. However, it seems this test had a small bug in the way it was adapted into the Spark codebase. The `AmIpFilter#getProxyAddresses()` logic may either return an empty set or throw a `ServletException` if it can't find any valid configured proxies. The Hadoop test's assertion allows for either of these conditions:

https://github.com/apache/hadoop/blob/rel/release-3.4.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java#L212-L222

```
    // waiting for configuration update
    GenericTestUtils.waitFor(new Supplier<Boolean>() {
      @OverRide
      public Boolean get() {
        try {
          return filter.getProxyAddresses().isEmpty();
        } catch (ServletException e) {
          return true;
        }
      }
    }, 500, updateInterval);
```

The version in Spark strictly requires an exception to be thrown:

https://github.com/apache/spark/blob/v4.0.0-rc2/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala#L163-L168

```
    // waiting for configuration update
    eventually(timeout(5.seconds), interval(500.millis)) {
      assertThrows[ServletException] {
        filter.getProxyAddresses.isEmpty
      }
    }
```

The test involves updating the proxy configuration to use "unknownhost" as an invalid proxy. In my network, there is actually a host named "unknownhost", but it only has an IPv6 address, and I only have an IPv4 address. This causes a "network unreachable" error instead of "unknown host", resulting in an empty set instead of an exception.

This pull request changes the Spark test to be consistent with the Hadoop test, allowing either condition to succeed.

I have also further randomized the invalid host name to increase likelihood of the test passing in all network environments.

### Why are the changes needed?

Maintain consistency with the intent of the original Hadoop test and ensure it can pass in any network setup.

### Does this PR introduce _any_ user-facing change?

No. The changes are in tests only.

### How was this patch tested?

Existing tests pass in my environment after this change.

### Was this patch authored or co-authored using generative AI tooling?

No.
Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @cnauroth, thank you for fixing it!

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

LuciferYang pushed a commit that referenced this pull request Mar 6, 2025
…ome networks

### What changes were proposed in this pull request?

While verifying Spark 4.0.0 RC2, I consistently saw YARN test `AmIpFilterSuite#testProxyUpdate` failing in my environment. The test is written to eventually expect a `ServletException` from `getProxyAddresses` after 5 seconds of retries, but I never received this exception.

This test and the corresponding `AmIpFilter` were introduced in [SPARK-48238](https://issues.apache.org/jira/browse/SPARK-48238) as a fork of the Hadoop implementation to resolve a dependency conflict. However, it seems this test had a small bug in the way it was adapted into the Spark codebase. The `AmIpFilter#getProxyAddresses()` logic may either return an empty set or throw a `ServletException` if it can't find any valid configured proxies. The Hadoop test's assertion allows for either of these conditions:

https://github.com/apache/hadoop/blob/rel/release-3.4.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java#L212-L222

```
    // waiting for configuration update
    GenericTestUtils.waitFor(new Supplier<Boolean>() {
      Override
      public Boolean get() {
        try {
          return filter.getProxyAddresses().isEmpty();
        } catch (ServletException e) {
          return true;
        }
      }
    }, 500, updateInterval);
```

The version in Spark strictly requires an exception to be thrown:

https://github.com/apache/spark/blob/v4.0.0-rc2/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala#L163-L168

```
    // waiting for configuration update
    eventually(timeout(5.seconds), interval(500.millis)) {
      assertThrows[ServletException] {
        filter.getProxyAddresses.isEmpty
      }
    }
```

The test involves updating the proxy configuration to use "unknownhost" as an invalid proxy. In my network, there is actually a host named "unknownhost", but it only has an IPv6 address, and I only have an IPv4 address. This causes a "network unreachable" error instead of "unknown host", resulting in an empty set instead of an exception.

This pull request changes the Spark test to be consistent with the Hadoop test, allowing either condition to succeed.

### Why are the changes needed?

Maintain consistency with the intent of the original Hadoop test and ensure it can pass in any network setup.

### Does this PR introduce _any_ user-facing change?

No. The changes are in tests only.

### How was this patch tested?

Existing tests pass in my environment after this change.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50173 from cnauroth/SPARK-51408.

Authored-by: Chris Nauroth <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
(cherry picked from commit 69d8ece)
Signed-off-by: yangjie01 <[email protected]>
@LuciferYang
Copy link
Contributor

Merged into master and branch-4.0. Thanks @cnauroth @pan3793 and @dongjoon-hyun

Pajaraja pushed a commit to Pajaraja/spark that referenced this pull request Mar 6, 2025
…ome networks

### What changes were proposed in this pull request?

While verifying Spark 4.0.0 RC2, I consistently saw YARN test `AmIpFilterSuite#testProxyUpdate` failing in my environment. The test is written to eventually expect a `ServletException` from `getProxyAddresses` after 5 seconds of retries, but I never received this exception.

This test and the corresponding `AmIpFilter` were introduced in [SPARK-48238](https://issues.apache.org/jira/browse/SPARK-48238) as a fork of the Hadoop implementation to resolve a dependency conflict. However, it seems this test had a small bug in the way it was adapted into the Spark codebase. The `AmIpFilter#getProxyAddresses()` logic may either return an empty set or throw a `ServletException` if it can't find any valid configured proxies. The Hadoop test's assertion allows for either of these conditions:

https://github.com/apache/hadoop/blob/rel/release-3.4.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java#L212-L222

```
    // waiting for configuration update
    GenericTestUtils.waitFor(new Supplier<Boolean>() {
      Override
      public Boolean get() {
        try {
          return filter.getProxyAddresses().isEmpty();
        } catch (ServletException e) {
          return true;
        }
      }
    }, 500, updateInterval);
```

The version in Spark strictly requires an exception to be thrown:

https://github.com/apache/spark/blob/v4.0.0-rc2/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala#L163-L168

```
    // waiting for configuration update
    eventually(timeout(5.seconds), interval(500.millis)) {
      assertThrows[ServletException] {
        filter.getProxyAddresses.isEmpty
      }
    }
```

The test involves updating the proxy configuration to use "unknownhost" as an invalid proxy. In my network, there is actually a host named "unknownhost", but it only has an IPv6 address, and I only have an IPv4 address. This causes a "network unreachable" error instead of "unknown host", resulting in an empty set instead of an exception.

This pull request changes the Spark test to be consistent with the Hadoop test, allowing either condition to succeed.

### Why are the changes needed?

Maintain consistency with the intent of the original Hadoop test and ensure it can pass in any network setup.

### Does this PR introduce _any_ user-facing change?

No. The changes are in tests only.

### How was this patch tested?

Existing tests pass in my environment after this change.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50173 from cnauroth/SPARK-51408.

Authored-by: Chris Nauroth <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
@cnauroth
Copy link
Contributor Author

cnauroth commented Mar 6, 2025

Thank you, @LuciferYang !

@cnauroth
Copy link
Contributor Author

cnauroth commented Mar 6, 2025

FYI, apache/hadoop#7478 has a follow-up to keep in sync with the changes here to randomize the fake host name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants