-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower _conj_copy
and alias
operation.
#8686
Conversation
Update: I'm currently investigating this odd CI failure when functionalization is disabled.
|
75f6b7d
to
4488c47
Compare
Update: I have found the solution to the previous error. This PR should pass all tests, now
|
4488c47
to
4c61272
Compare
Update: tests are all passing, now @tengyifei @pgmoka @lsy323 Could you review it whenever you have some time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay on this review. I mainly have questions around tests.
test/test_operations.py
Outdated
@@ -2384,6 +2384,19 @@ def test_cummax_0_sized_dimension(self): | |||
|
|||
self.assertEqual(actual, expected) | |||
|
|||
def test_conj_no_fallback(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this test is checking two behaviors:
- Is the lowered operation being used?
- Is the value of the lowered operation what is expected?
I think these two should be two separate tests for ease of debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of adding an error message instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think two different tests is preferable. It is easier to debug unit tests when each case individually encompasses a singular behavior.
With #8725 as a future bug, I think it also makes sense to keep these tests separate as the operation lowering behavior tests might be refactored in the future. Having this be a separate test will provide for an easier example case to base the refactor on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 2 reasons to prefer only 1 test:
- Both tests would have almost the same code, except for the
self.assertEquals(...)
. Duplicating it would make it harder to maintain. - Although this test also tests for the actual output, it does so just to make sure the results are equal. The main thing being tested here is whether the lowerings are actually used. The tests for the actual operation can be found in test_ops.py.
Let me know if you still think I should split this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Both tests would have almost the same code, except for the self.assertEquals(...). Duplicating it would make it harder to maintain.
I wouldn't be too worried about the code duplication here as we are looking at a small amount of code, and the meat of the code is in a helper method which could easily be removed from the method.
Although this test also tests for the actual output, it does so just to make sure the results are equal. The main thing being tested here is whether the lowerings are actually used. The tests for the actual operation can be found in test_ops.py.
I think this is a good point, and a reasonable justification for not splitting the tests. Since we already have a test, there is no reason to duplicate.
Overall, I think given that justification, adding error message here should be sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a test file for this module. Should we create a bug for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. But, in the end, I think this would already get tested in the Python operations above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my opinion here would be based on test coverage. Some of these operations I think seem complex enough to warrant a small unit test. If we did something like paramatized tests could be a clean way to add tests that made additional tests easier to implement.
_conj_copy
operation._conj_copy
and alias
operation.
This TPU CI failure looks unrelated to me. Let me undo my changes and see if the issue persists. |
torch_xla/csrc/tensor_methods.cpp
Outdated
@@ -1229,8 +1229,12 @@ XLATensorPtr clamp(const XLATensorPtr& input, | |||
Clamp(input->GetIrValue(), min_max.min, min_max.max)); | |||
} | |||
|
|||
XLATensorPtr clone(const XLATensorPtr& input) { | |||
XLATensorPtr cloned = input->CreateFrom(input->GetIrValue()); | |||
XLATensorPtr clone(const XLATensorPtr& input, bool is_conj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the clone method implementation depend on conj? This seems like a layering violation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that's where conjugate is supposed to be materialized. PyTorch's conjugate is implemented by a dispatch-key, making it lazy. It's only reflected to the actual data on clone()
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Could we factor the ir = torch::lazy::Value(torch_xla::MakeNode<ConjCopy>(ir));
into a separate conj_copy
function instead of adding a is_conj
flag to Clone
? I understand the need to match PyTorch semantics, but it would still be good to avoid mixing concerns.
I left some questions. Apologies if they don't make sense. I'm just trying to understand this PR. |
@tengyifei Thank you for your review. In fact, you saw lots of comments because I was trying to figure out whether the TPU CI failure was this PR's fault or not. I will ping you for another review once I have that figured out. |
d1025b0
to
67c574d
Compare
Ok. This flakiness issue was kind of frustrating. TPU CI seems to be okay with my changes, now. |
I will rebase these changes and wait for CI once more. |
67c574d
to
916112d
Compare
I'm pretty sure this error is unrelated to this PR. |
that's my bad: #8747 |
No problem. But I think that this only highlights more #8745. |
916112d
to
0adc64f
Compare
CI is green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM minus final comments. Once addressed, feel free to @ me again or CC me for a final look. I would also be curious for an LGTM for one of the other folks in the PR.
@tengyifei @pgmoka If you have no more change requests/observations, could you stamp this CI, please? |
torch_xla/csrc/tensor_methods.cpp
Outdated
@@ -1229,8 +1229,12 @@ XLATensorPtr clamp(const XLATensorPtr& input, | |||
Clamp(input->GetIrValue(), min_max.min, min_max.max)); | |||
} | |||
|
|||
XLATensorPtr clone(const XLATensorPtr& input) { | |||
XLATensorPtr cloned = input->CreateFrom(input->GetIrValue()); | |||
XLATensorPtr clone(const XLATensorPtr& input, bool is_conj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Could we factor the ir = torch::lazy::Value(torch_xla::MakeNode<ConjCopy>(ir));
into a separate conj_copy
function instead of adding a is_conj
flag to Clone
? I understand the need to match PyTorch semantics, but it would still be good to avoid mixing concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please address other comments before submitting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry we have some flaky tests again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
Fix: #3070
This PR adds a lowering for
_conj_copy
. This operation is called bytorch.conj
, and was being executed using the fallback path. With this PR,torch.conj
and its decomposed functions do not fallback.