-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[gpu]: ISTFT basic impl. #29030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[gpu]: ISTFT basic impl. #29030
Conversation
@@ -54,6 +54,7 @@ class OPENVINO_API ISTFT : public Op { | |||
void set_center(const bool center); | |||
|
|||
bool get_normalized() const; | |||
void set_normalized(const bool normalized); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed.
Is not required right, shape inference not depends on this attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is needed to properly construct op on gpu side(ctor params are not supported in gpu plugin... )..
Otherwise you will construct partially undefined op which will work at this point, but only because you and me know that shape inference currently does not depend on that state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, for core part
@wilson-seok please review, or assign someone to review |
@@ -0,0 +1,107 @@ | |||
// Copyright (C) 2018-2025 Intel Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a new(and better) infra for adding kernel: ocl_v2. Could you implement kernel on that, instead of ocl?
Here's an example of migration: #29901
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I can do it in separate PR if really needed - this one has already passed all the tests(which took a week due to unrelated CI failures...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm voting for this approach
stream& stream = instance.get_network().get_stream(); | ||
// This is needed to clear the output memory before executing the kernel for static shapes model. | ||
// Ref kernel assumes that output memory is already cleared. | ||
instance.output_memory(0).fill(stream, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For proper synchronization this call should either be blocking, or result event should be added to kernel dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the kernel may not run on the same stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they are guaranteed to be executed on the same stream, but depending on the queue type there might be a problem:
- in case of
in_order
queue all submitted tasks (memory fill and kernel enqueue) are executed in order they are enqueued - however, in case of
out_of_order
queue driver considers dependencies between the tasks to execute them. If there are no proper dependencies (using barriers or events), the driver doesn't guarantee that memory will be reset before the kernel launches in this case - as kernel execution might start in parallel or even before memory reset in theory
Additionally, considering that memory can be reused between primitives, there might be a situation where we reset the memory of some predecessor primitive right during its execution. Therefore, for out_of_order queue some additional barrier is required before this fill() call as well
if (inputs.size() == 4) { | ||
auto prim = cldnn::ISTFT(layer_type_name_ID(op), inputs[0], inputs[1], inputs[2], inputs[3], op->get_center(), op->get_normalized()); | ||
p.add_primitive(*op, prim); | ||
} else { | ||
auto prim = cldnn::ISTFT(layer_type_name_ID(op), inputs[0], inputs[1], inputs[2], inputs[3], inputs[4], op->get_center(), op->get_normalized()); | ||
p.add_primitive(*op, prim); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not actually needed to enumerate all the inputs, we can just slightly update ISTFT like this:
ISTFT(const primitive_id& id,
const std::vector<input_info>& inputs,
const bool center,
const bool normalized)
: primitive_base(id, inputs),
center(center),
normalized(normalized) {}
And simplify primitive creation:
if (inputs.size() == 4) { | |
auto prim = cldnn::ISTFT(layer_type_name_ID(op), inputs[0], inputs[1], inputs[2], inputs[3], op->get_center(), op->get_normalized()); | |
p.add_primitive(*op, prim); | |
} else { | |
auto prim = cldnn::ISTFT(layer_type_name_ID(op), inputs[0], inputs[1], inputs[2], inputs[3], inputs[4], op->get_center(), op->get_normalized()); | |
p.add_primitive(*op, prim); | |
} | |
auto prim = cldnn::ISTFT(layer_type_name_ID(op), inputs, op->get_center(), op->get_normalized()); | |
p.add_primitive(*op, prim); |
@@ -0,0 +1,27 @@ | |||
// Copyright (C) 2018-2025 Intel Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// Copyright (C) 2018-2025 Intel Corporation | |
// Copyright (C) 2025 Intel Corporation |
Basic implementation of ISTFT for GPU.
Details:
Tickets: