Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "vo_gpu: remove --scaler-lut-size" #13291

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hegdenischay
Copy link

This reverts commit 44cf628.

Fixes issue: #13273
scaler-lut-size=8 causes a performance regression on my machine, therefore it'd be a good idea to keep the default at 6 and add it to profile=high-quality

@hegdenischay hegdenischay marked this pull request as ready for review January 16, 2024 00:14
Copy link

Download the artifacts for this pull request:

Windows

@llyyr
Copy link
Contributor

llyyr commented Jan 16, 2024

The default should probably be 2^8 for the baseline and set to 2^6 for profile=fast instead. Based on some benchmarks done on an 11th gen Intel NUC iirc and my own 13th gen iGPU, the difference was insignificant which is why it was hardcoded to 8 in the first place. It's possible though that this matters for ancient iGPUs.

@sfan5
Copy link
Member

sfan5 commented Jan 17, 2024

It'd help if you described what "your machine" looks like.

@llyyr
Copy link
Contributor

llyyr commented Jan 17, 2024

It'd help if you described what "your machine" looks like.

From the linked issue:

GPU: Intel HD Graphics 4400 (Intel i5-4300U CPU)

@sfan5
Copy link
Member

sfan5 commented Jan 18, 2024

That's not totally ancient so I think bringing back the option makes sense.

@sfan5 sfan5 requested a review from haasn January 18, 2024 11:14
@haasn
Copy link
Member

haasn commented Jan 18, 2024

The default should probably be 2^8 for the baseline and set to 2^6 for profile=fast instead. Based on some benchmarks done on an 11th gen Intel NUC iirc and my own 13th gen iGPU, the difference was insignificant which is why it was hardcoded to 8 in the first place. It's possible though that this matters for ancient iGPUs.

+1

@kasper93
Copy link
Contributor

kasper93 commented Jan 18, 2024

and set to 2^6 for profile=fast instead

This doesn't make sense. profile=fast does not use any luts in part to avoid any ancient iGPU slowness. It is meant to be "dumb" by design. It uses direct sampling.

@llyyr
Copy link
Contributor

llyyr commented Jan 18, 2024

This doesn't make sense. profile=fast does not use any luts in part to avoid any ancient iGPU slowness. It is meant to be "dumb" by design. It uses direct sampling.

Yeah, sorry I forgot about that. Just bringing it back and defaulting to 8 is good enough then.

@kasper93
Copy link
Contributor

kasper93 commented Jan 18, 2024

keep the default at 6 and add it to profile=high-quality
but it only happens with scale=ewa_lanczossharp. cscale=ewa_lanczossharp

Since we are talking about polar scalers here, I wonder if it is related to https://code.videolan.org/videolan/libplacebo/-/issues/310 on older Intel iGPU noncompute version can be over 3 times as fast, and on your gpu the difference may be even higher.

@hegdenischay: would you mind doing some benchmarks, just so we have whole picture?

@sfan5
Copy link
Member

sfan5 commented Jan 18, 2024

his doesn't make sense. profile=fast does not use any luts in part to avoid any ancient iGPU slowness

It's not far-fetched that someone would start with profile=fast and enable more demanding options as needed. I think it still makes sense.

@kasper93
Copy link
Contributor

It's not far-fetched that someone would start with profile=fast and enable more demanding options as needed. I think it still makes sense.

It is not, but in the same time you can say the same about many other options and adding dummy options to profile only bloat the profile itself. Also this is related to some over decade old gpu.

@hegdenischay
Copy link
Author

Hi, sorry for not commenting earlier. Was kinda swamped at work. I'm mainly interested in bringing it back as an option, since I currently can't use this scaler without messing up the video. The part about adding it to profile=high-quality is just me being lazy with the commit.

I don't mind running benchmarks.

@hegdenischay
Copy link
Author

The relevant parts (vulkaninfo says my GPU is a HSW GT2):

'polar':	 43 frames in 1.383055 seconds => 32.164070 ms/frame (31.09 FPS), gpu time: 23.259074 ms
'polar_nocompute':	 33 frames in 1.520992 seconds => 46.090654 ms/frame (21.70 FPS), gpu time: 30.763520 ms

I have also attached a full log:
testlog.txt

I'm not sure why polar_nocompute is slower, essentially the opposite of what was seen in that issue.

@kasper93
Copy link
Contributor

Interesting, makes it little bit more complex to handle if we were to "smartly" enable compute shader.

Just out of curiosity, what is the performance on this benchmark when you reduce the size. (can also try 128 for completeness)

diff --git a/src/shaders/sampling.c b/src/shaders/sampling.c
index fc10f80b..e754ea00 100644
--- a/src/shaders/sampling.c
+++ b/src/shaders/sampling.c
@@ -559,7 +559,7 @@ struct sh_sampler_obj {
     pl_shader_obj pass2; // for pl_shader_sample_ortho
 };
 
-#define SCALER_LUT_SIZE     256
+#define SCALER_LUT_SIZE     64
 #define SCALER_LUT_CUTOFF   1e-3f
 
 static void sh_sampler_uninit(pl_gpu gpu, void *ptr)

@hegdenischay
Copy link
Author

with SCALER_LUT_SIZE = 64:

'polar':	 42 frames in 1.383859 seconds => 32.949020 ms/frame (30.35 FPS), gpu time: 23.302960 ms
'polar_nocompute':	 34 frames in 1.513501 seconds => 44.514732 ms/frame (22.46 FPS), gpu time: 30.203909 ms

with SCALER_LUT_SIZE = 128:

'polar':	 43 frames in 1.400678 seconds => 32.573914 ms/frame (30.70 FPS), gpu time: 23.413177 ms
'polar_nocompute':	 34 frames in 1.477449 seconds => 43.454386 ms/frame (23.01 FPS), gpu time: 29.327269 ms

Logs:
testlog-lut-size-128.txt
testlog-lut-size-64.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants