Documentation states that loudness_zwst_perseg takes a parameter noverlap that represents "Number of points to overlap between segments.", however in reality it represent hop_size. Example: loudness_zwst_perseg with nperseg = 4096 and noverlap = 3072 for audio of 94 seconds and sr 48k outputs 1474 frames. Time axis shows that duration of each frame is ~7miliseconds
Thus, its hop_size = time_step × sr or 0.064 × 48000 = 3072 samples and not nperseg-noverlap