Description
regarding this:
https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/data/get_uniform_subsegments.py
- the description for "max-remaining-duration" says:
parser.add_argument("--max-remaining-duration", type=float,
default=10, help="""Segment is not split
if the left-over duration is more than this
many seconds""")
shouldn't that be "less than"?
-
it mentions "constant-duration" overrides "max-remaining-duration"
parser.add_argument("--constant-duration", type=bool,
default=False, help="""Final segment is given
a start time max-segment-duration before the
end to force a constant segment duration. This
overrides the max-remaining-duration parameter""")but the code is conditional, not always gives a contant duration.
if (dur < args.max_remaining_duration):
start = max(end_time - args.max_segment_duration, start_time)
It looks to me like line 96 is likely a bug, i.e. should say args.max_segment_duration and not args.max_remaining_duration.
Then it would be more consistent with the usage message. And yes, it looks to me like max_remaining_duration is really functioning as a min_remaining_duration. I suppose the usage message could be clarified, but changing the variable name would require changing all calling code (plus others may have recipes that copy existing code that are not in the Kaldi repo, which we'd break). You could perhaps make a PR with the fix(es).