Skip to content

Errors in data/get_uniform_subsegments.py #4652

Open
@kkm000

Description

@kkm000

From kaldi-help

regarding this:
https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/data/get_uniform_subsegments.py

  1. the description for "max-remaining-duration" says:
parser.add_argument("--max-remaining-duration", type=float,
default=10, help="""Segment is not split
if the left-over duration is more than this
many seconds""")

shouldn't that be "less than"?

  1. it mentions "constant-duration" overrides "max-remaining-duration"
    parser.add_argument("--constant-duration", type=bool,
    default=False, help="""Final segment is given
    a start time max-segment-duration before the
    end to force a constant segment duration. This
    overrides the max-remaining-duration parameter""")

    but the code is conditional, not always gives a contant duration.

if (dur < args.max_remaining_duration):
start = max(end_time - args.max_segment_duration, start_time)

@danpovey:

It looks to me like line 96 is likely a bug, i.e. should say args.max_segment_duration and not args.max_remaining_duration.
Then it would be more consistent with the usage message. And yes, it looks to me like max_remaining_duration is really functioning as a min_remaining_duration. I suppose the usage message could be clarified, but changing the variable name would require changing all calling code (plus others may have recipes that copy existing code that are not in the Kaldi repo, which we'd break). You could perhaps make a PR with the fix(es).

Metadata

Metadata

Assignees

Labels

bugstaleStale bot on the loosewaiting-for-feedbackReporter's feedback has been requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions