Skip to content

测评接口评分不准确 #1

@mywsat

Description

@mywsat
  1. 我使用仓库中的train 最终的训练结果是
    训练 Epoch 36/40: 100%|########| 40/40 [00:10<00:00, 3.78it/s, loss=1.7837, acc=45.9%, batch=40/40]
    训练损失: 1.7121
    训练准确率: 45.91%
    验证 Epoch 36/40: 100%|#############################################| 40/40 [00:09<00:00, 4.28it/s]
    验证损失: 1.9522
    验证准确率: 39.94%

    难音素准确率:
    L : 18.4% (3428/18658)
    R : 23.9% (3875/16223)
    OW : 31.3% (4741/15167)
    AW : 11.9% (843/7076)
    TH : 0.1% (8/5615)
    DH : 29.8% (2794/9365)
    V : 15.1% (1130/7492)
    W : 60.2% (8848/14700)
    ⚠️ 验证损失未改善 (5/5)
    要达到多少的准度才能达到算法的要求呢

  2. 调用/api/pronunciation/assess 的评估
    音频文件是通过 Kokoro 82m生成的 评分的效果太差了。对此有什么改进的建议,尤其是针对多读 和漏读的判断对齐是否有别的方案
    {
    "overallScore": 47.1,
    "accuracyScore": 45.8,
    "fluencyScore": 64,
    "completenessScore": 25,
    "duration": 1.1,
    "wordCount": 1,
    "phonemeCount": 4,
    "words": [
    {
    "word": "hello",
    "score": 45.8,
    "confidence": 0.4,
    "startTime": 0.3,
    "endTime": 1.1,
    "duration": 0.8,
    "errorType": "Mispronunciation",
    "phonemes": [
    {
    "phoneme": "HH",
    "score": 88.1,
    "confidence": 0.8,
    "startTime": 0.3,
    "endTime": 0.5,
    "duration": 0.2,
    "gopScore": 1.4,
    "targetProb": 0.6,
    "confusionProb": 0.1,
    "errorType": "None",
    "nbestPhonemes": [
    {
    "phoneme": "HH",
    "score": 100
    },
    {
    "phoneme": "T",
    "score": 35.4
    },
    {
    "phoneme": "K",
    "score": 23.4
    },
    {
    "phoneme": "P",
    "score": 19.3
    },
    {
    "phoneme": "AE",
    "score": 7.8
    }
    ]
    },
    {
    "phoneme": "AH0",
    "score": 38.2,
    "confidence": 0.3,
    "startTime": 0.5,
    "endTime": 0.5,
    "duration": 0.1,
    "gopScore": -0.7,
    "targetProb": 0.2,
    "confusionProb": 0.4,
    "errorType": "Mispronunciation",
    "nbestPhonemes": [
    {
    "phoneme": "AW",
    "score": 89.9
    },
    {
    "phoneme": "AH",
    "score": 45.2
    },
    {
    "phoneme": "AA",
    "score": 31.9
    },
    {
    "phoneme": "AE",
    "score": 18.7
    },
    {
    "phoneme": "OW",
    "score": 14.6
    }
    ]
    },
    {
    "phoneme": "L",
    "score": 17.1,
    "confidence": 0.2,
    "startTime": 0.5,
    "endTime": 0.7,
    "duration": 0.2,
    "gopScore": -1.7,
    "targetProb": 0.1,
    "confusionProb": 0.4,
    "errorType": "Mispronunciation",
    "nbestPhonemes": [
    {
    "phoneme": "W",
    "score": 100
    },
    {
    "phoneme": "OW",
    "score": 59.3
    },
    {
    "phoneme": "L",
    "score": 19.5
    },
    {
    "phoneme": "AH",
    "score": 11.3
    },
    {
    "phoneme": "N",
    "score": 10.4
    }
    ]
    },
    {
    "phoneme": "OW1",
    "score": 39.7,
    "confidence": 0.4,
    "startTime": 0.7,
    "endTime": 1.1,
    "duration": 0.4,
    "gopScore": -0.4,
    "targetProb": 0.2,
    "confusionProb": 0.2,
    "errorType": "Mispronunciation",
    "nbestPhonemes": [
    {
    "phoneme": "AH",
    "score": 61.4
    },
    {
    "phoneme": "OW",
    "score": 41.4
    },
    {
    "phoneme": "L",
    "score": 27.1
    },
    {
    "phoneme": "ER",
    "score": 13.6
    },
    {
    "phoneme": "AA",
    "score": 12.4
    }
    ]
    }
    ]
    }
    ]
    }

  3. 对齐实际过于耗时

  4. 16:16:04,549 - 开始MFA对齐
    16:17:22,431 - MFA对齐执行成功(耗时:77.88秒)
    16:17:22,442 - 开始音素评分
    16:17:22,473 - 评分完成(耗时:0.03秒)
    总耗时:79.42秒
    最后感谢的分析 提供了我一些思路

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions