调用/api/pronunciation/assess 的评估
音频文件是通过 Kokoro 82m生成的 评分的效果太差了。对此有什么改进的建议,尤其是针对多读 和漏读的判断对齐是否有别的方案
{
"overallScore": 47.1,
"accuracyScore": 45.8,
"fluencyScore": 64,
"completenessScore": 25,
"duration": 1.1,
"wordCount": 1,
"phonemeCount": 4,
"words": [
{
"word": "hello",
"score": 45.8,
"confidence": 0.4,
"startTime": 0.3,
"endTime": 1.1,
"duration": 0.8,
"errorType": "Mispronunciation",
"phonemes": [
{
"phoneme": "HH",
"score": 88.1,
"confidence": 0.8,
"startTime": 0.3,
"endTime": 0.5,
"duration": 0.2,
"gopScore": 1.4,
"targetProb": 0.6,
"confusionProb": 0.1,
"errorType": "None",
"nbestPhonemes": [
{
"phoneme": "HH",
"score": 100
},
{
"phoneme": "T",
"score": 35.4
},
{
"phoneme": "K",
"score": 23.4
},
{
"phoneme": "P",
"score": 19.3
},
{
"phoneme": "AE",
"score": 7.8
}
]
},
{
"phoneme": "AH0",
"score": 38.2,
"confidence": 0.3,
"startTime": 0.5,
"endTime": 0.5,
"duration": 0.1,
"gopScore": -0.7,
"targetProb": 0.2,
"confusionProb": 0.4,
"errorType": "Mispronunciation",
"nbestPhonemes": [
{
"phoneme": "AW",
"score": 89.9
},
{
"phoneme": "AH",
"score": 45.2
},
{
"phoneme": "AA",
"score": 31.9
},
{
"phoneme": "AE",
"score": 18.7
},
{
"phoneme": "OW",
"score": 14.6
}
]
},
{
"phoneme": "L",
"score": 17.1,
"confidence": 0.2,
"startTime": 0.5,
"endTime": 0.7,
"duration": 0.2,
"gopScore": -1.7,
"targetProb": 0.1,
"confusionProb": 0.4,
"errorType": "Mispronunciation",
"nbestPhonemes": [
{
"phoneme": "W",
"score": 100
},
{
"phoneme": "OW",
"score": 59.3
},
{
"phoneme": "L",
"score": 19.5
},
{
"phoneme": "AH",
"score": 11.3
},
{
"phoneme": "N",
"score": 10.4
}
]
},
{
"phoneme": "OW1",
"score": 39.7,
"confidence": 0.4,
"startTime": 0.7,
"endTime": 1.1,
"duration": 0.4,
"gopScore": -0.4,
"targetProb": 0.2,
"confusionProb": 0.2,
"errorType": "Mispronunciation",
"nbestPhonemes": [
{
"phoneme": "AH",
"score": 61.4
},
{
"phoneme": "OW",
"score": 41.4
},
{
"phoneme": "L",
"score": 27.1
},
{
"phoneme": "ER",
"score": 13.6
},
{
"phoneme": "AA",
"score": 12.4
}
]
}
]
}
]
}
我使用仓库中的train 最终的训练结果是
训练 Epoch 36/40: 100%|########| 40/40 [00:10<00:00, 3.78it/s, loss=1.7837, acc=45.9%, batch=40/40]
训练损失: 1.7121
训练准确率: 45.91%
验证 Epoch 36/40: 100%|#############################################| 40/40 [00:09<00:00, 4.28it/s]
验证损失: 1.9522
验证准确率: 39.94%
难音素准确率:
⚠️ 验证损失未改善 (5/5)
L : 18.4% (3428/18658)
R : 23.9% (3875/16223)
OW : 31.3% (4741/15167)
AW : 11.9% (843/7076)
TH : 0.1% (8/5615)
DH : 29.8% (2794/9365)
V : 15.1% (1130/7492)
W : 60.2% (8848/14700)
要达到多少的准度才能达到算法的要求呢
调用/api/pronunciation/assess 的评估
音频文件是通过 Kokoro 82m生成的 评分的效果太差了。对此有什么改进的建议,尤其是针对多读 和漏读的判断对齐是否有别的方案
{
"overallScore": 47.1,
"accuracyScore": 45.8,
"fluencyScore": 64,
"completenessScore": 25,
"duration": 1.1,
"wordCount": 1,
"phonemeCount": 4,
"words": [
{
"word": "hello",
"score": 45.8,
"confidence": 0.4,
"startTime": 0.3,
"endTime": 1.1,
"duration": 0.8,
"errorType": "Mispronunciation",
"phonemes": [
{
"phoneme": "HH",
"score": 88.1,
"confidence": 0.8,
"startTime": 0.3,
"endTime": 0.5,
"duration": 0.2,
"gopScore": 1.4,
"targetProb": 0.6,
"confusionProb": 0.1,
"errorType": "None",
"nbestPhonemes": [
{
"phoneme": "HH",
"score": 100
},
{
"phoneme": "T",
"score": 35.4
},
{
"phoneme": "K",
"score": 23.4
},
{
"phoneme": "P",
"score": 19.3
},
{
"phoneme": "AE",
"score": 7.8
}
]
},
{
"phoneme": "AH0",
"score": 38.2,
"confidence": 0.3,
"startTime": 0.5,
"endTime": 0.5,
"duration": 0.1,
"gopScore": -0.7,
"targetProb": 0.2,
"confusionProb": 0.4,
"errorType": "Mispronunciation",
"nbestPhonemes": [
{
"phoneme": "AW",
"score": 89.9
},
{
"phoneme": "AH",
"score": 45.2
},
{
"phoneme": "AA",
"score": 31.9
},
{
"phoneme": "AE",
"score": 18.7
},
{
"phoneme": "OW",
"score": 14.6
}
]
},
{
"phoneme": "L",
"score": 17.1,
"confidence": 0.2,
"startTime": 0.5,
"endTime": 0.7,
"duration": 0.2,
"gopScore": -1.7,
"targetProb": 0.1,
"confusionProb": 0.4,
"errorType": "Mispronunciation",
"nbestPhonemes": [
{
"phoneme": "W",
"score": 100
},
{
"phoneme": "OW",
"score": 59.3
},
{
"phoneme": "L",
"score": 19.5
},
{
"phoneme": "AH",
"score": 11.3
},
{
"phoneme": "N",
"score": 10.4
}
]
},
{
"phoneme": "OW1",
"score": 39.7,
"confidence": 0.4,
"startTime": 0.7,
"endTime": 1.1,
"duration": 0.4,
"gopScore": -0.4,
"targetProb": 0.2,
"confusionProb": 0.2,
"errorType": "Mispronunciation",
"nbestPhonemes": [
{
"phoneme": "AH",
"score": 61.4
},
{
"phoneme": "OW",
"score": 41.4
},
{
"phoneme": "L",
"score": 27.1
},
{
"phoneme": "ER",
"score": 13.6
},
{
"phoneme": "AA",
"score": 12.4
}
]
}
]
}
]
}
对齐实际过于耗时
16:16:04,549 - 开始MFA对齐
16:17:22,431 - MFA对齐执行成功(耗时:77.88秒)
16:17:22,442 - 开始音素评分
16:17:22,473 - 评分完成(耗时:0.03秒)
总耗时:79.42秒
最后感谢的分析 提供了我一些思路