The issues that occurred in this project in Chinese and Japanese environments

I used the English test audio for this project, and it looks exactly as expected

[english.txt](https://github.com/user-attachments/files/23561402/english.txt)

But when using Chinese and Japanese speech, punctuation marks were completely removed, and spaces were added between each character, with the speech of multiple speakers all crammed together

[chinese.txt](https://github.com/user-attachments/files/23561410/chinese.txt)
[english.txt](https://github.com/user-attachments/files/23561411/english.txt)

The ctc-forced-aligner and romanize in the project seem to be having issues, and the nemo framework is also unable to correctly distinguish the speaker
Does the nemo framework use of titenet_large and other pre-trained models only support English?？
I want to know how to solve this tricky problem
I am using machine translation, so please bear with any misunderstandings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The issues that occurred in this project in Chinese and Japanese environments #348

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The issues that occurred in this project in Chinese and Japanese environments #348

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions