Skip to content

Commit db1971b

Browse files
iftakenyt605155624
authored andcommitted
update readme and add aistudio demo, test=doc (#2270)
1 parent 9203007 commit db1971b

File tree

2 files changed

+162
-29
lines changed

2 files changed

+162
-29
lines changed

README.md

Lines changed: 158 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -180,69 +180,200 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
180180
## Installation
181181

182182
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7* and *paddlepaddle>=2.3.1*.
183-
Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).
183+
184+
### **Dependency Introduction**
185+
186+
+ gcc >= 4.8.5
187+
+ paddlepaddle >= 2.3.1
188+
+ python >= 3.7
189+
+ OS support: Linux(recommend), Windows, Mac OSX
190+
191+
PaddleSpeech depends on paddlepaddle. For installation, please refer to the official website of [paddlepaddle](https://www.paddlepaddle.org.cn/en) and choose according to your own machine. Here is an example of the cpu version.
192+
193+
```bash
194+
pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
195+
```
196+
197+
There are two quick installation methods for PaddleSpeech, one is pip installation, and the other is source code compilation (recommended).
198+
### pip install
199+
200+
```shell
201+
pip install pytest-runner
202+
pip install paddlespeech
203+
```
204+
205+
### source code compilation
206+
207+
```shell
208+
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
209+
cd PaddleSpeech
210+
pip install pytest-runner
211+
pip install .
212+
```
213+
214+
For more installation problems, such as conda environment, librosa-dependent, gcc problems, kaldi installation, etc., you can refer to this [installation document](./docs/source/install.md). If you encounter problems during installation, you can leave a message on [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) and find related problems
184215

185216

186217
<a name="quickstart"></a>
187218
## Quick Start
188219

189-
Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text.
220+
Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md) or Python. Change `--input` to test your own audio/text and support 16k wav format audio.
221+
222+
**You can also quickly experience it in AI Studio 👉🏻 [PaddleSpeech API Demo](https://aistudio.baidu.com/aistudio/projectdetail/4353348?sUid=2470186&shared=1&ts=1660876445786)**
223+
224+
225+
Test audio sample download
190226

191-
**Audio Classification**
192227
```shell
193-
paddlespeech cls --input input.wav
228+
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
229+
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
194230
```
195231

196-
**Speaker Verification**
232+
### Automatic Speech Recognition
233+
234+
<details><summary>&emsp;(Click to expand)Open Source Speech Recognition</summary>
235+
236+
**command line experience**
237+
238+
```shell
239+
paddlespeech asr --lang zh --input zh.wav
197240
```
198-
paddlespeech vector --task spk --input input_16k.wav
241+
242+
**Python API experience**
243+
244+
```python
245+
>>> from paddlespeech.cli.asr.infer import ASRExecutor
246+
>>> asr = ASRExecutor()
247+
>>> result = asr(audio_file="zh.wav")
248+
>>> print(result)
249+
我认为跑步最重要的就是给我带来了身体健康
199250
```
251+
</details>
252+
253+
### Text-to-Speech
254+
255+
<details><summary>&emsp;Open Source Speech Synthesis</summary>
256+
257+
Output 24k sample rate wav format audio
258+
259+
260+
**command line experience**
200261

201-
**Automatic Speech Recognition**
202262
```shell
203-
paddlespeech asr --lang zh --input input_16k.wav
263+
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav
204264
```
205-
- web demo for Automatic Speech Recognition is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [ASR Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR)
206265

207-
**Speech Translation** (English to Chinese)
208-
(not support for Mac and Windows now)
266+
**Python API experience**
267+
268+
```python
269+
>>> from paddlespeech.cli.tts.infer import TTSExecutor
270+
>>> tts = TTSExecutor()
271+
>>> tts(text="今天天气十分不错。", output="output.wav")
272+
```
273+
- You can experience in [Huggingface Spaces](https://huggingface.co/spaces) [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)
274+
275+
</details>
276+
277+
### Audio Classification
278+
279+
<details><summary>&emsp;An open-domain sound classification tool</summary>
280+
281+
Sound classification model based on 527 categories of AudioSet dataset
282+
283+
**command line experience**
284+
209285
```shell
210-
paddlespeech st --input input_16k.wav
286+
paddlespeech cls --input zh.wav
211287
```
212288

213-
**Text-to-Speech**
289+
**Python API experience**
290+
291+
```python
292+
>>> from paddlespeech.cli.cls.infer import CLSExecutor
293+
>>> cls = CLSExecutor()
294+
>>> result = cls(audio_file="zh.wav")
295+
>>> print(result)
296+
Speech 0.9027186632156372
297+
```
298+
299+
</details>
300+
301+
### Voiceprint Extraction
302+
303+
<details><summary>&emsp;Industrial-grade voiceprint extraction tool</summary>
304+
305+
**command line experience**
306+
214307
```shell
215-
paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --output output.wav
308+
paddlespeech vector --task spk --input zh.wav
216309
```
217-
- web demo for Text to Speech is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)
218310

219-
**Text Postprocessing**
220-
- Punctuation Restoration
221-
```bash
222-
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
223-
```
311+
**Python API experience**
224312

225-
**Batch Process**
313+
```python
314+
>>> from paddlespeech.cli.vector import VectorExecutor
315+
>>> vec = VectorExecutor()
316+
>>> result = vec(audio_file="zh.wav")
317+
>>> print(result) # 187维向量
318+
[ -0.19083306 9.474295 -14.122263 -2.0916545 0.04848729
319+
4.9295826 1.4780062 0.3733844 10.695862 3.2697146
320+
-4.48199 -0.6617882 -9.170393 -11.1568775 -1.2358263 ...]
226321
```
227-
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
322+
323+
</details>
324+
325+
### Punctuation Restoration
326+
327+
<details><summary>&emsp;Quick recovery of text punctuation, works with ASR models</summary>
328+
329+
**command line experience**
330+
331+
```shell
332+
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
228333
```
229334

230-
**Shell Pipeline**
231-
- ASR + Punctuation Restoration
335+
**Python API experience**
336+
337+
```python
338+
>>> from paddlespeech.cli.text.infer import TextExecutor
339+
>>> text_punc = TextExecutor()
340+
>>> result = text_punc(text="今天的天气真不错啊你下午有空吗我想约你一起去吃饭")
341+
今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。
232342
```
233-
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
343+
344+
</details>
345+
346+
### Speech Translation
347+
348+
<details><summary>&emsp;End-to-end English to Chinese Speech Translation Tool</summary>
349+
350+
Use pre-compiled kaldi related tools, only support experience in Ubuntu system
351+
352+
**command line experience**
353+
354+
```shell
355+
paddlespeech st --input en.wav
234356
```
235357

236-
For more command lines, please see: [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
358+
**Python API experience**
237359

238-
If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md).
360+
```python
361+
>>> from paddlespeech.cli.st.infer import STExecutor
362+
>>> st = STExecutor()
363+
>>> result = st(audio_file="en.wav")
364+
['我 在 这栋 建筑 的 古老 门上 敲门 。']
365+
```
366+
367+
</details>
239368

240369

241370
<a name="quickstartserver"></a>
242371
## Quick Start Server
243372

244373
Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md).
245374

375+
**You can try it quickly in AI Studio (recommend): [SpeechServer](https://aistudio.baidu.com/aistudio/projectdetail/4354592?sUid=2470186&shared=1&ts=1660877827034)**
376+
246377
**Start server**
247378

248379
```shell

README_cn.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ pip install .
225225

226226
安装完成后,开发者可以通过命令行或者Python快速开始,命令行模式下改变 `--input` 可以尝试用自己的音频或文本测试,支持16k wav格式音频。
227227

228-
你也可以在`aistudio`中快速体验 👉🏻[PaddleSpeech API Demo ](https://aistudio.baidu.com/aistudio/projectdetail/4281335?shared=1)
228+
你也可以在`aistudio`中快速体验 👉🏻[一键预测,快速上手Speech开发任务](https://aistudio.baidu.com/aistudio/projectdetail/4353348?sUid=2470186&shared=1&ts=1660878142250)
229229

230230
测试音频示例下载
231231
```shell
@@ -373,7 +373,9 @@ python API 一键预测
373373

374374
<a name="快速使用服务"></a>
375375
## 快速使用服务
376-
安装完成后,开发者可以通过命令行一键启动语音识别,语音合成,音频分类三种服务。
376+
安装完成后,开发者可以通过命令行一键启动语音识别,语音合成,音频分类等多种服务。
377+
378+
你可以在 AI Studio 中快速体验:[SpeechServer一键部署](https://aistudio.baidu.com/aistudio/projectdetail/4354592?sUid=2470186&shared=1&ts=1660878208266)
377379

378380
**启动服务**
379381
```shell

0 commit comments

Comments
 (0)