WhisperNER

Implementation for the paper WhisperNER: Unified Open Named Entity and Speech Recognition. WhisperNER is a unified model for automatic speech recognition (ASR) and named entity recognition (NER), with zero-shot capabilities. The WhisperNER model is designed as a strong base model for the downstream task of ASR with NER, and can be fine-tuned on specific datasets for improved performance.

Links

📄 Paper: WhisperNER: Unified Open Named Entity and Speech Recognition.
🤗 Demo: Check out the demo here.
🤗 WhisperNER model collection.
📊 Datasets:
- Voxpopuli-NER-EN: A dataset for zero-shot NER evaluation based on the Voxpopuli dataset. The VoxPopuli Data is released under CC0 license, with the European Parliament's legal disclaimer. (see European Parliament's legal notice for the raw data)

Installation

Start with creating a virtual environment and activating it:

conda create -n whisper-ner python=3.10 -y
conda activate whisper-ner
pip install torch==2.2.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118

Then install the package:

git clone https://github.com/aiola-lab/whisper-ner.git
cd whisper-ner
pip install -e .

Dataset

The dataset should have the following json format, with keys text, audio, and ner:

[
  {
    "text": "The cost of HIV, TB and HCV medicines for treatment and for prevention varies from one country to another.",
    "audio": "test_part_0/20170703-0900-PLENARY-18-en_20170703-19:41:40_6.wav",
    "ner": [
      [
        12,
        15,
        "Disease",
        "HIV",
        "A virus that attacks the immune system and can lead to AIDS if not treated."
      ],
      [
        17,
        19,
        "Disease",
        "TB",
        "Tuberculosis, a bacterial infection that primarily affects the lungs but can also affect other parts of the body."
      ],
      [
        24,
        27,
        "Disease",
        "HCV",
        "Hepatitis C virus, a viral infection that causes liver inflammation, sometimes leading to serious liver damage."
      ],
      [
        28,
        37,
        "Healthcare Product",
        "medicines",
        "Substances used to treat or prevent disease and improve health."
      ],
      ...
    ]
  },
  ...
]

An example of a dataset can be found in Voxpopuli-NER-EN.

Training and finetuning

To train the model, run the following command (or modify it according to your needs):

python whisper_ner/trainer.py \
  --whisper-model-name aiola/whisper-ner-v1 \
  --lr 1e-06 \
  --batch-size 4 \
  --gradient-accumulation-steps 1 \
  --eval-steps 500 \
  --save-steps 500 \
  --max-steps 10000 \
  --fp16 False \
  --use-lora False \
  --lora-merge-and-unload False \
  --max-eval-samples=1000 \
  --entity-dropout-prob=0.1 \
  --n-neg-samples 2 \
  --output-path <output-dir> \
  --parts-to-freeze encoder \
  --predict-with-generate False \
  --audio-root-dir <root-of-audio-file> \
  --test-data-path <test-json-path> \
  --train-data-path <train-json-path> \
  --validation-data-path <val-json-path> \
  --wandb-logging true \
  --exp-name <wandb-exp-name> \
  --wandb-entity <wandb-entity>

Inference

We provide inference code in whisper_ner/inference.py. Usage example:

python whisper_ner/inference.py \
  --model-path "aiola/whisper-ner-v1" \
  --audio-file-path <audio-file-path> \
  --prompt <prompt> \  # comma seperated entity tags, e.g. "person, company"
  --entity-bias 0.  # bias for the start of entity token (`<`). For better control over precision-recall trade-off. Setting negative value will favor precision over recall, and positive value will favor recall over precision.

Citation

If you find our work or this code to be useful in your own research, please consider citing the following paper:

@article{ayache2024whisperner,
  title={WhisperNER: Unified Open Named Entity and Speech Recognition},
  author={Ayache, Gil and Pirchi, Menachem and Navon, Aviv and Shamsian, Aviv and Hetz, Gill and Keshet, Joseph},
  journal={arXiv preprint arXiv:2409.08107},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
whisper_ner		whisper_ner
.gitignore		.gitignore
.isort.cfg		.isort.cfg
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WhisperNER

Links

Installation

Dataset

Training and finetuning

Inference

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

aiola-lab/whisper-ner

Folders and files

Latest commit

History

Repository files navigation

WhisperNER

Links

Installation

Dataset

Training and finetuning

Inference

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages