Can't reproduce performance in Alimeeting  with pyannote version 4.04

### Tested versions

Reproducible :pyannote 4.04

### System information

linux ubuntu 22.04  GPU:4090 cuda_12.8.1_570.124.06_linux

### Issue description

Thanks for your great work on this project. I'm having trouble reproducing the reported performance on the AliMeeting dataset. When comparing my reproduced output with the official results shown on the[https://huggingface.co/pyannote/speaker-diarization-3.1](url),  I observed a significant DER gap—for example, on audio R8002_M8002_MS802, my DER reaches 12% (and 8% when limited to the first 180 seconds due to file size constraints). Could you help clarify if there are any specific settings or preprocessing steps I might be missing?
the inference code:

```
# instantiate the pipeline
from pyannote.audio import Pipeline
import torch
# import soundfile as sf
import argparse
import json
import logging
import os
from pathlib import Path
from typing import Optional
# import soundfile as sf
import torch
# import textgrid
# from modelscope.msdatasets import MsDataset

try:
    from pyannote.audio import Pipeline
    from pyannote.metrics.diarization import DiarizationErrorRate
    from pyannote.core import Annotation, Segment
except Exception:
    raise


pipeline = Pipeline.from_pretrained(
  "pyannote/speaker-diarization-3.1",
  cache_dir='/data/data/pretrained_model'
).to(torch.device("cuda"))

audio_path='/data/code/pyannote-audio/out/R8002_M8002_MS802_1ch_cut.wav'
# run the pipeline on an audio file
print("Running pipeline on %s..." % audio_path)
# diarization = pipeline(audio_input)
diarization = pipeline(audio_path)
print("end of pipeline")
print("writing output to disk...")
output_path = "/data/code/pyannote-audio/out/inference_reproduce_cut180s.rttm"
with open(output_path, "w") as f:
    for turn, speaker in diarization.speaker_diarization:
        # print(f"start={turn.start:.1f}s stop={turn.end:.1f}s {speaker}")
        f.write(f"SPEAKER {os.path.basename(audio_path)} 1 {turn.start:.3f} {turn.end - turn.start:.3f} <NA> <NA> {speaker} <NA> <NA>\n") 
```

the audio file in180s long
[R8002_M8002_MS802_1ch_cut.wav](https://github.com/user-attachments/files/25869402/R8002_M8002_MS802_1ch_cut.wav)
the uv environment 
[uv.lock.txt](https://github.com/user-attachments/files/25869531/uv.lock.txt)
[pyproject.txt](https://github.com/user-attachments/files/25869532/pyproject.txt)
the output I get
[inference_reproduce_cut180s_rttm.txt](https://github.com/user-attachments/files/25869674/inference_reproduce_cut180s_rttm.txt)
the result reported in benchmark [https://huggingface.co/pyannote/speaker-diarization-3.1](url):
[benchmark_cut180s_rttm.txt](https://github.com/user-attachments/files/25869695/benchmark_cut180s_rttm.txt)

### Minimal reproduction example (MRE)

no link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce performance in Alimeeting with pyannote version 4.04 #1989

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't reproduce performance in Alimeeting with pyannote version 4.04 #1989

Description

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions