Merge branch 'main' of https://github.com/m-bain/whisperX into main

2025-07-01 18:17:27 -04:00 · 2023-01-26 10:46:36 +00:00
parent 16d24b1c96 d20a2a4ea2
commit 7f2159a953
1 changed files with 1 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -52,7 +52,7 @@ This repository refines the timestamps of openAI's Whisper model via forced alig
 - VAD filtering: Voice Activity Detection (VAD) from [Pyannote.audio](https://huggingface.co/pyannote/voice-activity-detection) is used as a preprocessing step to remove reliance on whisper timestamps and only transcribe audio segments containing speech. add `--vad_filter` flag, increases timestamp accuracy and robustness (requires more GPU mem due to 30s inputs in wav2vec2)
 - Character level timestamps (see `*.char.ass` file output)
- Diarization (still in beta, add `--diarization`)
+- Diarization (still in beta, add `--diarize`)
 To enable VAD filtering and Diarization, include your Hugging Face access token that you can generate from [Here](https://huggingface.co/settings/tokens) after the `--hf_token` argument and accept the user agreement for the following models: [Segmentation](https://huggingface.co/pyannote/segmentation) , [Voice Activity Detection (VAD)](https://huggingface.co/pyannote/voice-activity-detection) , and [Speaker Diarization](https://huggingface.co/pyannote/speaker-diarization)