Merge pull request #246 from m-bain/v3

V3
2025-07-01 18:17:27 -04:00 · 2023-05-13 12:18:09 +01:00
parent 46b416296f 9ffb7e7a23
commit d8a2b4ffc9
7 changed files with 18 additions and 119 deletions
--- a/README.md
+++ b/README.md
@ -32,12 +32,12 @@
 <!-- <h2 align="left", id="what-is-it">What is it 🔎</h2> -->


-This repository provides fast automatic speaker recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.
+This repository provides fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.

 - ⚡️ Batched inference for 70x realtime transcription using whisper large-v2
 - 🪶 [faster-whisper](https://github.com/guillaumekln/faster-whisper) backend, requires <8GB gpu memory for large-v2 with beam_size=5
 - 🎯 Accurate word-level timestamps using wav2vec2 alignment
- 👯‍♂️ Multispeaker ASR using speaker diarization from [pyannote-audio](https://github.com/pyannote/pyannote-audio) (labels each segment/word with speaker ID) 
+- 👯‍♂️ Multispeaker ASR using speaker diarization from [pyannote-audio](https://github.com/pyannote/pyannote-audio) (speaker ID labels) 
 - 🗣️ VAD preprocessing, reduces hallucination & batching with no WER degradation


@ -74,9 +74,9 @@ GPU execution requires the NVIDIA libraries cuBLAS 11.x and cuDNN 8.x to be inst

 ### 2. Install PyTorch2.0, e.g. for Linux and Windows CUDA11.7:

-`pip3 install torch torchvision torchaudio`
+`conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia`

-See other methods [here.](https://pytorch.org/get-started/locally/)
+See other methods [here.](https://pytorch.org/get-started/previous-versions/#v200)

 ### 3. Install this repo