1c528d1a3c
Merge pull request #284 from prameshbajra/main
2023-05-27 11:19:13 +01:00
c65e7ba9b4
Merge pull request #280 from Thebys/patch-1
2023-05-27 11:18:27 +01:00
5a47f458ac
Added download path parameter.
2023-05-27 11:38:54 +02:00
f1032bb40a
VAD unequal stack size, remove debug change
2023-05-26 20:39:19 +01:00
bc8a03881a
Merge pull request #281 from m-bain/v3
...
fix Unequal Stack Size VAD error
2023-05-26 20:37:57 +01:00
42b4909bc0
fix Unequal Stack Size VAD error
2023-05-26 20:36:03 +01:00
bb15d6b68e
Add Czech alignment model
...
This PR adds the following Czech alignment model: https://huggingface.co/comodoro/wav2vec2-xls-r-300m-cs-250 .
I have successfully tested this with several Czech audio recordings with length of up to 3 hours, and the results are satisfactory.
However, I have received the following warnings and I am not sure how relevant it is:
```
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file C:\Users\Thebys\.cache\torch\whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0. Bad things might happen unless you revert torch to 1.x.
```
2023-05-26 21:17:01 +02:00
1d9d630fb9
added Korean wav2vec2 model
2023-05-26 20:33:16 +09:00
7c5468116f
Merge branch 'm-bain:main' into transcribe_keywords
2023-05-20 16:03:40 +02:00
a1c705b3a7
fix tokenizer is None
2023-05-20 15:52:45 +02:00
715435db42
add tokenizer is None case
2023-05-20 15:42:21 +02:00
1fc965bc1a
add task, language keyword to transcribe
2023-05-20 15:30:25 +02:00
74b98ebfaa
ensure device_index not None
2023-05-20 13:11:30 +02:00
53396adb21
add device_index
2023-05-20 13:02:46 +02:00
d8a2b4ffc9
Merge pull request #246 from m-bain/v3
...
V3
2023-05-13 12:18:09 +01:00
fd8f1003cf
add translate, fix word_timestamp error
2023-05-13 12:14:06 +01:00
7642390d0a
Merge branch 'main' into danish_alignment
2023-05-09 23:10:13 +01:00
eabf35dff0
Custom result types
2023-05-08 20:45:34 +02:00
b50aafb17b
Fix tuple unpacking
2023-05-08 20:03:42 +02:00
4603f010a5
update readme, setup, add option to return char_timestamps
2023-05-07 20:28:33 +01:00
24008aa1ed
fix long segments, break into sentences using nltk, improve align logic, improve diarize (sentence-based)
2023-05-07 15:32:58 +01:00
07361ba1d7
add device to dia pipeline @sorgfresser
2023-05-05 11:53:51 +01:00
4e2ac4e4e9
torch2.0, remove compile for now, round to times to 3 decimal
2023-05-04 20:38:13 +01:00
d2116b98ca
Merge pull request #210 from sorgfresser/v3
...
Update pyannote and torch version
2023-05-04 20:32:06 +01:00
d8f0ef4a19
Set diarization device manually
2023-05-04 16:25:34 +02:00
2d59eb9726
Add torch compile to log mel spectrogram
2023-05-03 23:17:44 +02:00
cb53661070
Enable Hebrew support
2023-05-03 11:26:12 -05:00
64ca208cc8
Fixed the word_start variable not initialized bug.
2023-05-02 13:13:02 +05:30
e24ca9e0a2
Merge pull request #205 from prashanthellina/v3-fix-diarization
2023-04-30 21:08:45 +01:00
601c91140f
references #202 , attempt to fix speaker diarization failing in v3
2023-04-30 17:33:24 +00:00
b9c8c5072b
Pad language detection if audio is too short
2023-04-30 18:34:18 +02:00
cb176a186e
added num_workers to fix pickling error
2023-04-29 19:51:05 +02:00
0efad26066
pass compute_type
2023-04-24 21:26:44 +01:00
2a29f0ec6a
add compute types
2023-04-24 21:24:22 +01:00
558d980535
v3 init
2023-04-24 21:08:43 +01:00
da458863d7
allow custom model_dir for torchaudio models
2023-04-14 21:40:36 +01:00
cf252a8592
allow custom path for vad model
2023-04-14 15:02:58 +01:00
6a72b61564
clamp end_timestamp to prevent infinite loop
2023-04-11 20:15:37 +01:00
bb15c9428f
opti the inference loop
2023-04-09 15:58:55 +08:00
4146e56d5b
Added vad_filter type
2023-04-05 17:11:29 +05:00
70a4a0a25c
Fix typo
2023-04-05 10:50:49 +09:00
a582a59493
mkdir for torch cache in case it doesnt exist
2023-04-01 13:05:40 -07:00
189aeac83e
v2 lets goo
2023-04-01 00:10:45 +01:00
11a78d7ced
handle tmp wav file better
2023-04-01 00:06:40 +01:00
b9ca701d69
.wav conversion, handle audio with no detected speech
2023-03-31 23:02:38 +01:00
d0fa028045
fix tfile naming
2023-03-30 19:24:42 +01:00
ae4a9de307
add vad model external dl
2023-03-30 18:57:55 +01:00
18b63d46e2
skeleton v2
2023-03-30 05:31:57 +01:00
33dd3b9bcd
Update decoding.py
...
Changes from https://github.com/openai/whisper/pull/914/
2023-03-24 11:56:41 +01:00
cea42ca470
Fix hugging face error
...
Model should be loaded with an id to avoid this error:
huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'pyannote\segmentation'.
2023-03-04 19:12:13 +01:00