whisperX

mirror of https://github.com/m-bain/whisperX.git synced 2025-07-01 18:17:27 -04:00

Author	SHA1	Message	Date
Simon	1fc965bc1a	add task, language keyword to transcribe	2023-05-20 15:30:25 +02:00
Max Bain	d8a2b4ffc9	Merge pull request #246 from m-bain/v3 V3	2023-05-13 12:18:09 +01:00
Max Bain	fd8f1003cf	add translate, fix word_timestamp error	2023-05-13 12:14:06 +01:00
Max Bain	7642390d0a	Merge branch 'main' into danish_alignment	2023-05-09 23:10:13 +01:00
Simon	eabf35dff0	Custom result types	2023-05-08 20:45:34 +02:00
Simon	b50aafb17b	Fix tuple unpacking	2023-05-08 20:03:42 +02:00
Max Bain	4603f010a5	update readme, setup, add option to return char_timestamps	2023-05-07 20:28:33 +01:00
Max Bain	24008aa1ed	fix long segments, break into sentences using nltk, improve align logic, improve diarize (sentence-based)	2023-05-07 15:32:58 +01:00
Max Bain	07361ba1d7	add device to dia pipeline @sorgfresser	2023-05-05 11:53:51 +01:00
Max Bain	4e2ac4e4e9	torch2.0, remove compile for now, round to times to 3 decimal	2023-05-04 20:38:13 +01:00
Max Bain	d2116b98ca	Merge pull request #210 from sorgfresser/v3 Update pyannote and torch version	2023-05-04 20:32:06 +01:00
Simon	d8f0ef4a19	Set diarization device manually	2023-05-04 16:25:34 +02:00
Simon	2d59eb9726	Add torch compile to log mel spectrogram	2023-05-03 23:17:44 +02:00
aramlang	cb53661070	Enable Hebrew support	2023-05-03 11:26:12 -05:00
Arnav Mehta	64ca208cc8	Fixed the word_start variable not initialized bug.	2023-05-02 13:13:02 +05:30
Max Bain	e24ca9e0a2	Merge pull request #205 from prashanthellina/v3-fix-diarization	2023-04-30 21:08:45 +01:00
Prashanth Ellina	601c91140f	references #202 , attempt to fix speaker diarization failing in v3	2023-04-30 17:33:24 +00:00
Simon	b9c8c5072b	Pad language detection if audio is too short	2023-04-30 18:34:18 +02:00
Thomas Mol	cb176a186e	added num_workers to fix pickling error	2023-04-29 19:51:05 +02:00
Max Bain	0efad26066	pass compute_type	2023-04-24 21:26:44 +01:00
Max Bain	2a29f0ec6a	add compute types	2023-04-24 21:24:22 +01:00
Max Bain	558d980535	v3 init	2023-04-24 21:08:43 +01:00
Max Bain	da458863d7	allow custom model_dir for torchaudio models	2023-04-14 21:40:36 +01:00
Max Bain	cf252a8592	allow custom path for vad model	2023-04-14 15:02:58 +01:00
m-bain	6a72b61564	clamp end_timestamp to prevent infinite loop	2023-04-11 20:15:37 +01:00
invisprints	bb15c9428f	opti the inference loop	2023-04-09 15:58:55 +08:00
dev-nomi	4146e56d5b	Added vad_filter type	2023-04-05 17:11:29 +05:00
Kevin Dias	70a4a0a25c	Fix typo	2023-04-05 10:50:49 +09:00
m-bain	a582a59493	mkdir for torch cache in case it doesnt exist	2023-04-01 13:05:40 -07:00
Max Bain	189aeac83e	v2 lets goo	2023-04-01 00:10:45 +01:00
Max Bain	11a78d7ced	handle tmp wav file better	2023-04-01 00:06:40 +01:00
Max Bain	b9ca701d69	.wav conversion, handle audio with no detected speech	2023-03-31 23:02:38 +01:00
Max Bain	d0fa028045	fix tfile naming	2023-03-30 19:24:42 +01:00
Max Bain	ae4a9de307	add vad model external dl	2023-03-30 18:57:55 +01:00
Max Bain	18b63d46e2	skeleton v2	2023-03-30 05:31:57 +01:00
Fernando O. Gallego	33dd3b9bcd	Update decoding.py Changes from https://github.com/openai/whisper/pull/914/	2023-03-24 11:56:41 +01:00
Muhammad Shakir	cea42ca470	Fix hugging face error Model should be loaded with an id to avoid this error: huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'pyannote\segmentation'.	2023-03-04 19:12:13 +01:00
Marcus Brandt	c8404d9805	added a danish alignment model	2023-03-04 13:20:40 +01:00
JCGoran	cfcede41f6	Added Python 3.7 compatibility - removed use of walrus operator in favor of `np.cumsum`	2023-03-02 15:46:07 +01:00
m-bain	847a3cd85b	Merge pull request #96 from smly/fix-batch-processing FIX: Assertion error in batch processing	2023-02-22 12:11:01 +00:00
smly	57f5957e0e	Pass device to pyannote.audio.Inference	2023-02-22 03:48:20 +09:00
smly	27fe502344	Fix assertion error in batch processing	2023-02-22 02:45:13 +09:00
Antoine Dufour	a1d2229416	Improvement to transcription starting point with VAD	2023-02-18 11:12:23 -05:00
arnavmehta7	2e307814dd	added if clause for checking	2023-02-10 14:48:51 +05:30
m-bain	d687cf3358	Merge pull request #58 from MahmoudAshraf97/main added turkish wav2vec2 model	2023-02-01 22:11:51 +00:00
Max Bain	0a3fd11562	update readme	2023-02-01 22:09:11 +00:00
Tengda Han	039af89a86	support batch processing	2023-02-01 19:41:20 +00:00
Mahmoud Ashraf	9f26112d5c	added turkish wav2vec2 model	2023-02-01 21:38:50 +02:00
m-bain	fd2a093754	Merge pull request #55 from jonatasgrosman/main FIX: Error when loading Hugging Face's models with embedded LM	2023-02-01 10:27:45 +00:00
Jonatas Grosman	d294e29ad9	fix: error when loading huggingface model with embedded language model	2023-01-31 23:24:26 -03:00

1 2

96 Commits