whisperX

mirror of https://github.com/m-bain/whisperX.git synced 2025-07-01 18:17:27 -04:00

Author	SHA1	Message	Date
Radu-Sebastian Amarie	1631c3040f	feat: enhance diarization with optional output of speaker embeddings - Updated DiarizationPipeline to include a return_embeddings parameter for optional speaker embeddings. - Modified assign_word_speakers to accept and process speaker embeddings. - Updated CLI to support --speaker_embeddings flag for JSON output. - Ensured backward compatibility for existing functionality.	2025-06-24 15:01:09 +02:00
bog	b343241253	feat: add diarize_model arg to CLI (#1101 )	2025-05-31 13:32:31 +02:00
Barabazs	7d36b832f9	refactor: update CLI entry point	2025-05-03 09:25:59 +02:00
Barabazs	ac0c8bd79a	feat: add version and Python version arguments to CLI	2025-05-01 11:08:54 +02:00
Barabazs	e7712f496e	refactor: update import statements to use explicit module paths across multiple files	2025-03-25 16:24:21 +01:00
philmcmahon	7b3c9ce629	Add models_cache_only param	2025-01-27 12:16:37 +00:00
3manifold	79eb8fa53d	Accept alternative VAD methods. Extend to use Silero VAD.	2025-01-06 13:41:46 +01:00
Barabazs	9a8967f27e	refactor: add type hints	2025-01-05 11:48:24 +01:00
Abhishek Sharma	51da22771f	feat: add verbose output (#759 ) --------- Co-authored-by: Abhishek Sharma <abhishek@zipteams.com> Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>	2025-01-01 13:07:52 +01:00
canoalberto	942c336b8f	Fixes --model_dir path	2023-12-27 14:03:54 -05:00
Mahmoud Ashraf	f865dfe710	fix typo	2023-12-04 17:38:50 +03:00
amosal	afd5ef1d58	FIX warnings for word options	2023-10-31 18:55:35 +01:00
Max Bain	c6fe379d9e	Merge pull request #517 from jkukul/support-language-names-as-parameters Support language names in `--language` parameter.	2023-10-25 11:16:30 -07:00
Jakub Kukul	14a7cab8eb	Pass patience and beam_size to faster-whisper.	2023-10-14 13:51:29 +02:00
Jakub Kukul	1001a055db	Support language names in --language.	2023-10-10 13:55:47 +02:00
Max Bain	ffd6167b26	Merge pull request #473 from sorgfresser/fix-faster-whisper-threads	2023-09-19 16:53:34 -07:00
Simon Sorg	0ae0d49d1d	add faster whisper threading	2023-09-14 11:47:51 +02:00
陳鈞	5223de2a41	fix: UnboundLocalError: local variable 'align_language' referenced before assignment	2023-08-30 01:11:09 +08:00
陳鈞	f505702dc7	chore(writer): Join words without spaces for ja, zh fix #248, fix #310	2023-08-30 01:11:09 +08:00
Max Bain	9647f60fca	Merge branch 'main' into add-merge-chunk-size-as-argument	2023-08-29 10:05:05 -06:00
陳鈞	eb771cf56d	feat: Add merge chunks chunk_size as arguments. Suggest from https://github.com/m-bain/whisperX/issues/200#issuecomment-1666507780	2023-08-29 23:09:02 +08:00
awerks	cb3ed4ab9d	Update transcribe.py	2023-08-16 16:22:29 +02:00
Mark Berger	48e7caad77	Update transcribe.py -> small change in `batch_size` description Changed the description of the `batch_size` parameter.	2023-07-24 11:45:38 +02:00
Max Bain	d39c1b2319	add "aud" to output_format	2023-06-07 11:48:49 +01:00
Max Bain	b026407fd9	Merge branch 'v3' of https://github.com/m-bain/whisperX into v3 Conflicts: whisperx/asr.py	2023-06-05 15:30:02 +01:00
Max Bain	a323cff654	--suppress_numerals option, ensures non-numerical words, for wav2vec2 alignment	2023-06-05 15:27:42 +01:00
Simon	74b98ebfaa	ensure device_index not None	2023-05-20 13:11:30 +02:00
Simon	53396adb21	add device_index	2023-05-20 13:02:46 +02:00
Max Bain	fd8f1003cf	add translate, fix word_timestamp error	2023-05-13 12:14:06 +01:00
Max Bain	4603f010a5	update readme, setup, add option to return char_timestamps	2023-05-07 20:28:33 +01:00
Max Bain	24008aa1ed	fix long segments, break into sentences using nltk, improve align logic, improve diarize (sentence-based)	2023-05-07 15:32:58 +01:00
Max Bain	07361ba1d7	add device to dia pipeline @sorgfresser	2023-05-05 11:53:51 +01:00
Max Bain	4e2ac4e4e9	torch2.0, remove compile for now, round to times to 3 decimal	2023-05-04 20:38:13 +01:00
Simon	d8f0ef4a19	Set diarization device manually	2023-05-04 16:25:34 +02:00
Prashanth Ellina	601c91140f	references #202 , attempt to fix speaker diarization failing in v3	2023-04-30 17:33:24 +00:00
Max Bain	0efad26066	pass compute_type	2023-04-24 21:26:44 +01:00
Max Bain	2a29f0ec6a	add compute types	2023-04-24 21:24:22 +01:00
Max Bain	558d980535	v3 init	2023-04-24 21:08:43 +01:00
invisprints	bb15c9428f	opti the inference loop	2023-04-09 15:58:55 +08:00
dev-nomi	4146e56d5b	Added vad_filter type	2023-04-05 17:11:29 +05:00
Max Bain	11a78d7ced	handle tmp wav file better	2023-04-01 00:06:40 +01:00
Max Bain	b9ca701d69	.wav conversion, handle audio with no detected speech	2023-03-31 23:02:38 +01:00
Max Bain	d0fa028045	fix tfile naming	2023-03-30 19:24:42 +01:00
Max Bain	ae4a9de307	add vad model external dl	2023-03-30 18:57:55 +01:00
Max Bain	18b63d46e2	skeleton v2	2023-03-30 05:31:57 +01:00
Muhammad Shakir	cea42ca470	Fix hugging face error Model should be loaded with an id to avoid this error: huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'pyannote\segmentation'.	2023-03-04 19:12:13 +01:00
m-bain	847a3cd85b	Merge pull request #96 from smly/fix-batch-processing FIX: Assertion error in batch processing	2023-02-22 12:11:01 +00:00
smly	57f5957e0e	Pass device to pyannote.audio.Inference	2023-02-22 03:48:20 +09:00
smly	27fe502344	Fix assertion error in batch processing	2023-02-22 02:45:13 +09:00
Antoine Dufour	a1d2229416	Improvement to transcription starting point with VAD	2023-02-18 11:12:23 -05:00

1 2

89 Commits