Commit Graph

89 Commits

Author SHA1 Message Date
1631c3040f feat: enhance diarization with optional output of speaker embeddings
- Updated DiarizationPipeline to include a return_embeddings parameter for optional speaker embeddings.
- Modified assign_word_speakers to accept and process speaker embeddings.
- Updated CLI to support --speaker_embeddings flag for JSON output.
- Ensured backward compatibility for existing functionality.
2025-06-24 15:01:09 +02:00
bog
b343241253 feat: add diarize_model arg to CLI (#1101) 2025-05-31 13:32:31 +02:00
7d36b832f9 refactor: update CLI entry point 2025-05-03 09:25:59 +02:00
ac0c8bd79a feat: add version and Python version arguments to CLI 2025-05-01 11:08:54 +02:00
e7712f496e refactor: update import statements to use explicit module paths across multiple files 2025-03-25 16:24:21 +01:00
7b3c9ce629 Add models_cache_only param 2025-01-27 12:16:37 +00:00
79eb8fa53d Accept alternative VAD methods. Extend to use Silero VAD. 2025-01-06 13:41:46 +01:00
9a8967f27e refactor: add type hints 2025-01-05 11:48:24 +01:00
51da22771f feat: add verbose output (#759)
---------

Co-authored-by: Abhishek Sharma <abhishek@zipteams.com>
Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
2025-01-01 13:07:52 +01:00
942c336b8f Fixes --model_dir path 2023-12-27 14:03:54 -05:00
f865dfe710 fix typo 2023-12-04 17:38:50 +03:00
afd5ef1d58 FIX warnings for word options 2023-10-31 18:55:35 +01:00
c6fe379d9e Merge pull request #517 from jkukul/support-language-names-as-parameters
Support language names in `--language` parameter.
2023-10-25 11:16:30 -07:00
14a7cab8eb Pass patience and beam_size to faster-whisper. 2023-10-14 13:51:29 +02:00
1001a055db Support language names in --language. 2023-10-10 13:55:47 +02:00
ffd6167b26 Merge pull request #473 from sorgfresser/fix-faster-whisper-threads 2023-09-19 16:53:34 -07:00
0ae0d49d1d add faster whisper threading 2023-09-14 11:47:51 +02:00
5223de2a41 fix: UnboundLocalError: local variable 'align_language' referenced before assignment 2023-08-30 01:11:09 +08:00
f505702dc7 chore(writer): Join words without spaces for ja, zh
fix #248, fix #310
2023-08-30 01:11:09 +08:00
9647f60fca Merge branch 'main' into add-merge-chunk-size-as-argument 2023-08-29 10:05:05 -06:00
eb771cf56d feat: Add merge chunks chunk_size as arguments.
Suggest from https://github.com/m-bain/whisperX/issues/200#issuecomment-1666507780
2023-08-29 23:09:02 +08:00
cb3ed4ab9d Update transcribe.py 2023-08-16 16:22:29 +02:00
48e7caad77 Update transcribe.py -> small change in batch_size description
Changed the description of the `batch_size` parameter.
2023-07-24 11:45:38 +02:00
d39c1b2319 add "aud" to output_format 2023-06-07 11:48:49 +01:00
b026407fd9 Merge branch 'v3' of https://github.com/m-bain/whisperX into v3
Conflicts:
	whisperx/asr.py
2023-06-05 15:30:02 +01:00
a323cff654 --suppress_numerals option, ensures non-numerical words, for wav2vec2 alignment 2023-06-05 15:27:42 +01:00
74b98ebfaa ensure device_index not None 2023-05-20 13:11:30 +02:00
53396adb21 add device_index 2023-05-20 13:02:46 +02:00
fd8f1003cf add translate, fix word_timestamp error 2023-05-13 12:14:06 +01:00
4603f010a5 update readme, setup, add option to return char_timestamps 2023-05-07 20:28:33 +01:00
24008aa1ed fix long segments, break into sentences using nltk, improve align logic, improve diarize (sentence-based) 2023-05-07 15:32:58 +01:00
07361ba1d7 add device to dia pipeline @sorgfresser 2023-05-05 11:53:51 +01:00
4e2ac4e4e9 torch2.0, remove compile for now, round to times to 3 decimal 2023-05-04 20:38:13 +01:00
d8f0ef4a19 Set diarization device manually 2023-05-04 16:25:34 +02:00
601c91140f references #202, attempt to fix speaker diarization failing in v3 2023-04-30 17:33:24 +00:00
0efad26066 pass compute_type 2023-04-24 21:26:44 +01:00
2a29f0ec6a add compute types 2023-04-24 21:24:22 +01:00
558d980535 v3 init 2023-04-24 21:08:43 +01:00
bb15c9428f opti the inference loop 2023-04-09 15:58:55 +08:00
4146e56d5b Added vad_filter type 2023-04-05 17:11:29 +05:00
11a78d7ced handle tmp wav file better 2023-04-01 00:06:40 +01:00
b9ca701d69 .wav conversion, handle audio with no detected speech 2023-03-31 23:02:38 +01:00
d0fa028045 fix tfile naming 2023-03-30 19:24:42 +01:00
ae4a9de307 add vad model external dl 2023-03-30 18:57:55 +01:00
18b63d46e2 skeleton v2 2023-03-30 05:31:57 +01:00
cea42ca470 Fix hugging face error
Model should be loaded with an id to avoid this error:
huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'pyannote\segmentation'.
2023-03-04 19:12:13 +01:00
847a3cd85b Merge pull request #96 from smly/fix-batch-processing
FIX: Assertion error in batch processing
2023-02-22 12:11:01 +00:00
57f5957e0e Pass device to pyannote.audio.Inference 2023-02-22 03:48:20 +09:00
27fe502344 Fix assertion error in batch processing 2023-02-22 02:45:13 +09:00
a1d2229416 Improvement to transcription starting point with VAD 2023-02-18 11:12:23 -05:00