1631c3040f
feat: enhance diarization with optional output of speaker embeddings
...
- Updated DiarizationPipeline to include a return_embeddings parameter for optional speaker embeddings.
- Modified assign_word_speakers to accept and process speaker embeddings.
- Updated CLI to support --speaker_embeddings flag for JSON output.
- Ensured backward compatibility for existing functionality.
2025-06-24 15:01:09 +02:00
b343241253
feat: add diarize_model arg to CLI ( #1101 )
2025-05-31 13:32:31 +02:00
7d36b832f9
refactor: update CLI entry point
2025-05-03 09:25:59 +02:00
ac0c8bd79a
feat: add version and Python version arguments to CLI
2025-05-01 11:08:54 +02:00
e7712f496e
refactor: update import statements to use explicit module paths across multiple files
2025-03-25 16:24:21 +01:00
7b3c9ce629
Add models_cache_only param
2025-01-27 12:16:37 +00:00
79eb8fa53d
Accept alternative VAD methods. Extend to use Silero VAD.
2025-01-06 13:41:46 +01:00
9a8967f27e
refactor: add type hints
2025-01-05 11:48:24 +01:00
51da22771f
feat: add verbose output ( #759 )
...
---------
Co-authored-by: Abhishek Sharma <abhishek@zipteams.com >
Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com >
2025-01-01 13:07:52 +01:00
942c336b8f
Fixes --model_dir path
2023-12-27 14:03:54 -05:00
f865dfe710
fix typo
2023-12-04 17:38:50 +03:00
afd5ef1d58
FIX warnings for word options
2023-10-31 18:55:35 +01:00
c6fe379d9e
Merge pull request #517 from jkukul/support-language-names-as-parameters
...
Support language names in `--language` parameter.
2023-10-25 11:16:30 -07:00
14a7cab8eb
Pass patience and beam_size to faster-whisper.
2023-10-14 13:51:29 +02:00
1001a055db
Support language names in --language.
2023-10-10 13:55:47 +02:00
ffd6167b26
Merge pull request #473 from sorgfresser/fix-faster-whisper-threads
2023-09-19 16:53:34 -07:00
0ae0d49d1d
add faster whisper threading
2023-09-14 11:47:51 +02:00
5223de2a41
fix: UnboundLocalError: local variable 'align_language' referenced before assignment
2023-08-30 01:11:09 +08:00
f505702dc7
chore(writer): Join words without spaces for ja, zh
...
fix #248 , fix #310
2023-08-30 01:11:09 +08:00
9647f60fca
Merge branch 'main' into add-merge-chunk-size-as-argument
2023-08-29 10:05:05 -06:00
eb771cf56d
feat: Add merge chunks chunk_size as arguments.
...
Suggest from https://github.com/m-bain/whisperX/issues/200#issuecomment-1666507780
2023-08-29 23:09:02 +08:00
cb3ed4ab9d
Update transcribe.py
2023-08-16 16:22:29 +02:00
48e7caad77
Update transcribe.py -> small change in batch_size
description
...
Changed the description of the `batch_size` parameter.
2023-07-24 11:45:38 +02:00
d39c1b2319
add "aud" to output_format
2023-06-07 11:48:49 +01:00
b026407fd9
Merge branch 'v3' of https://github.com/m-bain/whisperX into v3
...
Conflicts:
whisperx/asr.py
2023-06-05 15:30:02 +01:00
a323cff654
--suppress_numerals option, ensures non-numerical words, for wav2vec2 alignment
2023-06-05 15:27:42 +01:00
74b98ebfaa
ensure device_index not None
2023-05-20 13:11:30 +02:00
53396adb21
add device_index
2023-05-20 13:02:46 +02:00
fd8f1003cf
add translate, fix word_timestamp error
2023-05-13 12:14:06 +01:00
4603f010a5
update readme, setup, add option to return char_timestamps
2023-05-07 20:28:33 +01:00
24008aa1ed
fix long segments, break into sentences using nltk, improve align logic, improve diarize (sentence-based)
2023-05-07 15:32:58 +01:00
07361ba1d7
add device to dia pipeline @sorgfresser
2023-05-05 11:53:51 +01:00
4e2ac4e4e9
torch2.0, remove compile for now, round to times to 3 decimal
2023-05-04 20:38:13 +01:00
d8f0ef4a19
Set diarization device manually
2023-05-04 16:25:34 +02:00
601c91140f
references #202 , attempt to fix speaker diarization failing in v3
2023-04-30 17:33:24 +00:00
0efad26066
pass compute_type
2023-04-24 21:26:44 +01:00
2a29f0ec6a
add compute types
2023-04-24 21:24:22 +01:00
558d980535
v3 init
2023-04-24 21:08:43 +01:00
bb15c9428f
opti the inference loop
2023-04-09 15:58:55 +08:00
4146e56d5b
Added vad_filter type
2023-04-05 17:11:29 +05:00
11a78d7ced
handle tmp wav file better
2023-04-01 00:06:40 +01:00
b9ca701d69
.wav conversion, handle audio with no detected speech
2023-03-31 23:02:38 +01:00
d0fa028045
fix tfile naming
2023-03-30 19:24:42 +01:00
ae4a9de307
add vad model external dl
2023-03-30 18:57:55 +01:00
18b63d46e2
skeleton v2
2023-03-30 05:31:57 +01:00
cea42ca470
Fix hugging face error
...
Model should be loaded with an id to avoid this error:
huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'pyannote\segmentation'.
2023-03-04 19:12:13 +01:00
847a3cd85b
Merge pull request #96 from smly/fix-batch-processing
...
FIX: Assertion error in batch processing
2023-02-22 12:11:01 +00:00
57f5957e0e
Pass device to pyannote.audio.Inference
2023-02-22 03:48:20 +09:00
27fe502344
Fix assertion error in batch processing
2023-02-22 02:45:13 +09:00
a1d2229416
Improvement to transcription starting point with VAD
2023-02-18 11:12:23 -05:00