c89b4f898f
fix: incorrect type annotation in get_writer return value
...
The audio_path attribute that the __call__ method of the ResultWriter class takes is a str, not TextIO
2025-05-13 02:45:33 +02:00
0aed874589
Remove duplicated item
...
"lv": "latvian"
2025-04-12 11:08:15 +02:00
36d2622e27
feat: add Latvian align model
2025-01-25 09:45:17 +01:00
f286e7f3de
refactor: improve type hints and clean up imports
2025-01-13 10:45:50 +01:00
26d9b46888
feat: include speaker information in WriteTXT when diarizing
2025-01-05 18:21:34 +01:00
4acbdd75be
add "yue" to supported languages that was added along with Large-V3
2023-12-04 17:27:54 +03:00
d4a600b568
REMOVE duplicated code
2023-10-31 18:55:50 +01:00
c6d9e6cb67
chore(writer): improve text display(ja etc) in json file
2023-09-10 22:02:47 +08:00
f505702dc7
chore(writer): Join words without spaces for ja, zh
...
fix #248 , fix #310
2023-08-30 01:11:09 +08:00
a8bfac6bef
Merge pull request #427 from awerks/main
...
Update alignment.py
2023-08-29 10:03:46 -06:00
cc81ab7db7
fix missing prefix
...
Fixed missing the speaker part when enable --highlight_words
2023-08-25 12:08:16 +08:00
d2d840f06c
Update utils.py
2023-08-17 14:45:23 +02:00
0767597bff
fix writer fail on segments 0
2023-08-17 14:18:16 +02:00
b13778fefd
make aud optional
2023-06-07 11:47:49 +01:00
076ff96eb2
Add Audacity export
...
This exports the transcript to a text file that can be directly imported in Audacity as label file. This is useful to quickly check the transcript-audio alignment.
2023-06-07 05:49:49 +02:00
24008aa1ed
fix long segments, break into sentences using nltk, improve align logic, improve diarize (sentence-based)
2023-05-07 15:32:58 +01:00
558d980535
v3 init
2023-04-24 21:08:43 +01:00
70a4a0a25c
Fix typo
2023-04-05 10:50:49 +09:00
11a78d7ced
handle tmp wav file better
2023-04-01 00:06:40 +01:00
b9ca701d69
.wav conversion, handle audio with no detected speech
2023-03-31 23:02:38 +01:00
18b63d46e2
skeleton v2
2023-03-30 05:31:57 +01:00
0a3fd11562
update readme
2023-02-01 22:09:11 +00:00
5b8c8a7bd3
pandas fix
2023-01-27 15:05:08 +00:00
286a2f2c14
clean up logic, use pandas where possibl
2023-01-25 18:42:52 +00:00
eec6d1f8d8
missing word timestamps
2023-01-24 16:37:19 +00:00
d1600e5b0f
Merge branch 'main' of https://github.com/m-bain/whisperX into main
...
Conflicts:
whisperx/transcribe.py
whisperx/utils.py
2023-01-24 15:38:05 +00:00
d395c21b83
new logic, diarization, vad filtering
2023-01-24 15:02:08 +00:00
ba102feb7f
vad filter
2023-01-20 12:54:20 +00:00
4569cb982a
fix file_ass display bug
...
sentence start time on .ass files had a bug where if the first word did not have a timestamp, it would set sentence start_time to 0, but this needs to be the local 0 not actual file 0 (i.e. it should be segment['start'])
2023-01-12 12:57:12 +00:00
45e9509227
multilingual init
2022-12-18 12:21:24 +00:00
645d55903a
add .ass output
2022-12-17 17:24:48 +00:00
9f6fa61160
init commit
2022-12-14 18:59:12 +00:00