add .ass output

This commit is contained in:
Max Bain
2022-12-17 17:24:48 +00:00
parent 938341c05a
commit 645d55903a
8 changed files with 462 additions and 42 deletions

View File

@ -29,11 +29,12 @@ Run whisper on example segment (using default params)
`whisperx examples/sample01.wav --model medium.en --output examples/whisperx --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --align_extend 2` `whisperx examples/sample01.wav --model medium.en --output examples/whisperx --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --align_extend 2`
Outputs both word-level, and phrase level. If the speech is non-english, select an alternative ASR phoneme model from this list https://pytorch.org/audio/stable/pipelines.html#id14
Expected outputs:
Example:
### Qualitative Results:
Using normal whisper out of the box, many transcriptions are out of sync: Using normal whisper out of the box, many transcriptions are out of sync:
https://user-images.githubusercontent.com/36994049/207743923-b4f0d537-29ae-4be2-b404-bb941db73652.mov https://user-images.githubusercontent.com/36994049/207743923-b4f0d537-29ae-4be2-b404-bb941db73652.mov
@ -51,10 +52,10 @@ https://user-images.githubusercontent.com/36994049/207744104-ff4faca1-1bb8-41c9-
<h2 align="left">Limitations ⚠️</h2> <h2 align="left">Limitations ⚠️</h2>
- Hacked this up quite quickly, there might be some errors, please raise an issue if you encounter any. - Currently only tested for ENGLISH language. Check
- Currently only working and tested for ENGLISH language.
- Whisper normalises spoken numbers e.g. "fifty seven" to arabic numerals "57". Need to perform this normalization after alignment, so the phonemes can be aligned. Currently just ignores numbers. - Whisper normalises spoken numbers e.g. "fifty seven" to arabic numerals "57". Need to perform this normalization after alignment, so the phonemes can be aligned. Currently just ignores numbers.
- Assumes the initial whisper timestamps are accurate to some degree (within margin of 2 seconds, adjust if needed -- bigger margins more prone to alignment errors) - Assumes the initial whisper timestamps are accurate to some degree (within margin of 2 seconds, adjust if needed -- bigger margins more prone to alignment errors)
- Hacked this up quite quickly, there might be some errors, please raise an issue if you encounter any.
<h2 align="left">Coming Soon 🗓</h2> <h2 align="left">Coming Soon 🗓</h2>

BIN
examples/sample01.mp4 Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -0,0 +1,290 @@
[Script Info]
ScriptType: v4.00+
PlayResX: 384
PlayResY: 288
ScaledBorderAndShadow: yes
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,24,&Hffffff,&Hffffff,&H0,&H0,0,0,0,0,100,100,0,0,1,1,0,2,10,10,10,0
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:1.18,0:00:1.67,Default,,0,0,0,,{\1c&HFF00&\u1}Bella,{\r} Gloria, love.
Dialogue: 0,0:00:1.67,0:00:2.65,Default,,0,0,0,,Bella, Gloria, love.
Dialogue: 0,0:00:2.65,0:00:3.05,Default,,0,0,0,,Bella, {\1c&HFF00&\u1}Gloria,{\r} love.
Dialogue: 0,0:00:3.05,0:00:3.07,Default,,0,0,0,,Bella, Gloria, love.
Dialogue: 0,0:00:3.07,0:00:3.27,Default,,0,0,0,,Bella, Gloria, {\1c&HFF00&\u1}love.{\r}
Dialogue: 0,0:00:3.75,0:00:3.85,Default,,0,0,0,,{\1c&HFF00&\u1}Oh.{\r}
Dialogue: 0,0:00:4.50,0:00:4.72,Default,,0,0,0,,{\1c&HFF00&\u1}How{\r} are you?
Dialogue: 0,0:00:4.72,0:00:5.78,Default,,0,0,0,,How are you?
Dialogue: 0,0:00:5.78,0:00:5.90,Default,,0,0,0,,How {\1c&HFF00&\u1}are{\r} you?
Dialogue: 0,0:00:5.90,0:00:5.94,Default,,0,0,0,,How are you?
Dialogue: 0,0:00:5.94,0:00:6.22,Default,,0,0,0,,How are {\1c&HFF00&\u1}you?{\r}
Dialogue: 0,0:00:6.72,0:00:6.80,Default,,0,0,0,,{\1c&HFF00&\u1}Oh,{\r} I'm OK.
Dialogue: 0,0:00:6.80,0:00:6.88,Default,,0,0,0,,Oh, I'm OK.
Dialogue: 0,0:00:6.88,0:00:7.04,Default,,0,0,0,,Oh, {\1c&HFF00&\u1}I'm{\r} OK.
Dialogue: 0,0:00:7.04,0:00:7.09,Default,,0,0,0,,Oh, I'm OK.
Dialogue: 0,0:00:7.09,0:00:7.13,Default,,0,0,0,,Oh, I'm {\1c&HFF00&\u1}OK.{\r}
Dialogue: 0,0:00:8.41,0:00:8.45,Default,,0,0,0,,{\1c&HFF00&\u1}I{\r} will be.
Dialogue: 0,0:00:8.45,0:00:8.49,Default,,0,0,0,,I will be.
Dialogue: 0,0:00:8.49,0:00:8.73,Default,,0,0,0,,I {\1c&HFF00&\u1}will{\r} be.
Dialogue: 0,0:00:8.73,0:00:8.77,Default,,0,0,0,,I will be.
Dialogue: 0,0:00:8.77,0:00:8.91,Default,,0,0,0,,I will {\1c&HFF00&\u1}be.{\r}
Dialogue: 0,0:00:9.22,0:00:9.30,Default,,0,0,0,,{\1c&HFF00&\u1}I{\r} said she could stay with us tomorrow
Dialogue: 0,0:00:9.30,0:00:9.34,Default,,0,0,0,,I said she could stay with us tomorrow
Dialogue: 0,0:00:9.34,0:00:9.48,Default,,0,0,0,,I {\1c&HFF00&\u1}said{\r} she could stay with us tomorrow
Dialogue: 0,0:00:9.48,0:00:9.52,Default,,0,0,0,,I said she could stay with us tomorrow
Dialogue: 0,0:00:9.52,0:00:9.60,Default,,0,0,0,,I said {\1c&HFF00&\u1}she{\r} could stay with us tomorrow
Dialogue: 0,0:00:9.60,0:00:9.62,Default,,0,0,0,,I said she could stay with us tomorrow
Dialogue: 0,0:00:9.62,0:00:9.76,Default,,0,0,0,,I said she {\1c&HFF00&\u1}could{\r} stay with us tomorrow
Dialogue: 0,0:00:9.76,0:00:9.78,Default,,0,0,0,,I said she could stay with us tomorrow
Dialogue: 0,0:00:9.78,0:00:9.94,Default,,0,0,0,,I said she could {\1c&HFF00&\u1}stay{\r} with us tomorrow
Dialogue: 0,0:00:9.94,0:00:9.96,Default,,0,0,0,,I said she could stay with us tomorrow
Dialogue: 0,0:00:9.96,0:00:10.08,Default,,0,0,0,,I said she could stay {\1c&HFF00&\u1}with{\r} us tomorrow
Dialogue: 0,0:00:10.08,0:00:10.10,Default,,0,0,0,,I said she could stay with us tomorrow
Dialogue: 0,0:00:10.10,0:00:10.14,Default,,0,0,0,,I said she could stay with {\1c&HFF00&\u1}us{\r} tomorrow
Dialogue: 0,0:00:10.14,0:00:10.16,Default,,0,0,0,,I said she could stay with us tomorrow
Dialogue: 0,0:00:10.16,0:00:10.44,Default,,0,0,0,,I said she could stay with us {\1c&HFF00&\u1}tomorrow{\r}
Dialogue: 0,0:00:10.46,0:00:10.54,Default,,0,0,0,,{\1c&HFF00&\u1}just{\r} until she feels better.
Dialogue: 0,0:00:10.54,0:00:10.56,Default,,0,0,0,,just until she feels better.
Dialogue: 0,0:00:10.56,0:00:10.68,Default,,0,0,0,,just {\1c&HFF00&\u1}until{\r} she feels better.
Dialogue: 0,0:00:10.68,0:00:10.70,Default,,0,0,0,,just until she feels better.
Dialogue: 0,0:00:10.70,0:00:10.80,Default,,0,0,0,,just until {\1c&HFF00&\u1}she{\r} feels better.
Dialogue: 0,0:00:10.80,0:00:10.82,Default,,0,0,0,,just until she feels better.
Dialogue: 0,0:00:10.82,0:00:11.05,Default,,0,0,0,,just until she {\1c&HFF00&\u1}feels{\r} better.
Dialogue: 0,0:00:11.05,0:00:11.09,Default,,0,0,0,,just until she feels better.
Dialogue: 0,0:00:11.09,0:00:11.35,Default,,0,0,0,,just until she feels {\1c&HFF00&\u1}better.{\r}
Dialogue: 0,0:00:11.73,0:00:11.95,Default,,0,0,0,,{\1c&HFF00&\u1}Yeah.{\r}
Dialogue: 0,0:00:12.09,0:00:12.17,Default,,0,0,0,,{\1c&HFF00&\u1}Of{\r} course she can.
Dialogue: 0,0:00:12.17,0:00:12.20,Default,,0,0,0,,Of course she can.
Dialogue: 0,0:00:12.20,0:00:12.32,Default,,0,0,0,,Of {\1c&HFF00&\u1}course{\r} she can.
Dialogue: 0,0:00:12.32,0:00:12.38,Default,,0,0,0,,Of course she can.
Dialogue: 0,0:00:12.38,0:00:12.64,Default,,0,0,0,,Of course {\1c&HFF00&\u1}she{\r} can.
Dialogue: 0,0:00:12.64,0:00:12.72,Default,,0,0,0,,Of course she can.
Dialogue: 0,0:00:12.72,0:00:13.24,Default,,0,0,0,,Of course she {\1c&HFF00&\u1}can.{\r}
Dialogue: 0,0:00:13.36,0:00:13.70,Default,,0,0,0,,{\1c&HFF00&\u1}No,{\r} things won't be for long.
Dialogue: 0,0:00:13.70,0:00:13.82,Default,,0,0,0,,No, things won't be for long.
Dialogue: 0,0:00:13.82,0:00:14.12,Default,,0,0,0,,No, {\1c&HFF00&\u1}things{\r} won't be for long.
Dialogue: 0,0:00:14.12,0:00:14.19,Default,,0,0,0,,No, things won't be for long.
Dialogue: 0,0:00:14.19,0:00:14.39,Default,,0,0,0,,No, things {\1c&HFF00&\u1}won't{\r} be for long.
Dialogue: 0,0:00:14.39,0:00:14.43,Default,,0,0,0,,No, things won't be for long.
Dialogue: 0,0:00:14.43,0:00:14.53,Default,,0,0,0,,No, things won't {\1c&HFF00&\u1}be{\r} for long.
Dialogue: 0,0:00:14.53,0:00:14.59,Default,,0,0,0,,No, things won't be for long.
Dialogue: 0,0:00:14.59,0:00:14.73,Default,,0,0,0,,No, things won't be {\1c&HFF00&\u1}for{\r} long.
Dialogue: 0,0:00:14.73,0:00:14.81,Default,,0,0,0,,No, things won't be for long.
Dialogue: 0,0:00:14.81,0:00:15.01,Default,,0,0,0,,No, things won't be for {\1c&HFF00&\u1}long.{\r}
Dialogue: 0,0:00:15.17,0:00:15.41,Default,,0,0,0,,{\1c&HFF00&\u1}Well,{\r} you can stay as long as you want, my love.
Dialogue: 0,0:00:15.41,0:00:15.43,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:15.43,0:00:15.51,Default,,0,0,0,,Well, {\1c&HFF00&\u1}you{\r} can stay as long as you want, my love.
Dialogue: 0,0:00:15.51,0:00:15.55,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:15.55,0:00:15.69,Default,,0,0,0,,Well, you {\1c&HFF00&\u1}can{\r} stay as long as you want, my love.
Dialogue: 0,0:00:15.69,0:00:15.75,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:15.75,0:00:15.89,Default,,0,0,0,,Well, you can {\1c&HFF00&\u1}stay{\r} as long as you want, my love.
Dialogue: 0,0:00:15.89,0:00:15.95,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:15.95,0:00:16.01,Default,,0,0,0,,Well, you can stay {\1c&HFF00&\u1}as{\r} long as you want, my love.
Dialogue: 0,0:00:16.01,0:00:16.05,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:16.05,0:00:16.20,Default,,0,0,0,,Well, you can stay as {\1c&HFF00&\u1}long{\r} as you want, my love.
Dialogue: 0,0:00:16.20,0:00:16.24,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:16.24,0:00:16.28,Default,,0,0,0,,Well, you can stay as long {\1c&HFF00&\u1}as{\r} you want, my love.
Dialogue: 0,0:00:16.28,0:00:16.30,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:16.30,0:00:16.42,Default,,0,0,0,,Well, you can stay as long as {\1c&HFF00&\u1}you{\r} want, my love.
Dialogue: 0,0:00:16.42,0:00:16.46,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:16.46,0:00:16.66,Default,,0,0,0,,Well, you can stay as long as you {\1c&HFF00&\u1}want,{\r} my love.
Dialogue: 0,0:00:16.66,0:00:16.72,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:16.72,0:00:16.92,Default,,0,0,0,,Well, you can stay as long as you want, {\1c&HFF00&\u1}my{\r} love.
Dialogue: 0,0:00:16.92,0:00:16.96,Default,,0,0,0,,Well, you can stay as long as you want, my love.
Dialogue: 0,0:00:16.96,0:00:17.34,Default,,0,0,0,,Well, you can stay as long as you want, my {\1c&HFF00&\u1}love.{\r}
Dialogue: 0,0:00:17.62,0:00:17.86,Default,,0,0,0,,{\1c&HFF00&\u1}I've{\r} really missed you.
Dialogue: 0,0:00:17.86,0:00:17.88,Default,,0,0,0,,I've really missed you.
Dialogue: 0,0:00:17.88,0:00:18.14,Default,,0,0,0,,I've {\1c&HFF00&\u1}really{\r} missed you.
Dialogue: 0,0:00:18.14,0:00:18.19,Default,,0,0,0,,I've really missed you.
Dialogue: 0,0:00:18.19,0:00:18.59,Default,,0,0,0,,I've really {\1c&HFF00&\u1}missed{\r} you.
Dialogue: 0,0:00:18.59,0:00:18.63,Default,,0,0,0,,I've really missed you.
Dialogue: 0,0:00:18.63,0:00:18.81,Default,,0,0,0,,I've really missed {\1c&HFF00&\u1}you.{\r}
Dialogue: 0,0:00:19.49,0:00:19.79,Default,,0,0,0,,{\1c&HFF00&\u1}Pops.{\r}
Dialogue: 0,0:00:20.40,0:00:20.64,Default,,0,0,0,,{\1c&HFF00&\u1}Great{\r} to see you, love.
Dialogue: 0,0:00:20.64,0:00:20.66,Default,,0,0,0,,Great to see you, love.
Dialogue: 0,0:00:20.66,0:00:20.78,Default,,0,0,0,,Great {\1c&HFF00&\u1}to{\r} see you, love.
Dialogue: 0,0:00:20.78,0:00:20.82,Default,,0,0,0,,Great to see you, love.
Dialogue: 0,0:00:20.82,0:00:21.14,Default,,0,0,0,,Great to {\1c&HFF00&\u1}see{\r} you, love.
Dialogue: 0,0:00:21.14,0:00:21.16,Default,,0,0,0,,Great to see you, love.
Dialogue: 0,0:00:21.16,0:00:21.28,Default,,0,0,0,,Great to see {\1c&HFF00&\u1}you,{\r} love.
Dialogue: 0,0:00:21.28,0:00:21.32,Default,,0,0,0,,Great to see you, love.
Dialogue: 0,0:00:21.32,0:00:21.68,Default,,0,0,0,,Great to see you, {\1c&HFF00&\u1}love.{\r}
Dialogue: 0,0:00:21.90,0:00:23.21,Default,,0,0,0,,{\1c&HFF00&\u1}Oh.{\r}
Dialogue: 0,0:00:23.23,0:00:23.29,Default,,0,0,0,,{\1c&HFF00&\u1}All{\r} right, shall we get you off to bed then?
Dialogue: 0,0:00:23.29,0:00:23.31,Default,,0,0,0,,All right, shall we get you off to bed then?
Dialogue: 0,0:00:23.31,0:00:23.41,Default,,0,0,0,,All {\1c&HFF00&\u1}right,{\r} shall we get you off to bed then?
Dialogue: 0,0:00:23.41,0:00:23.43,Default,,0,0,0,,All right, shall we get you off to bed then?
Dialogue: 0,0:00:23.43,0:00:23.55,Default,,0,0,0,,All right, {\1c&HFF00&\u1}shall{\r} we get you off to bed then?
Dialogue: 0,0:00:23.55,0:00:23.57,Default,,0,0,0,,All right, shall we get you off to bed then?
Dialogue: 0,0:00:23.57,0:00:23.65,Default,,0,0,0,,All right, shall {\1c&HFF00&\u1}we{\r} get you off to bed then?
Dialogue: 0,0:00:23.65,0:00:23.67,Default,,0,0,0,,All right, shall we get you off to bed then?
Dialogue: 0,0:00:23.67,0:00:23.74,Default,,0,0,0,,All right, shall we {\1c&HFF00&\u1}get{\r} you off to bed then?
Dialogue: 0,0:00:23.74,0:00:23.76,Default,,0,0,0,,All right, shall we get you off to bed then?
Dialogue: 0,0:00:23.76,0:00:23.82,Default,,0,0,0,,All right, shall we get {\1c&HFF00&\u1}you{\r} off to bed then?
Dialogue: 0,0:00:23.82,0:00:23.84,Default,,0,0,0,,All right, shall we get you off to bed then?
Dialogue: 0,0:00:23.84,0:00:23.94,Default,,0,0,0,,All right, shall we get you {\1c&HFF00&\u1}off{\r} to bed then?
Dialogue: 0,0:00:23.94,0:00:23.96,Default,,0,0,0,,All right, shall we get you off to bed then?
Dialogue: 0,0:00:23.96,0:00:24.04,Default,,0,0,0,,All right, shall we get you off {\1c&HFF00&\u1}to{\r} bed then?
Dialogue: 0,0:00:24.04,0:00:24.06,Default,,0,0,0,,All right, shall we get you off to bed then?
Dialogue: 0,0:00:24.06,0:00:24.22,Default,,0,0,0,,All right, shall we get you off to {\1c&HFF00&\u1}bed{\r} then?
Dialogue: 0,0:00:24.22,0:00:24.24,Default,,0,0,0,,All right, shall we get you off to bed then?
Dialogue: 0,0:00:24.24,0:00:24.38,Default,,0,0,0,,All right, shall we get you off to bed {\1c&HFF00&\u1}then?{\r}
Dialogue: 0,0:00:24.58,0:00:24.72,Default,,0,0,0,,{\1c&HFF00&\u1}You{\r} should have given me some warm.
Dialogue: 0,0:00:24.72,0:00:24.78,Default,,0,0,0,,You should have given me some warm.
Dialogue: 0,0:00:24.78,0:00:24.98,Default,,0,0,0,,You {\1c&HFF00&\u1}should{\r} have given me some warm.
Dialogue: 0,0:00:24.98,0:00:25.02,Default,,0,0,0,,You should have given me some warm.
Dialogue: 0,0:00:25.02,0:00:25.12,Default,,0,0,0,,You should {\1c&HFF00&\u1}have{\r} given me some warm.
Dialogue: 0,0:00:25.12,0:00:25.16,Default,,0,0,0,,You should have given me some warm.
Dialogue: 0,0:00:25.16,0:00:25.29,Default,,0,0,0,,You should have {\1c&HFF00&\u1}given{\r} me some warm.
Dialogue: 0,0:00:25.29,0:00:25.35,Default,,0,0,0,,You should have given me some warm.
Dialogue: 0,0:00:25.35,0:00:25.45,Default,,0,0,0,,You should have given {\1c&HFF00&\u1}me{\r} some warm.
Dialogue: 0,0:00:25.45,0:00:25.49,Default,,0,0,0,,You should have given me some warm.
Dialogue: 0,0:00:25.49,0:00:25.67,Default,,0,0,0,,You should have given me {\1c&HFF00&\u1}some{\r} warm.
Dialogue: 0,0:00:25.67,0:00:25.81,Default,,0,0,0,,You should have given me some warm.
Dialogue: 0,0:00:25.81,0:00:26.05,Default,,0,0,0,,You should have given me some {\1c&HFF00&\u1}warm.{\r}
Dialogue: 0,0:00:26.31,0:00:26.37,Default,,0,0,0,,{\1c&HFF00&\u1}I{\r} know.
Dialogue: 0,0:00:26.37,0:00:26.39,Default,,0,0,0,,I know.
Dialogue: 0,0:00:26.39,0:00:26.49,Default,,0,0,0,,I {\1c&HFF00&\u1}know.{\r}
Dialogue: 0,0:00:26.61,0:00:26.69,Default,,0,0,0,,{\1c&HFF00&\u1}I'll{\r} have to put the electric blanket on.
Dialogue: 0,0:00:26.69,0:00:26.71,Default,,0,0,0,,I'll have to put the electric blanket on.
Dialogue: 0,0:00:26.71,0:00:26.81,Default,,0,0,0,,I'll {\1c&HFF00&\u1}have{\r} to put the electric blanket on.
Dialogue: 0,0:00:26.81,0:00:26.83,Default,,0,0,0,,I'll have to put the electric blanket on.
Dialogue: 0,0:00:26.83,0:00:27.02,Default,,0,0,0,,I'll have {\1c&HFF00&\u1}to{\r} put the electric blanket on.
Dialogue: 0,0:00:27.02,0:00:27.06,Default,,0,0,0,,I'll have to put the electric blanket on.
Dialogue: 0,0:00:27.06,0:00:27.42,Default,,0,0,0,,I'll have to {\1c&HFF00&\u1}put{\r} the electric blanket on.
Dialogue: 0,0:00:27.42,0:00:27.48,Default,,0,0,0,,I'll have to put the electric blanket on.
Dialogue: 0,0:00:27.48,0:00:27.56,Default,,0,0,0,,I'll have to put {\1c&HFF00&\u1}the{\r} electric blanket on.
Dialogue: 0,0:00:27.56,0:00:27.70,Default,,0,0,0,,I'll have to put the electric blanket on.
Dialogue: 0,0:00:27.70,0:00:28.08,Default,,0,0,0,,I'll have to put the {\1c&HFF00&\u1}electric{\r} blanket on.
Dialogue: 0,0:00:28.08,0:00:28.62,Default,,0,0,0,,I'll have to put the electric blanket on.
Dialogue: 0,0:00:28.62,0:00:28.84,Default,,0,0,0,,I'll have to put the electric {\1c&HFF00&\u1}blanket{\r} on.
Dialogue: 0,0:00:28.84,0:00:28.90,Default,,0,0,0,,I'll have to put the electric blanket on.
Dialogue: 0,0:00:28.90,0:00:28.94,Default,,0,0,0,,I'll have to put the electric blanket {\1c&HFF00&\u1}on.{\r}
Dialogue: 0,0:00:29.49,0:00:29.55,Default,,0,0,0,,{\1c&HFF00&\u1}I'm{\r} sorry.
Dialogue: 0,0:00:29.55,0:00:29.57,Default,,0,0,0,,I'm sorry.
Dialogue: 0,0:00:29.57,0:00:29.82,Default,,0,0,0,,I'm {\1c&HFF00&\u1}sorry.{\r}
Dialogue: 0,0:00:29.98,0:00:30.08,Default,,0,0,0,,{\1c&HFF00&\u1}All{\r} right, Bella.
Dialogue: 0,0:00:30.08,0:00:30.10,Default,,0,0,0,,All right, Bella.
Dialogue: 0,0:00:30.10,0:00:30.29,Default,,0,0,0,,All {\1c&HFF00&\u1}right,{\r} Bella.
Dialogue: 0,0:00:30.29,0:00:30.43,Default,,0,0,0,,All right, Bella.
Dialogue: 0,0:00:30.43,0:00:30.63,Default,,0,0,0,,All right, {\1c&HFF00&\u1}Bella.{\r}
Dialogue: 0,0:00:31.37,0:00:31.58,Default,,0,0,0,,{\1c&HFF00&\u1}Freezing{\r} up there.
Dialogue: 0,0:00:31.58,0:00:31.62,Default,,0,0,0,,Freezing up there.
Dialogue: 0,0:00:31.62,0:00:31.66,Default,,0,0,0,,Freezing {\1c&HFF00&\u1}up{\r} there.
Dialogue: 0,0:00:31.66,0:00:31.68,Default,,0,0,0,,Freezing up there.
Dialogue: 0,0:00:31.68,0:00:31.90,Default,,0,0,0,,Freezing up {\1c&HFF00&\u1}there.{\r}
Dialogue: 0,0:00:31.90,0:00:31.94,Default,,0,0,0,,{\1c&HFF00&\u1}In{\r} a bedroom, Peter unpacks her suitcase.
Dialogue: 0,0:00:31.94,0:00:31.96,Default,,0,0,0,,In a bedroom, Peter unpacks her suitcase.
Dialogue: 0,0:00:31.96,0:00:31.98,Default,,0,0,0,,In {\1c&HFF00&\u1}a{\r} bedroom, Peter unpacks her suitcase.
Dialogue: 0,0:00:31.98,0:00:32.00,Default,,0,0,0,,In a bedroom, Peter unpacks her suitcase.
Dialogue: 0,0:00:32.00,0:00:32.14,Default,,0,0,0,,In a {\1c&HFF00&\u1}bedroom,{\r} Peter unpacks her suitcase.
Dialogue: 0,0:00:32.14,0:00:32.20,Default,,0,0,0,,In a bedroom, Peter unpacks her suitcase.
Dialogue: 0,0:00:32.20,0:00:32.50,Default,,0,0,0,,In a bedroom, {\1c&HFF00&\u1}Peter{\r} unpacks her suitcase.
Dialogue: 0,0:00:32.50,0:00:32.58,Default,,0,0,0,,In a bedroom, Peter unpacks her suitcase.
Dialogue: 0,0:00:32.58,0:00:32.98,Default,,0,0,0,,In a bedroom, Peter {\1c&HFF00&\u1}unpacks{\r} her suitcase.
Dialogue: 0,0:00:32.98,0:00:33.00,Default,,0,0,0,,In a bedroom, Peter unpacks her suitcase.
Dialogue: 0,0:00:33.00,0:00:33.10,Default,,0,0,0,,In a bedroom, Peter unpacks {\1c&HFF00&\u1}her{\r} suitcase.
Dialogue: 0,0:00:33.10,0:00:33.16,Default,,0,0,0,,In a bedroom, Peter unpacks her suitcase.
Dialogue: 0,0:00:33.16,0:00:33.65,Default,,0,0,0,,In a bedroom, Peter unpacks her {\1c&HFF00&\u1}suitcase.{\r}
Dialogue: 0,0:00:34.27,0:00:34.35,Default,,0,0,0,,{\1c&HFF00&\u1}The{\r} middle-aged woman opens her green case.
Dialogue: 0,0:00:34.35,0:00:34.39,Default,,0,0,0,,The middle-aged woman opens her green case.
Dialogue: 0,0:00:34.39,0:00:34.91,Default,,0,0,0,,The {\1c&HFF00&\u1}middle-aged{\r} woman opens her green case.
Dialogue: 0,0:00:34.91,0:00:34.99,Default,,0,0,0,,The middle-aged woman opens her green case.
Dialogue: 0,0:00:34.99,0:00:35.27,Default,,0,0,0,,The middle-aged {\1c&HFF00&\u1}woman{\r} opens her green case.
Dialogue: 0,0:00:35.27,0:00:35.39,Default,,0,0,0,,The middle-aged woman opens her green case.
Dialogue: 0,0:00:35.39,0:00:35.67,Default,,0,0,0,,The middle-aged woman {\1c&HFF00&\u1}opens{\r} her green case.
Dialogue: 0,0:00:35.67,0:00:35.71,Default,,0,0,0,,The middle-aged woman opens her green case.
Dialogue: 0,0:00:35.71,0:00:35.81,Default,,0,0,0,,The middle-aged woman opens {\1c&HFF00&\u1}her{\r} green case.
Dialogue: 0,0:00:35.81,0:00:35.85,Default,,0,0,0,,The middle-aged woman opens her green case.
Dialogue: 0,0:00:35.85,0:00:36.19,Default,,0,0,0,,The middle-aged woman opens her {\1c&HFF00&\u1}green{\r} case.
Dialogue: 0,0:00:36.19,0:00:36.23,Default,,0,0,0,,The middle-aged woman opens her green case.
Dialogue: 0,0:00:36.23,0:00:36.53,Default,,0,0,0,,The middle-aged woman opens her green {\1c&HFF00&\u1}case.{\r}
Dialogue: 0,0:00:38.13,0:00:38.25,Default,,0,0,0,,{\1c&HFF00&\u1}Do{\r} you want your PJs?
Dialogue: 0,0:00:38.25,0:00:38.28,Default,,0,0,0,,Do you want your PJs?
Dialogue: 0,0:00:38.28,0:00:38.36,Default,,0,0,0,,Do {\1c&HFF00&\u1}you{\r} want your PJs?
Dialogue: 0,0:00:38.36,0:00:38.38,Default,,0,0,0,,Do you want your PJs?
Dialogue: 0,0:00:38.38,0:00:38.54,Default,,0,0,0,,Do you {\1c&HFF00&\u1}want{\r} your PJs?
Dialogue: 0,0:00:38.54,0:00:38.56,Default,,0,0,0,,Do you want your PJs?
Dialogue: 0,0:00:38.56,0:00:38.74,Default,,0,0,0,,Do you want {\1c&HFF00&\u1}your{\r} PJs?
Dialogue: 0,0:00:38.74,0:00:38.88,Default,,0,0,0,,Do you want your PJs?
Dialogue: 0,0:00:38.88,0:00:39.30,Default,,0,0,0,,Do you want your {\1c&HFF00&\u1}PJs?{\r}
Dialogue: 0,0:00:39.88,0:00:40.18,Default,,0,0,0,,{\1c&HFF00&\u1}Yeah.{\r}
Dialogue: 0,0:00:42.39,0:00:42.69,Default,,0,0,0,,{\1c&HFF00&\u1}Lifting{\r} a bundle of pajamas,
Dialogue: 0,0:00:42.69,0:00:42.73,Default,,0,0,0,,Lifting a bundle of pajamas,
Dialogue: 0,0:00:42.73,0:00:42.75,Default,,0,0,0,,Lifting {\1c&HFF00&\u1}a{\r} bundle of pajamas,
Dialogue: 0,0:00:42.75,0:00:42.81,Default,,0,0,0,,Lifting a bundle of pajamas,
Dialogue: 0,0:00:42.81,0:00:43.11,Default,,0,0,0,,Lifting a {\1c&HFF00&\u1}bundle{\r} of pajamas,
Dialogue: 0,0:00:43.11,0:00:43.13,Default,,0,0,0,,Lifting a bundle of pajamas,
Dialogue: 0,0:00:43.13,0:00:43.19,Default,,0,0,0,,Lifting a bundle {\1c&HFF00&\u1}of{\r} pajamas,
Dialogue: 0,0:00:43.19,0:00:43.25,Default,,0,0,0,,Lifting a bundle of pajamas,
Dialogue: 0,0:00:43.25,0:00:43.77,Default,,0,0,0,,Lifting a bundle of {\1c&HFF00&\u1}pajamas,{\r}
Dialogue: 0,0:00:44.07,0:00:44.31,Default,,0,0,0,,{\1c&HFF00&\u1}Peter{\r} finds a sheet of paper labeled
Dialogue: 0,0:00:44.31,0:00:44.37,Default,,0,0,0,,Peter finds a sheet of paper labeled
Dialogue: 0,0:00:44.37,0:00:44.63,Default,,0,0,0,,Peter {\1c&HFF00&\u1}finds{\r} a sheet of paper labeled
Dialogue: 0,0:00:44.63,0:00:44.67,Default,,0,0,0,,Peter finds a sheet of paper labeled
Dialogue: 0,0:00:44.67,0:00:44.69,Default,,0,0,0,,Peter finds {\1c&HFF00&\u1}a{\r} sheet of paper labeled
Dialogue: 0,0:00:44.69,0:00:44.75,Default,,0,0,0,,Peter finds a sheet of paper labeled
Dialogue: 0,0:00:44.75,0:00:44.95,Default,,0,0,0,,Peter finds a {\1c&HFF00&\u1}sheet{\r} of paper labeled
Dialogue: 0,0:00:44.95,0:00:44.99,Default,,0,0,0,,Peter finds a sheet of paper labeled
Dialogue: 0,0:00:44.99,0:00:45.05,Default,,0,0,0,,Peter finds a sheet {\1c&HFF00&\u1}of{\r} paper labeled
Dialogue: 0,0:00:45.05,0:00:45.11,Default,,0,0,0,,Peter finds a sheet of paper labeled
Dialogue: 0,0:00:45.11,0:00:45.46,Default,,0,0,0,,Peter finds a sheet of {\1c&HFF00&\u1}paper{\r} labeled
Dialogue: 0,0:00:45.46,0:00:45.54,Default,,0,0,0,,Peter finds a sheet of paper labeled
Dialogue: 0,0:00:45.54,0:00:45.88,Default,,0,0,0,,Peter finds a sheet of paper {\1c&HFF00&\u1}labeled{\r}
Dialogue: 0,0:00:46.34,0:00:47.04,Default,,0,0,0,,{\1c&HFF00&\u1}Lancaster{\r} North Hospital discharge sheet.
Dialogue: 0,0:00:47.04,0:00:47.12,Default,,0,0,0,,Lancaster North Hospital discharge sheet.
Dialogue: 0,0:00:47.12,0:00:47.38,Default,,0,0,0,,Lancaster {\1c&HFF00&\u1}North{\r} Hospital discharge sheet.
Dialogue: 0,0:00:47.38,0:00:47.44,Default,,0,0,0,,Lancaster North Hospital discharge sheet.
Dialogue: 0,0:00:47.44,0:00:47.94,Default,,0,0,0,,Lancaster North {\1c&HFF00&\u1}Hospital{\r} discharge sheet.
Dialogue: 0,0:00:47.94,0:00:48.27,Default,,0,0,0,,Lancaster North Hospital discharge sheet.
Dialogue: 0,0:00:48.27,0:00:48.93,Default,,0,0,0,,Lancaster North Hospital {\1c&HFF00&\u1}discharge{\r} sheet.
Dialogue: 0,0:00:48.93,0:00:49.03,Default,,0,0,0,,Lancaster North Hospital discharge sheet.
Dialogue: 0,0:00:49.03,0:00:49.25,Default,,0,0,0,,Lancaster North Hospital discharge {\1c&HFF00&\u1}sheet.{\r}
Dialogue: 0,0:00:50.29,0:00:50.37,Default,,0,0,0,,{\1c&HFF00&\u1}He{\r} closes the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:50.37,0:00:50.41,Default,,0,0,0,,He closes the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:50.41,0:00:50.77,Default,,0,0,0,,He {\1c&HFF00&\u1}closes{\r} the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:50.77,0:00:50.81,Default,,0,0,0,,He closes the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:50.81,0:00:50.91,Default,,0,0,0,,He closes {\1c&HFF00&\u1}the{\r} suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:50.91,0:00:50.95,Default,,0,0,0,,He closes the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:50.95,0:00:51.39,Default,,0,0,0,,He closes the {\1c&HFF00&\u1}suitcase{\r} and brings Gloria the pajamas.
Dialogue: 0,0:00:51.39,0:00:51.43,Default,,0,0,0,,He closes the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:51.43,0:00:51.51,Default,,0,0,0,,He closes the suitcase {\1c&HFF00&\u1}and{\r} brings Gloria the pajamas.
Dialogue: 0,0:00:51.51,0:00:51.53,Default,,0,0,0,,He closes the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:51.53,0:00:51.79,Default,,0,0,0,,He closes the suitcase and {\1c&HFF00&\u1}brings{\r} Gloria the pajamas.
Dialogue: 0,0:00:51.79,0:00:51.83,Default,,0,0,0,,He closes the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:51.83,0:00:52.23,Default,,0,0,0,,He closes the suitcase and brings {\1c&HFF00&\u1}Gloria{\r} the pajamas.
Dialogue: 0,0:00:52.23,0:00:52.25,Default,,0,0,0,,He closes the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:52.25,0:00:52.32,Default,,0,0,0,,He closes the suitcase and brings Gloria {\1c&HFF00&\u1}the{\r} pajamas.
Dialogue: 0,0:00:52.32,0:00:52.36,Default,,0,0,0,,He closes the suitcase and brings Gloria the pajamas.
Dialogue: 0,0:00:52.36,0:00:52.86,Default,,0,0,0,,He closes the suitcase and brings Gloria the {\1c&HFF00&\u1}pajamas.{\r}
Dialogue: 0,0:00:54.19,0:00:54.49,Default,,0,0,0,,{\1c&HFF00&\u1}There{\r} you go.
Dialogue: 0,0:00:54.49,0:00:54.55,Default,,0,0,0,,There you go.
Dialogue: 0,0:00:54.55,0:00:54.77,Default,,0,0,0,,There {\1c&HFF00&\u1}you{\r} go.
Dialogue: 0,0:00:54.77,0:00:54.79,Default,,0,0,0,,There you go.
Dialogue: 0,0:00:54.79,0:00:54.83,Default,,0,0,0,,There you {\1c&HFF00&\u1}go.{\r}
Dialogue: 0,0:00:55.65,0:00:55.77,Default,,0,0,0,,{\1c&HFF00&\u1}Thank{\r} you.
Dialogue: 0,0:00:55.77,0:00:55.80,Default,,0,0,0,,Thank you.
Dialogue: 0,0:00:55.80,0:00:55.90,Default,,0,0,0,,Thank {\1c&HFF00&\u1}you.{\r}
Dialogue: 0,0:00:55.90,0:00:55.94,Default,,0,0,0,,{\1c&HFF00&\u1}He{\r} picks up the locket.
Dialogue: 0,0:00:55.94,0:00:55.96,Default,,0,0,0,,He picks up the locket.
Dialogue: 0,0:00:55.96,0:00:56.10,Default,,0,0,0,,He {\1c&HFF00&\u1}picks{\r} up the locket.
Dialogue: 0,0:00:56.10,0:00:56.12,Default,,0,0,0,,He picks up the locket.
Dialogue: 0,0:00:56.12,0:00:56.20,Default,,0,0,0,,He picks {\1c&HFF00&\u1}up{\r} the locket.
Dialogue: 0,0:00:56.20,0:00:56.22,Default,,0,0,0,,He picks up the locket.
Dialogue: 0,0:00:56.22,0:00:56.32,Default,,0,0,0,,He picks up {\1c&HFF00&\u1}the{\r} locket.
Dialogue: 0,0:00:56.32,0:00:56.36,Default,,0,0,0,,He picks up the locket.
Dialogue: 0,0:00:56.36,0:00:56.74,Default,,0,0,0,,He picks up the {\1c&HFF00&\u1}locket.{\r}
Dialogue: 0,0:00:57.12,0:00:57.22,Default,,0,0,0,,{\1c&HFF00&\u1}You{\r} kept it.
Dialogue: 0,0:00:57.22,0:00:57.27,Default,,0,0,0,,You kept it.
Dialogue: 0,0:00:57.27,0:00:57.47,Default,,0,0,0,,You {\1c&HFF00&\u1}kept{\r} it.
Dialogue: 0,0:00:57.47,0:00:57.55,Default,,0,0,0,,You kept it.
Dialogue: 0,0:00:57.55,0:00:57.63,Default,,0,0,0,,You kept {\1c&HFF00&\u1}it.{\r}
Dialogue: 0,0:00:58.87,0:00:58.99,Default,,0,0,0,,{\1c&HFF00&\u1}Oh,{\r} of course.
Dialogue: 0,0:00:58.99,0:00:59.28,Default,,0,0,0,,Oh, of course.
Dialogue: 0,0:00:59.28,0:00:59.58,Default,,0,0,0,,Oh, {\1c&HFF00&\u1}of{\r} course.
Dialogue: 0,0:00:59.58,0:00:59.68,Default,,0,0,0,,Oh, of course.
Dialogue: 0,0:00:59.68,0:00:59.96,Default,,0,0,0,,Oh, of {\1c&HFF00&\u1}course.{\r}

View File

@ -95,46 +95,46 @@ In a bedroom, Peter unpacks her suitcase.
The middle-aged woman opens her green case. The middle-aged woman opens her green case.
25 25
00:00:38,095 --> 00:00:39,297 00:00:38,135 --> 00:00:39,296
Do you want your PJs? Do you want your PJs?
26 26
00:00:39,862 --> 00:00:40,185 00:00:39,879 --> 00:00:40,181
Yeah. Yeah.
27 27
00:00:42,394 --> 00:00:42,474 00:00:42,388 --> 00:00:43,773
Yeah. Lifting a bundle of pajamas,
28 28
00:00:42,474 --> 00:00:45,418 00:00:44,073 --> 00:00:45,876
Lifting a bundle of pajamas, Peter finds a sheet of paper Peter finds a sheet of paper labeled
29 29
00:00:45,538 --> 00:00:49,251 00:00:46,338 --> 00:00:49,249
labeled Lancaster North Hospital discharge sheet. Lancaster North Hospital discharge sheet.
30 30
00:00:50,293 --> 00:00:52,858 00:00:50,291 --> 00:00:52,856
He closes the suitcase and brings Gloria the pajamas. He closes the suitcase and brings Gloria the pajamas.
31 31
00:00:54,187 --> 00:00:54,832 00:00:54,186 --> 00:00:54,831
There you go. There you go.
32 32
00:00:55,655 --> 00:00:55,896 00:00:55,654 --> 00:00:55,895
Thank you. Thank you.
33 33
00:00:55,916 --> 00:00:56,742 00:00:55,895 --> 00:00:56,742
He picks up the locket. He picks up the locket.
34 34
00:00:57,124 --> 00:00:57,627 00:00:57,124 --> 00:00:57,627
He kept it. You kept it.
35 35
00:00:58,874 --> 00:00:59,899 00:00:58,874 --> 00:00:59,960
Oh, cool. Oh, of course.

View File

@ -12,7 +12,7 @@ from .audio import SAMPLE_RATE, N_FRAMES, HOP_LENGTH, pad_or_trim, log_mel_spect
from .alignment import get_trellis, backtrack, merge_repeats, merge_words from .alignment import get_trellis, backtrack, merge_repeats, merge_words
from .decoding import DecodingOptions, DecodingResult from .decoding import DecodingOptions, DecodingResult
from .tokenizer import LANGUAGES, TO_LANGUAGE_CODE, get_tokenizer from .tokenizer import LANGUAGES, TO_LANGUAGE_CODE, get_tokenizer
from .utils import exact_div, format_timestamp, optional_int, optional_float, str2bool, write_txt, write_vtt, write_srt from .utils import exact_div, format_timestamp, optional_int, optional_float, str2bool, write_txt, write_vtt, write_srt, write_ass
if TYPE_CHECKING: if TYPE_CHECKING:
from .model import Whisper from .model import Whisper
@ -269,7 +269,6 @@ def align(
MAX_DURATION = audio.shape[1] / SAMPLE_RATE MAX_DURATION = audio.shape[1] / SAMPLE_RATE
prev_t2 = 0 prev_t2 = 0
word_level = []
for idx, segment in enumerate(transcript): for idx, segment in enumerate(transcript):
t1 = max(segment['start'] - extend_duration, 0) t1 = max(segment['start'] - extend_duration, 0)
t2 = min(segment['end'] + extend_duration, MAX_DURATION) t2 = min(segment['end'] + extend_duration, MAX_DURATION)
@ -287,9 +286,10 @@ def align(
transcription = segment['text'].strip() transcription = segment['text'].strip()
t_words = transcription.split(' ') t_words = transcription.split(' ')
t_words_clean = [re.sub(r"[^a-zA-Z' ]", "", x) for x in t_words] t_words_clean = [''.join([w for w in word if w.upper() in model_dictionary.keys()]) for word in t_words]
t_words_nonempty = [x for x in t_words_clean if x != ""] t_words_nonempty = [x for x in t_words_clean if x != ""]
t_words_nonempty_idx = [x for x in range(len(t_words_clean)) if t_words_clean[x] != ""] t_words_nonempty_idx = [x for x in range(len(t_words_clean)) if t_words_clean[x] != ""]
segment['word-level'] = []
if len(t_words_nonempty) > 0: if len(t_words_nonempty) > 0:
transcription_cleaned = "|".join(t_words_nonempty).upper() transcription_cleaned = "|".join(t_words_nonempty).upper()
@ -315,27 +315,25 @@ def align(
segment['end'] = t2_actual segment['end'] = t2_actual
prev_t2 = segment['end'] prev_t2 = segment['end']
# merge missing words to previous, or merge with next word ahead if idx == 0 # merge missing words to previous, or merge with next word ahead if idx == 0
for x in range(len(t_local)): for x in range(len(t_local)):
curr_word = t_words[x] curr_word = t_words[x]
curr_timestamp = t_local[x] curr_timestamp = t_local[x]
if curr_timestamp is not None: if curr_timestamp is not None:
word_level.append({"text": curr_word, "start": curr_timestamp[0], "end": curr_timestamp[1]}) segment['word-level'].append({"text": curr_word, "start": curr_timestamp[0], "end": curr_timestamp[1]})
else: else:
if x == 0: segment['word-level'].append({"text": curr_word, "start": None, "end": None})
t_words[x+1] = " ".join([curr_word, t_words[x+1]])
else:
word_level[-1]['text'] += ' ' + curr_word
else: else:
# then we resort back to original whisper timestamps # then we resort back to original whisper timestamps
# segment['start] and segment['end'] are unchanged # segment['start] and segment['end'] are unchanged
prev_t2 = 0 prev_t2 = 0
word_level.append({"text": segment['text'], "start": segment['start'], "end":segment['end']}) segment['word-level'].append({"text": segment['text'], "start": segment['start'], "end":segment['end']})
print(f"[{format_timestamp(segment['start'])} --> {format_timestamp(segment['end'])}] {segment['text']}") print(f"[{format_timestamp(segment['start'])} --> {format_timestamp(segment['end'])}] {segment['text']}")
return {"segments": transcript}, {"segments": word_level} return {"segments": transcript}
def cli(): def cli():
from . import available_models from . import available_models
@ -347,11 +345,9 @@ def cli():
parser.add_argument("--device", default="cuda" if torch.cuda.is_available() else "cpu", help="device to use for PyTorch inference") parser.add_argument("--device", default="cuda" if torch.cuda.is_available() else "cpu", help="device to use for PyTorch inference")
# alignment params # alignment params
parser.add_argument("--align_model", default="WAV2VEC2_ASR_LARGE_LV60K_960H", help="Name of phoneme-level ASR model to do alignment") parser.add_argument("--align_model", default="WAV2VEC2_ASR_LARGE_LV60K_960H", help="Name of phoneme-level ASR model to do alignment")
parser.add_argument("--align_extend", default=1, type=float, help="Seconds before and after to extend the whisper segments for alignment") parser.add_argument("--align_extend", default=2, type=float, help="Seconds before and after to extend the whisper segments for alignment")
parser.add_argument("--align_from_prev", default=True, type=bool, help="Whether to clip the alignment start time of current segment to the end time of the last aligned word of the previous segment") parser.add_argument("--align_from_prev", default=True, type=bool, help="Whether to clip the alignment start time of current segment to the end time of the last aligned word of the previous segment")
# parser.add_argument("--align_interpolate_missing", default=True, type=bool, help="Whether to interpolate the timestamp of words not tokenized by the align model, e.g. integers")
parser.add_argument("--output_dir", "-o", type=str, default=".", help="directory to save the outputs") parser.add_argument("--output_dir", "-o", type=str, default=".", help="directory to save the outputs")
parser.add_argument("--output_type", default="srt", choices=['all', 'srt', 'vtt', 'txt'], help="directory to save the outputs") parser.add_argument("--output_type", default="srt", choices=['all', 'srt', 'vtt', 'txt'], help="directory to save the outputs")
@ -417,7 +413,7 @@ def cli():
for audio_path in args.pop("audio"): for audio_path in args.pop("audio"):
result = transcribe(model, audio_path, temperature=temperature, **args) result = transcribe(model, audio_path, temperature=temperature, **args)
result_aligned, result_aligned_word = align(result["segments"], align_model, align_dictionary, audio_path, device, result_aligned = align(result["segments"], align_model, align_dictionary, audio_path, device,
extend_duration=align_extend, start_from_previous=align_from_prev) extend_duration=align_extend, start_from_previous=align_from_prev)
audio_basename = os.path.basename(audio_path) audio_basename = os.path.basename(audio_path)
@ -425,22 +421,20 @@ def cli():
if output_type in ["txt", "all"]: if output_type in ["txt", "all"]:
with open(os.path.join(output_dir, audio_basename + ".txt"), "w", encoding="utf-8") as txt: with open(os.path.join(output_dir, audio_basename + ".txt"), "w", encoding="utf-8") as txt:
write_txt(result_aligned["segments"], file=txt) write_txt(result_aligned["segments"], file=txt)
with open(os.path.join(output_dir, audio_basename + ".word.txt"), "w", encoding="utf-8") as txt:
write_txt(result_aligned_word["segments"], file=txt)
# save VTT # save VTT
if output_type in ["vtt", "all"]: if output_type in ["vtt", "all"]:
with open(os.path.join(output_dir, audio_basename + ".vtt"), "w", encoding="utf-8") as vtt: with open(os.path.join(output_dir, audio_basename + ".vtt"), "w", encoding="utf-8") as vtt:
write_vtt(result_aligned["segments"], file=vtt) write_vtt(result_aligned["segments"], file=vtt)
with open(os.path.join(output_dir, audio_basename + ".word.vtt"), "w", encoding="utf-8") as vtt:
write_vtt(result_aligned_word["segments"], file=vtt)
# save SRT # save SRT
if output_type in ["srt", "all"]: if output_type in ["srt", "all"]:
with open(os.path.join(output_dir, audio_basename + ".srt"), "w", encoding="utf-8") as srt: with open(os.path.join(output_dir, audio_basename + ".srt"), "w", encoding="utf-8") as srt:
write_srt(result_aligned["segments"], file=srt) write_srt(result_aligned["segments"], file=srt)
with open(os.path.join(output_dir, audio_basename + ".word.srt"), "w", encoding="utf-8") as srt:
write_srt(result_aligned_word["segments"], file=srt) # save ASS
with open(os.path.join(output_dir, audio_basename + ".ass"), "w", encoding="utf-8") as srt:
write_ass(result_aligned["segments"], file=srt)
if __name__ == '__main__': if __name__ == '__main__':

View File

@ -1,5 +1,5 @@
import zlib import zlib
from typing import Iterator, TextIO from typing import Iterator, TextIO, Tuple, List
def exact_div(x, y): def exact_div(x, y):
@ -86,3 +86,138 @@ def write_srt(transcript: Iterator[dict], file: TextIO):
file=file, file=file,
flush=True, flush=True,
) )
def write_ass(transcript: Iterator[dict], file: TextIO,
color: str = None, underline=True,
prefmt: str = None, suffmt: str = None,
font: str = None, font_size: int = 24,
strip=True, **kwargs):
"""
Credit: https://github.com/jianfch/stable-ts/blob/ff79549bd01f764427879f07ecd626c46a9a430a/stable_whisper/text_output.py
Generate Advanced SubStation Alpha (ASS) file from results to
display both phrase-level & word-level timestamp simultaneously by:
-using segment-level timestamps display phrases as usual
-using word-level timestamps change formats (e.g. color/underline) of the word in the displayed segment
Note: ass file is used in the same way as srt, vtt, etc.
Parameters
----------
res: dict
results from modified model
ass_path: str
output path (e.g. caption.ass)
color: str
color code for a word at its corresponding timestamp
<bbggrr> reverse order hexadecimal RGB value (e.g. FF0000 is full intensity blue. Default: 00FF00)
underline: bool
whether to underline a word at its corresponding timestamp
prefmt: str
used to specify format for word-level timestamps (must be use with 'suffmt' and overrides 'color'&'underline')
appears as such in the .ass file:
Hi, {<prefmt>}how{<suffmt>} are you?
reference [Appendix A: Style override codes] in http://www.tcax.org/docs/ass-specs.htm
suffmt: str
used to specify format for word-level timestamps (must be use with 'prefmt' and overrides 'color'&'underline')
appears as such in the .ass file:
Hi, {<prefmt>}how{<suffmt>} are you?
reference [Appendix A: Style override codes] in http://www.tcax.org/docs/ass-specs.htm
font: str
word font (default: Arial)
font_size: int
word font size (default: 48)
kwargs:
used for format styles:
'Name', 'Fontname', 'Fontsize', 'PrimaryColour', 'SecondaryColour', 'OutlineColour', 'BackColour', 'Bold',
'Italic', 'Underline', 'StrikeOut', 'ScaleX', 'ScaleY', 'Spacing', 'Angle', 'BorderStyle', 'Outline',
'Shadow', 'Alignment', 'MarginL', 'MarginR', 'MarginV', 'Encoding'
"""
fmt_style_dict = {'Name': 'Default', 'Fontname': 'Arial', 'Fontsize': '48', 'PrimaryColour': '&Hffffff',
'SecondaryColour': '&Hffffff', 'OutlineColour': '&H0', 'BackColour': '&H0', 'Bold': '0',
'Italic': '0', 'Underline': '0', 'StrikeOut': '0', 'ScaleX': '100', 'ScaleY': '100',
'Spacing': '0', 'Angle': '0', 'BorderStyle': '1', 'Outline': '1', 'Shadow': '0',
'Alignment': '2', 'MarginL': '10', 'MarginR': '10', 'MarginV': '10', 'Encoding': '0'}
for k, v in filter(lambda x: 'colour' in x[0].lower() and not str(x[1]).startswith('&H'), kwargs.items()):
kwargs[k] = f'&H{kwargs[k]}'
fmt_style_dict.update((k, v) for k, v in kwargs.items() if k in fmt_style_dict)
if font:
fmt_style_dict.update(Fontname=font)
if font_size:
fmt_style_dict.update(Fontsize=font_size)
fmts = f'Format: {", ".join(map(str, fmt_style_dict.keys()))}'
styles = f'Style: {",".join(map(str, fmt_style_dict.values()))}'
ass_str = f'[Script Info]\nScriptType: v4.00+\nPlayResX: 384\nPlayResY: 288\nScaledBorderAndShadow: yes\n\n' \
f'[V4+ Styles]\n{fmts}\n{styles}\n\n' \
f'[Events]\nFormat: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text\n\n'
if prefmt or suffmt:
if suffmt:
assert prefmt, 'prefmt must be used along with suffmt'
else:
suffmt = r'\r'
else:
if not color:
color = 'HFF00'
underline_code = r'\u1' if underline else ''
prefmt = r'{\1c&' + f'{color.upper()}&{underline_code}' + '}'
suffmt = r'{\r}'
def secs_to_hhmmss(secs: Tuple[float, int]):
mm, ss = divmod(secs, 60)
hh, mm = divmod(mm, 60)
return f'{hh:0>1.0f}:{mm:0>2.0f}:{ss:0>2.2f}'
def dialogue(words: List[str], idx, start, end) -> str:
text = ''.join(f' {prefmt}{word}{suffmt}'
# if not word.startswith(' ') or word == ' ' else
# f' {prefmt}{word.strip()}{suffmt}')
if curr_idx == idx else
f' {word}'
for curr_idx, word in enumerate(words))
return f"Dialogue: 0,{secs_to_hhmmss(start)},{secs_to_hhmmss(end)}," \
f"Default,,0,0,0,,{text.strip() if strip else text}"
ass_arr = []
for segment in transcript:
curr_words = [wrd['text'] for wrd in segment['word-level']]
prev = segment['word-level'][0]['start']
for wdx, word in enumerate(segment['word-level']):
if word['start'] is not None:
# fill gap between previous word
if word['start'] > prev:
filler_ts = {
"words": curr_words,
"start": prev,
"end": word['start'],
"idx": -1
}
ass_arr.append(filler_ts)
# highlight current word
f_word_ts = {
"words": curr_words,
"start": word['start'],
"end": word['end'],
"idx": wdx
}
ass_arr.append(f_word_ts)
prev = word['end']
ass_str += '\n'.join(map(lambda x: dialogue(**x), ass_arr))
file.write(ass_str)