mirror of
https://github.com/m-bain/whisperX.git
synced 2025-07-01 18:17:27 -04:00
update python example
This commit is contained in:
@ -130,12 +130,13 @@ See more examples in other languages [here](EXAMPLES.md).
|
|||||||
|
|
||||||
```python
|
```python
|
||||||
import whisperx
|
import whisperx
|
||||||
|
import whisper
|
||||||
|
|
||||||
device = "cuda"
|
device = "cuda"
|
||||||
audio_file = "audio.mp3"
|
audio_file = "audio.mp3"
|
||||||
|
|
||||||
# transcribe with original whisper
|
# transcribe with original whisper
|
||||||
model = whisperx.load_model("large", device)
|
model = whisper.load_model("large", device)
|
||||||
result = model.transcribe(audio_file)
|
result = model.transcribe(audio_file)
|
||||||
|
|
||||||
print(result["segments"]) # before alignment
|
print(result["segments"]) # before alignment
|
||||||
@ -157,9 +158,6 @@ In addition to forced alignment, the following two modifications have been made
|
|||||||
|
|
||||||
1. `--condition_on_prev_text` is set to `False` by default (reduces hallucination)
|
1. `--condition_on_prev_text` is set to `False` by default (reduces hallucination)
|
||||||
|
|
||||||
2. Clamping segment `end_time` to be at least 0.02s (one time precision) later than `start_time` (prevents segments with negative duration)
|
|
||||||
|
|
||||||
|
|
||||||
<h2 align="left" id="limitations">Limitations ⚠️</h2>
|
<h2 align="left" id="limitations">Limitations ⚠️</h2>
|
||||||
|
|
||||||
- Whisper normalises spoken numbers e.g. "fifty seven" to arabic numerals "57". Need to perform this normalization after alignment, so the phonemes can be aligned. Currently just ignores numbers.
|
- Whisper normalises spoken numbers e.g. "fifty seven" to arabic numerals "57". Need to perform this normalization after alignment, so the phonemes can be aligned. Currently just ignores numbers.
|
||||||
|
Reference in New Issue
Block a user