mirror of
https://github.com/m-bain/whisperX.git
synced 2025-07-01 18:17:27 -04:00
update python example
This commit is contained in:
@ -130,12 +130,13 @@ See more examples in other languages [here](EXAMPLES.md).
|
||||
|
||||
```python
|
||||
import whisperx
|
||||
import whisper
|
||||
|
||||
device = "cuda"
|
||||
audio_file = "audio.mp3"
|
||||
|
||||
# transcribe with original whisper
|
||||
model = whisperx.load_model("large", device)
|
||||
model = whisper.load_model("large", device)
|
||||
result = model.transcribe(audio_file)
|
||||
|
||||
print(result["segments"]) # before alignment
|
||||
@ -157,9 +158,6 @@ In addition to forced alignment, the following two modifications have been made
|
||||
|
||||
1. `--condition_on_prev_text` is set to `False` by default (reduces hallucination)
|
||||
|
||||
2. Clamping segment `end_time` to be at least 0.02s (one time precision) later than `start_time` (prevents segments with negative duration)
|
||||
|
||||
|
||||
<h2 align="left" id="limitations">Limitations ⚠️</h2>
|
||||
|
||||
- Whisper normalises spoken numbers e.g. "fifty seven" to arabic numerals "57". Need to perform this normalization after alignment, so the phonemes can be aligned. Currently just ignores numbers.
|
||||
|
Reference in New Issue
Block a user