diff --git a/README.md b/README.md index 7e410f2..2bffa43 100644 --- a/README.md +++ b/README.md @@ -130,12 +130,13 @@ See more examples in other languages [here](EXAMPLES.md). ```python import whisperx +import whisper device = "cuda" audio_file = "audio.mp3" # transcribe with original whisper -model = whisperx.load_model("large", device) +model = whisper.load_model("large", device) result = model.transcribe(audio_file) print(result["segments"]) # before alignment @@ -157,9 +158,6 @@ In addition to forced alignment, the following two modifications have been made 1. `--condition_on_prev_text` is set to `False` by default (reduces hallucination) -2. Clamping segment `end_time` to be at least 0.02s (one time precision) later than `start_time` (prevents segments with negative duration) - -

Limitations ⚠️

- Whisper normalises spoken numbers e.g. "fifty seven" to arabic numerals "57". Need to perform this normalization after alignment, so the phonemes can be aligned. Currently just ignores numbers.