chore: update version to 3.3.3 in pyproject.toml and uv.lock

feat: add version and Python version arguments to CLI
fix: downgrade ctranslate2 dependency version
2025-07-01 18:17:27 -04:00 · 2025-05-01 11:08:54 +02:00 · 2025-05-01 11:08:54 +02:00 · 2025-05-01 11:08:54 +02:00 · 2025-04-12 11:08:15 +02:00 · 2025-03-25 18:49:44 +01:00
19 changed files with 3048 additions and 165 deletions
--- a/.github/workflows/build-and-release.yml
+++ b/.github/workflows/build-and-release.yml
@ -11,25 +11,21 @@ jobs:
      - name: Checkout
        uses: actions/checkout@v4

-      - name: Set up Python
-        uses: actions/setup-python@v5
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
        with:
+          version: "0.5.14"
          python-version: "3.9"

-      - name: Install dependencies
-        run: |
-          python -m pip install build
-
-      - name: Build wheels
-        run: python -m build --wheel
+      - name: Build package
+        run: uv build

      - name: Release to Github
        uses: softprops/action-gh-release@v2
        with:
-          files: dist/*
+          files: dist/*.whl

      - name: Publish package to PyPi
-        uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
-        with:
-          user: __token__
-          password: ${{ secrets.PYPI_API_TOKEN }}
+        run: uv publish
+        env:
+          UV_PUBLISH_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
--- a/.github/workflows/python-compatibility.yml
+++ b/.github/workflows/python-compatibility.yml
@ -5,7 +5,7 @@ on:
    branches: [main]
  pull_request:
    branches: [main]
-  workflow_dispatch:  # Allows manual triggering from GitHub UI
+  workflow_dispatch: # Allows manual triggering from GitHub UI

 jobs:
  test:
@ -17,16 +17,15 @@ jobs:
    steps:
      - uses: actions/checkout@v4

-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v5
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
        with:
+          version: "0.5.14"
          python-version: ${{ matrix.python-version }}

-      - name: Install package
-        run: |
-          python -m pip install --upgrade pip
-          pip install .
+      - name: Install the project
+        run: uv sync --all-extras

      - name: Test import
        run: |
-          python -c "import whisperx; print('Successfully imported whisperx')"
+          uv run python -c "import whisperx; print('Successfully imported whisperx')"
--- a/.github/workflows/tmp.yml
+++ b/.github/workflows/tmp.yml
@ -1,35 +0,0 @@
-name: Python Compatibility Test (PyPi)
-
-on:
-  push:
-    branches: [main]
-  pull_request:
-    branches: [main]
-  workflow_dispatch:  # Allows manual triggering from GitHub UI
-
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        python-version: ["3.9", "3.10", "3.11", "3.12"]
-
-    steps:
-      - uses: actions/checkout@v4
-
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v5
-        with:
-          python-version: ${{ matrix.python-version }}
-
-      - name: Install package
-        run: |
-          pip install whisperx
-
-      - name: Print packages
-        run: |
-          pip list
-
-      - name: Test import
-        run: |
-          python -c "import whisperx; print('Successfully imported whisperx')"
--- a/README.md
+++ b/README.md
@ -62,54 +62,41 @@ This repository provides fast automatic speech recognition (70x realtime with la
 - Paper drop🎓👨‍🏫! Please see our [ArxiV preprint](https://arxiv.org/abs/2303.00747) for benchmarking and details of WhisperX. We also introduce more efficient batch inference resulting in large-v2 with *60-70x REAL TIME speed.

 <h2 align="left" id="setup">Setup ⚙️</h2>
-Tested for PyTorch 2.0, Python 3.10 (use other versions at your own risk!)

-GPU execution requires the NVIDIA libraries cuBLAS 11.x and cuDNN 8.x to be installed on the system. Please refer to the [CTranslate2 documentation](https://opennmt.net/CTranslate2/installation.html).
+### 1. Simple Installation (Recommended)

-
-### 1. Create Python3.10 environment
-
-`conda create --name whisperx python=3.10`
-
-`conda activate whisperx`
-
-
-### 2. Install PyTorch, e.g. for Linux and Windows CUDA11.8:
-
-`conda install pytorch==2.0.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia`
-
-See other methods [here.](https://pytorch.org/get-started/previous-versions/#v200)
-
-### 3. Install WhisperX
-
-You have several installation options:
-
-#### Option A: Stable Release (recommended)
-Install the latest stable version from PyPI:
+The easiest way to install WhisperX is through PyPi:

 ```bash
 pip install whisperx
 ```

-#### Option B: Development Version
-Install the latest development version directly from GitHub (may be unstable):
+Or if using [uvx](https://docs.astral.sh/uv/guides/tools/#running-tools):

 ```bash
-pip install git+https://github.com/m-bain/whisperx.git
+uvx whisperx
 ```

-If already installed, update to the most recent commit:
+### 2. Advanced Installation Options
+
+These installation methods are for developers or users with specific needs. If you're not sure, stick with the simple installation above.
+
+#### Option A: Install from GitHub
+
+To install directly from the GitHub repository:

 ```bash
-pip install git+https://github.com/m-bain/whisperx.git --upgrade
+uvx git+https://github.com/m-bain/whisperX.git
 ```

-#### Option C: Development Mode
-If you wish to modify the package, clone and install in editable mode:
+#### Option B: Developer Installation
+
+If you want to modify the code or contribute to the project:
+
 ```bash
 git clone https://github.com/m-bain/whisperX.git
 cd whisperX
-pip install -e .
+uv sync --all-extras --dev
 ```

 > **Note**: The development version may contain experimental features and bugs. Use the stable PyPI release for production environments.
@ -117,12 +104,12 @@ pip install -e .
 You may also need to install ffmpeg, rust etc. Follow openAI instructions here https://github.com/openai/whisper#setup.

 ### Speaker Diarization
+
 To **enable Speaker Diarization**, include your Hugging Face access token (read) that you can generate from [Here](https://huggingface.co/settings/tokens) after the `--hf_token` argument and accept the user agreement for the following models: [Segmentation](https://huggingface.co/pyannote/segmentation-3.0) and [Speaker-Diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) (if you choose to use Speaker-Diarization 2.x, follow requirements [here](https://huggingface.co/pyannote/speaker-diarization) instead.)

 > **Note**<br>
 > As of Oct 11, 2023, there is a known issue regarding slow performance with pyannote/Speaker-Diarization-3.0 in whisperX. It is due to dependency conflicts between faster-whisper and pyannote-audio 3.0.0. Please see [this issue](https://github.com/m-bain/whisperX/issues/499) for more details and potential workarounds.

-
 <h2 align="left" id="example">Usage 💬 (command line)</h2>

 ### English
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,36 @@
+[project]
+urls = { repository = "https://github.com/m-bain/whisperx" }
+authors = [{ name = "Max Bain" }]
+name = "whisperx"
+version = "3.3.3"
+description = "Time-Accurate Automatic Speech Recognition using Whisper."
+readme = "README.md"
+requires-python = ">=3.9, <3.13"
+license = { text = "BSD-2-Clause" }
+
+dependencies = [
+    "ctranslate2<4.5.0",
+    "faster-whisper>=1.1.1",
+    "nltk>=3.9.1",
+    "numpy>=2.0.2",
+    "onnxruntime>=1.19",
+    "pandas>=2.2.3",
+    "pyannote-audio>=3.3.2",
+    "torch>=2.5.1",
+    "torchaudio>=2.5.1",
+    "transformers>=4.48.0",
+]
+
+
+[project.scripts]
+whisperx = "whisperx.transcribe:cli"
+
+[build-system]
+requires = ["setuptools"]
+
+[tool.setuptools]
+include-package-data = true
+
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["whisperx*"]
--- a/requirements.txt
+++ b/requirements.txt
@ -1,8 +0,0 @@
-torch>=2
-torchaudio>=2
-faster-whisper==1.1.0
-ctranslate2<4.5.0
-transformers
-pandas
-setuptools>=65
-nltk
--- a/setup.py
+++ b/setup.py
@ -1,33 +0,0 @@
-import os
-
-import pkg_resources
-from setuptools import find_packages, setup
-
-with open("README.md", "r", encoding="utf-8") as f:
-    long_description = f.read()
-
-setup(
-    name="whisperx",
-    py_modules=["whisperx"],
-    version="3.3.1",
-    description="Time-Accurate Automatic Speech Recognition using Whisper.",
-    long_description=long_description,
-    long_description_content_type="text/markdown",
-    python_requires=">=3.9, <3.13",
-    author="Max Bain",
-    url="https://github.com/m-bain/whisperx",
-    license="BSD-2-Clause",
-    packages=find_packages(exclude=["tests*"]),
-    install_requires=[
-        str(r)
-        for r in pkg_resources.parse_requirements(
-            open(os.path.join(os.path.dirname(__file__), "requirements.txt"))
-        )
-    ]
-    + [f"pyannote.audio==3.3.2"],
-    entry_points={
-        "console_scripts": ["whisperx=whisperx.transcribe:cli"],
-    },
-    include_package_data=True,
-    extras_require={"dev": ["pytest"]},
-)
--- a/uv.lock
+++ b/uv.lock
--- a/whisperx/SubtitlesProcessor.py
+++ b/whisperx/SubtitlesProcessor.py
@ -1,6 +1,5 @@
 import math
-from .conjunctions import get_conjunctions, get_comma
-from typing import TextIO
+from whisperx.conjunctions import get_conjunctions, get_comma

 def normal_round(n):
    if n - math.floor(n) < 0.5:
--- a/whisperx/init.py
+++ b/whisperx/init.py
@ -1,4 +1,7 @@
-from .alignment import load_align_model, align
-from .audio import load_audio
-from .diarize import assign_word_speakers, DiarizationPipeline
-from .asr import load_model
+from whisperx.alignment import load_align_model as load_align_model, align as align
+from whisperx.asr import load_model as load_model
+from whisperx.audio import load_audio as load_audio
+from whisperx.diarize import (
+    assign_word_speakers as assign_word_speakers,
+    DiarizationPipeline as DiarizationPipeline,
+)
--- a/whisperx/main.py
+++ b/whisperx/main.py
@ -1,4 +1,4 @@
-from .transcribe import cli
+from whisperx.transcribe import cli


 cli()
--- a/whisperx/alignment.py
+++ b/whisperx/alignment.py
@ -13,9 +13,9 @@ import torch
 import torchaudio
 from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

-from .audio import SAMPLE_RATE, load_audio
-from .utils import interpolate_nans
-from .types import (
+from whisperx.audio import SAMPLE_RATE, load_audio
+from whisperx.utils import interpolate_nans
+from whisperx.types import (
    AlignedTranscriptionResult,
    SingleSegment,
    SingleAlignedSegment,
--- a/whisperx/asr.py
+++ b/whisperx/asr.py
@ -11,9 +11,10 @@ from faster_whisper.transcribe import TranscriptionOptions, get_ctranslate2_stor
 from transformers import Pipeline
 from transformers.pipelines.pt_utils import PipelineIterator

-from .audio import N_SAMPLES, SAMPLE_RATE, load_audio, log_mel_spectrogram
-from .types import SingleSegment, TranscriptionResult
-from .vads import Vad, Silero, Pyannote
+from whisperx.audio import N_SAMPLES, SAMPLE_RATE, load_audio, log_mel_spectrogram
+from whisperx.types import SingleSegment, TranscriptionResult
+from whisperx.vads import Vad, Silero, Pyannote
+

 def find_numeral_symbol_tokens(tokenizer):
    numeral_symbol_tokens = []
@ -50,6 +51,7 @@ class WhisperModel(faster_whisper.WhisperModel):
            previous_tokens,
            without_timestamps=options.without_timestamps,
            prefix=options.prefix,
+            hotwords=options.hotwords
        )

        encoder_output = self.encode(features)
--- a/whisperx/audio.py
+++ b/whisperx/audio.py
@ -7,7 +7,7 @@ import numpy as np
 import torch
 import torch.nn.functional as F

-from .utils import exact_div
+from whisperx.utils import exact_div

 # hard-coded audio hyperparameters
 SAMPLE_RATE = 16000
--- a/whisperx/diarize.py
+++ b/whisperx/diarize.py
@ -4,8 +4,8 @@ from pyannote.audio import Pipeline
 from typing import Optional, Union
 import torch

-from .audio import load_audio, SAMPLE_RATE
-from .types import TranscriptionResult, AlignedTranscriptionResult
+from whisperx.audio import load_audio, SAMPLE_RATE
+from whisperx.types import TranscriptionResult, AlignedTranscriptionResult


 class DiarizationPipeline:
--- a/whisperx/transcribe.py
+++ b/whisperx/transcribe.py
@ -1,17 +1,20 @@
 import argparse
 import gc
 import os
+import sys
 import warnings
+import importlib.metadata
+import platform

 import numpy as np
 import torch

-from .alignment import align, load_align_model
-from .asr import load_model
-from .audio import load_audio
-from .diarize import DiarizationPipeline, assign_word_speakers
-from .types import AlignedTranscriptionResult, TranscriptionResult
-from .utils import (
+from whisperx.alignment import align, load_align_model
+from whisperx.asr import load_model
+from whisperx.audio import load_audio
+from whisperx.diarize import DiarizationPipeline, assign_word_speakers
+from whisperx.types import AlignedTranscriptionResult, TranscriptionResult
+from whisperx.utils import (
    LANGUAGES,
    TO_LANGUAGE_CODE,
    get_writer,
@ -85,6 +88,8 @@ def cli():
    parser.add_argument("--hf_token", type=str, default=None, help="Hugging Face Access Token to access PyAnnote gated models")

    parser.add_argument("--print_progress", type=str2bool, default = False, help = "if True, progress will be printed in transcribe() and align() methods.")
+    parser.add_argument("--version", "-V", action="version", version=f"%(prog)s {importlib.metadata.version('whisperx')}",help="Show whisperx version information and exit")
+    parser.add_argument("--python-version", "-P", action="version", version=f"Python {platform.python_version()} ({platform.python_implementation()})",help="Show python version information and exit")
    # fmt: on

    args = parser.parse_args().__dict__
@ -138,7 +143,9 @@ def cli():
                f"{model_name} is an English-only model but received '{args['language']}'; using English instead."
            )
        args["language"] = "en"
-    align_language = args["language"] if args["language"] is not None else "en" # default to loading english if not specified
+    align_language = (
+        args["language"] if args["language"] is not None else "en"
+    )  # default to loading english if not specified

    temperature = args.pop("temperature")
    if (increment := args.pop("temperature_increment_on_fallback")) is not None:
@ -174,12 +181,29 @@ def cli():
    if args["max_line_count"] and not args["max_line_width"]:
        warnings.warn("--max_line_count has no effect without --max_line_width")
    writer_args = {arg: args.pop(arg) for arg in word_options}
-    
+
    # Part 1: VAD & ASR Loop
    results = []
    tmp_results = []
    # model = load_model(model_name, device=device, download_root=model_dir)
-    model = load_model(model_name, device=device, device_index=device_index, download_root=model_dir, compute_type=compute_type, language=args['language'], asr_options=asr_options, vad_method=vad_method, vad_options={"chunk_size":chunk_size, "vad_onset": vad_onset, "vad_offset": vad_offset}, task=task, local_files_only=model_cache_only, threads=faster_whisper_threads)
+    model = load_model(
+        model_name,
+        device=device,
+        device_index=device_index,
+        download_root=model_dir,
+        compute_type=compute_type,
+        language=args["language"],
+        asr_options=asr_options,
+        vad_method=vad_method,
+        vad_options={
+            "chunk_size": chunk_size,
+            "vad_onset": vad_onset,
+            "vad_offset": vad_offset,
+        },
+        task=task,
+        local_files_only=model_cache_only,
+        threads=faster_whisper_threads,
+    )

    for audio_path in args.pop("audio"):
        audio = load_audio(audio_path)
@ -203,7 +227,9 @@ def cli():
    if not no_align:
        tmp_results = results
        results = []
-        align_model, align_metadata = load_align_model(align_language, device, model_name=align_model)
+        align_model, align_metadata = load_align_model(
+            align_language, device, model_name=align_model
+        )
        for result, audio_path in tmp_results:
            # >> Align
            if len(tmp_results) > 1:
@ -215,8 +241,12 @@ def cli():
            if align_model is not None and len(result["segments"]) > 0:
                if result.get("language", "en") != align_metadata["language"]:
                    # load new language
-                    print(f"New language found ({result['language']})! Previous was ({align_metadata['language']}), loading new alignment model for new language...")
-                    align_model, align_metadata = load_align_model(result["language"], device)
+                    print(
+                        f"New language found ({result['language']})! Previous was ({align_metadata['language']}), loading new alignment model for new language..."
+                    )
+                    align_model, align_metadata = load_align_model(
+                        result["language"], device
+                    )
                print(">>Performing alignment...")
                result: AlignedTranscriptionResult = align(
                    result["segments"],
@ -239,13 +269,17 @@ def cli():
    # >> Diarize
    if diarize:
        if hf_token is None:
-            print("Warning, no --hf_token used, needs to be saved in environment variable, otherwise will throw error loading diarization model...")
+            print(
+                "Warning, no --hf_token used, needs to be saved in environment variable, otherwise will throw error loading diarization model..."
+            )
        tmp_results = results
        print(">>Performing diarization...")
        results = []
        diarize_model = DiarizationPipeline(use_auth_token=hf_token, device=device)
        for result, input_audio_path in tmp_results:
-            diarize_segments = diarize_model(input_audio_path, min_speakers=min_speakers, max_speakers=max_speakers)
+            diarize_segments = diarize_model(
+                input_audio_path, min_speakers=min_speakers, max_speakers=max_speakers
+            )
            result = assign_word_speakers(diarize_segments, result)
            results.append((result, input_audio_path))
    # >> Write
@ -253,5 +287,6 @@ def cli():
        result["language"] = align_language
        writer(result, audio_path, writer_args)

+
 if __name__ == "__main__":
    cli()
--- a/whisperx/utils.py
+++ b/whisperx/utils.py
@ -106,7 +106,6 @@ LANGUAGES = {
    "jw": "javanese",
    "su": "sundanese",
    "yue": "cantonese",
-    "lv": "latvian",
 }

 # language code lookup by name, with a few language aliases
--- a/whisperx/vads/init.py
+++ b/whisperx/vads/init.py
@ -1,3 +1,3 @@
-from whisperx.vads.pyannote import Pyannote
-from whisperx.vads.silero import Silero
-from whisperx.vads.vad import Vad
+from whisperx.vads.pyannote import Pyannote as Pyannote
+from whisperx.vads.silero import Silero as Silero
+from whisperx.vads.vad import Vad as Vad
--- a/whisperx/vads/pyannote.py
+++ b/whisperx/vads/pyannote.py
@ -1,6 +1,4 @@
-import hashlib
 import os
-import urllib
 from typing import Callable, Text, Union
 from typing import Optional

@ -12,11 +10,11 @@ from pyannote.audio.pipelines import VoiceActivityDetection
 from pyannote.audio.pipelines.utils import PipelineModel
 from pyannote.core import Annotation, SlidingWindowFeature
 from pyannote.core import Segment
-from tqdm import tqdm

 from whisperx.diarize import Segment as SegmentX
 from whisperx.vads.vad import Vad

+
 def load_vad_model(device, vad_onset=0.500, vad_offset=0.363, use_auth_token=None, model_fp=None):
    model_dir = torch.hub._get_torch_home()
Author	SHA1	Message	Date
Barabazs	f5b40b5366	chore: update version to 3.3.3 in pyproject.toml and uv.lock	2025-05-01 11:08:54 +02:00
Barabazs	ac0c8bd79a	feat: add version and Python version arguments to CLI	2025-05-01 11:08:54 +02:00
Barabazs	cd59f21d1a	fix: downgrade ctranslate2 dependency version	2025-05-01 11:08:54 +02:00
Yan Cheng Cheok	0aed874589	Remove duplicated item "lv": "latvian"	2025-04-12 11:08:15 +02:00
Barabazs	f10dbf6ab1	fix: update setuptools configuration to include package discovery for whisperx	2025-03-25 18:49:44 +01:00
Barabazs	a7564c2ad6	docs: update installation instructions	2025-03-25 17:02:41 +01:00
Barabazs	e7712f496e	refactor: update import statements to use explicit module paths across multiple files	2025-03-25 16:24:21 +01:00
jademlc	8e53866704	feat: pass hotwords argument to get_prompt (#1073 ) Co-authored-by: Jade Moillic <jade.moillic@radiofrance.com>	2025-03-24 10:47:47 +01:00
Max Bain	3205436d58	Merge pull request #1002 from Barabazs/feat/uv	2025-03-23 12:59:46 +00:00
Barabazs	d2f0e53f71	chore: remove tmp workflow	2025-02-12 08:23:23 +01:00
Barabazs	7489ebf876	feat: update build and release workflow to use uv for package installation and publishing	2025-02-12 08:23:23 +01:00
Barabazs	90256cc481	feat: use uv recommended setup	2025-02-12 08:23:23 +01:00
Barabazs	b41ebd4871	chore: add numpy to deps	2025-02-12 08:23:23 +01:00
Barabazs	63bc1903c1	feat: update Python compatibility workflow to use uv	2025-02-12 08:23:23 +01:00
Barabazs	272714e07d	feat: use uv for building package	2025-02-12 08:23:23 +01:00