From 6b64cb079a1ad6aa2370669803a3aafabc71e6e0 Mon Sep 17 00:00:00 2001 From: m-bain <36994049+m-bain@users.noreply.github.com> Date: Sun, 18 Dec 2022 18:43:33 +0000 Subject: [PATCH] add arch figure, citation --- README.md | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/README.md b/README.md index eb68222..195dac7 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,9 @@ This repository refines the timestamps of openAI's Whisper model via forced alig **Forced Alignment** refers to the process by which orthographic transcriptions are aligned to audio recordings to automatically generate phone level segmentation. +whisperx-arch + +

Setup ⚙️

Install this package using @@ -98,7 +101,46 @@ https://user-images.githubusercontent.com/36994049/208298819-6f462b2c-8cae-4c54- Contact maxbain[at]robots[dot]ox[dot]ac[dot]uk if using this for commerical purposes. +

Acknowledgements 🙏

Of course, this is mostly just a modification to [openAI's whisper](https://github.com/openai/whisper). As well as accreditation to this [PyTorch tutorial on forced alignment](https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html) + + +

Citation

+If you use this in your research, just cite the repo, + +```bibtex +@misc{bain2022whisperx, + author = {Bain, Max}, + title = {WhisperX}, + year = {2022}, + publisher = {GitHub}, + journal = {GitHub repository}, + howpublished = {\url{https://github.com/m-bain/whisperX}}, +} +``` + +as well as the whisper paper, + +```bibtex +@article{radford2022robust, + title={Robust speech recognition via large-scale weak supervision}, + author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya}, + journal={arXiv preprint arXiv:2212.04356}, + year={2022} +} +``` +and any alignment model used, e.g. wav2vec2.0. + +```bibtex +@article{baevski2020wav2vec, + title={wav2vec 2.0: A framework for self-supervised learning of speech representations}, + author={Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael}, + journal={Advances in Neural Information Processing Systems}, + volume={33}, + pages={12449--12460}, + year={2020} +} +```