add arch figure, citation

2025-07-01 18:17:27 -04:00 · 2022-12-18 18:43:33 +00:00
parent 3a91aa1384
commit 6b64cb079a
1 changed files with 42 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -17,6 +17,9 @@ This repository refines the timestamps of openAI's Whisper model via forced alig
 **Forced Alignment** refers to the process by which orthographic transcriptions are aligned to audio recordings to automatically generate phone level segmentation.
 <img width="1216" align="center" alt="whisperx-arch" src="https://user-images.githubusercontent.com/36994049/208313881-903ab3ea-4932-45fd-b3dc-70876cddaaa2.png">
 <h2 align="left">Setup ⚙️</h2>
 Install this package using
@ -98,7 +101,46 @@ https://user-images.githubusercontent.com/36994049/208298819-6f462b2c-8cae-4c54-
 Contact maxbain[at]robots[dot]ox[dot]ac[dot]uk if using this for commerical purposes.
 <h2 align="left">Acknowledgements 🙏</h2>
 Of course, this is mostly just a modification to [openAI's whisper](https://github.com/openai/whisper).
 As well as accreditation to this [PyTorch tutorial on forced alignment](https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html)
 <h2 align="left">Citation</h2>
 If you use this in your research, just cite the repo,
 ```bibtex
@misc{bain2022whisperx,
  author = {Bain, Max},
  title = {WhisperX},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/m-bain/whisperX}},
 }
 ```
 as well as the whisper paper,
 ```bibtex
@article{radford2022robust,
  title={Robust speech recognition via large-scale weak supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  journal={arXiv preprint arXiv:2212.04356},
  year={2022}
 }
 ```
 and any alignment model used, e.g. wav2vec2.0.
 ```bibtex
@article{baevski2020wav2vec,
  title={wav2vec 2.0: A framework for self-supervised learning of speech representations},
  author={Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  pages={12449--12460},
  year={2020}
 }
 ```