add arch figure, citation

This commit is contained in:
m-bain
2022-12-18 18:43:33 +00:00
committed by GitHub
parent 3a91aa1384
commit 6b64cb079a

View File

@ -17,6 +17,9 @@ This repository refines the timestamps of openAI's Whisper model via forced alig
**Forced Alignment** refers to the process by which orthographic transcriptions are aligned to audio recordings to automatically generate phone level segmentation.
<img width="1216" align="center" alt="whisperx-arch" src="https://user-images.githubusercontent.com/36994049/208313881-903ab3ea-4932-45fd-b3dc-70876cddaaa2.png">
<h2 align="left">Setup ⚙️</h2>
Install this package using
@ -98,7 +101,46 @@ https://user-images.githubusercontent.com/36994049/208298819-6f462b2c-8cae-4c54-
Contact maxbain[at]robots[dot]ox[dot]ac[dot]uk if using this for commerical purposes.
<h2 align="left">Acknowledgements 🙏</h2>
Of course, this is mostly just a modification to [openAI's whisper](https://github.com/openai/whisper).
As well as accreditation to this [PyTorch tutorial on forced alignment](https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html)
<h2 align="left">Citation</h2>
If you use this in your research, just cite the repo,
```bibtex
@misc{bain2022whisperx,
author = {Bain, Max},
title = {WhisperX},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/m-bain/whisperX}},
}
```
as well as the whisper paper,
```bibtex
@article{radford2022robust,
title={Robust speech recognition via large-scale weak supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
journal={arXiv preprint arXiv:2212.04356},
year={2022}
}
```
and any alignment model used, e.g. wav2vec2.0.
```bibtex
@article{baevski2020wav2vec,
title={wav2vec 2.0: A framework for self-supervised learning of speech representations},
author={Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
journal={Advances in Neural Information Processing Systems},
volume={33},
pages={12449--12460},
year={2020}
}
```