mirror of
https://github.com/m-bain/whisperX.git
synced 2025-07-01 18:17:27 -04:00
add arch figure, citation
This commit is contained in:
42
README.md
42
README.md
@ -17,6 +17,9 @@ This repository refines the timestamps of openAI's Whisper model via forced alig
|
|||||||
|
|
||||||
**Forced Alignment** refers to the process by which orthographic transcriptions are aligned to audio recordings to automatically generate phone level segmentation.
|
**Forced Alignment** refers to the process by which orthographic transcriptions are aligned to audio recordings to automatically generate phone level segmentation.
|
||||||
|
|
||||||
|
<img width="1216" align="center" alt="whisperx-arch" src="https://user-images.githubusercontent.com/36994049/208313881-903ab3ea-4932-45fd-b3dc-70876cddaaa2.png">
|
||||||
|
|
||||||
|
|
||||||
<h2 align="left">Setup ⚙️</h2>
|
<h2 align="left">Setup ⚙️</h2>
|
||||||
Install this package using
|
Install this package using
|
||||||
|
|
||||||
@ -98,7 +101,46 @@ https://user-images.githubusercontent.com/36994049/208298819-6f462b2c-8cae-4c54-
|
|||||||
|
|
||||||
Contact maxbain[at]robots[dot]ox[dot]ac[dot]uk if using this for commerical purposes.
|
Contact maxbain[at]robots[dot]ox[dot]ac[dot]uk if using this for commerical purposes.
|
||||||
|
|
||||||
|
|
||||||
<h2 align="left">Acknowledgements 🙏</h2>
|
<h2 align="left">Acknowledgements 🙏</h2>
|
||||||
|
|
||||||
Of course, this is mostly just a modification to [openAI's whisper](https://github.com/openai/whisper).
|
Of course, this is mostly just a modification to [openAI's whisper](https://github.com/openai/whisper).
|
||||||
As well as accreditation to this [PyTorch tutorial on forced alignment](https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html)
|
As well as accreditation to this [PyTorch tutorial on forced alignment](https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html)
|
||||||
|
|
||||||
|
|
||||||
|
<h2 align="left">Citation</h2>
|
||||||
|
If you use this in your research, just cite the repo,
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{bain2022whisperx,
|
||||||
|
author = {Bain, Max},
|
||||||
|
title = {WhisperX},
|
||||||
|
year = {2022},
|
||||||
|
publisher = {GitHub},
|
||||||
|
journal = {GitHub repository},
|
||||||
|
howpublished = {\url{https://github.com/m-bain/whisperX}},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
as well as the whisper paper,
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@article{radford2022robust,
|
||||||
|
title={Robust speech recognition via large-scale weak supervision},
|
||||||
|
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
|
||||||
|
journal={arXiv preprint arXiv:2212.04356},
|
||||||
|
year={2022}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
and any alignment model used, e.g. wav2vec2.0.
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@article{baevski2020wav2vec,
|
||||||
|
title={wav2vec 2.0: A framework for self-supervised learning of speech representations},
|
||||||
|
author={Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
|
||||||
|
journal={Advances in Neural Information Processing Systems},
|
||||||
|
volume={33},
|
||||||
|
pages={12449--12460},
|
||||||
|
year={2020}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
Reference in New Issue
Block a user