diff --git a/README.md b/README.md
index 1c7d775..487b9a1 100644
--- a/README.md
+++ b/README.md
@@ -24,7 +24,8 @@
Usage •
Multilingual •
Contribute •
- More examples
+ More examples •
+ Paper
@@ -49,6 +50,7 @@ This repository refines the timestamps of openAI's Whisper model via forced alig
New🚨
+- Paper drop🎓👨🏫! Please see our [ArxiV preprint](https://arxiv.org/abs/2303.00747) for benchmarking and details of WhisperX. We also introduce more efficient batch inference resulting in large-v2 with *60-70x REAL TIME speed.* Repo will be updated soon with this efficient batch inference.
- Batch processing: Add `--vad_filter --parallel_bs [int]` for transcribing long audio file in batches (only supported with VAD filtering). Replace `[int]` with a batch size that fits your GPU memory, e.g. `--parallel_bs 16`.
- VAD filtering: Voice Activity Detection (VAD) from [Pyannote.audio](https://huggingface.co/pyannote/voice-activity-detection) is used as a preprocessing step to remove reliance on whisper timestamps and only transcribe audio segments containing speech. add `--vad_filter` flag, increases timestamp accuracy and robustness (requires more GPU mem due to 30s inputs in wav2vec2)
- Character level timestamps (see `*.char.ass` file output)
@@ -201,38 +203,14 @@ And borrows important alignment code from [PyTorch tutorial on forced alignment]
Citation
-If you use this in your research, for now just cite the repo,
+If you use this in your research, please cite the paper:
```bibtex
-@misc{bain2022whisperx,
- author = {Bain, Max and Han, Tengda},
- title = {WhisperX},
- year = {2022},
- publisher = {GitHub},
- journal = {GitHub repository},
- howpublished = {\url{https://github.com/m-bain/whisperX}},
+@article{bain2022whisperx,
+ title={WhisperX: Time-Accurate Speech Transcription of Long-Form Audio},
+ author={Bain, Max and Huh, Jaesung and Han, Tengda and Zisserman, Andrew},
+ journal={arXiv preprint, arXiv:2303.00747},
+ year={2023}
}
```
-as well as the whisper paper,
-
-```bibtex
-@article{radford2022robust,
- title={Robust speech recognition via large-scale weak supervision},
- author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
- journal={arXiv preprint arXiv:2212.04356},
- year={2022}
-}
-```
-and any alignment model used, e.g. wav2vec2.0.
-
-```bibtex
-@article{baevski2020wav2vec,
- title={wav2vec 2.0: A framework for self-supervised learning of speech representations},
- author={Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
- journal={Advances in Neural Information Processing Systems},
- volume={33},
- pages={12449--12460},
- year={2020}
-}
-```