Transformers🔗
This class introduces attention mechanisms in detail and presents the Transformer architecture, with an example in NLP. It builds on the NLP class.
Additional Resources🔗
- The Illustrated Transformer
- Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems 30, 2017. pdf
- Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901. pdf
- Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." J. Mach. Learn. Res. 21.140 (2020): 1-67. pdf
- t5 on HuggingFace
- c4 dataset
- Hoffmann, Jordan, et al. "Training Compute-Optimal Large Language Models." arXiv preprint arXiv:2203.15556 (2022). pdf
- Taylor, Ross, et al. "Galactica: A large language model for science." arXiv preprint arXiv:2211.09085 (2022). pdf
- ChatGPT
Vision Transformers🔗
- Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020). pdf
- ViT - Papers with code
- Wightman, Ross, Hugo Touvron, and Hervé Jégou. "Resnet strikes back: An improved training procedure in timm." arXiv preprint arXiv:2110.00476 (2021). pdf
- Liu, Zhuang, et al. "A convnet for the 2020s." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. pdf