Skip to content

Transformers🔗

This class introduces attention mechanisms in detail and presents the Transformer architecture, with an example in NLP. It builds on the NLP class.

Attention notebook source

Attention notebook on Colab

Transformer notebook source

Transformer notebook on Colab

Additional Resources🔗

  • The Illustrated Transformer
  • Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems 30, 2017. pdf
  • Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901. pdf
  • Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." J. Mach. Learn. Res. 21.140 (2020): 1-67. pdf
  • t5 on HuggingFace
  • c4 dataset
  • Hoffmann, Jordan, et al. "Training Compute-Optimal Large Language Models." arXiv preprint arXiv:2203.15556 (2022). pdf
  • Taylor, Ross, et al. "Galactica: A large language model for science." arXiv preprint arXiv:2211.09085 (2022). pdf
  • ChatGPT

Vision Transformers🔗

  • Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020). pdf
  • ViT - Papers with code
  • Wightman, Ross, Hugo Touvron, and Hervé Jégou. "Resnet strikes back: An improved training procedure in timm." arXiv preprint arXiv:2110.00476 (2021). pdf
  • Liu, Zhuang, et al. "A convnet for the 2020s." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. pdf