Introduction to GPT-2

Lex Fridman introduces GPT-2, a transformer with 1.5 billion parameters trained on 40 billion tokens of text from webpages linked to from Reddit articles. Ilya Sutskever explains that the transformer is a combination of multiple ideas, of which attention is one.