GPT Series - Triton 1 (make GPU go brrr)
Motivations Basic GPT-2 Recently, I rewrote GPT2 as an exercice to help me prepare for big AI companies interviews. After reading the paper and reused the Shakespeare dataset given by Karpathy in its nanoGPT project, I started to write the code for the whole model : LayerNorm Attention layer Training loop Feed forward network (FFN) Positional embedding Model improvements I then focused on improving the model by implementing a few features such as : ...