Ive been learning about LLMs over the past few months, but i havent gone into too much depth. Your videos seem very detailed and technical. Which one(s) would you recommend starting off with?
There are excellent courses from DeepLearning.ai on Coursera. To go even deeper, I recommend to directly read the technical papers which gives you more depth of understanding.
1. Roofline model 2. Transformer arch. > bottleneck of attention > flash attention 3. LLM Inference can be divided into: prefilling-stage (compute-bound) and decoding-stage (memory-bound) 4. LLM serving: paged attention, radix attention If you want to optimize the inference performance, this review paper is awesome: LLM Inference Unveiled: Survey and Roofline Model Insights
Nicely done and very helpful! Thank you!! FYI, the stress is on the first syllable of "INference", not the second ("inFERence").
Copy that! Thank you😊
Thank you! It was very informative and well explained. Is it possible to access the PDF slides you presented?
Sure. The majority slides are taken from the AWS tutorial: drive.google.com/file/d/1uVhHtRBwXy7o8ejaS6Ab6pSybkzticE3/view
Ive been learning about LLMs over the past few months, but i havent gone into too much depth. Your videos seem very detailed and technical. Which one(s) would you recommend starting off with?
There are excellent courses from DeepLearning.ai on Coursera. To go even deeper, I recommend to directly read the technical papers which gives you more depth of understanding.
1. Roofline model
2. Transformer arch. > bottleneck of attention > flash attention
3. LLM Inference can be divided into: prefilling-stage (compute-bound) and decoding-stage (memory-bound)
4. LLM serving: paged attention, radix attention
If you want to optimize the inference performance, this review paper is awesome: LLM Inference Unveiled: Survey and Roofline Model Insights