A little guide to building Large Language Models in 2024

Поделиться
HTML-код
  • Опубликовано: 2 июн 2024
  • A little guide through all you need to know to train a good performance large language model in 2024.
    This is an introduction talk with link to references for further reading.
    This is the first video of a 2 part series:
    - Video 1 (this video): covering all the concepts to train a good performance LLM in 2024
    - Video 2 (next video): hands-on applying all these concepts with code example
    This video is adapted from a talk I gave in 2024 at a AI/ML winter school for graduate student. When I shared the slides online people kept asking for a recording of the unrecorded class so I decided to spend a morning recording it to share it more widely along the slides.
    Link to the slides: docs.google.com/presentation/...
    Chapters:
    00:00:00 Intro
    00:00:59 Workflow for LLMs
    Part 1: Training: data
    00:01:17 Data preparation - intro and good recent ressources on data preparation
    00:05:28 A web scale pretraining corpus - goals and challenges
    00:11:29 Web scale data sources - Focus on recent datasets
    00:18:01 Language, and quality filtering
    00:24:34 Diving in data deduplication
    00:27:40 Final data preparation for training
    00:31:31 How to evaluate data quality at scale
    00:36:29 The datatrove and lighteval libraries
    Part 2: Training: modeling
    00:38:18 Introduction in modeling technics for LLM training
    00:39:09 When the model is too big: parallelism
    00:40:00 Data parallelism
    00:41:18 Tensor parallelism
    00:44:38 Pipeline parallelism
    00:47:00 Sequence parallelism and references on 4D parallelism
    00:47:52 Synchronisation: GPU-CPU and GPU-GPU challenges
    00:52:14 Flash attention v1 and v2
    00:56:23 Stable training recipes
    00:59:12 New architectures: Mixture-of-experts
    01:03:13 New architectures: Mamba
    01:04:49 The nanotron library
    Part 3: Fine-tuning: RLHF and alignement
    01:06:15 RLHF in 2024
    01:08:23 PPO, DPO and REINFORCE
    Part 4: Fast inference techniques
    01:11:23 Quantization, speculative decoding and compilation: overview and ressources
    End
    01:14:36 Sharing your model, datasets and demo - final words
  • НаукаНаука

Комментарии • 33

  • @angelogiacco857
    @angelogiacco857 2 месяца назад +9

    This is why I love youtube. Getting to hear the thoughts of the CSO of one of the hottest startups around! Thomas, I'll be at the HuggingFace x Mixtral hackathon in Paris next month, hope to see you there!

  • @FusionQuill
    @FusionQuill 2 месяца назад +5

    Thanks for posting this. Lots of customers have been asking us how they can understand the process of creating LLMs

  • @ScottzPlaylists
    @ScottzPlaylists 17 дней назад +1

    👍Best video I've seen on Model Building and Training. ❗
    I noticed at least 2 slides has "RLFH" instead of "RLHF"

  • @venkateshmurugadas7481
    @venkateshmurugadas7481 2 месяца назад

    Thank you so much for this extensive overview of the complete pipeline on LLM training and inference.

  • @user-fh9cq9oz4m
    @user-fh9cq9oz4m 2 месяца назад +1

    Thank you very much for your effort..Awaiting for Video 2.

  • @dheerajnunni8611
    @dheerajnunni8611 2 месяца назад +1

    Thank you for this! A very good introduction to the whole LLM training ecosystem for beginners.

  • @anabildea9274
    @anabildea9274 2 месяца назад

    Very insightful. Thank you for sharing.

  • @stalinthomas9850
    @stalinthomas9850 2 месяца назад

    Brilliant lecture! Just so much information and insights! Thanks a lot for this!

  • @shotanatenadze3705
    @shotanatenadze3705 2 месяца назад

    Brilliant lecture! please, continue recording and sharing your knowledge, it's invaluable resource for everyone in this field.

  • @jennyliu07
    @jennyliu07 2 месяца назад

    Thank you for sharing this amazing video!

  • @ndamulelosbg8887
    @ndamulelosbg8887 Месяц назад

    This was wonderful, spending this much time on talking data preparation is key!

  • @user-zr2ps3km8m
    @user-zr2ps3km8m Месяц назад

    This is really helpful! Thank you very much.

  • @danberm1755
    @danberm1755 Месяц назад

    Thanks so much!!! Much appreciated.

  • @computerauditor
    @computerauditor 2 месяца назад

    Really insightful 🔥🔥🔥

  • @1littlecoder
    @1littlecoder 2 месяца назад +4

    Gold 🥇🥇🥇

  • @willsmithorg
    @willsmithorg Месяц назад

    Very interesting, thank you.

  • @theglionking
    @theglionking 2 месяца назад

    Thank you, Thom.

  • @minhnguyenbinh609
    @minhnguyenbinh609 2 месяца назад

    Thank you for this video

  • @MLTOKYO
    @MLTOKYO 2 месяца назад

    Amazing!

  • @husseinekeita8909
    @husseinekeita8909 2 месяца назад

    Thanks a lot for this. Nanotron is really useful

  • @7alexopoulos
    @7alexopoulos Месяц назад

    Merci beaucoup Thomas!!

  • @phaZZi6461
    @phaZZi6461 Месяц назад

    amazing lecture

  • @Pingu_astrocat21
    @Pingu_astrocat21 Месяц назад

    Thank you so much :)

  • @stevechiou5760
    @stevechiou5760 Месяц назад

    Great video! When is the second one coming out?

  • @clray123
    @clray123 Месяц назад

    What has become of the retentive network architecture which was touted as alternative for transformers? Why have no published LLMs been trained using it?

  • @linli6838
    @linli6838 2 месяца назад

    很不错的视频

  • @lynncherny
    @lynncherny 2 месяца назад

    slides link? :)

  • @ojasvisingh786
    @ojasvisingh786 2 месяца назад

    🎉❤

  • @420_gunna
    @420_gunna Месяц назад

    33:23, what's his example of the noisier dataset? It sounds like he's saying "Zopalé" or something 😄

    • @clray123
      @clray123 Месяц назад

      "The Pile" - it's on the slide...

    • @420_gunna
      @420_gunna Месяц назад

      @@clray123 Hah! Duh -- thank you 😅