The ML Tech Lead!
The ML Tech Lead!
  • Видео 51
  • Просмотров 78 790
How To Bring Machine Learning Projects to Success
To build a successful machine learning product, you need to understand how to manage a machine learning project. This takes a lot of soft skills, from product discovery passing by the project planning and finally the execution, there is a lot more to building a machine learning solution than just knowing some algorithms!
Просмотров: 318

Видео

What are Float32, Float16 and BFloat16 Data Types?
Просмотров 7012 месяца назад
Float32, Float16 or BFloat16! Why does that matter for Deep Learning? Those are just different levels of precision. Float32 is a way to represent a floating point number with 32 bits (1 or 0), and Float16 / BFloat16 is a way to represent the same number with just 16 bits. This is quite important for Deep Learning because, in the backpropagation algorithm, the model parameters are updated by a g...
The Position Encoding In Transformers
Просмотров 4862 месяца назад
Transformers and the self-attention are powerful architectures to enable large language models, but we need a mechanism for them to understand the order of the different tokens we input into the models. The position encoding is that mechanism! There are many ways to encode the positions, but let me show you the way it was developed in the "Attention is all you need" paper. Let's get into it!
Introduction to Machine Learning System Design!
Просмотров 1,8 тыс.2 месяца назад
Machine Learning System Design is one of my favorite aspects in Machine Learning. We start with a business idea, a product ideation to deduce a whole set of technical requirements to build that product. I want to give you my framework, my playbook to design ML solutions from a business problem. I am going to use a specific example to illustrate this playbook: how to build the friend suggestion ...
Understanding How LoRA Adapters Work!
Просмотров 9852 месяца назад
LoRA Adapters are, to me, one of the smartest strategies used in Machine Learning in recent years! LoRA came as a very natural strategy for fine-tuning models. In my opinion, if you want to work with large language models, knowing how to fine-tune models is one of the most important skills to have these days as a machine learning engineer. So, let me show you the mathematical foundation for tho...
The Backpropagation Algorithm Explained!
Просмотров 6832 месяца назад
The backpropagation algorithm is the heart of deep learning! That is the core reason why we can have those advanced models like LLMs. In a previous video, we saw we can use the computational graph that is built as part of deep learning models to compute any derivatives of the network outputs with respect to the network inputs. I'll put the link in the description. Now we are going to see how we...
Understanding The Computational Graph in Neural Networks
Просмотров 1,1 тыс.2 месяца назад
Do you know what is this computational graph used by deep learning frameworks like TensorFlow or PyTorch? No? Let me tell you then! The whole logic behind how neural networks function is the back-propagation algorithm. This algorithm allows to update the weights of the network so that it can learn. The key aspect of this algorithm is to make sure we can compute the derivatives or the gradients ...
How to Approach Model Optimization for AutoML
Просмотров 6493 месяца назад
Since I started my career in Machine learning, I have worked hard to automate every aspect of my work. If I couldn't develop a fully production-ready machine learning at the click of a button, I was doing something wrong! I find it funny how you can recognize a senior machine learning engineer by how little he works to achieve the same results as a junior one working 10 times as hard! AutoML ha...
Understanding CatBoost!
Просмотров 7263 месяца назад
CatBoost was developed by Yandex in 2017: CatBoost: unbiased boosting with categorical features. They realized that the boosting process induces a special case of data leakage. To prevent that, they developed two new techniques, the expanding mean target encoding and the ordered boosting. - The Gradient Boosted Algorithm Explained: ruclips.net/video/XWQ0Fd_xiBE/видео.html - Understanding XGBoos...
Implementing the Self-Attention Mechanism from Scratch in PyTorch!
Просмотров 8653 месяца назад
Let’s implement the self-attention layer! Here is the video where you can find the logic behind it: ruclips.net/video/W28LfOld44Y/видео.html
What is the Vision Transformer?
Просмотров 6283 месяца назад
I find the Vision Transformer to be quite an interesting model! The self-attention mechanism and the transformer architecture were designed to help fix some of the flaws we saw in previous models that had applications in natural language processing. With the Vision Transformer, a few scientists at Google realized they could take images instead of text as input data and use that architecture as ...
Understanding XGBoost From A to Z!
Просмотров 1,2 тыс.3 месяца назад
I often say that I some point in my career, I became more of a XGBoost modeler than a Machine Learning modeler. That's because if you were working on large tabular datasets, there was no point to try another algorithm, it would provide close to optimum results without much effort. Yeah ok, LightGBM and Catboost are obviously as good and sometimes better, but I will always keep a special place i...
The Gradient Boosted Algorithm Explained!
Просмотров 1,3 тыс.3 месяца назад
In the gradient-boosted trees algorithm, we iterate the following: - We train a tree on the errors made at the previous iteration - We add the tree to the ensemble, and we predict with the new model - We compute the errors made for this iteration.
How Can We Generate BETTER Sequences with LLMs?
Просмотров 4044 месяца назад
We know that LLMs are trained to predict the next word. When we decode the output sequence, we use the tokens of the prompt and the previously predicted tokens to predict the next word. With greedy decoding or multinomial sampling decoding, we use those predictions to output the next token in an autoregressive manner. But is this the sequence we are looking for, considering the prompt? Do we ac...
What is this Temperature for a Large Language Model?
Просмотров 6794 месяца назад
What is this Temperature for a Large Language Model?
From Words to Tokens: The Byte-Pair Encoding Algorithm
Просмотров 5304 месяца назад
From Words to Tokens: The Byte-Pair Encoding Algorithm
The Multi-head Attention Mechanism Explained!
Просмотров 9764 месяца назад
The Multi-head Attention Mechanism Explained!
What ML Engineer Are You? How To Present Yourself On Your Resume
Просмотров 3144 месяца назад
What ML Engineer Are You? How To Present Yourself On Your Resume
Understanding How Vector Databases Work!
Просмотров 18 тыс.4 месяца назад
Understanding How Vector Databases Work!
Understanding the Self-Attention Mechanism in 8 min
Просмотров 1,4 тыс.4 месяца назад
Understanding the Self-Attention Mechanism in 8 min
What is Perplexity for LLMs?
Просмотров 4664 месяца назад
What is Perplexity for LLMs?
Getting a Job in AI: The Different ML Jobs
Просмотров 2874 месяца назад
Getting a Job in AI: The Different ML Jobs
Revolutionizing Education with AI: Personalized Learning, Model Challenges, and Finance Insights
Просмотров 29510 месяцев назад
Revolutionizing Education with AI: Personalized Learning, Model Challenges, and Finance Insights
Exploring Data Science Careers and Potential of Large Language Models
Просмотров 17410 месяцев назад
Exploring Data Science Careers and Potential of Large Language Models
Unlocking AI's Secrets: Career Journeys, Challenges, and the Future
Просмотров 21810 месяцев назад
Unlocking AI's Secrets: Career Journeys, Challenges, and the Future
Working in AI as a Software Engineer!
Просмотров 27611 месяцев назад
Working in AI as a Software Engineer!
Let's Talk about AI with Etienne Bernard!
Просмотров 23711 месяцев назад
Let's Talk about AI with Etienne Bernard!

Комментарии

  • @jacquelinecroftoon2071
    @jacquelinecroftoon2071 День назад

    Hernandez Kenneth Walker Jessica Walker Lisa

  • @jacquelinecroftoon2071
    @jacquelinecroftoon2071 2 дня назад

    Perez Dorothy Perez Paul Martin Melissa

  • @bhaskartripathi
    @bhaskartripathi 4 дня назад

    Great video. But I would have loved if you had spent a minute also on why float32 vs bfloat16 is applied in backpropagation. But the video is still brilliant as always!

  • @chrisogonas
    @chrisogonas 4 дня назад

    Incredibly useful! Thanks Damien.

  • @Zoronoa01
    @Zoronoa01 10 дней назад

    Thank you for this great explanation!

  • @AnshumanAwasthi-kd7qx
    @AnshumanAwasthi-kd7qx 25 дней назад

    I clicked to learn , but you're speaking Embeddings. Speak English 😂

  • @MahSan-nv4jv
    @MahSan-nv4jv 28 дней назад

    Solid explanation and well explored on the problem. Please share diverse problems when possible. Thank you so much

  • @rachadlakis1
    @rachadlakis1 28 дней назад

    Thanks

  • @subodhsharma5097
    @subodhsharma5097 Месяц назад

    Very Insightful!

  • @nikhilshaganti5585
    @nikhilshaganti5585 Месяц назад

    Thank you for the video. In the code at 06:20, shouldn't we deduct 1 from cumcount to make sure we are not counting the current row?

  • @Pedritox0953
    @Pedritox0953 Месяц назад

    Great video!

  • @WillMoody-crmstorm
    @WillMoody-crmstorm Месяц назад

    Holy moly. Thank you. I thought these concepts were beyond me until watching this video. You have a serious gift for explanation

  • @beincheekym8
    @beincheekym8 Месяц назад

    thank you for the clear and concise video!

  • @madhu819-j6o
    @madhu819-j6o 2 месяца назад

    how to convert a decimal number to a bfloat16 format in verilog

  • @bougfou972
    @bougfou972 2 месяца назад

    wow, very clear explanation. THank you very much for this format (much more clear than medium article)

  • @math_in_cantonese
    @math_in_cantonese 2 месяца назад

    I have a question, for pos=0 and "horizontal_index"=2, shouldn't it be PE(pos,2) = sin(pos/10000^(2/d_model)) ? I believe you used the same symbol "i" for 2 different way of indexing, right ? 7:56

    • @TheMLTechLead
      @TheMLTechLead 2 месяца назад

      Yeah you are right, I realized I made that mistake. I need to reshoot it.

    • @AlainDrolet-e4z
      @AlainDrolet-e4z 3 дня назад

      Thank you Damien, and math_in_cantonese I'm in the middle of writing a short article discussing position encoding. Damien, feel proud that you are the first reference I quote in the article! I was just going crazy trying to nail the exact meaning of "i". In Damien's video it is clear he means "i" the dimension index, and the values shown with sin/cos match. But now I could not make any logic of this understanding with the equation formulation below: PE(pos,2i) = sin(pos/10000^2i/dmodel) PE(pos,2i+1) = cos(pos/10000^2i/dmodel) If see this as PE(pos, 0) referreing to the first column (column zero) and, say, PE(pos,5) as referring to the sixth column (column 5), with 5 = 2i+1 => i = (5-1)/2 = 2. So "i" is more like the index of a (sin,cos) pair of dimensions. Its range is d_model/2. The original sin (😄, pun intended) is in the Attention is all you need. There they simply state: > where pos is the position and i is the dimension This is wrong, it seems, 2i and 2i+1 are the dimensions. In any case big thank you Damien, I have watched, many of your videos. They are quite useful in ramping me up on LLM and the rest. Merci beaucoup Alain

  • @TemporaryForstudy
    @TemporaryForstudy 2 месяца назад

    nice. but i have one doubt. like how adding sine and cosine values ensuring that we are encoding the positions. like how did the author come to this conclusion why not other values?

    • @TheMLTechLead
      @TheMLTechLead 2 месяца назад

      The sine and cosine functions provide smooth and continuous representations, which help in learning the relative positions effectively. For example, the encoding for positions k and k+1 will be similar, reflecting their proximity in the sequence. The frequency-based sinusoidal functions allow the encoding to generalize to sequences of arbitrary length without needing to re-learn positional information for different sequence lengths. The model can understand relative positions beyond the length of sequences seen during training. The combination of sine and cosine functions ensures that each position has a unique encoding. The orthogonality property of these functions helps in distinguishing between different positions effectively, even for long sequences. The different frequencies used in the positional encodings allow the model to capture both short-term and long-term dependencies within the sequence. Higher frequency components help in understanding local relationships, while lower frequency components help in capturing global structures. Also, sinusoidal functions are differentiable, which is crucial for backpropagation during training. This ensures that the model can learn to use the positional encodings effectively through gradient-based optimization methods.

  • @do-yeounlee7202
    @do-yeounlee7202 2 месяца назад

    Thanks for the clear explanation. I've watched a few of your videos and follow you on LinkedIn, and I can say that you're killing it brother. Also love the simplicity in your infographics that you have in your videos. Do you get them from elsewhere or do you make it yourself?

    • @TheMLTechLead
      @TheMLTechLead 2 месяца назад

      I make them myself. Takes me most of my time!

    • @do-yeounlee7202
      @do-yeounlee7202 2 месяца назад

      @@TheMLTechLead Respect! What do you use to make them?

    • @TheMLTechLead
      @TheMLTechLead 2 месяца назад

      @@do-yeounlee7202 I use canva.com

  • @karnaghose4784
    @karnaghose4784 2 месяца назад

    Great explanation 👍🏻

  • @bassimeledath2224
    @bassimeledath2224 2 месяца назад

    Excellent. Good ML system design videos are hard to find on RUclips so really appreciate this!

  • @adityagupta4465
    @adityagupta4465 2 месяца назад

    Really well explained. You've earned a subscriber 🎉

  • @passportkaya
    @passportkaya 2 месяца назад

    not really. I'm a US citizen been all over Europe. I say it's the same .

    • @TheMLTechLead
      @TheMLTechLead 2 месяца назад

      How long have you lived in Europe and what countries exactly?

  • @sebastianguerrero5626
    @sebastianguerrero5626 2 месяца назад

    nice content, keep it up!

  • @EmpreendedoresdoBEM
    @EmpreendedoresdoBEM 2 месяца назад

    very clear explanation. thanks

  • @naatcollections7976
    @naatcollections7976 2 месяца назад

    I like your channel

  • @godzilllla2452
    @godzilllla2452 2 месяца назад

    I've got it now. I wonder why we can't calculate the x gradient by starting the backward pass closer to x instead of going through all the activations.

    • @TheMLTechLead
      @TheMLTechLead 2 месяца назад

      I am not sure I understand the question.

  • @mateuszsmendowski2677
    @mateuszsmendowski2677 2 месяца назад

    One of the best explanations on RUclips. Substantively and visually at the highest level :) Are you able to share those slides e.g. via Git?

    • @TheMLTechLead
      @TheMLTechLead 2 месяца назад

      I cannot share the slide but you can see the diagrams in my newsletter: newsletter.theaiedge.io/p/understanding-the-self-attention

  • @zeeshankhanyousafzai5229
    @zeeshankhanyousafzai5229 2 месяца назад

  • @milleniumsalman1984
    @milleniumsalman1984 2 месяца назад

    too good

  • @milleniumsalman1984
    @milleniumsalman1984 2 месяца назад

    great video

  • @milleniumsalman1984
    @milleniumsalman1984 2 месяца назад

    good video

  • @Snerdy0867
    @Snerdy0867 2 месяца назад

    Phenomenal visuals and explanations. Best video on this concept I've ever seen.

  • @IkhukumarHazarika
    @IkhukumarHazarika 3 месяца назад

    Is it rnn 😅

  • @IkhukumarHazarika
    @IkhukumarHazarika 3 месяца назад

    Love the way you teach every point please start teaching this way

  • @IkhukumarHazarika
    @IkhukumarHazarika 3 месяца назад

    More good content indeed good one❤

  • @AbuzarbhuttaG
    @AbuzarbhuttaG 3 месяца назад

    💯💯💯

  • @faysoufox
    @faysoufox 3 месяца назад

    Thank you for your videos

  • @math_in_cantonese
    @math_in_cantonese 3 месяца назад

    I will use your videos as interview refresher....... It is so easy to forget about the details when everyday work floods in for a period of years.

  • @math_in_cantonese
    @math_in_cantonese 3 месяца назад

    Thanks, I forgot some details about Gradient Boosted Algorithm and I was too lazy to look it up.

  • @vivek2319
    @vivek2319 3 месяца назад

    Please make more videos

  • @jairjuliocc
    @jairjuliocc 3 месяца назад

    Thanks You.Can you explain the entire self attention flow? (from postional encode to final next word prediction). I think it will be an entire series 😅

    • @TheMLTechLead
      @TheMLTechLead 3 месяца назад

      It is coming! It will take time

  • @CrypticPulsar
    @CrypticPulsar 3 месяца назад

    Thank you, Damien!!

  • @va940
    @va940 3 месяца назад

    Very good advice ❤

  • @va940
    @va940 3 месяца назад

    Awesome

  • @elmoreglidingclub3030
    @elmoreglidingclub3030 3 месяца назад

    Excellent!! Very good explanation. I need to work on my ear for French. But pausing and backing up the video helped. Great stuff!!

    • @TheMLTechLead
      @TheMLTechLead 3 месяца назад

      My accent + my speaking skills are my weaknesses. Working on it and I think I am improving!

    • @elmoreglidingclub3030
      @elmoreglidingclub3030 3 месяца назад

      @@TheMLTechLead Thanks for your reply but absolutely no apology necessary!! I think it is an excellent video and helpful information. Much appreciation for posting. I am a professor in a business school and always looking for insights into how to teach the technical side of technology in the context of business. Your explanation has been very helpful.

  • @Gowtham25
    @Gowtham25 3 месяца назад

    It's really good and usefull... Expecting for training an llm from the scratch for the next and interested in KAN-FORMER...

  • @astudent8885
    @astudent8885 3 месяца назад

    ML is a black box but boosting seems to be more interpretable (potentially) if we can make the trees more sparse and orthogonal

    • @TheMLTechLead
      @TheMLTechLead 3 месяца назад

      Tree-based method can naturally be used to measure Shapley values without approximation: shap.readthedocs.io/en/latest/tabular_examples.html

  • @astudent8885
    @astudent8885 3 месяца назад

    Do you mean that the new tree is predicting the error? In that case, wouldn't you subtract the new prediction from the previous predictions

    • @TheMLTechLead
      @TheMLTechLead 3 месяца назад

      So we have an ensemble of trees F that predicts y such that F(x) = \hat{y}. The error is y - F(x) = e. We want to add a tree that predicts the error T(x) = \hat{e} = e + error = y - F(x) + error. Therefore F(x) + T(x) = y + error

  • @siddharthsingh7281
    @siddharthsingh7281 3 месяца назад

    share the resources in description

    • @MCroppered
      @MCroppered 3 месяца назад

      Why

    • @MCroppered
      @MCroppered 3 месяца назад

      “Give me the exam solutions pls”

  • @py2992
    @py2992 3 месяца назад

    Thank you for this video !