DataMListic
DataMListic
  • Видео 169
  • Просмотров 975 762
Why L1 Regularization Produces Sparse Weights
In this video, we talk about why the L1 regularization produces sparse weights.
*References*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
L1 vs L2 Regularization: ruclips.net/video/aBgMRXSqd04/видео.html
*Related Videos*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Overfitting vs Underfitting: ruclips.net/video/B9rhzg6_LLw/видео.html
Why Models Overfit and Underfit - The Bias Variance Trade-off: ruclips.net/video/5mbX6ITznHk/видео.html
Least Squares vs Maximum Likelihood: ruclips.net/video/WCP98USBZ0w/видео.html
Why We Don't Use the Mean Squared Error (MSE) Loss in Classification: ruclips.net/video/bNwI3IUOKyg/видео.html
Hyperparameters Tuning: Grid Search vs Random Search: ruclips.net/video/G-fXV-o9QV8/видео.html
XGBoost Explained in U...
Просмотров: 1 204

Видео

Overfitting vs Underfitting - Explained
Просмотров 1,1 тыс.21 день назад
In this video, we explore two of the fundamental concepts in machine learning (ML): overfitting and underfitting. *Related Videos* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Bias-Variance Trade-Off: ruclips.net/video/5mbX6ITznHk/видео.html Why We Don't Use the Mean Squared Error (MSE) Loss in Classification: ruclips.net/video/bNwI3IUOKyg/видео.html The Bessel's Correction: ruclips.net/video/E3_408q1mjo/видео.htm...
Confidence Intervals Explained
Просмотров 706Месяц назад
In this video, we dive into confidence intervals, a fundamental concept in statistics that provides a range of values likely to contain a population parameter, such as the mean. *References* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ T-test Explained: ruclips.net/video/7KtLCXeXmiU/видео.html Z-test Explained: ruclips.net/video/u0EdFFp_U0c/видео.html *Related Videos* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Basic Probability Dis...
Z-Test Explained
Просмотров 1,3 тыс.Месяц назад
In this video, we talk about the z-test, a statistical method used to determine whether there is a significant difference between sample data and a population mean. We also discuss how it differs from the t-test, focusing on when to use each test based on sample size and whether the population standard deviation is known. *References* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ T-test Explained: ruclips.net/video...
L1 vs L2 Regularization
Просмотров 2,2 тыс.Месяц назад
In this video, we talk about the L1 and L2 regularization, two techniques that help prevent overfitting, and explore the differences between them. *References* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Why Weight Regularization Reduces Overfitting: ruclips.net/video/Im91WYsbZ_U/видео.html *Related Videos* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Why Models Overfit and Underfit - The Bias Variance Trade-off: ruclips.net/video/5...
Poisson Distribution - Explained
Просмотров 1,4 тыс.Месяц назад
In this video, we talk about the Poisson distribution and how it is derived as an edge case of the Binomial distribution when the probability of success tends to 0 and the number of trials tends to infinity. *References* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Basic Probability Distributions Explained: Bernoulli, Binomial, Categorical, Multinomial: ruclips.net/video/PPfjlRLNqEs/видео.html *Related Videos* ▬▬▬...
Basic Probability Distributions Explained: Bernoulli, Binomial, Categorical, Multinomial
Просмотров 7572 месяца назад
In this video, we talk about some of the fundamental concepts in probability theory: the marginal probability, the joint probability and the conditional probability, together with the mathematical links between them. *References* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Combinations explained: ruclips.net/video/p8vIcmr_Pqo/видео.html *Related Videos* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Marginal, Joint and Conditional Pro...
T-Test Explained
Просмотров 3 тыс.3 месяца назад
T-Test Explained
AI Weekly Brief - Week 2: Llama 3.2, OpenAI Voice Mode, Mira Murati leaves OpenAI
Просмотров 2853 месяца назад
AI Weekly Brief - Week 2: Llama 3.2, OpenAI Voice Mode, Mira Murati leaves OpenAI
AI Weekly Brief - Week 2: LlamaCoder, Eureka, YouTube GenAI, Pixtral 12B
Просмотров 3344 месяца назад
AI Weekly Brief - Week 2: LlamaCoder, Eureka, RUclips GenAI, Pixtral 12B
AI Weekly Brief - Week 1: OpenAI o1-preview, DataGemma, AlphaProteo
Просмотров 2474 месяца назад
AI Weekly Brief - Week 1: OpenAI o1-preview, DataGemma, AlphaProteo
Covariance Matrix - Explained
Просмотров 7 тыс.4 месяца назад
Covariance Matrix - Explained
The Bitter Lesson in AI...
Просмотров 1 тыс.4 месяца назад
The Bitter Lesson in AI...
Marginal, Joint and Conditional Probabilities Explained
Просмотров 8 тыс.5 месяцев назад
Marginal, Joint and Conditional Probabilities Explained
Least Squares vs Maximum Likelihood
Просмотров 20 тыс.6 месяцев назад
Least Squares vs Maximum Likelihood
AI Reading List (by Ilya Sutskever) - Part 5
Просмотров 1,4 тыс.7 месяцев назад
AI Reading List (by Ilya Sutskever) - Part 5
AI Reading List (by Ilya Sutskever) - Part 4
Просмотров 1,2 тыс.7 месяцев назад
AI Reading List (by Ilya Sutskever) - Part 4
AI Reading List (by Ilya Sutskever) - Part 3
Просмотров 1,7 тыс.7 месяцев назад
AI Reading List (by Ilya Sutskever) - Part 3
AI Reading List (by Ilya Sutskever) - Part 2
Просмотров 2,4 тыс.7 месяцев назад
AI Reading List (by Ilya Sutskever) - Part 2
AI Reading List (by Ilya Sutskever) - Part 1
Просмотров 17 тыс.7 месяцев назад
AI Reading List (by Ilya Sutskever) - Part 1
Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained
Просмотров 9 тыс.8 месяцев назад
Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained
Singular Value Decomposition (SVD) Explained
Просмотров 4,3 тыс.8 месяцев назад
Singular Value Decomposition (SVD) Explained
ROUGE Score Explained
Просмотров 2,2 тыс.8 месяцев назад
ROUGE Score Explained
BLEU Score Explained
Просмотров 3 тыс.8 месяцев назад
BLEU Score Explained
Cross-Validation Explained
Просмотров 8729 месяцев назад
Cross-Validation Explained
Sliding Window Attention (Longformer) Explained
Просмотров 3 тыс.9 месяцев назад
Sliding Window Attention (Longformer) Explained
BART Explained: Denoising Sequence-to-Sequence Pre-training
Просмотров 2,2 тыс.9 месяцев назад
BART Explained: Denoising Sequence-to-Sequence Pre-training
RLHF: Training Language Models to Follow Instructions with Human Feedback - Paper Explained
Просмотров 1 тыс.10 месяцев назад
RLHF: Training Language Models to Follow Instructions with Human Feedback - Paper Explained
Chain-of-Verification (COVE) Reduces Hallucination in Large Language Models - Paper Explained
Просмотров 74310 месяцев назад
Chain-of-Verification (COVE) Reduces Hallucination in Large Language Models - Paper Explained
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits - Paper Explained
Просмотров 2,6 тыс.10 месяцев назад
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits - Paper Explained

Комментарии

  • @mohammedrumaan2704
    @mohammedrumaan2704 23 часа назад

    Amazing video ✨

  • @urjadamodhar4023
    @urjadamodhar4023 4 дня назад

    this is awesome

  • @simonhinterseer9974
    @simonhinterseer9974 4 дня назад

    Why is grid search computationally more expensive - it depends on how many grid points there are - if your random search has more random hp picks, then the random search takes longer, no?

    • @datamlistic
      @datamlistic День назад

      Both can be computationally expensive! In practice however, what i've seen is that people do a big grid search, but random search only for a certain amount of time.

  • @LunaSuJu
    @LunaSuJu 5 дней назад

    Thank you! That visualisation with explaining sample mean being closer made A LOT OF SENSE! Though I do not understand the degrees of freedom part and it always bothered me! Could you make a video on degrees of freedom as well (I can see how we get x3 from the other info but I do not get how it helps anything and how it is related to dividing 2?? Thanks a lot in advance!)

  • @quirkyquester
    @quirkyquester 9 дней назад

    great video, thx!

  • @AlexM.West_Music
    @AlexM.West_Music 10 дней назад

    thank you for the explantion!

  • @siwarzaouia5343
    @siwarzaouia5343 11 дней назад

    great job, finally useful video to make this theorem simple and clear , keep working

  • @misoren9645
    @misoren9645 11 дней назад

    But how do you get Q? How do you get K? How are the values of those vectors defined?

  • @awumbe
    @awumbe 11 дней назад

    So in poor words. Oscillation in time domain gets converted into a sum of rotational frequencies in complex domain?

  • @CaptainHaHaa
    @CaptainHaHaa 11 дней назад

    Fascinating stuff cheers mate

    • @datamlistic
      @datamlistic 10 дней назад

      Thanks! Glad you liked it! :)

  • @datamlistic
    @datamlistic 12 дней назад

    Video errors: - at 03:36 I should have written "low variance" on the screen for underfitting instead of "high variance". Thanks @nuktu for noticing this!

  • @datamlistic
    @datamlistic 12 дней назад

    Full video at: ruclips.net/video/B9rhzg6_LLw/видео.html

  • @dafish3882
    @dafish3882 12 дней назад

    Great explanation! Thankyou so much!

    • @datamlistic
      @datamlistic 12 дней назад

      Thanks! Glad it was helpful! Make sure to check the other videos in the series if you wanna learn more about object detection. :)

  • @abolfazlchoubdaran2430
    @abolfazlchoubdaran2430 13 дней назад

    dada tank u

  • @Mathframe
    @Mathframe 13 дней назад

    Where did you get the original explanation for this?

    • @datamlistic
      @datamlistic 12 дней назад

      There are more places which explains this in a similar manner. For instance, I read this blog a while ago: explained.ai/regularization/

  • @DM-py7pj
    @DM-py7pj 14 дней назад

    Why are the circles getting bigger as you minimise the loss?

    • @kippie608
      @kippie608 14 дней назад

      I think if you consider the concentric circles as a top down view of a hill, it topological view. Gradient descent would involve going down the hill from the top (the smallest circles) down the slope (to larger and larger circles). Not sure if this is accurate, but this is my sense of what he was describing in the video.

    • @datamlistic
      @datamlistic 12 дней назад

      @@kippie608 That's exactly what I had in mind. Now that I think about it, it might be a bit confusing as people tend to depict gradient descent the other way around.

  • @iacobsorina6924
    @iacobsorina6924 14 дней назад

    Great explanation!

  • @datamlistic
    @datamlistic 14 дней назад

    Learn more about the differences between L1 and L2 regularization here: ruclips.net/video/aBgMRXSqd04/видео.html

  • @aquienleimporta9096
    @aquienleimporta9096 15 дней назад

    Question, if in 2:15 the feature map passes through a CNN of 3X3 , then h and w in the minute 3:05 wouldn't be of the height and width of the feature map less 2 ?(3-1 both due to the fact that is 3X3 the CNN) or may be a padding of 1 should be applied first to the feature map ?

  • @georgeofhamilton
    @georgeofhamilton 16 дней назад

    This is probably the best explanation of this concept on RUclips. Every other video on standard deviation glosses over why we use Bessel’s correction.

    • @datamlistic
      @datamlistic 12 дней назад

      Thank you! Happy to hear that! :)

  • @georgeofhamilton
    @georgeofhamilton 16 дней назад

    Can you elaborate more on how the degrees of freedom are connected to the bias? Why is it that we need to consider degrees of freedom in the sample metrics but not in the population metrics?

  • @CHRISTYLARA-k1h
    @CHRISTYLARA-k1h 16 дней назад

    Sometimes the market needs a good laugh, and $LLM delivers. But jokes aside, it’s launching on SOLANA and hitting BingX tomorrow. I’ll admit it-I’m curious to see where this goes.

  • @iJustWannaBeAbleToUseMyWifi
    @iJustWannaBeAbleToUseMyWifi 16 дней назад

    recently i discovered this channel and i have to say your explanations are so simple its very good to understand and i want to thank you

  • @tsruturaj
    @tsruturaj 16 дней назад

    Omg, thank you for the explanation! I was eagerly searching for this one ❤

    • @datamlistic
      @datamlistic 12 дней назад

      Thanks! Happy to hear this! :)

  • @Mr_Mohit_Shukla
    @Mr_Mohit_Shukla 17 дней назад

    Arey bhai aur thode khatin words use kar le na simple chiz ko samjhane ke liye

  • @ecougm
    @ecougm 18 дней назад

    Excellent

  • @mohammedrumaan2704
    @mohammedrumaan2704 18 дней назад

    Lovely video. Great explanation!!!🎉🎉

    • @datamlistic
      @datamlistic 12 дней назад

      Thanks! Glad you liked it! :)

  • @jared9167
    @jared9167 18 дней назад

    Why do I get a different mean from your sample of [173,173,172,161] in your example at 2:00? I get a mean of 169.75 not 170.38.

  • @yavuz4837
    @yavuz4837 18 дней назад

    nice video dude clearly explaining the concept, thx

  • @AmmarMujtabaTariq-zv8zp
    @AmmarMujtabaTariq-zv8zp 20 дней назад

    -1 is not going upward, to the previous hidden state, but to the right, toward the point wise multiplication of zt and h(tilde)t

  • @nuktu
    @nuktu 20 дней назад

    At 03:36 you say "low variance" for underfitting models, but the screen says "high variance" on the right hand side box. I assume that should say "low variance" too?

    • @datamlistic
      @datamlistic 12 дней назад

      Yes, I should have written "low variance" on the screen. Underfitting has high bias, and a low variance in general. Sorry for the confusion and thanks for noticing this! I will ad a pin to this video where I mention this error so others don't get confused.

  • @datamlistic
    @datamlistic 20 дней назад

    Full video link: ruclips.net/video/8ZYLWipgPT0/видео.html

  • @yusufjones9260
    @yusufjones9260 22 дня назад

    One question, for the total gradient for we need to calculate for o1,o2 too and add them together right?

  • @datamlistic
    @datamlistic 22 дня назад

    A more theoretical overview of these issues are presented in my video about the bias-variance trade-off: ruclips.net/video/5mbX6ITznHk/видео.html

  • @KonradTamas
    @KonradTamas 23 дня назад

    Must Love them Hungarians!

  • @subi_rocks
    @subi_rocks 24 дня назад

    Thank you so much for explaining this topic in such an easy way.

  • @khalilebdelli6199
    @khalilebdelli6199 24 дня назад

    preparing for a job interview and im handling object detection, this is definitely a must for everyone interested in learning more about this topic.

  • @tantzer6113
    @tantzer6113 25 дней назад

    So, BART was made “outdated” by which new technology? What is the best tool for NLP tasks?

  • @tisnix
    @tisnix 25 дней назад

    Thanks for the clear explanation!

  • @holopyolo2452
    @holopyolo2452 29 дней назад

    how we get shared key? bro

  • @ptsdon
    @ptsdon Месяц назад

    This was a really good explanation of the Fourier Teansform.

  • @ignasa007
    @ignasa007 Месяц назад

    Correction -- the Gaussian distribution is NOT on the *perpendicular distance* of the input-label tuple from the mean line, but instead the *vertical distance* (along the label dimension).

  • @vil9386
    @vil9386 Месяц назад

    This is a very nice crisp content. Thank you.

    • @datamlistic
      @datamlistic Месяц назад

      Thanks! Glad you liked it! :)

  • @hallz7297
    @hallz7297 Месяц назад

    The best ML channel

  • @hallz7297
    @hallz7297 Месяц назад

    Amazing content my friend, keep going !

    • @datamlistic
      @datamlistic Месяц назад

      Thanks! Glad you liked it! :)

  • @Anastasia-x5s
    @Anastasia-x5s Месяц назад

    Waaaejtg5f e rf

  • @ДмитрийСвирепов-ю3д

    Man, it's literally the first time I ever comment a video on Yt. I don't have much time to prepare for the exam, but your delievery and explanations are so clear. Be blessed and keep on going!!

    • @datamlistic
      @datamlistic Месяц назад

      Many thanks! Make sure to also check the other videos as well! :)

  • @e555t66
    @e555t66 Месяц назад

    Thanks for this! very useful for revision when preparing for a test or interview.

  • @davide0965
    @davide0965 Месяц назад

    You didn't explain anything. What is the intuition behind q,k,v ? Just rewrote the formula

    • @datamlistic
      @datamlistic Месяц назад

      I kinda agree with this. It's an old video of mine and it's not that polished. Sorry you didn't find in it what you were looking for.