Lecture 10: Evaluation of Language Models, Basic Smoothing

Поделиться
HTML-код
  • Опубликовано: 10 дек 2024

Комментарии • 18

  • @lakshyakeshwani1676
    @lakshyakeshwani1676 День назад +1

    Where is lecture 11?

  • @pawanchoure1289
    @pawanchoure1289 2 года назад +1

    Traditionally, language model performance is measured by perplexity, cross-entropy, and bits-per-character (BPC). As language models are increasingly being used as pre-trained models for other NLP tasks, they are often also evaluated based on how well they perform on downstream tasks.

  • @pawanchoure1289
    @pawanchoure1289 2 года назад +1

    One solution to probability density estimation is referred to as Maximum Likelihood Estimation or MLE for short. First, it involves defining a parameter called theta that defines both the choice of the probability density function and the parameters of that distribution.

  • @pawanchoure1289
    @pawanchoure1289 2 года назад

    In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample.

  • @pawanchoure1289
    @pawanchoure1289 2 года назад

    A 2-gram (or bigram) is a two-word sequence of words, like “I love”, “love reading”, or “Analytics Vidhya”. And a 3-gram (or trigram) is a three-word sequence of words like “I love reading”, “about data science” or “on Analytics Vidhya”.

  • @pawanchoure1289
    @pawanchoure1289 2 года назад

    perplexity is a measurement of how well a probability model predicts a sample. In the context of Natural Language Processing, perplexity is one way to evaluate language models.

  • @pawanchoure1289
    @pawanchoure1289 2 года назад

    The term smoothing refers to the adjustment of the maximum likelihood estimator of a language model so that it will be more accurate. ... When estimating a language model based on a limited amount of text, such as a single document, smoothing of the maximum likelihood model is extremely important.

  • @pawanchoure1289
    @pawanchoure1289 2 года назад

    The Shannon Visualization Method
    1. Choose a random bigram (, w) according to its probability.
    2. Now choose a random bigram (w, x) according to its probability.
    3. And so on until we choose
    4. Then string the words together.

  • @pawanchoure1289
    @pawanchoure1289 2 года назад

    Perplexity is the inverse probability of the test set, normalized by the number of words. In the case of unigrams: Now you say you have already constructed the unigram model, meaning, for each word you have the relevant probability.

  • @pawanchoure1289
    @pawanchoure1289 2 года назад

    What is extrinsic and intrinsic evaluation?
    In an intrinsic evaluation, quality of NLP systems outputs is evaluated against pre-determined ground truth (reference text) whereas an extrinsic evaluation is aimed at evaluating systems outputs based on their impact on the performance of other NLP systems

  • @louerleseigneur4532
    @louerleseigneur4532 4 года назад +2

    Thanks sir

  • @pawanchoure1289
    @pawanchoure1289 2 года назад

    unigram prior smoothing

  • @divyanshukumar2605
    @divyanshukumar2605 3 года назад +5

    Never goes in depth of any concepts, just says a bunch of technical words without explaining explicitly, even the explanations are word to word copied from the lecture of Dan Jurafsky.

  • @sumonchakrabarty6805
    @sumonchakrabarty6805 2 года назад +3

    Worst teacher ever seen in my life. He don't even know English properly. His vocabulary is worse. These kind of professors should be fired out immediately of IITs. They are polluting the teaching process....

  • @divyanshukumar2605
    @divyanshukumar2605 3 года назад +1

    A third grade teacher, he should be teaching a 5th grader.