VSM, LSA, & SVD | Introduction to Text Analytics with R Part 7

Поделиться
HTML-код
  • Опубликовано: 14 окт 2024

Комментарии • 34

  • @krbkll
    @krbkll 6 лет назад +4

    Dave,Nobody can ever explain it better than how you have done it here. The 12 sessions you have on Text Analytics should be made mandatory for any curriculum for Text Analytics. Same goes for your Titanic sessions. Thanks for your efforts in making this as flawless as one can imagine.

  • @TomerBenDavid
    @TomerBenDavid 6 лет назад

    Extraordinar! This is one of the best if not the best teacher I have ever met.

  • @suyashpandey830
    @suyashpandey830 7 лет назад +1

    Wow..concepts so beautifully explained all through out the series. Waiting eagerly for the next video.

    • @Datasciencedojo
      @Datasciencedojo  7 лет назад

      @Suyash Pandey - Thank you for the kind words, glad you have found the videos useful!
      Dave

  • @murraystaff568
    @murraystaff568 7 лет назад +1

    So freaking cool! Can't wait for the next video. Please do a course in Australia!

    • @Datasciencedojo
      @Datasciencedojo  7 лет назад

      @Murray Staff - Glad you are finding the videos of use! If you would like to keep abreast of Data Science Dojo’s plans for Australia you can sign up for alerts at the following page: datasciencedojo.com/bootcamp/schedule/#contact-us-form
      Dave

  • @jonimatix
    @jonimatix 7 лет назад +2

    Great video with interesting concepts explained well!

    • @Datasciencedojo
      @Datasciencedojo  7 лет назад

      @jonimatix - You are too kind. Glad you like the video!
      Dave

  • @pedrofernandosalgadoalvare772
    @pedrofernandosalgadoalvare772 3 года назад

    Dave! You're a genius! Thanks a lot.

  • @soheilmohajerjasbi9900
    @soheilmohajerjasbi9900 6 лет назад

    Never mind my earlier question! I see that it has to do with the notation used in the lecture compared with available resources on the net. I still hold on to my "Excellent lecture" remark!

  • @nicholascanova4250
    @nicholascanova4250 7 лет назад +1

    These videos are great, going to subscribe to your channel. Keep it up!

    • @Datasciencedojo
      @Datasciencedojo  7 лет назад

      @Nicholas Canova - Glad you like the videos!
      Dave

  • @aakashchugh9
    @aakashchugh9 6 лет назад +3

    Hi Dave.. i was going through an article Dimensionality reduction for bag-of-words models: PCA vs LSA by Benjamin Fayyazuddin Ljungberg and the results are PCA performs better than LSA for dimentionally reduction
    .. Is it generalised or trial and error?

  • @dilshadkhanum6953
    @dilshadkhanum6953 7 лет назад +1

    Thank you soo much Dave

  • @kristyburns2363
    @kristyburns2363 5 лет назад +3

    Feels like we are skipping things because they are repeated but now I am confused and not sure what to do with my model

  • @soheilmohajerjasbi9900
    @soheilmohajerjasbi9900 6 лет назад

    Excellent lecture! However, is it possible that there is a typo on slide 8? Should term correlation be shown as X_trans * X, and document correlation be shown as X * X_trans? Please refer to earlier slides 6 and 7. Thanks!

  • @TheShekhar91
    @TheShekhar91 7 лет назад +1

    @ Dave, according to this video, LSA reduces the dimensionality problem. But if we refer to a wikipedia page ( en.wikipedia.org/wiki/Singular_value_decomposition ) and follow the example given of a matrix M at the botton we can see that the matrices used in SVD have the following dimentions :
    ( I am just using the dimensions here, please refer the link for the actual matrices)
    M - 4 X 5
    U - 4 X 4
    Sigma - 4 X 5 and
    V* - 5 X 5
    Multiplying U and Sigma gives us a matrix of 4 X 5 dimensionality, and finally multiplying this result with V* gives us a 4 X 5 dimensionality matrix. Which is similar in size to the original matrix M.
    how then is the dimensionality problem being handled by SVD?
    Please help me understand this.

    • @Datasciencedojo
      @Datasciencedojo  7 лет назад +1

      @Shekhar Tanwar - The number of dimensions is reduced using what is known as "Truncated SVD". In particular, the code leverages the irlba package that allows for calculating only the N most important singular vectors. Specifically, the following code reduces the dimensional space down to 300:
      # Perform SVD. Specifically, reduce dimensionality down to 300 columns
      # for our latent semantic analysis (LSA).
      train.irlba

    • @gregorijabramov6577
      @gregorijabramov6577 6 лет назад

      I have a simmilar Problem with this solution.
      I am not sure why we are not using something like this ?
      train.irlba$v %*% diag(train.irlba.50$d) %*% t(train.irlba$u)
      Approximation of Document-Term-Matrix=Document-Koncept-Matrix * SV * Term-Koncept-Matrix
      Why do we use only the Document-Koncept-Matrix? Is that still LSA or something different ?

  • @dezenaamvergeetiknie
    @dezenaamvergeetiknie 5 лет назад

    Linear algebra is not my strong suit so I have a question. Suppose I have performed the SVD computation but I want to add another feature to the original feature space. Does this mean I have to redo the entire SVD computation or is there some efficient way to update the SVD?

  • @sumitdargan3288
    @sumitdargan3288 7 лет назад +1

    Hey Dave,
    Got an error in part 6. Have posted there the error in the comments section of part 6. Get back soon as soon as possible.
    Thanks !

    • @Datasciencedojo
      @Datasciencedojo  7 лет назад

      @Sumit Dargan - Check the response on that page.

  • @the_crypto_gorgu
    @the_crypto_gorgu 5 лет назад

    What about NGD is it similar? (NGD-> normalized google distance)

  • @r_pydatascience
    @r_pydatascience 4 года назад

    What a great teaching skills. Thank you for these helpful videos.
    I was trying to reproduce your codes with different data. I have the following error after running the following code. Pls help.
    rpart.cv.1

    • @r_pydatascience
      @r_pydatascience 2 года назад

      @NoobTube Thanks. Check my channel. I have an ongoing tutorial on biomedical literature text classification.

  • @kokweikhong5974
    @kokweikhong5974 7 лет назад +1

    in 29:33, is the V that contains the eighvectors of the document correlations, XXt while the U is XtX?

    • @Datasciencedojo
      @Datasciencedojo  7 лет назад

      @Kok Wei Khong - Assuming a term-document matrix (i.e., a matrix where the terms are the rows and the documents are columns), then you have the following:
      U = XXt (i.e., resulting matrix is term-focused)
      V = XtX (i.e., resulting matrix is document-focused)
      HTH,
      Dave

  • @cauliflower78
    @cauliflower78 7 лет назад +1

    I am getting an error on this line:
    rpart.cv.3

    • @Datasciencedojo
      @Datasciencedojo  7 лет назад

      @db-engineering - As mentioned in the comments in the GitHub code this is the result of the formula expansion exceeding R's default memory allocation. To get past this you need to run R from the command line with an option to increase the memory allocation. As I mention in the code comments you can see the following Stack Overflow post if you would like to run the code yourself:
      stackoverflow.com/questions/28728774/how-to-set-max-ppsize-in-r
      HTH,
      Dave

    • @yespleasers
      @yespleasers 3 года назад

      @@Datasciencedojo I'm having the same problem but can't find how to increase the memory allocation for MacOs. Do you know how to do this, Dave? Love the course btw. You're a gifted teacher.

    • @Datasciencedojo
      @Datasciencedojo  3 года назад

      Hello @@yespleasers, you can forward your query/question to our data science team via chatbot or email id on our site: datasciencedojo.com/

  • @pariamolayemvand3343
    @pariamolayemvand3343 6 лет назад

    Hi. I can't open github anymore. it gives 404 error

  • @jimbobbillybob
    @jimbobbillybob 5 лет назад

    whoo knoo? 15:03