Dave,Nobody can ever explain it better than how you have done it here. The 12 sessions you have on Text Analytics should be made mandatory for any curriculum for Text Analytics. Same goes for your Titanic sessions. Thanks for your efforts in making this as flawless as one can imagine.
@Murray Staff - Glad you are finding the videos of use! If you would like to keep abreast of Data Science Dojo’s plans for Australia you can sign up for alerts at the following page: datasciencedojo.com/bootcamp/schedule/#contact-us-form Dave
Never mind my earlier question! I see that it has to do with the notation used in the lecture compared with available resources on the net. I still hold on to my "Excellent lecture" remark!
Hi Dave.. i was going through an article Dimensionality reduction for bag-of-words models: PCA vs LSA by Benjamin Fayyazuddin Ljungberg and the results are PCA performs better than LSA for dimentionally reduction .. Is it generalised or trial and error?
Excellent lecture! However, is it possible that there is a typo on slide 8? Should term correlation be shown as X_trans * X, and document correlation be shown as X * X_trans? Please refer to earlier slides 6 and 7. Thanks!
@ Dave, according to this video, LSA reduces the dimensionality problem. But if we refer to a wikipedia page ( en.wikipedia.org/wiki/Singular_value_decomposition ) and follow the example given of a matrix M at the botton we can see that the matrices used in SVD have the following dimentions : ( I am just using the dimensions here, please refer the link for the actual matrices) M - 4 X 5 U - 4 X 4 Sigma - 4 X 5 and V* - 5 X 5 Multiplying U and Sigma gives us a matrix of 4 X 5 dimensionality, and finally multiplying this result with V* gives us a 4 X 5 dimensionality matrix. Which is similar in size to the original matrix M. how then is the dimensionality problem being handled by SVD? Please help me understand this.
@Shekhar Tanwar - The number of dimensions is reduced using what is known as "Truncated SVD". In particular, the code leverages the irlba package that allows for calculating only the N most important singular vectors. Specifically, the following code reduces the dimensional space down to 300: # Perform SVD. Specifically, reduce dimensionality down to 300 columns # for our latent semantic analysis (LSA). train.irlba
I have a simmilar Problem with this solution. I am not sure why we are not using something like this ? train.irlba$v %*% diag(train.irlba.50$d) %*% t(train.irlba$u) Approximation of Document-Term-Matrix=Document-Koncept-Matrix * SV * Term-Koncept-Matrix Why do we use only the Document-Koncept-Matrix? Is that still LSA or something different ?
Linear algebra is not my strong suit so I have a question. Suppose I have performed the SVD computation but I want to add another feature to the original feature space. Does this mean I have to redo the entire SVD computation or is there some efficient way to update the SVD?
What a great teaching skills. Thank you for these helpful videos. I was trying to reproduce your codes with different data. I have the following error after running the following code. Pls help. rpart.cv.1
@Kok Wei Khong - Assuming a term-document matrix (i.e., a matrix where the terms are the rows and the documents are columns), then you have the following: U = XXt (i.e., resulting matrix is term-focused) V = XtX (i.e., resulting matrix is document-focused) HTH, Dave
@db-engineering - As mentioned in the comments in the GitHub code this is the result of the formula expansion exceeding R's default memory allocation. To get past this you need to run R from the command line with an option to increase the memory allocation. As I mention in the code comments you can see the following Stack Overflow post if you would like to run the code yourself: stackoverflow.com/questions/28728774/how-to-set-max-ppsize-in-r HTH, Dave
@@Datasciencedojo I'm having the same problem but can't find how to increase the memory allocation for MacOs. Do you know how to do this, Dave? Love the course btw. You're a gifted teacher.
Dave,Nobody can ever explain it better than how you have done it here. The 12 sessions you have on Text Analytics should be made mandatory for any curriculum for Text Analytics. Same goes for your Titanic sessions. Thanks for your efforts in making this as flawless as one can imagine.
Extraordinar! This is one of the best if not the best teacher I have ever met.
Wow..concepts so beautifully explained all through out the series. Waiting eagerly for the next video.
@Suyash Pandey - Thank you for the kind words, glad you have found the videos useful!
Dave
So freaking cool! Can't wait for the next video. Please do a course in Australia!
@Murray Staff - Glad you are finding the videos of use! If you would like to keep abreast of Data Science Dojo’s plans for Australia you can sign up for alerts at the following page: datasciencedojo.com/bootcamp/schedule/#contact-us-form
Dave
Great video with interesting concepts explained well!
@jonimatix - You are too kind. Glad you like the video!
Dave
Dave! You're a genius! Thanks a lot.
Never mind my earlier question! I see that it has to do with the notation used in the lecture compared with available resources on the net. I still hold on to my "Excellent lecture" remark!
These videos are great, going to subscribe to your channel. Keep it up!
@Nicholas Canova - Glad you like the videos!
Dave
Hi Dave.. i was going through an article Dimensionality reduction for bag-of-words models: PCA vs LSA by Benjamin Fayyazuddin Ljungberg and the results are PCA performs better than LSA for dimentionally reduction
.. Is it generalised or trial and error?
Thank you soo much Dave
Feels like we are skipping things because they are repeated but now I am confused and not sure what to do with my model
Excellent lecture! However, is it possible that there is a typo on slide 8? Should term correlation be shown as X_trans * X, and document correlation be shown as X * X_trans? Please refer to earlier slides 6 and 7. Thanks!
@ Dave, according to this video, LSA reduces the dimensionality problem. But if we refer to a wikipedia page ( en.wikipedia.org/wiki/Singular_value_decomposition ) and follow the example given of a matrix M at the botton we can see that the matrices used in SVD have the following dimentions :
( I am just using the dimensions here, please refer the link for the actual matrices)
M - 4 X 5
U - 4 X 4
Sigma - 4 X 5 and
V* - 5 X 5
Multiplying U and Sigma gives us a matrix of 4 X 5 dimensionality, and finally multiplying this result with V* gives us a 4 X 5 dimensionality matrix. Which is similar in size to the original matrix M.
how then is the dimensionality problem being handled by SVD?
Please help me understand this.
@Shekhar Tanwar - The number of dimensions is reduced using what is known as "Truncated SVD". In particular, the code leverages the irlba package that allows for calculating only the N most important singular vectors. Specifically, the following code reduces the dimensional space down to 300:
# Perform SVD. Specifically, reduce dimensionality down to 300 columns
# for our latent semantic analysis (LSA).
train.irlba
I have a simmilar Problem with this solution.
I am not sure why we are not using something like this ?
train.irlba$v %*% diag(train.irlba.50$d) %*% t(train.irlba$u)
Approximation of Document-Term-Matrix=Document-Koncept-Matrix * SV * Term-Koncept-Matrix
Why do we use only the Document-Koncept-Matrix? Is that still LSA or something different ?
Linear algebra is not my strong suit so I have a question. Suppose I have performed the SVD computation but I want to add another feature to the original feature space. Does this mean I have to redo the entire SVD computation or is there some efficient way to update the SVD?
Hey Dave,
Got an error in part 6. Have posted there the error in the comments section of part 6. Get back soon as soon as possible.
Thanks !
@Sumit Dargan - Check the response on that page.
What about NGD is it similar? (NGD-> normalized google distance)
What a great teaching skills. Thank you for these helpful videos.
I was trying to reproduce your codes with different data. I have the following error after running the following code. Pls help.
rpart.cv.1
@NoobTube Thanks. Check my channel. I have an ongoing tutorial on biomedical literature text classification.
in 29:33, is the V that contains the eighvectors of the document correlations, XXt while the U is XtX?
@Kok Wei Khong - Assuming a term-document matrix (i.e., a matrix where the terms are the rows and the documents are columns), then you have the following:
U = XXt (i.e., resulting matrix is term-focused)
V = XtX (i.e., resulting matrix is document-focused)
HTH,
Dave
I am getting an error on this line:
rpart.cv.3
@db-engineering - As mentioned in the comments in the GitHub code this is the result of the formula expansion exceeding R's default memory allocation. To get past this you need to run R from the command line with an option to increase the memory allocation. As I mention in the code comments you can see the following Stack Overflow post if you would like to run the code yourself:
stackoverflow.com/questions/28728774/how-to-set-max-ppsize-in-r
HTH,
Dave
@@Datasciencedojo I'm having the same problem but can't find how to increase the memory allocation for MacOs. Do you know how to do this, Dave? Love the course btw. You're a gifted teacher.
Hello @@yespleasers, you can forward your query/question to our data science team via chatbot or email id on our site: datasciencedojo.com/
Hi. I can't open github anymore. it gives 404 error
whoo knoo? 15:03
LOL