These courses are so intersting!! I have previously been obsessed with your course on Cognitive Linguistics and I was wondering whether or not you are going to post a continuation on the series within the series on Linguistic Relativity(because you kind of left us on a bit of cliffhanger there😅) Again thank you so much for these videos!!
Thank you so much professor! I am a beginner interested in corpus linguistics, and I want to learn more with it! But I don't know how can I find some corpus online (like the BNCA in your video), could you offer some suggestions?
Hi, professor Hilpert! I just finished watching this video, it is really highly rewarding and highly recommended! This time I have successfully downloaded the spreadsheet! By the way, is the textbook you adopt Lindquist's (2009) "Corpus Linguistics and the Description of English?"
@@MartinHilpert Today I just pop up a new question: could you please explain a little bit about the "assumptions"? I find it hard to transplant notions like "normal distribution" "independent observations" in parametric tests into a collocational analysis. Besides, is it really necessary to ensure that all the expected frequencies are larger than 1, and 80% of them are larger than 5?
Hi Martin, great and very detailed explanation, especially for the Delta P! Would you do/think of a series of Collostructional Analysis video tutorials? Anyway, stay safe!
Thank you so, so, so, so much for putting up these videos ☺ - I've been struggling with the statistical aspects of collocations for the past week or so but since I have (next to) zero experience with statistics I kept running into walls. I will also shamelessly appropriate the Excel-formula for the log likelihood from the other video - I tried to figure it out on my own but kept going wrong somewhere and got results that couldn't possibly be correct 😅
It took me a while to see why, and I thought I'd leave a comment here in case someone else sees this. Why is the formula "B * C / D"? The idea is: 1) The probability of finding "pretty" is B/D and the probability of finding "well" is "C/D". 2) The probability of finding both pretty and well is "(B * C) / (D^2)". 3) But then... there are D-1 pairs in the data. So in theory the formula should have been ((B*C) / (D^2)) * D-1. 4) For very large D, (D-1/D) ~> 1, therefore the formula becomes (B*C)/D
I had debated myself on whether or not to go into the issue of D-1 word pairs. I ended up taking the shortcut, your comment clears it up. Thanks a lot!
Rank Collocate Collocate frequency Log-likelihood 1 well 42 162.0473 2 said 15 45.0989 3 why 6 19.8583 4 goes 4 15.068 5 says 4 13.182 6 look 4 8.8964 7 now 4 6.2623 8 saying 3 11.7249 Hello Professor, My Question is that words ranked 4 to 7 have same collocate frequency, why do they have different Log likelihood? Regards
Their collocate frequencies are the same, but their overall corpus frequencies differ: goes < says < look < now. The higher the overall corpus frequency, the lower LL, given the same collocate frequency.
I was wondering if there is a sort of “threshold” for all these measures. I am more familiar with the log-likelihood and a score of 10.83 indicates that the collocation is at 99.9% not due to chance. What about the MI? Is there a banality threshold too? Thank you 😊
Martin, it looks like the website for Lancaster has changed since this video, but as it is the same web address (corpora.lancs.ac.uk/) I assume it is the same, only the toolbox has been relabeled #LancsBox.
Thank you so much for these videos professor! Looking forward for the upcoming ones. All the best.
These courses are so intersting!!
I have previously been obsessed with your course on Cognitive Linguistics and I was wondering whether or not you are going to post a continuation on the series within the series on Linguistic Relativity(because you kind of left us on a bit of cliffhanger there😅)
Again thank you so much for these videos!!
I know, I know. I promised a continuation of that. I'll do my best to make that happen. All good wishes!
Thank you sir, we appreciate all your efforts. Looking forwrad to that continuation!!
Dear Professor! What type of linguisitcs deals with collocations? Cognitive, applied, etc...?
Thank you so much professor! I am a beginner interested in corpus linguistics, and I want to learn more with it! But I don't know how can I find some corpus online (like the BNCA in your video), could you offer some suggestions?
Hi, professor Hilpert! I just finished watching this video, it is really highly rewarding and highly recommended! This time I have successfully downloaded the spreadsheet! By the way, is the textbook you adopt Lindquist's (2009) "Corpus Linguistics and the Description of English?"
Many thanks, Jack! And yes, exactly, it's Lindquist (2009). There's a newer edition, but I'm using the old one with my students.
@@MartinHilpert Today I just pop up a new question: could you please explain a little bit about the "assumptions"? I find it hard to transplant notions like "normal distribution" "independent observations" in parametric tests into a collocational analysis. Besides, is it really necessary to ensure that all the expected frequencies are larger than 1, and 80% of them are larger than 5?
@@jackmeng8326 I'll try to do that, thanks for the pointers!
Hi Martin, great and very detailed explanation, especially for the Delta P! Would you do/think of a series of Collostructional Analysis video tutorials? Anyway, stay safe!
Hi Gede, good to hear from you! Yes, there will be videos on collostructional analysis quite soon. You stay safe, too!
Awesome! Thanks Martin and looking forward to the CollAna videos!
Thank you so, so, so, so much for putting up these videos ☺ - I've been struggling with the statistical aspects of collocations for the past week or so but since I have (next to) zero experience with statistics I kept running into walls. I will also shamelessly appropriate the Excel-formula for the log likelihood from the other video - I tried to figure it out on my own but kept going wrong somewhere and got results that couldn't possibly be correct 😅
It took me a while to see why, and I thought I'd leave a comment here in case someone else sees this. Why is the formula "B * C / D"? The idea is:
1) The probability of finding "pretty" is B/D and the probability of finding "well" is "C/D".
2) The probability of finding both pretty and well is "(B * C) / (D^2)".
3) But then... there are D-1 pairs in the data. So in theory the formula should have been ((B*C) / (D^2)) * D-1.
4) For very large D, (D-1/D) ~> 1, therefore the formula becomes (B*C)/D
I had debated myself on whether or not to go into the issue of D-1 word pairs. I ended up taking the shortcut, your comment clears it up. Thanks a lot!
Rank Collocate Collocate frequency Log-likelihood
1 well 42 162.0473
2 said 15 45.0989
3 why 6 19.8583
4 goes 4 15.068
5 says 4 13.182
6 look 4 8.8964
7 now 4 6.2623
8 saying 3 11.7249
Hello Professor,
My Question is that words ranked 4 to 7 have same collocate frequency, why do they have different Log likelihood?
Regards
Their collocate frequencies are the same, but their overall corpus frequencies differ: goes < says < look < now. The higher the overall corpus frequency, the lower LL, given the same collocate frequency.
@@MartinHilpert thanks Professor :-) you are amazing ☺️
thank you, extremely interesting!
I was wondering if there is a sort of “threshold” for all these measures. I am more familiar with the log-likelihood and a score of 10.83 indicates that the collocation is at 99.9% not due to chance. What about the MI? Is there a banality threshold too? Thank you 😊
Martin, it looks like the website for Lancaster has changed since this video, but as it is the same web address (corpora.lancs.ac.uk/) I assume it is the same, only the toolbox has been relabeled #LancsBox.
Many thanks, George! The functions should be the same.
@@MartinHilpert I appreciate your help.