To look for binomial in COCA simply use this expression: * _n* and * _n* That's about it bro :) PS: please delete the spaces between * and _ when you use the expression. I added them because a character between two * is printed in bold here in the comments like this *_n* and *_n*
@@Enjoy.your_life34. a specialized corpus includes texts of a particular type, an example would be the Michigan Corpus of Academic Spoken English (MICASE)
On statistical significance and significance testing: Say that you have two corpora, one contains texts produced by men, and the other contains texts produced by women. You would like to see whether men use the word ‘wonderful’ more than women do. You compare the frequencies and you get that men have used the word 128 times while women have used it 110 times only. So, it seems that indeed men use ‘wonderful’ more than women do. Nevertheless, there is a number of things to consider, corpus size for example! Here’s the question, is the observed difference actually significant to claim that in general men use that word more than women do? or is it just a matter of chance and has nothing to do with men and women’s speech? To determine whether the difference is statistically significant and not due to chance, we need to use significance tests. One example would be the chi-square test. What the chi-square test does is that it compares the difference between the actual observed frequencies (128 and 110 in our case), with the expected frequencies ( the ones that we would expect if no factor other than chance had been involved). The closer these two results are to each other, the greater the probability that the observed frequencies are influenced by chance alone, hence the difference would not be significant. If you want to read more about it, I recommend this: www.lancaster.ac.uk/fss/courses/ling/corpus/Corpus3/3SIG.HTM Here’s more on expected frequencies and the chi-square test: ruclips.net/video/ZUGKFoHUHQI/видео.html&t On type/token ratio: Type/token ration is a measure of lexical richness. In essence it gives you an idea about how many distinct words (types) are used in a text relative to the total number of words (tokens). It is calculated by dividing the total number of types by the total number of tokens. The closer the score is to 1, the richer the text (the more distinct words are used), the further it is from 1, the more repetitions you have in the text.
It depends on what you want to study exactly. If you are interested in its historical development I would suggest using a historical corpus of English and see how the use of 'they' changes over the years.
Thank you so much for responding my concern sir. I have a study research which in title of THE SINGULARIZATION "THEY" IN AN UNDERGRADUATE THESIS. In our matrix written in methodology. We will use Corpus Instrument instead. So in your own opinion, what exactly corpus were gonna use for our reaserch? Because, i'm not that familiar of corpus yet. There's a lot of questions in my mind about corpus. Thank you for responding again.
Hi Sabrina! A reference corpus is a corpus that you choose as a standard of comparison with the corpus you're working with. It is usually more general and representative of the source language as a whole and it is large enough to represent all relevant varieties of a language and its features. Here's how it is useful. Say you are working with a corpus of biology, and you want to display a list of keywords that are particularly characteristics of the type of discourse or language contained within that biology corpus. In this case, you'd need to compare this 'specialized corpus' with a more general 'reference corpus' so as to see the list of words that are particular to 'biology'.
Not all monitor corpora can be used as reference. A monitor corpus is one which grows in size over time. Still, the data that makes the corpus may not be general enough for the corpus to be used as reference. For instance, a monitor corpus of newspapers' data is certainly not a general corpus, or one to be viewed as 'a standard' for comparison.
A corpus is intended to be a representative sample of authentic language use. There are various types of corpora as you can see so specific research purposes would vary depending on the type of the corpus chosen. But the general aim I would say is to study how a language is used authentically in a given context (either generally, or across different regions, time periods, domains etc...)
Sample Copora 02:25
Corpora for Comparison 05:13
General Corpora 09:50
Specialized Corpora 10:58
Annotated Corpora 11:35
Unannotated Copora 17:11
Learner Corpora 17:50
Hello! I want to email u but i couldn't find ur email. Would u please Write it for me and thanks.
@@ghmarioumaima5391 Hi Oumaima! Sorry about that. There you go: yassine.iabdounane@gmail.com
I am new to the field of Corpus Linguistics. I am learning too many things from your videos. Thank you for sharing such informative videos.
you are the best teacher on youtube , please keep sharing
God bless you sir I'm so grateful for learning this gorgeous lesson
Thank you so much 🥰❤️
Thank you so much, clearly explained.. i'm doing my master's degree in spain and curpus lingsuistics is a new concept to me.
you are helping me a lot in my Master's degree in NLP. Thank you man ! Keep up the good work.
thanks for the nice words man! best of luck with your Master's degree!
Thank you for giving us insightful and organized lessons about the corpus linguistics!
Also, it was very cute of you showing the "Helsinki" in the la Casa de Papel!!!!
Plz sir keep sharing your knowledge with us ❤
From Malaysia, thank you for the explanation.
I am so proud of you!
Thank you so much my dear!
First the first time, i have understood the things related to CL. Thank yoi
I'm very happy to hear that! All the best
Hey, is COCA a monitor corpus or a diachronic corpus please?
What are the type of registre. And please explain registre and geres
Aoa, sir how can I contact you for my PhD research in linguistics using corpus linguistics. thanks
It's awesome to learn different typer of Corpora.
Thank you for watching!
bro, can you make a video on how to search binomial word pairs in a certain corpus, like COCA.
To look for binomial in COCA simply use this expression:
* _n* and * _n*
That's about it bro :)
PS: please delete the spaces between * and _ when you use the expression. I added them because a character between two * is printed in bold here in the comments like this *_n* and *_n*
Great videos Yassine! Thank you
Thank you Reina! I'm glad you find them useful!
@@YassineIabdounane whats the example of Specialized corpora
@@Enjoy.your_life34. a specialized corpus includes texts of a particular type, an example would be the Michigan Corpus of Academic Spoken English (MICASE)
thank you so much!amazing course!
My pleasure! I'm glad you liked it!
Very useful videos. I loved them.
Thanks man! Happy to know that :)
Can you please elaborate statistical significance and significance test with examples?
And also type-token ratio
Please...
On statistical significance and significance testing:
Say that you have two corpora, one contains texts produced by men, and the other contains texts produced by women. You would like to see whether men use the word ‘wonderful’ more than women do. You compare the frequencies and you get that men have used the word 128 times while women have used it 110 times only. So, it seems that indeed men use ‘wonderful’ more than women do. Nevertheless, there is a number of things to consider, corpus size for example! Here’s the question, is the observed difference actually significant to claim that in general men use that word more than women do? or is it just a matter of chance and has nothing to do with men and women’s speech? To determine whether the difference is statistically significant and not due to chance, we need to use significance tests. One example would be the chi-square test. What the chi-square test does is that it compares the difference between the actual observed frequencies (128 and 110 in our case), with the expected frequencies ( the ones that we would expect if no factor other than chance had been involved). The closer these two results are to each other, the greater the probability that the observed frequencies are influenced by chance alone, hence the difference would not be significant.
If you want to read more about it, I recommend this: www.lancaster.ac.uk/fss/courses/ling/corpus/Corpus3/3SIG.HTM
Here’s more on expected frequencies and the chi-square test: ruclips.net/video/ZUGKFoHUHQI/видео.html&t
On type/token ratio:
Type/token ration is a measure of lexical richness. In essence it gives you an idea about how many distinct words (types) are used in a text relative to the total number of words (tokens). It is calculated by dividing the total number of types by the total number of tokens. The closer the score is to 1, the richer the text (the more distinct words are used), the further it is from 1, the more repetitions you have in the text.
Thank you so much.
Thannnkk you so much! Thank you Yassine!
My pleasure!
I have a question sir, how will i use corpus linguistics to this topic Singularization of "they" ?
Hope you answer my question..thank you.
It depends on what you want to study exactly. If you are interested in its historical development I would suggest using a historical corpus of English and see how the use of 'they' changes over the years.
Thank you so much for responding my concern sir. I have a study research which in title of THE SINGULARIZATION "THEY" IN AN UNDERGRADUATE THESIS. In our matrix written in methodology. We will use Corpus Instrument instead. So in your own opinion, what exactly corpus were gonna use for our reaserch? Because, i'm not that familiar of corpus yet. There's a lot of questions in my mind about corpus.
Thank you for responding again.
excellent was very helpful - thanks!
my pleasure! Happy you find it helpful :)
Merci beaucoup !
Please explain Reference corpus.
Hi Sabrina! A reference corpus is a corpus that you choose as a standard of comparison with the corpus you're working with. It is usually more general and representative of the source language as a whole and it is large enough to represent all relevant varieties of a language and its features. Here's how it is useful. Say you are working with a corpus of biology, and you want to display a list of keywords that are particularly characteristics of the type of discourse or language contained within that biology corpus. In this case, you'd need to compare this 'specialized corpus' with a more general 'reference corpus' so as to see the list of words that are particular to 'biology'.
Excellent. Thank you so much.
Refernce corpus and monitor corpus are same or different?
Because when I searched examples Bank of English
Is used as example for both corpora.
Not all monitor corpora can be used as reference. A monitor corpus is one which grows in size over time. Still, the data that makes the corpus may not be general enough for the corpus to be used as reference. For instance, a monitor corpus of newspapers' data is certainly not a general corpus, or one to be viewed as 'a standard' for comparison.
Keep going
good
تحفه❤❤❤❤❤
Thanks!
Tq for the information
Informative
Thank you!
God bless you!
Thank you very much! God bless you too :)
Can i ask? What is the purpose of corpus?
A corpus is intended to be a representative sample of authentic language use. There are various types of corpora as you can see so specific research purposes would vary depending on the type of the corpus chosen. But the general aim I would say is to study how a language is used authentically in a given context (either generally, or across different regions, time periods, domains etc...)
I love the winnie the pooh "repertoire" meme hehe
makes you feel so fancy doesn't it? lol
😭
you look like Snowden
haha it's the glasses I think