A big thank you to Lingopie for sponsoring this video - discover the joy of language learning at Lingopie! 7-day free trial + 70% off the Lifetime Membership: learn.lingopie.com/onewordatatime
I really like Lingopie and would recommend it (love all the subtitling features), but be aware the amount of content that they have for a language varies. Super - no brainer for a language like French, German, English, and Spanish. But I got a one-year subscription (for the purposes of improving my Portuguese). I would say (one year) was worth it -- but not a lot of content for Portuguese (especially some of the content areas I liked) -- so after a year didn't renew (was happy with paying for 1 year, but really had exhausted all the content I was interested in). So highly recommend -- just make sure you are aware of this going in.
I used a technique similar to this years ago when my employer was uploading his decades of newspaper columns about economics to a blog. I was asked to code a script that would scan through all the columns and find the best key words to use as topic tags for quickly finding related posts. Obviously choosing the most frequent n-grams doesn't work because those have less significant meaning (e.g., 'the', 'and', 'this'). Conversely, while the least frequent n-grams (e.g., 'National Center for Education Statistics') may carry more exact meaning, they aren't useful for creating links between related posts because they would rarely be seen again. I had to implement an algorithm to find the sweet spot of n-grams (e.g. 'Federal Reserve', 'inflation', 'climate change') that were shared by several posts but not an overwhelming number. Great video as always! I love the focus on data and analysis applied to second-language learning.
I had a recent experience where i couldn't use the word 'snow' in a conversation. So I said something to the effect of "frozen water that falls from the sky." Wordy, but accurate enough to get the point across.
Yes, agreed that there is some value in specialization, especially in topics you are interested in. Didn't approach this as rigorously as you did, just have watched a TON of travel videos and dining/food videos (even learned ALL the words on many restaurant menus) and yes -- each area has it's sets of specialized words. I find that my comprehension is really strong when I watch travel and food content. I also read comic books -- and comic books have a certain set frequently of vocabulary (like lots of action verbs). Yes, it's an interesting phenomenon that you can use to some extent to your advantage. Also loved your comment about the Easy Languages channels on RUclips -- big fan of their content. I use Easy Portuguese a ton myself.
I kind of realize this vaguely, because yeah in the beginning, top 1-2000 words are essential, but beyond that is now based on specific topics. Which is why being more extensive is important after the beginner stage.
Congrats for the video! Really helpful as always. By the way, I watch your videos from Galicia, so I speak Galician (one of the languages that doesn't appear at the end). It would be awesome if you included another row with the translation in my language in the following videos. In this case, the word for sweet would be "doce" (same word as Portuguese, we share a lot of similatiries hehe). Have a great day!
@@quantus5875 its a software that linguists use to analyze bodies of texts, looking e.g. at what words or chunks are particular about a text, compared to a corpus, or what words are most common, etc
I just found out the one material you need to understand to be able to become fluent in any language, i would like to see your data insight about it: The one book you need to be able to understand to become fluent is: A monolingual dictionary, of course, a dictionary has tens of thousands of words, but that is not what i'm talking about, if you analyze a monolingual dictionary, the high frequency words are going to be the words more used in definitions, and the low frequency words, are going to be the words that the definition is given. for more precision you can make this data analisys only with the words used in the definitions. one could argue that if you're able to understand a new word only by its definition, and are able to give the definition/describe the word you want to say but doesnt know. you reached fluency. so the only vocab list you really need to translate in a language, is the words used to give definitions in the dictionary, after that, you're able to learn all the other words without even using your native language.
Woah, that sounds like something someone should have already coded Then again, you did provide the math, so I guess I should just code it when I get home.
This is fantastic! I'm going to go to all my hoarded text files and calculate a bunch of TFIDFs. It looks from your example list that mixing document genre is OK, perhaps even helpful. Do you know whether it's important to match documents for length? Thanks for this great presentation!
Hey - I really love the idea behind this video (and am a big fan of your channel in general). I will say though that I just tried to implement TFIDF myself via Python script to produce a word list for a Chinese video I am using as input. It looks like there is a high correlation between the word list I would get if I sorted by decreasing TFIDF vs. the word list I would get if I just sorted by raw frequency of appearance in that Chinese video (without using TFIDF). I know you're not necessarily recommending in your video that people calculate / use TFIDF, but I'm wondering if you found the same results in your analysis? I have only analyzed the script for one Chinese YT video so far (and the corpus of text I'm comparing to is 140 other files for random movies / tv shows / book chapters). But it suggests that you may not really need to use TFIDF calculation to get at some of the objectives you're outlining here. But rather, focus on the more general takeaway of not shying away from learning specialized words when you come across them - but you may be able to surface those specialized words without TFIDF.
Hello, there. I was wondering if you uploded the code somewhere. I'm a teacher and I'm trying to create an analysis of the phrases we should learn in preparation to watching a TV-Show in class. It would be immensely useful.
personally, whenever i'm learning a new thing, that i intend to learn my whole life, i like to aproach it, two different ways that I call: 1 - From Start to End. 2 - From End to Start. basically sometimes i study in a more traditional way, studying all the basics, step by step, and then sometimes i just go on and try to do whatever is that im trying to learn, and learn a little bit by my mistakes. of course, in life, there are things that you cant use this strategy, because there are things that dont allow mistakes, but language learning its not one of this things. Short answer: do a little bit of both, just make sure to keep motivated and enjoy the process
From a "motivation" standpoint comprehensible output IMO is the better option. I don't think you have to choose exclusively though maybe spend 70% of your time in CI learning activities and 30% in others. Depends on what learning activities you like? Staying motivated IMO trumps efficiency any day. Better to stay motivated and be a little less efficient, than to be super-efficient and then drop the language a year later due to lack of motivation.
Yeah, you probably should. Although as a programmer myself, I'm sure that such kind of program would be very easy to do on the basic level (probably harder for german, since it has those word parts that travel to the end of the sentence, and it's not always can be easily distinguished programmatically as a part of previous word, and so on). But I can imagine, that despite having multiple AIs that can help code stuff, it still requires tremendous effort for a person that has never programmed before, to make such a program. So yeah, that's basically a guide on how to make this kind of program, not very much of a use to an average person.
The main point that I was trying to get across was that for a given topic, generalized vocabulary that you learn based on a frequency list or in a classroom is probably not enough on its own. So it’s worth going the extra mile to specialize a bit. The point here is not that you need the TFIDF scores for a specific topic 🤓
If you are interested, you don't need very much Python to process this kind of data. I've learned to code mostly from doing projects like this, give it a try!
It would be interesting to find out what topics tend to be the most universally beneficial without falling into the trap of most common words. So mid frequency words that still carry very clear meaning that aren't generic. Like, it would be good to know which topics at each level would most optimally increase our learning and understanding along with vocabulary. You could even split early B1 and advanced B1 topics etc.
I think the general usefulness is just understanding that every topic area has it's own set of useful words. Wouldn't use this video from a technical standpoint, I would say delve into watching/reading topics that interest you -- will be fun because you can get pretty good in specific topics quickly -- I would just recommend keeping language learning fun -- if some topic like "astronomy" doesn't interest you -- then don't waste any time on specific vocab.
Ive been learning german on and off for 10 years. I think im b1 or maybe b2. But b1 for sure i think. And i want to know how important it is for you to know the gender. Because i read and listen but rarely speak and when i do i get nervous i will use wrong grammar. But people say im understandable but im just guessing. Will my guesses be more accurate naturally the more im exposed or do i need to drill the genders? I know all the rules but in practice i just guess i dont think about the rules. I also always learn words with the gender but once again in practice i just guess. I feel way nore confident understanding written or spoken german than producing it. Because i dont have to be confident that i know gender.
I’m German. And it’s great if someone uses the correct gender. But you will be understood nevertheless. And I know there are some rules but not for everything. So guessing is fine. Your guessing will become better over time if you have enough exposure to German. So it feels like guessing or a gut feeling but actually you’ll remember what you have heard/read somewhere. And if you don’t speak much it’s totally normal to make mistakes especially considering gender. But that’s really not that important.
It depends on what language level of speaking proficiency you are aiming for? If you just want to be understood then not that important, but if you are aiming for ~B2 or higher level proficiency you do need to get gender correct (at least most of the time). Also don't be afraid of making gender mistakes. The more you get exposed to the language the better you will get at it. I think of gender as part of learning the word (for languages that have gender) -- IMO you don't really know that well if you don't know the gender of the word.
@@olivia5030 i’m native speaker but it’s based on the scientific theory of comprehensible input which i have used to learn the other three languages i speak
Really enjoyed the advice, but... 2 million words!? Languages don't even consist of 1 million words. Estimates for educated, native speakers are a few hundred thousand words. 🤨
A big thank you to Lingopie for sponsoring this video - discover the joy of language learning at Lingopie! 7-day free trial + 70% off the Lifetime Membership: learn.lingopie.com/onewordatatime
I really like Lingopie and would recommend it (love all the subtitling features), but be aware the amount of content that they have for a language varies. Super - no brainer for a language like French, German, English, and Spanish. But I got a one-year subscription (for the purposes of improving my Portuguese). I would say (one year) was worth it -- but not a lot of content for Portuguese (especially some of the content areas I liked) -- so after a year didn't renew (was happy with paying for 1 year, but really had exhausted all the content I was interested in). So highly recommend -- just make sure you are aware of this going in.
I used a technique similar to this years ago when my employer was uploading his decades of newspaper columns about economics to a blog. I was asked to code a script that would scan through all the columns and find the best key words to use as topic tags for quickly finding related posts. Obviously choosing the most frequent n-grams doesn't work because those have less significant meaning (e.g., 'the', 'and', 'this'). Conversely, while the least frequent n-grams (e.g., 'National Center for Education Statistics') may carry more exact meaning, they aren't useful for creating links between related posts because they would rarely be seen again. I had to implement an algorithm to find the sweet spot of n-grams (e.g. 'Federal Reserve', 'inflation', 'climate change') that were shared by several posts but not an overwhelming number.
Great video as always! I love the focus on data and analysis applied to second-language learning.
2:08 the list killed me. Super interesting as always, if I ever manage to get into programming I'd be curious to try this for myself
I had a recent experience where i couldn't use the word 'snow' in a conversation. So I said something to the effect of "frozen water that falls from the sky." Wordy, but accurate enough to get the point across.
thank you so much for the step by step!
hahahha the Dunning Kruger thing is hilarious
Yes, agreed that there is some value in specialization, especially in topics you are interested in. Didn't approach this as rigorously as you did, just have watched a TON of travel videos and dining/food videos (even learned ALL the words on many restaurant menus) and yes -- each area has it's sets of specialized words. I find that my comprehension is really strong when I watch travel and food content. I also read comic books -- and comic books have a certain set frequently of vocabulary (like lots of action verbs). Yes, it's an interesting phenomenon that you can use to some extent to your advantage.
Also loved your comment about the Easy Languages channels on RUclips -- big fan of their content. I use Easy Portuguese a ton myself.
I kind of realize this vaguely, because yeah in the beginning, top 1-2000 words are essential, but beyond that is now based on specific topics. Which is why being more extensive is important after the beginner stage.
I admire how much data analysis you think your audience can do XD
Congrats for the video! Really helpful as always. By the way, I watch your videos from Galicia, so I speak Galician (one of the languages that doesn't appear at the end). It would be awesome if you included another row with the translation in my language in the following videos. In this case, the word for sweet would be "doce" (same word as Portuguese, we share a lot of similatiries hehe). Have a great day!
You can find out a lot using Wordsmith too. Great video btw!
What is Wordsmith?
@@quantus5875 its a software that linguists use to analyze bodies of texts, looking e.g. at what words or chunks are particular about a text, compared to a corpus, or what words are most common, etc
Your videos are always so swell
I found this very interesting, while at the same time knowing their ain't no way i am going to do that. Ill just nut it out or look up the words.
I just found out the one material you need to understand to be able to become fluent in any language, i would like to see your data insight about it:
The one book you need to be able to understand to become fluent is: A monolingual dictionary, of course, a dictionary has tens of thousands of words, but that is not what i'm talking about, if you analyze a monolingual dictionary, the high frequency words are going to be the words more used in definitions, and the low frequency words, are going to be the words that the definition is given. for more precision you can make this data analisys only with the words used in the definitions.
one could argue that if you're able to understand a new word only by its definition, and are able to give the definition/describe the word you want to say but doesnt know. you reached fluency. so the only vocab list you really need to translate in a language, is the words used to give definitions in the dictionary, after that, you're able to learn all the other words without even using your native language.
Woah, that sounds like something someone should have already coded
Then again, you did provide the math, so I guess I should just code it when I get home.
lingq does something similar, but in a different way.
Doesnt use the same logic/formula, but the results are similar
This is fantastic! I'm going to go to all my hoarded text files and calculate a bunch of TFIDFs. It looks from your example list that mixing document genre is OK, perhaps even helpful. Do you know whether it's important to match documents for length? Thanks for this great presentation!
Hey - I really love the idea behind this video (and am a big fan of your channel in general). I will say though that I just tried to implement TFIDF myself via Python script to produce a word list for a Chinese video I am using as input. It looks like there is a high correlation between the word list I would get if I sorted by decreasing TFIDF vs. the word list I would get if I just sorted by raw frequency of appearance in that Chinese video (without using TFIDF).
I know you're not necessarily recommending in your video that people calculate / use TFIDF, but I'm wondering if you found the same results in your analysis? I have only analyzed the script for one Chinese YT video so far (and the corpus of text I'm comparing to is 140 other files for random movies / tv shows / book chapters). But it suggests that you may not really need to use TFIDF calculation to get at some of the objectives you're outlining here. But rather, focus on the more general takeaway of not shying away from learning specialized words when you come across them - but you may be able to surface those specialized words without TFIDF.
I was planning on doing the same over the net week or so (when I get round to it) will let you know if I got the same result (also for chinese)
Hello, there. I was wondering if you uploded the code somewhere. I'm a teacher and I'm trying to create an analysis of the phrases we should learn in preparation to watching a TV-Show in class. It would be immensely useful.
Should I stick to comprehensible input or watch more advanced videos in german?
personally, whenever i'm learning a new thing, that i intend to learn my whole life, i like to aproach it, two different ways that I call: 1 - From Start to End. 2 - From End to Start.
basically sometimes i study in a more traditional way, studying all the basics, step by step, and then sometimes i just go on and try to do whatever is that im trying to learn, and learn a little bit by my mistakes. of course, in life, there are things that you cant use this strategy, because there are things that dont allow mistakes, but language learning its not one of this things.
Short answer: do a little bit of both, just make sure to keep motivated and enjoy the process
From a "motivation" standpoint comprehensible output IMO is the better option. I don't think you have to choose exclusively though maybe spend 70% of your time in CI learning activities and 30% in others. Depends on what learning activities you like? Staying motivated IMO trumps efficiency any day. Better to stay motivated and be a little less efficient, than to be super-efficient and then drop the language a year later due to lack of motivation.
Now to try and find a way to gather said information...
I spent way too much time doing data analysis on a Gänsehaut book instead of actually reading the book 😂
3:15 the random video in the background saying "Wieder Obdachlose am Hafencity verprügelt" killed me as a german
Lingopie is always 70% off
Lingopie is a fraud
same with Lingq lol. Meanwhile i've had a Readlang subscription for 10 years and it's been $5 a month the whole time.
but how though ,,, should I know how to program stuff like that to start learning a language ??
Yeah, you probably should. Although as a programmer myself, I'm sure that such kind of program would be very easy to do on the basic level (probably harder for german, since it has those word parts that travel to the end of the sentence, and it's not always can be easily distinguished programmatically as a part of previous word, and so on). But I can imagine, that despite having multiple AIs that can help code stuff, it still requires tremendous effort for a person that has never programmed before, to make such a program. So yeah, that's basically a guide on how to make this kind of program, not very much of a use to an average person.
The main point that I was trying to get across was that for a given topic, generalized vocabulary that you learn based on a frequency list or in a classroom is probably not enough on its own. So it’s worth going the extra mile to specialize a bit. The point here is not that you need the TFIDF scores for a specific topic 🤓
If you are interested, you don't need very much Python to process this kind of data. I've learned to code mostly from doing projects like this, give it a try!
It would be interesting to find out what topics tend to be the most universally beneficial without falling into the trap of most common words. So mid frequency words that still carry very clear meaning that aren't generic.
Like, it would be good to know which topics at each level would most optimally increase our learning and understanding along with vocabulary. You could even split early B1 and advanced B1 topics etc.
Lol. No, you don't need to be able to program to start learning a language. Jesus fucking christ dude...
I thought it was an interesting idea but at the end I didn't come away with any practical useful information, evidence, testing and so on.
I think the general usefulness is just understanding that every topic area has it's own set of useful words. Wouldn't use this video from a technical standpoint, I would say delve into watching/reading topics that interest you -- will be fun because you can get pretty good in specific topics quickly -- I would just recommend keeping language learning fun -- if some topic like "astronomy" doesn't interest you -- then don't waste any time on specific vocab.
Ive been learning german on and off for 10 years. I think im b1 or maybe b2. But b1 for sure i think. And i want to know how important it is for you to know the gender. Because i read and listen but rarely speak and when i do i get nervous i will use wrong grammar. But people say im understandable but im just guessing. Will my guesses be more accurate naturally the more im exposed or do i need to drill the genders? I know all the rules but in practice i just guess i dont think about the rules. I also always learn words with the gender but once again in practice i just guess. I feel way nore confident understanding written or spoken german than producing it. Because i dont have to be confident that i know gender.
you need input! you’ve already understood the rules so now it has to go into the subconscious by listening and reading a lot. ☺️
I’m German. And it’s great if someone uses the correct gender. But you will be understood nevertheless. And I know there are some rules but not for everything. So guessing is fine. Your guessing will become better over time if you have enough exposure to German. So it feels like guessing or a gut feeling but actually you’ll remember what you have heard/read somewhere.
And if you don’t speak much it’s totally normal to make mistakes especially considering gender. But that’s really not that important.
It depends on what language level of speaking proficiency you are aiming for? If you just want to be understood then not that important, but if you are aiming for ~B2 or higher level proficiency you do need to get gender correct (at least most of the time). Also don't be afraid of making gender mistakes. The more you get exposed to the language the better you will get at it.
I think of gender as part of learning the word (for languages that have gender) -- IMO you don't really know that well if you don't know the gender of the word.
@@chrissy4957 have you learned German?
@@olivia5030 i’m native speaker but it’s based on the scientific theory of comprehensible input which i have used to learn the other three languages i speak
Really enjoyed the advice, but... 2 million words!? Languages don't even consist of 1 million words. Estimates for educated, native speakers are a few hundred thousand words. 🤨
I'm pretty sure he means that in the sense of total words (like the total number of words in a book) not unique words.
@@fireflyswords5739 I think you're right. Noticed it after binge watching his other videos. Good stuff.