The corpus manager is one of the best innovations brought by ver. 4, very well done. Having my corpora always already loaded truly speeds up the workflow, and the fact that reference corpus can be immediately swapped with target is a nice little touch. With each update, Prof. Anthony is ushering in the 4th generation of corpus tools. Thank you, sir!
Hello, Professor ! Thank you so much for the tutorial. It helped me a lot, I must say. But I have a question. So you uploaded excel word list file that includes the frequency. I couldn’t be able to see if ‘’ the’’ included there. However when you clicked all the words occured. So my question is will only the words that is written in our list occur or all the words and their frequencies will be available to us ? ( I think I asked in a complicated way , sorry for that :( )
Hello, professor! Thank you so much for your video, it's been helping me a lot. I've a question, how am I supposed to add stoplists to hide prepositions and unnecessary articles from my wordlist? The farthest I've got was to go to Global Settings >> Hide words from the file >> Add file list, however, the words that were supposed to be hidden still appear for me.
@@AntLabJPN Hi there! Thank you for your quick reply. Both "clusters/n-grams..." and "hide words in the file" are selected, but when I apply and press "start" to generate a wordlist, the words remain. I checked twice and the stoplist still selected in the box.
@@isabelagesser8627 Can you try a simple experiment? Generate a word list using the in-built demo corpus. Then, try using the filter tool to eliminate the word "the". Once you get that working, I think everything else will start to work, too.
Hello Professor Anthony, thank you for providing a tutorial for your excellent AntConc software. I am not an academic, however I am studying the apocrypha and my bible software does not allow me to get statistics from that text. I wanted to know if it is possible in AntConc to somehow additionally sort the Word List according to the files of the corpus. In my studies I am especially interested in words that appear only once, so I have clicked Start to get the Word List, I have ordered it by frequency, but I also want the results to be grouped by the file, so I don't have to keep manually searching the hapaxes in KWIC to find out which book they belong to. I hope I explained that correctly. Thank you and have a great day.
Hi. I suggest you use a two step procedure. First, generate the hapax using the word list tool. Second, copy the list of hapax and load the list as words to search in the KWIC tool via the advanced search. If you do that, then, you can search for all the words at once and you'll immediately see which file they appear in. You can also sort the KWIC results by file ID. I hope that helps!
That's a great question. When comparing relative frequencies in a corpus, we often have very small values, so people in the field like to scale up the value to something a little more meaningful e.g. "10 words per thousand" or "10 words per million". So, the "per thousand" or "per million" is the multipler.
hello professor The Antconc version I have is 4.3.1 How can I save TXT documents? When I close the page and open it again, the documents I uploaded are deleted. I cannot see the documents I uploaded later on the (Corpus Manager) page. Thank you for your help.
Hi. I think you are using the "Open Files as Quick Corpus" option. If you use that, the corpus is saved as temp.db. I recommend you use the "Raw Files" option in the corpus manager and build your corpus there. You can then give it a proper name and it will stay in the repository for use later.
Hi again Laurence, thank you for all your advice. I'm trying to delete a temp.db file that I uploaded. However, the message 'temp.db could not be deleted because it is being used by another process' keeps coming up. The document is closed on my computer. How can I find what the process is that is keeping it from being deleted?
Hi. If you update to AntConc 4.3.0, you should find that this bug is fixed. Otherwise, you should find that if you restart AntConc, the deleted file will correctly not show again.
Prof. Anthony, hi. Can I use this device in order to count unique individual words in Books ? In my language I think it has never been done . So i wanted to count how many individual words are used in our books from school level to university level. So , I wonder can I use this device in order to do that? Thank you for your answer beforehand
Yes. AntConc can be used for exactly this task. Load in pdfs, docx, or text files of your books, and then use the word list tool to count all the individual words.
@@AntLabJPN thank you for your explanation. I also wanted to know , how can I work with agglutinative languages? Can I add stems and make antconk to look for them? Or it has other functions than better than my idea ?
Ptofessor Anthony hello. I have created corpus and did the lemas . I learned a lit of things becaue of you. Dear professor now I wnated to.understand the usage of parts of speech in.y corpus. How can I do it?
Yes, you can! I document how to do it in the help guide, but just check the "headword" option when you create a corpus via the Corpus Manager. You'll see it at the bottom of the screen here: ruclips.net/video/yDSa1rp8Bqs/видео.html
@@AntLabJPN Thank you! I have another question...What would you say is the most efficient way to analyze which adjectives are most used to describe animals; and which animals are mentioned the most? Sorry for the questions but I am taking a university course in Germany and I do not know so much about corpus analysis....yet.
Why do my KWIC Alice results look different to yours? My first result is: “ which certainly was not here before , ” said Alice, , ) and round the neck of the bottle To the left of the blue 'Alice', 'said' is in red, " is in green & , is in red so no text is highlighted to the left except 'said'. As far as I can see, all my settings are the same as yours.
Hi Professor! Thank you for your tutorial! But I have a question concerning the installment of the reference corpus through existing files. I couldn't figure out how to install a reference corpus in Antconc version 4.2. When I tried to do it after successfully installing the target corpus, I always got the "overwrite the existing corpus" note. Can you help me with that? I also tried older versions like 4.0 etc but when I clicked "create" nothing happened. I don't know if it's the problem with my reference corpus because I have 500 txt files in total. Thank you very much!
Hi. Target corpora and reference corpora are just corpora in AntConc. You can choose any corpus to be a target corpus and any corpus to be a reference corpus. My guess is that you are trying to create a new corpus with the same name as an existing corpus. Just choose a unique name for your corpus in the corpus manager and you should be fine.
@@AntLabJPN Thank you for your response! I tried to use different names for my target corpora and reference corpora. Now there is no "overwriting" note but after I installed my reference corpora successfully the target copora just disappeared when I returned to the main window. Can you help me with that? my operation: corpus manager→target corpus→raw files→add files→change corpus name→create. Then the target corpus is successfully installed. Then, reference corpus→raw files (do I need to clear the previous files and then add new files or just leave them together?) → change the corpus name →create (The new problem is there is no notification of overwriting anymore but after it's done, the target corpus that has been installed is gone cuz when i turn back to the target corpus page it's empty) Thank you very much!!
@@Solitale- First, create your two corpora. Next, in the right pane of the corpus manager select if you want to set the target or reference corpus by clicking on the relevant tab. Then, simply select the corpus in the left pane to set it as your choice. Think of the right pane as determining *how* a corpus will be used. Corpora never get deleted unless you choose to delete them. They always appear on the left. The right pane chooses the role of a corpus for a particulary study.
@@AntLabJPN Now i figured! I need to click Corpus Database rather than staying at Raw Files to choose my target and reference corpus. Thank you very much! All the best wishes!
Actually, if you pick which role you want to use before creating your corpus, it will automatically initially set that role when you build it. Of course, you can change the role at any time.
When uploading my raw files, there is a pop up saying that there is an error. Thus, there are no files being uploaded. Can I seek help on this? Thank you.
@@hjalmarp.hernandezph.d.9133 Hi, my guess is that the error report is about UTF-8. See, the FAQ 5 comment on the AntConc website: www.laurenceanthony.net/software/antconc/
Hi. If you are referring to the word count given by software like Microsoft Word, it's because the way the software counts words is slightly different to the default setting in AntConc (e.g. in AntConc, "don't" would be considered as 2 words.
Hi. If each glossary is saved as a separate file, yes, you can do this. You can use the KWIC tool to find the relevant entries and then click on the result to jump to the file view to see the whole glossary.
The csv file was for demonstration purposes. If you download one of the corpora from the online respository and create a word list and save that, you can see the format that you need to use. You can find many word lists across the Internet. As long as you have a "word" and "frequency" column, then you can load it. I hope that helps!
@@SimonCheung-fk5iz Are you trying to load a word list without any frequency information? If so, AntConc won't be able to accept the list and it won't work.
Hi. Usually this is because you did not install the software correctly. Check the help page for how to do this. If you still have problems, reply back here.
I just wanted to share my difficulties in using very large corpora (but I don't mean to complain, because this is a great tool). I recently purchased the COCA (1 bn words) but it seems to be too big for smooth functioning, even though I use a relatively powerful PC (Intel Core i9). For example, it took an hour to load into the Corpus Manager, and then failed at 90% complete and I needed to do it again. After that, it could perform KWIC searches relatively quickly, but things like Cluster or Collocation searches can take up to 30 minutes per search. Prof. Anthony, do you have an estimate of the upper limits of corpus size that retains smooth functioning?
Hi Michael. The speed of processing a corpus will be largely unaffected by the power of the CPU in AntConc 4. It really comes down to the database design. So, loading a corpus will be slow because the words have to be imported into the database and indexed. You should find AntConc 4.2 is twice as fast as 4.0 and 4.1, but it's still not going to be instant. Saying that, loading a 1 bn word corpus in 1 hour seems very fast to me. I'm surprised at the slow performance of cluster and collocation searches. If you search for a word that is reasonably rare, it should complete in a few seconds. Is this not the case? To be honest, I don't develop using 1 bn word corpora, but I should be able to optimize the sofware for good performance even with these bigger corpora. Let me look into it.
@@AntLabJPN Thank you. I should have been more accurate; the long Clusters time was actually on 4.0, and was probably for 2gram searches like /Thomist*/ + R1, and I believe all Collocation searches took dozens of minutes. When I downloaded 4.2 none of my corpora transferred over, for some reason (maybe because COCA was on there?). When I tried re-uploading COCA the other day it failed, and I haven't tried again, but I'll look forward to the faster speed. And knowing that it's normal to take over an hour to load something that big is reassuring.
@@michaelhenshaw-vetmedengli2064 Hi. Are you woorking on a PC or a Mac? On a PC, the new app is installed in the same location as the old version so the corpora there should still be viewable in the repository. In fact, you can even see where it is installed using the shortcut properties in the Windows Start. Also, 4.2 does some speed improvements. I suggest you start with just "Thomist" and check the speed. It should be functioning smoothly. Adding a wildcard means that the database has to do a regex search through the index which is much slower than an exact match. Let me know how you go.
@@AntLabJPN Ah, now I see what happened. I had installed 4.0.2 in Documents>Corpus Linguistics>AntConc with my other CL stuff instead of in the default Start Menu>Programs. That had worked fine for the earlier portable versions. So, I tried copying that folder to Start Menu>Programs and then re-installed 4.2, but it still didn't carry over my corpora. But this is a minor inconvenience, really, and the problem should be solved with future updates. Thank you for your help, and overall dedication to this project.
@@michaelhenshaw-vetmedengli2064 If you installed 4.0.2 as a portable version and the corpora are still there, it is very easy to copy over your corpora into the new version. Just use the Corpus Manager-> Add Database File(s) or Add Database Dir option, select the folder where the corpora are stored and copy them over.
I'm not quite sure how that would work. Are you suggesting that the program would link the *ideas* from the scripts. If so, that would be quite advanced. Perhaps AI could help.
The corpus manager is one of the best innovations brought by ver. 4, very well done. Having my corpora always already loaded truly speeds up the workflow, and the fact that reference corpus can be immediately swapped with target is a nice little touch.
With each update, Prof. Anthony is ushering in the 4th generation of corpus tools. Thank you, sir!
Wow. Thanks Michael. That's such a lovely comment. It is very much appreciated.
It is indeed extremely useful and a massive time-saver!
Thanks Dr Anthony for the tutorial it's super helpful, AntConc is revolutionary indeed
Thank you for the really kind comment. It's much appreciated!
Great explanation!
Thank you! I'm hoping to create more videos as soon as I can.
Hello, Professor ! Thank you so much for the tutorial. It helped me a lot, I must say. But I have a question. So you uploaded excel word list file that includes the frequency. I couldn’t be able to see if ‘’ the’’ included there. However when you clicked all the words occured. So my question is will only the words that is written in our list occur or all the words and their frequencies will be available to us ? ( I think I asked in a complicated way , sorry for that :( )
Hello, professor! Thank you so much for your video, it's been helping me a lot.
I've a question, how am I supposed to add stoplists to hide prepositions and unnecessary articles from my wordlist? The farthest I've got was to go to Global Settings >> Hide words from the file >> Add file list, however, the words that were supposed to be hidden still appear for me.
Hi. Yes, you need to use the Tool Filters option in the global settings. My guess is that you forgot to activate the option by checking the checkbox.
@@AntLabJPN Hi there! Thank you for your quick reply. Both "clusters/n-grams..." and "hide words in the file" are selected, but when I apply and press "start" to generate a wordlist, the words remain. I checked twice and the stoplist still selected in the box.
@@isabelagesser8627 Can you try a simple experiment? Generate a word list using the in-built demo corpus. Then, try using the filter tool to eliminate the word "the". Once you get that working, I think everything else will start to work, too.
@@AntLabJPN It worked, thank you so much!
Hello Professor Anthony, thank you for providing a tutorial for your excellent AntConc software. I am not an academic, however I am studying the apocrypha and my bible software does not allow me to get statistics from that text. I wanted to know if it is possible in AntConc to somehow additionally sort the Word List according to the files of the corpus. In my studies I am especially interested in words that appear only once, so I have clicked Start to get the Word List, I have ordered it by frequency, but I also want the results to be grouped by the file, so I don't have to keep manually searching the hapaxes in KWIC to find out which book they belong to. I hope I explained that correctly. Thank you and have a great day.
Hi. I suggest you use a two step procedure. First, generate the hapax using the word list tool. Second, copy the list of hapax and load the list as words to search in the KWIC tool via the advanced search. If you do that, then, you can search for all the words at once and you'll immediately see which file they appear in. You can also sort the KWIC results by file ID. I hope that helps!
@@AntLabJPN Brilliant! Thank you so much
great! In the Statistics under Global Settings, there is a Normalized Frequency Multiplier. what dos that do?
That's a great question. When comparing relative frequencies in a corpus, we often have very small values, so people in the field like to scale up the value to something a little more meaningful e.g. "10 words per thousand" or "10 words per million". So, the "per thousand" or "per million" is the multipler.
@@AntLabJPN thank you for the prompt reply. I'm submitting an article and I've used AntConc for lexical bundle analysis. Great tool. Thank you Anthony
@@AntLabJPN thank you Lawrence, I always thought the first name was Anthony
@@Pythonology Many people get my name wrong. My first name is actually Laurence (with a "U").
hello professor
The Antconc version I have is 4.3.1
How can I save TXT documents?
When I close the page and open it again, the documents I uploaded are deleted.
I cannot see the documents I uploaded later on the (Corpus Manager) page.
Thank you for your help.
Hi. I think you are using the "Open Files as Quick Corpus" option. If you use that, the corpus is saved as temp.db. I recommend you use the "Raw Files" option in the corpus manager and build your corpus there. You can then give it a proper name and it will stay in the repository for use later.
Hi again Laurence, thank you for all your advice. I'm trying to delete a temp.db file that I uploaded. However, the message 'temp.db could not be deleted because it is being used by another process' keeps coming up. The document is closed on my computer. How can I find what the process is that is keeping it from being deleted?
Hi. If you update to AntConc 4.3.0, you should find that this bug is fixed. Otherwise, you should find that if you restart AntConc, the deleted file will correctly not show again.
Prof. Anthony, hi. Can I use this device in order to count unique individual words in Books ? In my language I think it has never been done . So i wanted to count how many individual words are used in our books from school level to university level. So , I wonder can I use this device in order to do that? Thank you for your answer beforehand
Yes. AntConc can be used for exactly this task. Load in pdfs, docx, or text files of your books, and then use the word list tool to count all the individual words.
@@AntLabJPN 🙏 Can I directly add pdf? I thought I should convert them to text files first .
@@bozok6360Yes, you can load in pdfs directly. There is no need to convert them.
@@AntLabJPN thank you for your explanation. I also wanted to know , how can I work with agglutinative languages? Can I add stems and make antconk to look for them? Or it has other functions than better than my idea ?
@@bozok6360 Yes. I suggest you look at the different wildcard options. They are listed in the global settings.
Ptofessor Anthony hello. I have created corpus and did the lemas . I learned a lit of things becaue of you. Dear professor now I wnated to.understand the usage of parts of speech in.y corpus. How can I do it?
I wanted to ask if you can use a lemma list in this version? Because I cannot find the function to upload a lemma list?
Yes, you can! I document how to do it in the help guide, but just check the "headword" option when you create a corpus via the Corpus Manager. You'll see it at the bottom of the screen here: ruclips.net/video/yDSa1rp8Bqs/видео.html
@@AntLabJPN Thank you! I have another question...What would you say is the most efficient way to analyze which adjectives are most used to describe animals; and which animals are mentioned the most? Sorry for the questions but I am taking a university course in Germany and I do not know so much about corpus analysis....yet.
Thank you!!!
You're welcome!
Why do my KWIC Alice results look different to yours? My first result is:
“ which certainly was not here before , ” said Alice, , ) and round the neck of the bottle
To the left of the blue 'Alice', 'said' is in red, " is in green & , is in red so no text is highlighted to the left except 'said'.
As far as I can see, all my settings are the same as yours.
Hi. It sounds like you've clicked the sort option to sort to the left. The default seting in AntConc is to sort to the right.
Hi Professor! Thank you for your tutorial!
But I have a question concerning the installment of the reference corpus through existing files. I couldn't figure out how to install a reference corpus in Antconc version 4.2. When I tried to do it after successfully installing the target corpus, I always got the "overwrite the existing corpus" note. Can you help me with that? I also tried older versions like 4.0 etc but when I clicked "create" nothing happened. I don't know if it's the problem with my reference corpus because I have 500 txt files in total. Thank you very much!
Hi. Target corpora and reference corpora are just corpora in AntConc. You can choose any corpus to be a target corpus and any corpus to be a reference corpus. My guess is that you are trying to create a new corpus with the same name as an existing corpus. Just choose a unique name for your corpus in the corpus manager and you should be fine.
@@AntLabJPN Thank you for your response! I tried to use different names for my target corpora and reference corpora. Now there is no "overwriting" note but after I installed my reference corpora successfully the target copora just disappeared when I returned to the main window. Can you help me with that?
my operation: corpus manager→target corpus→raw files→add files→change corpus name→create. Then the target corpus is successfully installed.
Then, reference corpus→raw files (do I need to clear the previous files and then add new files or just leave them together?) → change the corpus name →create (The new problem is there is no notification of overwriting anymore but after it's done, the target corpus that has been installed is gone cuz when i turn back to the target corpus page it's empty)
Thank you very much!!
@@Solitale- First, create your two corpora. Next, in the right pane of the corpus manager select if you want to set the target or reference corpus by clicking on the relevant tab. Then, simply select the corpus in the left pane to set it as your choice. Think of the right pane as determining *how* a corpus will be used. Corpora never get deleted unless you choose to delete them. They always appear on the left. The right pane chooses the role of a corpus for a particulary study.
@@AntLabJPN Now i figured! I need to click Corpus Database rather than staying at Raw Files to choose my target and reference corpus. Thank you very much! All the best wishes!
Actually, if you pick which role you want to use before creating your corpus, it will automatically initially set that role when you build it. Of course, you can change the role at any time.
Hello. I am not able to load the temp file after opening corpus manager
Hi. My guess is that you are on a Mac and you didn't install the software properly. Did you drag the app to your Applications folder?
When uploading my raw files, there is a pop up saying that there is an error. Thus, there are no files being uploaded. Can I seek help on this? Thank you.
Hi. What's the error that is shown?
The pop up shows "The following user files could not be read. See the error report below". How can I solve this? Thank you.@@AntLabJPN
@@hjalmarp.hernandezph.d.9133 Hi, my guess is that the error report is about UTF-8. See, the FAQ 5 comment on the AntConc website: www.laurenceanthony.net/software/antconc/
Oh. I will do that and go back to u. Thank u so much.@@AntLabJPN
I have tried resaving one of the files into UTF-8. I can now save raw files. Thank u again.@@AntLabJPN
The word count on my document is different to the token count when I upload it to AntConc. Why is that?
Hi. If you are referring to the word count given by software like Microsoft Word, it's because the way the software counts words is slightly different to the default setting in AntConc (e.g. in AntConc, "don't" would be considered as 2 words.
@@AntLabJPN Okay thank you
Hi Sir! I want to know about loading token definitions...
What exactly do you mean?
Can i upload bilingual glossaries? And then search for a word and see the result in all glossaries uploaded as raw
Hi. If each glossary is saved as a separate file, yes, you can do this. You can use the KWIC tool to find the relevant entries and then click on the result to jump to the file view to see the whole glossary.
Hi, thanks for this new tutorial series! Very useful! 😀 Just one question: where can we find the Excel file (AmE06 etc.csv) to load a simple wordlist?
The csv file was for demonstration purposes. If you download one of the corpora from the online respository and create a word list and save that, you can see the format that you need to use. You can find many word lists across the Internet. As long as you have a "word" and "frequency" column, then you can load it. I hope that helps!
@@AntLabJPN It does help, thank you!! 👍
@@AntLabJPNIf I only have the words without the frequency column. Does it mean that the word list function will not be working?
@@SimonCheung-fk5iz Are you trying to load a word list without any frequency information? If so, AntConc won't be able to accept the list and it won't work.
@@AntLabJPN I see. Thanks for your clarification👍
I am not able to load text file after opening corpus manager
Hi. Usually this is because you did not install the software correctly. Check the help page for how to do this. If you still have problems, reply back here.
I have found a youtube transcript tool. Combined with this antconc it could be useful to retrieve knowledge from youtube, no ?
Yes, absolutely!
I just wanted to share my difficulties in using very large corpora (but I don't mean to complain, because this is a great tool). I recently purchased the COCA (1 bn words) but it seems to be too big for smooth functioning, even though I use a relatively powerful PC (Intel Core i9). For example, it took an hour to load into the Corpus Manager, and then failed at 90% complete and I needed to do it again. After that, it could perform KWIC searches relatively quickly, but things like Cluster or Collocation searches can take up to 30 minutes per search.
Prof. Anthony, do you have an estimate of the upper limits of corpus size that retains smooth functioning?
Hi Michael. The speed of processing a corpus will be largely unaffected by the power of the CPU in AntConc 4. It really comes down to the database design. So, loading a corpus will be slow because the words have to be imported into the database and indexed. You should find AntConc 4.2 is twice as fast as 4.0 and 4.1, but it's still not going to be instant. Saying that, loading a 1 bn word corpus in 1 hour seems very fast to me. I'm surprised at the slow performance of cluster and collocation searches. If you search for a word that is reasonably rare, it should complete in a few seconds. Is this not the case? To be honest, I don't develop using 1 bn word corpora, but I should be able to optimize the sofware for good performance even with these bigger corpora. Let me look into it.
@@AntLabJPN Thank you. I should have been more accurate; the long Clusters time was actually on 4.0, and was probably for 2gram searches like /Thomist*/ + R1, and I believe all Collocation searches took dozens of minutes. When I downloaded 4.2 none of my corpora transferred over, for some reason (maybe because COCA was on there?). When I tried re-uploading COCA the other day it failed, and I haven't tried again, but I'll look forward to the faster speed. And knowing that it's normal to take over an hour to load something that big is reassuring.
@@michaelhenshaw-vetmedengli2064 Hi. Are you woorking on a PC or a Mac? On a PC, the new app is installed in the same location as the old version so the corpora there should still be viewable in the repository. In fact, you can even see where it is installed using the shortcut properties in the Windows Start. Also, 4.2 does some speed improvements. I suggest you start with just "Thomist" and check the speed. It should be functioning smoothly. Adding a wildcard means that the database has to do a regex search through the index which is much slower than an exact match. Let me know how you go.
@@AntLabJPN Ah, now I see what happened. I had installed 4.0.2 in Documents>Corpus Linguistics>AntConc with my other CL stuff instead of in the default Start Menu>Programs. That had worked fine for the earlier portable versions. So, I tried copying that folder to Start Menu>Programs and then re-installed 4.2, but it still didn't carry over my corpora. But this is a minor inconvenience, really, and the problem should be solved with future updates. Thank you for your help, and overall dedication to this project.
@@michaelhenshaw-vetmedengli2064 If you installed 4.0.2 as a portable version and the corpora are still there, it is very easy to copy over your corpora into the new version. Just use the Corpus Manager-> Add Database File(s) or Add Database Dir option, select the folder where the corpora are stored and copy them over.
it would be fantastic to be able to create mindmaps from university course scripts
I'm not quite sure how that would work. Are you suggesting that the program would link the *ideas* from the scripts. If so, that would be quite advanced. Perhaps AI could help.
How is this open source. YOU ARE A LEGEND😍🥰🤗
Thank you for the kind message.