AntConc 4 (ver. 4.2) - Corpus Manager Basics

Laurence Anthony

Просмотров 21 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 28 янв 2025

Комментарии • 85

@michaelhenshaw-vetmedengli2064 2 года назад ⁺¹²
The corpus manager is one of the best innovations brought by ver. 4, very well done. Having my corpora always already loaded truly speeds up the workflow, and the fact that reference corpus can be immediately swapped with target is a nice little touch.
With each update, Prof. Anthony is ushering in the 4th generation of corpus tools. Thank you, sir!
@AntLabJPN 2 года назад ⁺¹
Wow. Thanks Michael. That's such a lovely comment. It is very much appreciated.
@e-liveproject 2 года назад
It is indeed extremely useful and a massive time-saver!
@caiholbrook7458 Год назад ⁺⁴
Thanks Dr Anthony for the tutorial it's super helpful, AntConc is revolutionary indeed
@AntLabJPN Год назад ⁺¹
Thank you for the really kind comment. It's much appreciated!
@e-liveproject 2 года назад ⁺²
Great explanation!
@AntLabJPN 2 года назад ⁺¹
Thank you! I'm hoping to create more videos as soon as I can.
@Pythonology Год назад
great! In the Statistics under Global Settings, there is a Normalized Frequency Multiplier. what dos that do?
@AntLabJPN Год назад ⁺¹
That's a great question. When comparing relative frequencies in a corpus, we often have very small values, so people in the field like to scale up the value to something a little more meaningful e.g. "10 words per thousand" or "10 words per million". So, the "per thousand" or "per million" is the multipler.
@Pythonology Год назад
@@AntLabJPN thank you for the prompt reply. I'm submitting an article and I've used AntConc for lexical bundle analysis. Great tool. Thank you Anthony
@Pythonology Год назад
@@AntLabJPN thank you Lawrence, I always thought the first name was Anthony
@AntLabJPN Год назад ⁺¹
@@Pythonology Many people get my name wrong. My first name is actually Laurence (with a "U").
@isabelagesser8627 9 месяцев назад ⁺¹
Hello, professor! Thank you so much for your video, it's been helping me a lot.
I've a question, how am I supposed to add stoplists to hide prepositions and unnecessary articles from my wordlist? The farthest I've got was to go to Global Settings >> Hide words from the file >> Add file list, however, the words that were supposed to be hidden still appear for me.
@AntLabJPN 9 месяцев назад
Hi. Yes, you need to use the Tool Filters option in the global settings. My guess is that you forgot to activate the option by checking the checkbox.
@isabelagesser8627 9 месяцев назад
@@AntLabJPN Hi there! Thank you for your quick reply. Both "clusters/n-grams..." and "hide words in the file" are selected, but when I apply and press "start" to generate a wordlist, the words remain. I checked twice and the stoplist still selected in the box.
@AntLabJPN 9 месяцев назад ⁺¹
@@isabelagesser8627 Can you try a simple experiment? Generate a word list using the in-built demo corpus. Then, try using the filter tool to eliminate the word "the". Once you get that working, I think everything else will start to work, too.
@isabelagesser8627 9 месяцев назад ⁺¹
@@AntLabJPN It worked, thank you so much!
@xenitheo 9 месяцев назад ⁺¹
Hello Professor Anthony, thank you for providing a tutorial for your excellent AntConc software. I am not an academic, however I am studying the apocrypha and my bible software does not allow me to get statistics from that text. I wanted to know if it is possible in AntConc to somehow additionally sort the Word List according to the files of the corpus. In my studies I am especially interested in words that appear only once, so I have clicked Start to get the Word List, I have ordered it by frequency, but I also want the results to be grouped by the file, so I don't have to keep manually searching the hapaxes in KWIC to find out which book they belong to. I hope I explained that correctly. Thank you and have a great day.
@AntLabJPN 9 месяцев назад
Hi. I suggest you use a two step procedure. First, generate the hapax using the word list tool. Second, copy the list of hapax and load the list as words to search in the KWIC tool via the advanced search. If you do that, then, you can search for all the words at once and you'll immediately see which file they appear in. You can also sort the KWIC results by file ID. I hope that helps!
@xenitheo 9 месяцев назад ⁺¹
@@AntLabJPN Brilliant! Thank you so much
@sryy43 Месяц назад
Hello, Professor ! Thank you so much for the tutorial. It helped me a lot, I must say. But I have a question. So you uploaded excel word list file that includes the frequency. I couldn’t be able to see if ‘’ the’’ included there. However when you clicked all the words occured. So my question is will only the words that is written in our list occur or all the words and their frequencies will be available to us ? ( I think I asked in a complicated way , sorry for that :( )
@muhammeda3489 2 месяца назад
hello professor
The Antconc version I have is 4.3.1
How can I save TXT documents?
When I close the page and open it again, the documents I uploaded are deleted.
I cannot see the documents I uploaded later on the (Corpus Manager) page.
Thank you for your help.
@AntLabJPN 2 месяца назад
Hi. I think you are using the "Open Files as Quick Corpus" option. If you use that, the corpus is saved as temp.db. I recommend you use the "Raw Files" option in the corpus manager and build your corpus there. You can then give it a proper name and it will stay in the repository for use later.
@anonymia166 2 года назад ⁺¹
I wanted to ask if you can use a lemma list in this version? Because I cannot find the function to upload a lemma list?
@AntLabJPN 2 года назад
Yes, you can! I document how to do it in the help guide, but just check the "headword" option when you create a corpus via the Corpus Manager. You'll see it at the bottom of the screen here: ruclips.net/video/yDSa1rp8Bqs/видео.html
@anonymia166 2 года назад
@@AntLabJPN Thank you! I have another question...What would you say is the most efficient way to analyze which adjectives are most used to describe animals; and which animals are mentioned the most? Sorry for the questions but I am taking a university course in Germany and I do not know so much about corpus analysis....yet.
@henbane2247 7 месяцев назад
Hi again Laurence, thank you for all your advice. I'm trying to delete a temp.db file that I uploaded. However, the message 'temp.db could not be deleted because it is being used by another process' keeps coming up. The document is closed on my computer. How can I find what the process is that is keeping it from being deleted?
@AntLabJPN 7 месяцев назад
Hi. If you update to AntConc 4.3.0, you should find that this bug is fixed. Otherwise, you should find that if you restart AntConc, the deleted file will correctly not show again.
@bozok6360 5 месяцев назад
Prof. Anthony, hi. Can I use this device in order to count unique individual words in Books ? In my language I think it has never been done . So i wanted to count how many individual words are used in our books from school level to university level. So , I wonder can I use this device in order to do that? Thank you for your answer beforehand
@AntLabJPN 5 месяцев назад
Yes. AntConc can be used for exactly this task. Load in pdfs, docx, or text files of your books, and then use the word list tool to count all the individual words.
@bozok6360 5 месяцев назад
@@AntLabJPN 🙏 Can I directly add pdf? I thought I should convert them to text files first .
@AntLabJPN 5 месяцев назад
@@bozok6360Yes, you can load in pdfs directly. There is no need to convert them.
@bozok6360 5 месяцев назад
@@AntLabJPN thank you for your explanation. I also wanted to know , how can I work with agglutinative languages? Can I add stems and make antconk to look for them? Or it has other functions than better than my idea ?
@AntLabJPN 5 месяцев назад ⁺¹
@@bozok6360 Yes. I suggest you look at the different wildcard options. They are listed in the global settings.
@holdthewinds 8 месяцев назад ⁺¹
Thank you!!!
@AntLabJPN 8 месяцев назад
You're welcome!
@Solitale- 7 месяцев назад
Hi Professor! Thank you for your tutorial!
But I have a question concerning the installment of the reference corpus through existing files. I couldn't figure out how to install a reference corpus in Antconc version 4.2. When I tried to do it after successfully installing the target corpus, I always got the "overwrite the existing corpus" note. Can you help me with that? I also tried older versions like 4.0 etc but when I clicked "create" nothing happened. I don't know if it's the problem with my reference corpus because I have 500 txt files in total. Thank you very much!
@AntLabJPN 7 месяцев назад
Hi. Target corpora and reference corpora are just corpora in AntConc. You can choose any corpus to be a target corpus and any corpus to be a reference corpus. My guess is that you are trying to create a new corpus with the same name as an existing corpus. Just choose a unique name for your corpus in the corpus manager and you should be fine.
@Solitale- 7 месяцев назад
@@AntLabJPN Thank you for your response! I tried to use different names for my target corpora and reference corpora. Now there is no "overwriting" note but after I installed my reference corpora successfully the target copora just disappeared when I returned to the main window. Can you help me with that?
my operation: corpus manager→target corpus→raw files→add files→change corpus name→create. Then the target corpus is successfully installed.
Then, reference corpus→raw files (do I need to clear the previous files and then add new files or just leave them together?) → change the corpus name →create (The new problem is there is no notification of overwriting anymore but after it's done, the target corpus that has been installed is gone cuz when i turn back to the target corpus page it's empty)
Thank you very much!!
@AntLabJPN 7 месяцев назад
@@Solitale- First, create your two corpora. Next, in the right pane of the corpus manager select if you want to set the target or reference corpus by clicking on the relevant tab. Then, simply select the corpus in the left pane to set it as your choice. Think of the right pane as determining *how* a corpus will be used. Corpora never get deleted unless you choose to delete them. They always appear on the left. The right pane chooses the role of a corpus for a particulary study.
@Solitale- 7 месяцев назад
@@AntLabJPN Now i figured! I need to click Corpus Database rather than staying at Raw Files to choose my target and reference corpus. Thank you very much! All the best wishes!
@AntLabJPN 7 месяцев назад
Actually, if you pick which role you want to use before creating your corpus, it will automatically initially set that role when you build it. Of course, you can change the role at any time.
@henbane2247 7 месяцев назад
Why do my KWIC Alice results look different to yours? My first result is:
“ which certainly was not here before , ” said Alice, , ) and round the neck of the bottle
To the left of the blue 'Alice', 'said' is in red, " is in green & , is in red so no text is highlighted to the left except 'said'.
As far as I can see, all my settings are the same as yours.
@AntLabJPN 7 месяцев назад
Hi. It sounds like you've clicked the sort option to sort to the left. The default seting in AntConc is to sort to the right.
@SumairZahid 11 месяцев назад
Hi Sir! I want to know about loading token definitions...
@AntLabJPN 9 месяцев назад
What exactly do you mean?
@bozok6360 2 месяца назад
Ptofessor Anthony hello. I have created corpus and did the lemas . I learned a lit of things becaue of you. Dear professor now I wnated to.understand the usage of parts of speech in.y corpus. How can I do it?
@Sophiesrishti Год назад
Hello. I am not able to load the temp file after opening corpus manager
@AntLabJPN Год назад
Hi. My guess is that you are on a Mac and you didn't install the software properly. Did you drag the app to your Applications folder?
@henbane2247 7 месяцев назад
The word count on my document is different to the token count when I upload it to AntConc. Why is that?
@AntLabJPN 7 месяцев назад
Hi. If you are referring to the word count given by software like Microsoft Word, it's because the way the software counts words is slightly different to the default setting in AntConc (e.g. in AntConc, "don't" would be considered as 2 words.
@henbane2247 7 месяцев назад
@@AntLabJPN Okay thank you
@mohamedelkellawy9549 11 месяцев назад
Can i upload bilingual glossaries? And then search for a word and see the result in all glossaries uploaded as raw
@AntLabJPN 11 месяцев назад
Hi. If each glossary is saved as a separate file, yes, you can do this. You can use the KWIC tool to find the relevant entries and then click on the result to jump to the file view to see the whole glossary.
@Sophiesrishti Год назад
I am not able to load text file after opening corpus manager
@AntLabJPN Год назад
Hi. Usually this is because you did not install the software correctly. Check the help page for how to do this. If you still have problems, reply back here.
@hjalmarp.hernandezph.d.9133 Год назад
When uploading my raw files, there is a pop up saying that there is an error. Thus, there are no files being uploaded. Can I seek help on this? Thank you.
@AntLabJPN Год назад
Hi. What's the error that is shown?
@hjalmarp.hernandezph.d.9133 Год назад
The pop up shows "The following user files could not be read. See the error report below". How can I solve this? Thank you.@@AntLabJPN
@AntLabJPN Год назад
@@hjalmarp.hernandezph.d.9133 Hi, my guess is that the error report is about UTF-8. See, the FAQ 5 comment on the AntConc website: www.laurenceanthony.net/software/antconc/
@hjalmarp.hernandezph.d.9133 Год назад
Oh. I will do that and go back to u. Thank u so much.@@AntLabJPN
@hjalmarp.hernandezph.d.9133 Год назад
I have tried resaving one of the files into UTF-8. I can now save raw files. Thank u again.@@AntLabJPN
@ChimiChuri-k2o 3 месяца назад
I have found a youtube transcript tool. Combined with this antconc it could be useful to retrieve knowledge from youtube, no ?
@AntLabJPN 3 месяца назад
Yes, absolutely!
@ChimiChuri-k2o 3 месяца назад
it would be fantastic to be able to create mindmaps from university course scripts
@AntLabJPN 3 месяца назад
I'm not quite sure how that would work. Are you suggesting that the program would link the *ideas* from the scripts. If so, that would be quite advanced. Perhaps AI could help.
@michaelhenshaw-vetmedengli2064 2 года назад
I just wanted to share my difficulties in using very large corpora (but I don't mean to complain, because this is a great tool). I recently purchased the COCA (1 bn words) but it seems to be too big for smooth functioning, even though I use a relatively powerful PC (Intel Core i9). For example, it took an hour to load into the Corpus Manager, and then failed at 90% complete and I needed to do it again. After that, it could perform KWIC searches relatively quickly, but things like Cluster or Collocation searches can take up to 30 minutes per search.
Prof. Anthony, do you have an estimate of the upper limits of corpus size that retains smooth functioning?
@AntLabJPN 2 года назад ⁺²
Hi Michael. The speed of processing a corpus will be largely unaffected by the power of the CPU in AntConc 4. It really comes down to the database design. So, loading a corpus will be slow because the words have to be imported into the database and indexed. You should find AntConc 4.2 is twice as fast as 4.0 and 4.1, but it's still not going to be instant. Saying that, loading a 1 bn word corpus in 1 hour seems very fast to me. I'm surprised at the slow performance of cluster and collocation searches. If you search for a word that is reasonably rare, it should complete in a few seconds. Is this not the case? To be honest, I don't develop using 1 bn word corpora, but I should be able to optimize the sofware for good performance even with these bigger corpora. Let me look into it.
@michaelhenshaw-vetmedengli2064 2 года назад
@@AntLabJPN Thank you. I should have been more accurate; the long Clusters time was actually on 4.0, and was probably for 2gram searches like /Thomist*/ + R1, and I believe all Collocation searches took dozens of minutes. When I downloaded 4.2 none of my corpora transferred over, for some reason (maybe because COCA was on there?). When I tried re-uploading COCA the other day it failed, and I haven't tried again, but I'll look forward to the faster speed. And knowing that it's normal to take over an hour to load something that big is reassuring.
@AntLabJPN 2 года назад
@@michaelhenshaw-vetmedengli2064 Hi. Are you woorking on a PC or a Mac? On a PC, the new app is installed in the same location as the old version so the corpora there should still be viewable in the repository. In fact, you can even see where it is installed using the shortcut properties in the Windows Start. Also, 4.2 does some speed improvements. I suggest you start with just "Thomist" and check the speed. It should be functioning smoothly. Adding a wildcard means that the database has to do a regex search through the index which is much slower than an exact match. Let me know how you go.
@michaelhenshaw-vetmedengli2064 2 года назад
@@AntLabJPN Ah, now I see what happened. I had installed 4.0.2 in Documents>Corpus Linguistics>AntConc with my other CL stuff instead of in the default Start Menu>Programs. That had worked fine for the earlier portable versions. So, I tried copying that folder to Start Menu>Programs and then re-installed 4.2, but it still didn't carry over my corpora. But this is a minor inconvenience, really, and the problem should be solved with future updates. Thank you for your help, and overall dedication to this project.
@AntLabJPN 2 года назад
@@michaelhenshaw-vetmedengli2064 If you installed 4.0.2 as a portable version and the corpora are still there, it is very easy to copy over your corpora into the new version. Just use the Corpus Manager-> Add Database File(s) or Add Database Dir option, select the folder where the corpora are stored and copy them over.
@ChimiChuri-k2o 3 месяца назад
How is this open source. YOU ARE A LEGEND😍🥰🤗
@AntLabJPN 3 месяца назад
Thank you for the kind message.
@sabrinafusari8133 2 года назад
Hi, thanks for this new tutorial series! Very useful! 😀 Just one question: where can we find the Excel file (AmE06 etc.csv) to load a simple wordlist?
@AntLabJPN 2 года назад
The csv file was for demonstration purposes. If you download one of the corpora from the online respository and create a word list and save that, you can see the format that you need to use. You can find many word lists across the Internet. As long as you have a "word" and "frequency" column, then you can load it. I hope that helps!
@sabrinafusari8133 2 года назад
@@AntLabJPN It does help, thank you!! 👍
@SimonCheung-fk5iz 10 месяцев назад
@@AntLabJPNIf I only have the words without the frequency column. Does it mean that the word list function will not be working?
@AntLabJPN 10 месяцев назад
@@SimonCheung-fk5iz Are you trying to load a word list without any frequency information? If so, AntConc won't be able to accept the list and it won't work.
@SimonCheung-fk5iz 10 месяцев назад
@@AntLabJPN I see. Thanks for your clarification👍

Следующие

Автовоспроизведение