Creating a text classification model in spacy 3x (Topic Modeling in Python for DH 04.02)
HTML-код
- Опубликовано: 10 янв 2025
- If you enjoy this video, please subscribe.
✅Be my Patron: / wjbmattingly
✅PayPal: www.paypal.com...
Medium article:
/ building-a-text-classi...
Spacy Populate Config File Site:
spacy.io/usage...
If there's a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
You can follow me at:
/ wjb_mattingly
For the repo, please see: github.com/wjbmattingly/youtube_text_classification
Thanks bro ❤ I needed a model to detect spam in my social media 👍 you're the OG you explained it very well!
No problem! So happy to hear that this was helpful!
What a great tutorial, love you man!!!
Great tutorial, I struggled all day to find a working example of the process.
So happy I could help!
hey great stuff do you have stuff on multi class text classification?
Thank you a lot! Really valuable video
No problem!
It would be nice ro know if this training is possible using a transformer model in spacy and if it would improve results.
Thanks for the question! I am going to be covering this in my new ML and spaCy series. The short answer is yes and yes it should, but it will depend on the problem and in some cases it may not even be necessary to use a transformer to have comparable results.
@@python-programming Also from this point of view: it looks like transformers are everywhere now, but the easiest and straight forward way to use spacy is with the normal language models. It looks a bit forgotten to teach the basic use (just installing and loading) a transformer model in Spacy.
Thanks so much! I will be sure to include that as well!
Great content here. Thank you.
I have a spaCy related question.
Does adding more components to the pipeline improves accuracy or not?
I mean in this example does adding/removing "tok2vec" affects accuracy?
I struggle finding info on how components depend on each other during training.
Hi, I just joined your channel with membership. I followed this video with great interest. I would like to know if you have some example code to replace the use of the Spacy Train program directly with python code in order to train text classification with accuracy in Spanish. Congratulations on all your knowledge and Thanks for sharing it.
Great content! I suspect your subscriber count is going to explode very soon.
Thanks that means a lot to me!
what was the reason that you used en_core_web_sm and not spacy.blank("en") ?
Any chance you could make a tutorial like this for text classification with 6 possible classes? Or i could message you for guidance? Followed your article/code but I’m just hung up on how to correctly initialize the model
I will add that to my to-dos. For the mean time, if you want to add more labels to your model, simply include more training data that represent those labels, i.e. doc.cats["label 1"], then one for label 2, label 3, etc. Does that help? spaCy will be able to recognize those new labels.
@@python-programming Yes thank you, I followed your code you left in the comment of your article but for some reason it appears that the model is not being properly initialized as I'm getting 0s for the training iterations
It's hard to find multilabel documentation in spacy 3. 0
this is the problem I am having too. I can’t figure how to convert my training data to a spaCy format. It’s a tuple, with the first index as a string of text, and the second index as a nested dictionary of category labels. {‘cats’: {‘label1’: 0, ‘label2’: 1, ‘label3’: 1}}. But how to take that dataset that is a list of tuples and run it through DocBin? Modifying the Make docs function for more than 2 categories (more than one if/else pair) doesn’t seem to work.
@@sarasharick5209 did you figure it out? I am also getting the same issue. There really isn't any good documentation/videos on how to do Multiclass TextCat in spacy v3
I have a question, if I have a task to classify in more than 2 classifications (mutually exclusive), how do I write the "if"? Do I need to write the values of all my classifications in each of the cases?
You should specify nlp in the function argument instead of using pulling it from global variable outside of the function in make_docs.
Hey,
thanks a lot for these awesome videos.
I am getting this error and unable to solve it can you please help me.
Video time: 00:08:35
3 train_docs = make_docs(train_data[:num_texts])
---> 4 doc_bin = DocBin(docs=train_docs)
5 doc_bin.to_disk("./data/train.spacy")
6
TypeError: __init__() got an unexpected keyword argument 'docs'
I am not sure DocBin take an argument as docs. Your help will be appreciated. Thanks
Thank you...
i have more than 2 categories. What should i do?
Hey I think you said but are you doing new videos for Spacy custom NER? Thanks!
I am! Wanted to finish this series first. I will have it done next week. Spacy 3x should be week after that.
@@python-programming no stress, looking forward to it mate!
Hi, Thank you very much for this very useful video. I have followed your steps with a another data and I get this error as I try to start the training with the command: python -m spacy train ....
Error I get: ValueError: [E143] Labels for component 'textcat_multilabel' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's `initialize` method.
I have also added label using "textcat.add_label("POSITIVE")" and
"textcat.add_label("NEGATIVE")"...but without success. Could you please suggest how I can fix this? TIA
thank you !
hey guys The only I'm having with this is the ">python -m spacy train config.cfg --output ./output" in the terminal just gives me an error of lueError: [E913] Corpus path can't be None. Maybe you forgot to define it in your .cfg file or override it on the CLI?
Thank you Prof
No problem! Happy to help
Hi, Sir can you provide me the code for evaluating it's performance
Hello. I haven't seen good spacy v3 tutorials but they are needed badly. You see I'm Russian and spacy official documentation is pretty hard for me to understand. I tried using. begin_training() method which now called textcat.initialize() which has example() argument. Can you tell me please what example is and how can I use it if it is required to have predicted data in it, but I, with this method, is trying to train my model - I don't have predictions yet. Thanks for the video
Prathamesh More
Hi, Sir can you provide me the code for evaluating it's performance
Hi, love your content! How do you import ml_datasets with conda? I tried conda install ml_datasets without any results
I tried from a CSV file (with pandas) but can't get any results
Can you also show, how to do this literally in code without cfg file?
Great question. As far as I know, you can't. Spacy 3 training is completely different and based entirely around the cfg. If you find a source for how to do it in script, please let me know and I will so a video on it.
"""
ValueError: [E143] Labels for component 'textcat_multilabel' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's `initialize` method
"""
I am getting this error when training the MODEL. Please resolve it if you can :-;