Creating a text classification model in spacy 3x (Topic Modeling in Python for DH 04.02)

Поделиться
HTML-код
  • Опубликовано: 10 янв 2025
  • If you enjoy this video, please subscribe.
    ✅Be my Patron: / wjbmattingly
    ✅PayPal: www.paypal.com...
    Medium article:
    / building-a-text-classi...
    Spacy Populate Config File Site:
    spacy.io/usage...
    If there's a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
    If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
    You can follow me at:
    / wjb_mattingly

Комментарии • 45

  • @python-programming
    @python-programming  3 года назад +4

    For the repo, please see: github.com/wjbmattingly/youtube_text_classification

  • @the-real-random-person
    @the-real-random-person Год назад +1

    Thanks bro ❤ I needed a model to detect spam in my social media 👍 you're the OG you explained it very well!

  • @AkshaySharmaakkiikka0
    @AkshaySharmaakkiikka0 4 месяца назад

    What a great tutorial, love you man!!!

  • @seanosuilleabhainemerald
    @seanosuilleabhainemerald Год назад +1

    Great tutorial, I struggled all day to find a working example of the process.

  • @justinhuang8034
    @justinhuang8034 3 года назад +2

    hey great stuff do you have stuff on multi class text classification?

  • @andreyklepikov7084
    @andreyklepikov7084 Год назад +1

    Thank you a lot! Really valuable video

  • @joseberlines-l4f
    @joseberlines-l4f Год назад +1

    It would be nice ro know if this training is possible using a transformer model in spacy and if it would improve results.

    • @python-programming
      @python-programming  Год назад +1

      Thanks for the question! I am going to be covering this in my new ML and spaCy series. The short answer is yes and yes it should, but it will depend on the problem and in some cases it may not even be necessary to use a transformer to have comparable results.

    • @joseberlines-l4f
      @joseberlines-l4f Год назад +1

      @@python-programming Also from this point of view: it looks like transformers are everywhere now, but the easiest and straight forward way to use spacy is with the normal language models. It looks a bit forgotten to teach the basic use (just installing and loading) a transformer model in Spacy.

    • @python-programming
      @python-programming  Год назад

      Thanks so much! I will be sure to include that as well!

  • @nikosantisoc1047
    @nikosantisoc1047 3 года назад +3

    Great content here. Thank you.
    I have a spaCy related question.
    Does adding more components to the pipeline improves accuracy or not?
    I mean in this example does adding/removing "tok2vec" affects accuracy?
    I struggle finding info on how components depend on each other during training.

  • @lfmtube
    @lfmtube 3 года назад +1

    Hi, I just joined your channel with membership. I followed this video with great interest. I would like to know if you have some example code to replace the use of the Spacy Train program directly with python code in order to train text classification with accuracy in Spanish. Congratulations on all your knowledge and Thanks for sharing it.

  • @RobotechII
    @RobotechII 3 года назад +2

    Great content! I suspect your subscriber count is going to explode very soon.

  • @adrianvideanu480
    @adrianvideanu480 2 года назад

    what was the reason that you used en_core_web_sm and not spacy.blank("en") ?

  • @kylemoran7867
    @kylemoran7867 3 года назад +3

    Any chance you could make a tutorial like this for text classification with 6 possible classes? Or i could message you for guidance? Followed your article/code but I’m just hung up on how to correctly initialize the model

    • @python-programming
      @python-programming  3 года назад +1

      I will add that to my to-dos. For the mean time, if you want to add more labels to your model, simply include more training data that represent those labels, i.e. doc.cats["label 1"], then one for label 2, label 3, etc. Does that help? spaCy will be able to recognize those new labels.

    • @kylemoran7867
      @kylemoran7867 3 года назад

      @@python-programming Yes thank you, I followed your code you left in the comment of your article but for some reason it appears that the model is not being properly initialized as I'm getting 0s for the training iterations

    • @Nnonymus
      @Nnonymus 3 года назад +1

      It's hard to find multilabel documentation in spacy 3. 0

    • @sarasharick5209
      @sarasharick5209 2 года назад

      this is the problem I am having too. I can’t figure how to convert my training data to a spaCy format. It’s a tuple, with the first index as a string of text, and the second index as a nested dictionary of category labels. {‘cats’: {‘label1’: 0, ‘label2’: 1, ‘label3’: 1}}. But how to take that dataset that is a list of tuples and run it through DocBin? Modifying the Make docs function for more than 2 categories (more than one if/else pair) doesn’t seem to work.

    • @shaheerahsan2486
      @shaheerahsan2486 Год назад

      @@sarasharick5209 did you figure it out? I am also getting the same issue. There really isn't any good documentation/videos on how to do Multiclass TextCat in spacy v3

  • @oscaralberto6835
    @oscaralberto6835 Год назад

    I have a question, if I have a task to classify in more than 2 classifications (mutually exclusive), how do I write the "if"? Do I need to write the values ​​of all my classifications in each of the cases?

  • @Brickkzz
    @Brickkzz 2 года назад

    You should specify nlp in the function argument instead of using pulling it from global variable outside of the function in make_docs.

  • @dhirajsharma74
    @dhirajsharma74 3 года назад +1

    Hey,
    thanks a lot for these awesome videos.
    I am getting this error and unable to solve it can you please help me.
    Video time: 00:08:35
    3 train_docs = make_docs(train_data[:num_texts])
    ---> 4 doc_bin = DocBin(docs=train_docs)
    5 doc_bin.to_disk("./data/train.spacy")
    6
    TypeError: __init__() got an unexpected keyword argument 'docs'
    I am not sure DocBin take an argument as docs. Your help will be appreciated. Thanks

  • @honaidaattaher4549
    @honaidaattaher4549 2 года назад +1

    Thank you...

  • @okopyl
    @okopyl Год назад

    i have more than 2 categories. What should i do?

  • @alexcrowley243
    @alexcrowley243 3 года назад +1

    Hey I think you said but are you doing new videos for Spacy custom NER? Thanks!

    • @python-programming
      @python-programming  3 года назад +1

      I am! Wanted to finish this series first. I will have it done next week. Spacy 3x should be week after that.

    • @alexcrowley243
      @alexcrowley243 3 года назад +1

      @@python-programming no stress, looking forward to it mate!

  • @SharifulIslamMD
    @SharifulIslamMD 2 года назад

    Hi, Thank you very much for this very useful video. I have followed your steps with a another data and I get this error as I try to start the training with the command: python -m spacy train ....
    Error I get: ValueError: [E143] Labels for component 'textcat_multilabel' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's `initialize` method.
    I have also added label using "textcat.add_label("POSITIVE")" and
    "textcat.add_label("NEGATIVE")"...but without success. Could you please suggest how I can fix this? TIA

  • @tomstalley3179
    @tomstalley3179 5 месяцев назад

    thank you !

  • @danilomontalvo-arnao6315
    @danilomontalvo-arnao6315 2 года назад

    hey guys The only I'm having with this is the ">python -m spacy train config.cfg --output ./output" in the terminal just gives me an error of lueError: [E913] Corpus path can't be None. Maybe you forgot to define it in your .cfg file or override it on the CLI?

  • @vinsmokearifka
    @vinsmokearifka 3 года назад +1

    Thank you Prof

  • @prathameshmore5262
    @prathameshmore5262 2 года назад +1

    Hi, Sir can you provide me the code for evaluating it's performance

  • @cornellius7694
    @cornellius7694 3 года назад

    Hello. I haven't seen good spacy v3 tutorials but they are needed badly. You see I'm Russian and spacy official documentation is pretty hard for me to understand. I tried using. begin_training() method which now called textcat.initialize() which has example() argument. Can you tell me please what example is and how can I use it if it is required to have predicted data in it, but I, with this method, is trying to train my model - I don't have predictions yet. Thanks for the video

  • @prathameshmore5262
    @prathameshmore5262 2 года назад

    Prathamesh More
    Hi, Sir can you provide me the code for evaluating it's performance

  • @shmouel4747
    @shmouel4747 2 года назад

    Hi, love your content! How do you import ml_datasets with conda? I tried conda install ml_datasets without any results

    • @shmouel4747
      @shmouel4747 2 года назад

      I tried from a CSV file (with pandas) but can't get any results

  • @cornellius7694
    @cornellius7694 3 года назад +1

    Can you also show, how to do this literally in code without cfg file?

    • @python-programming
      @python-programming  3 года назад +2

      Great question. As far as I know, you can't. Spacy 3 training is completely different and based entirely around the cfg. If you find a source for how to do it in script, please let me know and I will so a video on it.

  • @nivedvenugopalan5422
    @nivedvenugopalan5422 3 года назад

    """
    ValueError: [E143] Labels for component 'textcat_multilabel' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's `initialize` method
    """
    I am getting this error when training the MODEL. Please resolve it if you can :-;