Speech Recognition in Python | finetune wav2vec2 model for a custom ASR model

Поделиться
HTML-код
  • Опубликовано: 9 окт 2024
  • In this RUclips tutorial, we'll explore the Wav2Vec2 model, a powerful tool for speech recognition and representation learning. If you're in the field of speech recognition or interested in top-notch models, you've likely heard of Wav2Vec2. This video focuses on practical steps, guiding you through fine-tuning Wav2Vec2 with your own speech data without delving deep into technicalities.
    Wav2Vec2 is designed for Connectionist Temporal Classification (CTC) loss, and we'll show you how to use it effectively for your tasks. You can leverage pre-trained models and adapt them to your needs, saving you from starting from scratch.
    We'll walk you through the code, ensuring you have the necessary requirements like PyTorch and Transformers. You'll also learn how to apply audio augmentations to enhance data quality.
    Throughout the tutorial, you'll discover how to monitor your model's progress with TensorBoard, implement early stopping, and save the best checkpoints. We'll also cover converting your PyTorch model to ONNX for easier deployment on various platforms.
    To validate the model's performance, we'll run inference on a test dataset, checking character and word error rates to showcase the model's accuracy.
    This tutorial aims to empower you to use Wav2Vec2 effectively for speech recognition tasks, whether you're a beginner or an experienced practitioner.
    GitHub link: github.com/pyt...
    Trained model: drive.google.c...
    #transformers #nlp #wav2vec #tensorflow #pytorch

Комментарии • 28

  • @infinitewebrevolution
    @infinitewebrevolution 6 месяцев назад +2

    Thank you so much sir with your hard work and pertained model, it has helped me alot
    I would always thank you

    • @PyLessons
      @PyLessons  6 месяцев назад

      Glad to hear that! You are welcome

  • @hugok6212
    @hugok6212 7 месяцев назад +2

    Excellent video and explanation. I have a question, if I train a model this way, can I use it for speech recognition in real time?. Thank you

    • @PyLessons
      @PyLessons  7 месяцев назад

      Hey, yes and no. If depends on what hardware you'll run model (cpu, gpu or other). It depends on your "real time" requirements. You need to test it and you'll see :)

  • @shafiqrhmankeliwall8019
    @shafiqrhmankeliwall8019 6 месяцев назад +1

    Hi Great job Keep it up, I have one question that : I want to build/Train model for some low resource languages such as Pashto, I will make a dataset from scratch. any idea how to start or any useful links. Thanks

    • @PyLessons
      @PyLessons  6 месяцев назад

      Thanks! I do not recommend to make a dataset from scratch alone, I believe you should be able to find something in open source. I don't have dataset, but check my dataset structure and you'll see what format it required

  • @BrightShoko-m7c
    @BrightShoko-m7c 7 месяцев назад +1

    Good job👏..........but i'm getting errors on onnx installation, ....what python version did you use

    • @PyLessons
      @PyLessons  7 месяцев назад

      I used it with 3.10 python. What error you receive, often it might be related with protobuf version

  • @N3ONGNCS
    @N3ONGNCS 3 месяца назад

    i want to create an ASR for an African Vernacular/local language ,could i use this for that, ill create my own dataset if need be, or what would you suggest, im attempting this for the first time an am a little lost and overwhelmed

  • @maimunahmaskur7525
    @maimunahmaskur7525 4 месяца назад

    its a great code!
    Could you please help, if I want to use this code for a dataset labeled phonemes and use PER (Phoneme Error Rate) for test and validation, what should I do? I mean which parts of the code do I need to adjust?
    Thank You!

    • @PyLessons
      @PyLessons  3 месяца назад

      I am not familiar with PER, so I can't tell you

  • @AmitYadav-rp3ot
    @AmitYadav-rp3ot 10 месяцев назад

    Hi there, great video!
    I wanted to know your opinion on training a model like this just for recognising numbers and couple of words from an audio file.
    will such a custom training help to reduce the size of the model ?
    I want to create a very small model so that I can run it on a sub GHz clock CPU.
    please share what you think.
    Many thanks

    • @PyLessons
      @PyLessons  10 месяцев назад

      Hi, thanks!
      No, training model on simpler data doesn't reduce model size. Check my other videos to create your own custom model for simpler data, such as numbers and words. But if your variety of words is simple, maybe you should consider classification task. Also, to reduce size of the model check quantization and pruning techniques

  • @djrocks5678
    @djrocks5678 10 месяцев назад

    Hi there! Thanks a lot for this. I wanted to ask you - I am working on a desktop voice assistant project as part of my university work. I wanted to train my own speech recognition model. How would I go about this? I saw datasets and something like Mozillas 79GB data is too much for my needs and was wondering how I'd go about making a smaller scale speech recognition model for my project.

    • @PyLessons
      @PyLessons  10 месяцев назад

      Hi, usually its impossible to get great results, without huge datasets and GPU computing. But you may try to create a custom ASR model with my another tutorial, what you can check here: ruclips.net/video/8g_4LG2lpR8/видео.html. Also, there are a lot of trained ASR models that usually you need only to integrate (just an idea)

    • @BASDOURI
      @BASDOURI 5 месяцев назад

      your contact please ?

  • @PyCode.academe
    @PyCode.academe 8 месяцев назад

    God bless you!

    • @PyLessons
      @PyLessons  8 месяцев назад

      You are welcome :)

  • @victormessias107
    @victormessias107 6 месяцев назад +1

    When I'm training, its freezes on the end of the first epoch. Any idea?

    • @PyLessons
      @PyLessons  6 месяцев назад

      It shouldn't be like that, try to debug it. For example iterate through training data provider and validation data provider, for example "for data in data_provider" and check if it can reach the end. If you still face these issues open issue on GitHub with more details

  • @mohamedabdiaziz5993
    @mohamedabdiaziz5993 4 месяца назад +1

    My final university projects is like this system, I need help I have prepared my own dataset

    • @PyLessons
      @PyLessons  3 месяца назад

      I already helped by creating this step by step tutorial mate :)

    • @mohamedabdiaziz5993
      @mohamedabdiaziz5993 3 месяца назад

      @@PyLessons I am facing error, already I have prepared my own dataset for the last 4 months

    • @mohamedabdiaziz5993
      @mohamedabdiaziz5993 3 месяца назад

      @@PyLessons I tried but wer is 100% even if model make 7200 steps

  • @Ogamp
    @Ogamp 11 месяцев назад

    thank you for this. Could you please put me through an ASRmodel for recognizing regional accents please? how can i contact you thanks

    • @PyLessons
      @PyLessons  10 месяцев назад

      Your task is classification task, try to google it, but usually it's done only with Encoder model.

  • @aqibmumtaz1262
    @aqibmumtaz1262 9 месяцев назад +1

    Great