Google's Universal Speech Model for 100+ languages beats OpenAI's Whisper Model

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024
  • Eager to train your own #Whisper or #GPT-4o model but running out of data? We are proud to offer this unique large-scale conversational speech dataset in different languages and topics for #ASR, #TTS, #NLP, and other conversational AI R&D. It has speaker labels and high quality transcriptions. The duration of the dataset depends on the customer's needs and can extend up to 1 million hours. See the description and samples in the following post:
    / olewave-large-scaled-c...
    send an email to info@olewave.com for more details.
    Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
    Abstract:
    We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.
    #google #usm #asr #openai #whisper #multimodal #multilingual #speech #seamlessm4t #voicebox
    ==
    Olewave offers avant-garde bespoke solutions for proprietary data labeling, normalization, and transformation.
    Tired of inaccurate transcriptions and frustrating APIs? Olewave offers a superior solution with:
    • AI-powered Accuracy: Transcribe any audio, regardless of language, dialect, accent, or topic, with exceptional accuracy. We surpass the competition in understanding even the most challenging recordings.
    • Detailed Insights: Gain valuable insights with word/character-level confidence scores, precise timestamps, and advanced speech analytics.
    • Privacy Guaranteed: Keep your data secure. Integrate our powerful data labeling tool directly into your platform, eliminating risks associated with external APIs.
    • Competitive Pricing: Enjoy high-quality service at accessible prices, outperforming both tech giants and human-intensive transcription solutions.
    Ready to experience the difference? Don't settle for mediocrity. Contact info@olewave.com and give us a try!
    Customized Large-Scale Datasets
    Olewave delivers customized, labeled, and validated large-scale real-world NLP/CV/speech/multimodal datasets of various scenarios such as dictation and conversation in multi accents/dialects/languages, and of diverse topics such as education, finance, legal, entertainment, healthcare, retail, and customer service.
    Our datasets include:
    • topic-specific text datasets for training your own LLM/ChatGPT/LLaMA model.
    • visual/video/image datasets with tags/prompts for training your own CV/SAM model;
    • speech/audio datasets of different languages and dialects for training your own ASR/Whisper/SeamlessM4T/TTS model.
    • and multimodal datasets.
    We constantly collect timely data from languages including Brazilian Portuguese, Latin America Spanish, Arabic, Southeast Asian, Chinese, Japanese, Korean.
    Faster and affordable in data delivery than traditional data vendors;
    More effective and efficient than traditional data vendors.

Комментарии • 1

  • @content_ai_
    @content_ai_ 4 дня назад

    Do you know other models that can do ASR other than whisper and seamless m4t.