Google Researcher's In-Depth Analysis on End-to-End Speech Recognition, Part 1: Overview & Modeling

Deduct OpenAI GPT-4o's Neural Network Architecture

From OpenAI's Whisper Model to Your Own In-House ASR Service: Long Audio and Streaming (Part 3)

4 Days of Training Like a Marine

Apple Watch Series 10 and Ultra 2 Black // All The Details!

Apple iPhone 16 Pro and Pro Max Revealed

Google's Universal Speech Model for 100+ languages beats OpenAI's Whisper Model

Olewave

Просмотров 445

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 сен 2024
Eager to train your own #Whisper or #GPT-4o model but running out of data? We are proud to offer this unique large-scale conversational speech dataset in different languages and topics for #ASR, #TTS, #NLP, and other conversational AI R&D. It has speaker labels and high quality transcriptions. The duration of the dataset depends on the customer's needs and can extend up to 1 million hours. See the description and samples in the following post:
/ olewave-large-scaled-c...
send an email to info@olewave.com for more details.
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Abstract:
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.
#google #usm #asr #openai #whisper #multimodal #multilingual #speech #seamlessm4t #voicebox
==
Olewave offers avant-garde bespoke solutions for proprietary data labeling, normalization, and transformation.
Tired of inaccurate transcriptions and frustrating APIs? Olewave offers a superior solution with:
• AI-powered Accuracy: Transcribe any audio, regardless of language, dialect, accent, or topic, with exceptional accuracy. We surpass the competition in understanding even the most challenging recordings.
• Detailed Insights: Gain valuable insights with word/character-level confidence scores, precise timestamps, and advanced speech analytics.
• Privacy Guaranteed: Keep your data secure. Integrate our powerful data labeling tool directly into your platform, eliminating risks associated with external APIs.
• Competitive Pricing: Enjoy high-quality service at accessible prices, outperforming both tech giants and human-intensive transcription solutions.
Ready to experience the difference? Don't settle for mediocrity. Contact info@olewave.com and give us a try!
Customized Large-Scale Datasets
Olewave delivers customized, labeled, and validated large-scale real-world NLP/CV/speech/multimodal datasets of various scenarios such as dictation and conversation in multi accents/dialects/languages, and of diverse topics such as education, finance, legal, entertainment, healthcare, retail, and customer service.
Our datasets include:
• topic-specific text datasets for training your own LLM/ChatGPT/LLaMA model.
• visual/video/image datasets with tags/prompts for training your own CV/SAM model;
• speech/audio datasets of different languages and dialects for training your own ASR/Whisper/SeamlessM4T/TTS model.
• and multimodal datasets.
We constantly collect timely data from languages including Brazilian Portuguese, Latin America Spanish, Arabic, Southeast Asian, Chinese, Japanese, Korean.
Faster and affordable in data delivery than traditional data vendors;
More effective and efficient than traditional data vendors.

Комментарии • 1

@content_ai_ 4 дня назад
Do you know other models that can do ASR other than whisper and seamless m4t.

Следующие

Автовоспроизведение

Google Researcher's In-Depth Analysis on End-to-End Speech Recognition, Part 1: Overview & Modeling

Google Researcher's In-Depth Analysis on End-to-End Speech Recognition, Part 1: Overview & Modeling

Deduct OpenAI GPT-4o's Neural Network Architecture

Deduct OpenAI GPT-4o's Neural Network Architecture

From OpenAI's Whisper Model to Your Own In-House ASR Service: Long Audio and Streaming (Part 3)

From OpenAI's Whisper Model to Your Own In-House ASR Service: Long Audio and Streaming (Part 3)

4 Days of Training Like a Marine

4 Days of Training Like a Marine

Apple Watch Series 10 and Ultra 2 Black // All The Details!

Apple Watch Series 10 and Ultra 2 Black // All The Details!

Apple iPhone 16 Pro and Pro Max Revealed

Apple iPhone 16 Pro and Pro Max Revealed

MISSING: SLIM SHADY [Expanded Mourner’s Edition Trailer]

MISSING: SLIM SHADY [Expanded Mourner’s Edition Trailer]

From OpenAI's Whisper Model to Your Own In-House ASR Service: Postprocessing and Language Modeling

From OpenAI's Whisper Model to Your Own In-House ASR Service: Postprocessing and Language Modeling

Tycho:a tookit for building high-ROI in-house speech-related services (ASR/TTS/Translation):Overview

Tycho:a tookit for building high-ROI in-house speech-related services (ASR/TTS/Translation):Overview

Deep Double Descent

Deep Double Descent

Long Review: Apple's MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Long Review: Apple's MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

A Quick Review of Apple's SOTA Multimodal LLM: MM1

A Quick Review of Apple's SOTA Multimodal LLM: MM1

[Detailed Paper Reading] Zipformer: A faster and better encoder for automatic speech recognition

[Detailed Paper Reading] Zipformer: A faster and better encoder for automatic speech recognition

A Review of SpeechT5: Introducing Google's T5 into Speech (ASR, TTS, SID, ...) Tasks

A Review of SpeechT5: Introducing Google's T5 into Speech (ASR, TTS, SID, ...) Tasks

Why word timestamps generated by OpenAI Whisper are not accurate? How to make them accurate again?

Why word timestamps generated by OpenAI Whisper are not accurate? How to make them accurate again?

Variational Autoencoder (VAE) and Reparameterization Trick - Revisiting the Classic Generative Model

Variational Autoencoder (VAE) and Reparameterization Trick - Revisiting the Classic Generative Model

The Most Elite Chefs Ever!

The Most Elite Chefs Ever!

How She Escaped the Police Expertly 👮‍♀️🤯 #shorts #hack

How She Escaped the Police Expertly 👮‍♀️🤯 #shorts #hack

Apple Event - September 9

Apple Event - September 9

В гостях Саня Булкин! #вопросребром

В гостях Саня Булкин! #вопросребром

ТУМАННЫЕ УРОВНИ В ПВЗ ТЕПЕРЬ НЕ БЕСПОЛЕЗНЫ! ВСЁ ИЗ-ЗА...

ТУМАННЫЕ УРОВНИ В ПВЗ ТЕПЕРЬ НЕ БЕСПОЛЕЗНЫ! ВСЁ ИЗ-ЗА...

Introducing iPhone 16 | Apple

Introducing iPhone 16 | Apple

Презентация iPhone 16 за 4 минуты

Презентация iPhone 16 за 4 минуты

Lp. Сердце Вселенной #11 РАЗДЕЛЕНИЕ ЛИЧНОСТИ [Голос в Голове] • Майнкрафт

Lp. Сердце Вселенной #11 РАЗДЕЛЕНИЕ ЛИЧНОСТИ [Голос в Голове] • Майнкрафт