SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

Data Analytics for Beginners | Data Analytics Training | Data Analytics Course | Intellipaat

Neural target speech extraction

Neville, Keane & Richards DEBATE Amorim leaving Rashford & Garnacho out of Man United squad

REBUILDING A PORSCHE 911 GT3RS FROM SCRATCH

I Spent 100 Hours for IMPOSSIBLE Dragon Race V4 in Blox Fruits!

Speech Recognition with Next-Generation Kaldi (K2, Lhotse, Icefall)

INTERSPEECH2021

Просмотров 4,9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 фев 2025
Title: Speech Recognition with Next-Generation Kaldi (K2, Lhotse, Icefall)
Authors: Sanjeev Khudanpur, Daniel Povey, Piotr Żelasko
Category: Tutorials
Abstract: This tutorial introduces k2, the cutting-edge successor to the Kaldi speech processing, which consists of several Python-centric modules to enable building speech recognition systems, along with its enabling counterparts, Lhotse and Icefall. The participants will learn how to perform swift data manipulation with Lhotse; how to build and leverage auto-differentiable weighted finite state transducers with k2; and how these two can be combined to create Pytorch-based, state-of-the-art hybrid ASR system recipes from Snowfall, the precursor to Icefall.
Dr. Daniel Povey is an expert in ASR, best known as the lead author of the Kaldi toolkit and also for popularizing discriminative training (now known as "sequence training" in the form of MMI and MPE). He has worked in various research positions at IBM, Microsoft and Johns Hopkins University, and is now Chief Speech Scientist of Xiaomi Corporation in Beijing, China.
Dr. Piotr Żelasko is an expert in ASR and spoken language understanding, with extensive experience in developing practical and scalable ASR solutions for industrial-strength use. He worked with successful speech processing start-ups - Techmo (Poland) and IntelligentWire (USA, acquired by Avaya). At present, he is a research scientist at Johns Hopkins University.
Prof. Sanjeev Khudanpur has 25+ years of experience working on almost all aspects of human language technology, including ASR, machine translation, and information retrieval. He has lead a number of research projects from NSF, DARPA, IARPA, and industry sponsors, and published extensively. He has trained more than 40 PhD and Masters students to use Kaldi for their dissertation work.
For more details and PDF version of the paper visit:
tutorial03

Комментарии • 2

@unknownhero6187 Год назад
Why does HMM model the P(A|W) not P(W|A) as it done in DNN in the speech recognition domain where A is a speech signal and W are words or phones?
@MrMdibi 2 года назад
May I please get a link to the pdf resources?

Следующие

Автовоспроизведение

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

Data Analytics for Beginners | Data Analytics Training | Data Analytics Course | Intellipaat

Data Analytics for Beginners | Data Analytics Training | Data Analytics Course | Intellipaat

Neural target speech extraction

Neural target speech extraction

Neville, Keane & Richards DEBATE Amorim leaving Rashford & Garnacho out of Man United squad

Neville, Keane & Richards DEBATE Amorim leaving Rashford & Garnacho out of Man United squad

REBUILDING A PORSCHE 911 GT3RS FROM SCRATCH

REBUILDING A PORSCHE 911 GT3RS FROM SCRATCH

I Spent 100 Hours for IMPOSSIBLE Dragon Race V4 in Blox Fruits!

I Spent 100 Hours for IMPOSSIBLE Dragon Race V4 in Blox Fruits!

Boston FBI announce arrest of two Iranians in connection with fatal drone strike

Boston FBI announce arrest of two Iranians in connection with fatal drone strike

Deep Learning: A Crash Course (2018) | SIGGRAPH Courses

Deep Learning: A Crash Course (2018) | SIGGRAPH Courses

What is generative AI and how does it work? - The Turing Lectures with Mirella Lapata

What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata

Combinators: A 100-Year Celebration

Combinators: A 100-Year Celebration

Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition

Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition

Relaxing Piano Music: Romantic Music, Beautiful Relaxing Music, Sleep Music, Stress Relief ★122

Relaxing Piano Music: Romantic Music, Beautiful Relaxing Music, Sleep Music, Stress Relief ★122

A Conscious Universe? - Dr Rupert Sheldrake

A Conscious Universe? – Dr Rupert Sheldrake

Towards automatic speech recognition for people with atypical speech

Towards automatic speech recognition for people with atypical speech

François Chollet on OpenAI o-models and ARC

François Chollet on OpenAI o-models and ARC

Lecture 3 | Loss Functions and Optimization

Lecture 3 | Loss Functions and Optimization

Мой тг: Подвал Стинта #стинт #stint #stintik

Мой тг: Подвал Стинта #стинт #stint #stintik

IEM Katowice 2025 - Day 8 - FULL SHOW

IEM Katowice 2025 - Day 8 - FULL SHOW

Фильм В тылу врага : боевик, триллер, драма (2022)

Фильм В тылу врага : боевик, триллер, драма (2022)

Apple отменила еще один проект

Apple отменила еще один проект

ЛУАНА СТАЛА ЗЛОЙ НА 24 ЧАСА ЧЕЛЛЕНДЖ (АНИМАЦИЯ)

ЛУАНА СТАЛА ЗЛОЙ НА 24 ЧАСА ЧЕЛЛЕНДЖ (АНИМАЦИЯ)

ИГРА НА ЖИЗНЬ в Майнкрафт [Buckshot Roulette] + Райм, Градус, Вазачка

ИГРА НА ЖИЗНЬ в Майнкрафт [Buckshot Roulette] + Райм, Градус, Вазачка

Lifehack 😄 ⁠@Infamous_wu13 #elsarca #tiktok

Lifehack 😄 ⁠@Infamous_wu13 #elsarca #tiktok