Видео 676
Просмотров 76 776

3:00:29

Concept to Code: Semi-Supervised End-To-End Approaches For Speech Recognition

2:23:05

Intonation Transcription and Modelling in Research and Speech Technology Applications

2:50:58

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

1:18:38

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

45:23

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

33:47

An Introduction to Automatic Differentiation with Weighted Finite-State Automata

Title: An Introduction to Automatic Differentiation with Weighted Finite-State Automata
Authors: Awni Hannun
Category: Tutorials
Abstract: Weighted finite-state automata (WFSAs) have been a critical building block in modern automatic speech recognition. However, their use in conjunction with "end-to-end" deep learning systems is limited by the lack of efficient frameworks with support for automatic differentiation. This limitation is being overcome with the advent of new frameworks like GTN and k2. This tutorial will cover the basics of WFSAs and review their application in speech recognition. We will then explain the core concepts of automatic differentiation and show how to use it with WFS...

Видео

3:00:29

Neural target speech extraction

Просмотров 1,3 тыс.2 года назад

Title: Neural target speech extraction Authors: Kateřina Žmolíková, Marc Delcroix Category: Tutorials Abstract: Dealing with overlapping speech remains one of the great challenges of speech processing. Target speech extraction consists of directly estimating speech of a desired speaker in a speech mixture, given clues about that speaker, such as a short enrollment utterance or video of the spea...

Concept to Code: Semi-Supervised End-To-End Approaches For Speech Recognition

2:23:05

Concept to Code: Semi-Supervised End-To-End Approaches For Speech Recognition

Просмотров 5662 года назад

Title: Concept to Code: Semi-Supervised End-To-End Approaches For Speech Recognition Authors: Omprakash Sonie, Kannan Venkateshan Category: Tutorials Abstract: Training Automatic Speech Recognition (ASR) models usually requires transcribing large quantities of audio, which is both expensive and time-consuming. To overcome this limitation, and many semi-supervised training approaches have been p...

Intonation Transcription and Modelling in Research and Speech Technology Applications

2:50:58

Intonation Transcription and Modelling in Research and Speech Technology Applications

Просмотров 5142 года назад

Title: Intonation Transcription and Modelling in Research and Speech Technology Applications Authors: Cong Zhang, Amalia Arvaniti, Kathleen Jepson, Katherine Marcoux Category: Tutorials Abstract: This tutorial covers the theory and practical applications of intonation research. The following three topics will be introduced to speech technology engineers and researchers new to the field of inton...

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

1:18:38

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

Просмотров 2,3 тыс.2 года назад

Title: SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit Authors: Aku Rouhe Category: Tutorials Abstract: SpeechBrain is a novel open-source speech toolkit natively designed to support various speech and audio processing applications. It currently supports a large variety of tasks, such as speech recognition, speaker recognition, speech enhancement, speech ...

45:23

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

Просмотров 3,1 тыс.2 года назад

Title: SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit Authors: Titouan Parcollet Category: Tutorials Abstract: SpeechBrain is a novel open-source speech toolkit natively designed to support various speech and audio processing applications. It currently supports a large variety of tasks, such as speech recognition, speaker recognition, speech enhancement,...

33:47

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

Просмотров 9662 года назад

Title: SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit Authors: Mirco Ravanelli Category: Tutorials Abstract: SpeechBrain is a novel open-source speech toolkit natively designed to support various speech and audio processing applications. It currently supports a large variety of tasks, such as speech recognition, speaker recognition, speech enhancement, s...

Speech Recognition with Next-Generation Kaldi (K2, Lhotse, Icefall)

3:04:22

Speech Recognition with Next-Generation Kaldi (K2, Lhotse, Icefall)

Просмотров 4,9 тыс.2 года назад

Title: Speech Recognition with Next-Generation Kaldi (K2, Lhotse, Icefall) Authors: Sanjeev Khudanpur, Daniel Povey, Piotr Żelasko Category: Tutorials Abstract: This tutorial introduces k2, the cutting-edge successor to the Kaldi speech processing, which consists of several Python-centric modules to enable building speech recognition systems, along with its enabling counterparts, Lhotse and Ice...

Language Modeling and Artificial Intelligence

1:02:31

Language Modeling and Artificial Intelligence

Просмотров 2072 года назад

Title: Language Modeling and Artificial Intelligence Authors: Tomáš Mikolov (CIIRC CTU Prague) Category: Keynotes Abstract: Abstract Statistical language modeling has been labeled as an AI-complete problem by many famous researchers of the past. However, despite all the progress made in the last decade, it remains unclear how much progress towards truly intelligent language models we made. In t...

Learning speech models from multi-modal data

1:00:15

Learning speech models from multi-modal data

Просмотров 4632 года назад

Title: Learning speech models from multi-modal data Authors: Karen Livescu (TTI-Chicago) Category: Survey talks Abstract: Abstract Speech is usually recorded as an acoustic signal, but it often appears in context with other signals. In addition to the acoustic signal, we may have available a corresponding visual scene, the video of the speaker, physiological signals such as the speaker's moveme...

Towards automatic speech recognition for people with atypical speech

1:04:17

Towards automatic speech recognition for people with atypical speech

Просмотров 5502 года назад

Title: Towards automatic speech recognition for people with atypical speech Authors: Heidi Christensen (University of Sheffield) Category: Survey talks Abstract: Abstract In the last decade we have seen how speech technologies for typical speech have matured and thus enabled the advancement of a multitude of services and technologies including voice-enabled conversational interfaces, dictation ...

1:15:02

Opening ceremony

Просмотров 822 года назад

Title: Opening ceremony Authors: Honza Černocký, Lukáš Burget, John Hansen, Lori Lamel, Phil Green, Hynek Heřmanský, Jan Grolich Category: Opening Abstract: For more details and PDF version of the paper visit: opening01

Child Language Acquisition studied with Wearables

58:22

Child Language Acquisition studied with Wearables

Просмотров 922 года назад

Title: Child Language Acquisition studied with Wearables Authors: Alejandrina Cristia (Ecole Normale Supérieure) Category: Survey talks Abstract: Abstract In recent years, the ease with which we can collect audio (and to a lesser extent visual information) with wearables has improved dramatically. These allow unprecedented access to the speech that children produce, and that which they year. Al...

Uncovering the acoustic cues of COVID-19 infection

1:03:19

Uncovering the acoustic cues of COVID-19 infection

Просмотров 662 года назад

Title: Uncovering the acoustic cues of COVID-19 infection Authors: Sriram Ganapathy (Indian Institute of Science, Bangalore) Category: Survey talks Abstract: Abstract The investigation of acoustic biomarkers of respiratory diseases has societal and public health impact following the onset of COVID-19 pandemic. The efforts in the pre-pandemic period focused on developing smartphone friendly diag...

Ethical and Technological Challenges of Conversational AI

51:15

Ethical and Technological Challenges of Conversational AI

Просмотров 922 года назад

Title: Ethical and Technological Challenges of Conversational AI Authors: Pascale Fung (Honk Kong University of Science & Technology) Category: Keynotes Abstract: Abstract Conversational AI (ConvAI) systems have applications ranging from personal assistance, health assistance to customer services. They have been in place since the first call centre agent went live in the late 1990s. More recent...

Adaptive listening to everyday soundscapes

1:02:36

Adaptive listening to everyday soundscapes

Просмотров 1282 года назад

Adaptive listening to everyday soundscapes

ISCA Medalist: Forty years of speech and language processing: from Bayes decision rule to deep l...

58:26

ISCA Medalist: Forty years of speech and language processing: from Bayes decision rule to deep l...

Просмотров 1572 года назад

ISCA Medalist: Forty years of speech and language processing: from Bayes decision rule to deep l...

WittyKiddy: Multilingual Spoken Language Learning for Kids - (3 minutes introduction)

3:45

WittyKiddy: Multilingual Spoken Language Learning for Kids - (3 minutes introduction)

Просмотров 412 года назад

WittyKiddy: Multilingual Spoken Language Learning for Kids - (3 minutes introduction)

Web Interface for estimating articulatory movements in speech production from acoustics and text...

3:19

Web Interface for estimating articulatory movements in speech production from acoustics and text...

Просмотров 422 года назад

Web Interface for estimating articulatory movements in speech production from acoustics and text...

NeMo (Inverse) Text Normalization: From Development To Production - (longer introduction)

12:58

NeMo (Inverse) Text Normalization: From Development To Production - (longer introduction)

Просмотров 4232 года назад

NeMo (Inverse) Text Normalization: From Development To Production - (longer introduction)

Automatic Radiology Report Editing through Voice - (3 minutes introduction)

2:37

Automatic Radiology Report Editing through Voice - (3 minutes introduction)

Просмотров 1102 года назад

Automatic Radiology Report Editing through Voice - (3 minutes introduction)

NeMo (Inverse) Text Normalization: From Development To Production - (3 minutes introduction)

3:17

NeMo (Inverse) Text Normalization: From Development To Production - (3 minutes introduction)

Просмотров 1462 года назад

NeMo (Inverse) Text Normalization: From Development To Production - (3 minutes introduction)

Save your Voice: Voice Banking and TTS for Anyone - (3 minutes introduction)

3:19

Save your Voice: Voice Banking and TTS for Anyone - (3 minutes introduction)

Просмотров 1052 года назад

Save your Voice: Voice Banking and TTS for Anyone - (3 minutes introduction)

Interactive and real-time acoustic measurement tools for speech data acquisition and presentatio...

2:28

Interactive and real-time acoustic measurement tools for speech data acquisition and presentatio...

Просмотров 422 года назад

Interactive and real-time acoustic measurement tools for speech data acquisition and presentatio...

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech - (Oral presentation)

1:23

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech - (Oral presentation)

Просмотров 382 года назад

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech - (Oral presentation)

F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement - (Or...

16:09

F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement - (Or...

Просмотров 1402 года назад

F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement - (Or...

Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility O...

1:28

Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility O...

Просмотров 282 года назад

Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility O...

Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia...

1:29

Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia...

Просмотров 742 года назад

Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia...

Conformer Parrotron: a Faster and Stronger End-to-end SpeechConversion and Recognition Model for...

1:14

Conformer Parrotron: a Faster and Stronger End-to-end SpeechConversion and Recognition Model for...

Просмотров 4122 года назад

Conformer Parrotron: a Faster and Stronger End-to-end SpeechConversion and Recognition Model for...

A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting ...

1:20

A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting ...

Просмотров 272 года назад

A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting ...

@chatting6231 2 месяца назад
This is a great step, I'm happy to hear about this. But what's next after 2 years...I see a few groups do the same way...How about yours?
@MarxOrx 4 месяца назад
❤
@ericmcnally5128 7 месяцев назад
Oh my god I have sooo many ideas regarding this topic and how the right app might simultaneously empower user-communities while also generating large datasets.
@lullu3467 11 месяцев назад
Is there any implementation?
@unknownhero6187 11 месяцев назад
Why does HMM model the P(A|W) not P(W|A) as it done in DNN in the speech recognition domain where A is a speech signal and W are words or phones?
@jatinjoshi7549 Год назад
Sir from where can I download the dataset
@05vishalwaghmare95 Год назад
Thank you, This video helped me lot.
@lgzster Год назад
I suspect that presaspiration is more prevalent in Welsh English speakers who also speak Welsh. I wonder if there are other varieties of English that also have preaspiration.
@ShilpaSj-i1e Год назад
That's cool!
@tilakrajchoubey5534 Год назад
Why the maximum output of the tts is only upto 11 seconds??
@rushdamansuri8545 2 года назад
Heyy, can you help me out with the code of this.
@ericniemeyer7505 2 года назад
Thank you!
@vladimirsherman6883 2 года назад
Can the speech enhancement model run in realtime to process an audio stream?
@BHAI_gaming56 2 года назад
i need dataset of this project.
@MrMdibi 2 года назад
May I please get a link to the pdf resources?
@atonelug2830 2 года назад
Hello , is there a software two seperate two voices overlapping ? I would be more than interesting.
@ruohoruotsi 2 года назад
Excellent!
@Sidharth_abhi7 2 года назад
Where can I get the source code of this project ma'am?
@abhinavbhardwaj727 2 года назад
very informative🙂🙂
@STONJAUS_FILMS 2 года назад
This is so cool. Wonder if it would improve the sound from some very old videos I have is there a way for me to test it ? I work on films and sometimes the audio from the clip on mics is not so good so I was wondering if I could give it a try to improve it with this AI ... thank you very much.
@the-only-baka 4 месяца назад
If you want i have a code you can run to train your own model
@ansschapendonk4560 2 года назад
Zielig, zielig, want Mirjam en Co baseren zich op de baanbrekende onderzoeksresultaten van Ans Schapendonk over de universele klankhelix. Bron: Ingo Plag wiens naam helixt in PLAGIAAT.
@ajitkumar15 2 года назад
can we get the code please.Thank you for this video
@sense2001 2 года назад
Hi! Is there a way to use this? This paper seems very useful for artists who need to listen to harmonies to practice songs.
@gokulsivakumar2489 2 года назад
Sir can i get the IEMOCAP dataset ?
@noahdrisort2005 2 года назад
1) Do you have a public implementation for these objective evaluations? 2) Should we have any quick conclusion for the quality of vocoder when using a global benchmark? 3) Is the quality of vocoders robust when we plug it into Acoustic Model? By my experiment, when I test vocoder with ground truth melspec input, it is good, but when connect it to AM to make TTS pipeline, the voice seems bad, I must finetune it with AM. Therefore, to benchmark for TTS, I think we should not evaluate partly AM and vocoder model?
@tusharrohilla7154 2 года назад
How the equal error rate is calculated? Can someone please and
@elonmuskfan9210 2 года назад
Can you share the code with me if you have done it in matlab?
@elonmuskfan9210 2 года назад
It is available in matlab??
@이종법-o2m 2 года назад
is there code for this method?
@ghaleb8530 3 года назад
is this toolkit free? or is it money paid?
@ghaleb8530 3 года назад
i want to know how to use this toolkit. plz help me
@ghaleb8530 3 года назад
i need your help on how i can use this toolkit plz
@ghaleb8530 3 года назад
plz how can i download this toolkit
@_rkk 3 года назад
any plan to release source code ?
@mono_to_STEREO 3 года назад
Thank you for an excellent presentation. 👍👍
@victorbct1027 3 года назад
What a stunning presentation ! A crystal clear explanation of a complexe subject. Thx Mr Teytaut

INTERSPEECH2021

Видео

Комментарии