- Видео 676
- Просмотров 76 776
INTERSPEECH2021
Добавлен 14 дек 2021
Videos from INTERSPEECH 2021 held in Brno, Czech Republic. There are over 608 talks published with abstract, authors and link to ISCA archive to download the paper PDF.nterspeech
An Introduction to Automatic Differentiation with Weighted Finite-State Automata
Title: An Introduction to Automatic Differentiation with Weighted Finite-State Automata
Authors: Awni Hannun
Category: Tutorials
Abstract: Weighted finite-state automata (WFSAs) have been a critical building block in modern automatic speech recognition. However, their use in conjunction with "end-to-end" deep learning systems is limited by the lack of efficient frameworks with support for automatic differentiation. This limitation is being overcome with the advent of new frameworks like GTN and k2. This tutorial will cover the basics of WFSAs and review their application in speech recognition. We will then explain the core concepts of automatic differentiation and show how to use it with WFS...
Authors: Awni Hannun
Category: Tutorials
Abstract: Weighted finite-state automata (WFSAs) have been a critical building block in modern automatic speech recognition. However, their use in conjunction with "end-to-end" deep learning systems is limited by the lack of efficient frameworks with support for automatic differentiation. This limitation is being overcome with the advent of new frameworks like GTN and k2. This tutorial will cover the basics of WFSAs and review their application in speech recognition. We will then explain the core concepts of automatic differentiation and show how to use it with WFS...
Просмотров: 524
Видео
Neural target speech extraction
Просмотров 1,3 тыс.2 года назад
Title: Neural target speech extraction Authors: Kateřina Žmolíková, Marc Delcroix Category: Tutorials Abstract: Dealing with overlapping speech remains one of the great challenges of speech processing. Target speech extraction consists of directly estimating speech of a desired speaker in a speech mixture, given clues about that speaker, such as a short enrollment utterance or video of the spea...
Concept to Code: Semi-Supervised End-To-End Approaches For Speech Recognition
Просмотров 5662 года назад
Title: Concept to Code: Semi-Supervised End-To-End Approaches For Speech Recognition Authors: Omprakash Sonie, Kannan Venkateshan Category: Tutorials Abstract: Training Automatic Speech Recognition (ASR) models usually requires transcribing large quantities of audio, which is both expensive and time-consuming. To overcome this limitation, and many semi-supervised training approaches have been p...
Intonation Transcription and Modelling in Research and Speech Technology Applications
Просмотров 5142 года назад
Title: Intonation Transcription and Modelling in Research and Speech Technology Applications Authors: Cong Zhang, Amalia Arvaniti, Kathleen Jepson, Katherine Marcoux Category: Tutorials Abstract: This tutorial covers the theory and practical applications of intonation research. The following three topics will be introduced to speech technology engineers and researchers new to the field of inton...
SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit
Просмотров 2,3 тыс.2 года назад
Title: SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit Authors: Aku Rouhe Category: Tutorials Abstract: SpeechBrain is a novel open-source speech toolkit natively designed to support various speech and audio processing applications. It currently supports a large variety of tasks, such as speech recognition, speaker recognition, speech enhancement, speech ...
SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit
Просмотров 3,1 тыс.2 года назад
Title: SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit Authors: Titouan Parcollet Category: Tutorials Abstract: SpeechBrain is a novel open-source speech toolkit natively designed to support various speech and audio processing applications. It currently supports a large variety of tasks, such as speech recognition, speaker recognition, speech enhancement,...
SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit
Просмотров 9662 года назад
Title: SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit Authors: Mirco Ravanelli Category: Tutorials Abstract: SpeechBrain is a novel open-source speech toolkit natively designed to support various speech and audio processing applications. It currently supports a large variety of tasks, such as speech recognition, speaker recognition, speech enhancement, s...
Speech Recognition with Next-Generation Kaldi (K2, Lhotse, Icefall)
Просмотров 4,9 тыс.2 года назад
Title: Speech Recognition with Next-Generation Kaldi (K2, Lhotse, Icefall) Authors: Sanjeev Khudanpur, Daniel Povey, Piotr Żelasko Category: Tutorials Abstract: This tutorial introduces k2, the cutting-edge successor to the Kaldi speech processing, which consists of several Python-centric modules to enable building speech recognition systems, along with its enabling counterparts, Lhotse and Ice...
Language Modeling and Artificial Intelligence
Просмотров 2072 года назад
Title: Language Modeling and Artificial Intelligence Authors: Tomáš Mikolov (CIIRC CTU Prague) Category: Keynotes Abstract: Abstract Statistical language modeling has been labeled as an AI-complete problem by many famous researchers of the past. However, despite all the progress made in the last decade, it remains unclear how much progress towards truly intelligent language models we made. In t...
Learning speech models from multi-modal data
Просмотров 4632 года назад
Title: Learning speech models from multi-modal data Authors: Karen Livescu (TTI-Chicago) Category: Survey talks Abstract: Abstract Speech is usually recorded as an acoustic signal, but it often appears in context with other signals. In addition to the acoustic signal, we may have available a corresponding visual scene, the video of the speaker, physiological signals such as the speaker's moveme...
Towards automatic speech recognition for people with atypical speech
Просмотров 5502 года назад
Title: Towards automatic speech recognition for people with atypical speech Authors: Heidi Christensen (University of Sheffield) Category: Survey talks Abstract: Abstract In the last decade we have seen how speech technologies for typical speech have matured and thus enabled the advancement of a multitude of services and technologies including voice-enabled conversational interfaces, dictation ...
Opening ceremony
Просмотров 822 года назад
Title: Opening ceremony Authors: Honza Černocký, Lukáš Burget, John Hansen, Lori Lamel, Phil Green, Hynek Heřmanský, Jan Grolich Category: Opening Abstract: For more details and PDF version of the paper visit: opening01
Child Language Acquisition studied with Wearables
Просмотров 922 года назад
Title: Child Language Acquisition studied with Wearables Authors: Alejandrina Cristia (Ecole Normale Supérieure) Category: Survey talks Abstract: Abstract In recent years, the ease with which we can collect audio (and to a lesser extent visual information) with wearables has improved dramatically. These allow unprecedented access to the speech that children produce, and that which they year. Al...
Uncovering the acoustic cues of COVID-19 infection
Просмотров 662 года назад
Title: Uncovering the acoustic cues of COVID-19 infection Authors: Sriram Ganapathy (Indian Institute of Science, Bangalore) Category: Survey talks Abstract: Abstract The investigation of acoustic biomarkers of respiratory diseases has societal and public health impact following the onset of COVID-19 pandemic. The efforts in the pre-pandemic period focused on developing smartphone friendly diag...
Ethical and Technological Challenges of Conversational AI
Просмотров 922 года назад
Title: Ethical and Technological Challenges of Conversational AI Authors: Pascale Fung (Honk Kong University of Science & Technology) Category: Keynotes Abstract: Abstract Conversational AI (ConvAI) systems have applications ranging from personal assistance, health assistance to customer services. They have been in place since the first call centre agent went live in the late 1990s. More recent...
Adaptive listening to everyday soundscapes
Просмотров 1282 года назад
Adaptive listening to everyday soundscapes
ISCA Medalist: Forty years of speech and language processing: from Bayes decision rule to deep l...
Просмотров 1572 года назад
ISCA Medalist: Forty years of speech and language processing: from Bayes decision rule to deep l...
WittyKiddy: Multilingual Spoken Language Learning for Kids - (3 minutes introduction)
Просмотров 412 года назад
WittyKiddy: Multilingual Spoken Language Learning for Kids - (3 minutes introduction)
Web Interface for estimating articulatory movements in speech production from acoustics and text...
Просмотров 422 года назад
Web Interface for estimating articulatory movements in speech production from acoustics and text...
NeMo (Inverse) Text Normalization: From Development To Production - (longer introduction)
Просмотров 4232 года назад
NeMo (Inverse) Text Normalization: From Development To Production - (longer introduction)
Automatic Radiology Report Editing through Voice - (3 minutes introduction)
Просмотров 1102 года назад
Automatic Radiology Report Editing through Voice - (3 minutes introduction)
NeMo (Inverse) Text Normalization: From Development To Production - (3 minutes introduction)
Просмотров 1462 года назад
NeMo (Inverse) Text Normalization: From Development To Production - (3 minutes introduction)
Save your Voice: Voice Banking and TTS for Anyone - (3 minutes introduction)
Просмотров 1052 года назад
Save your Voice: Voice Banking and TTS for Anyone - (3 minutes introduction)
Interactive and real-time acoustic measurement tools for speech data acquisition and presentatio...
Просмотров 422 года назад
Interactive and real-time acoustic measurement tools for speech data acquisition and presentatio...
Analysis and Tuning of a Voice Assistant System for Dysfluent Speech - (Oral presentation)
Просмотров 382 года назад
Analysis and Tuning of a Voice Assistant System for Dysfluent Speech - (Oral presentation)
F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement - (Or...
Просмотров 1402 года назад
F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement - (Or...
Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility O...
Просмотров 282 года назад
Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility O...
Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia...
Просмотров 742 года назад
Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia...
Conformer Parrotron: a Faster and Stronger End-to-end SpeechConversion and Recognition Model for...
Просмотров 4122 года назад
Conformer Parrotron: a Faster and Stronger End-to-end SpeechConversion and Recognition Model for...
A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting ...
Просмотров 272 года назад
A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting ...
This is a great step, I'm happy to hear about this. But what's next after 2 years...I see a few groups do the same way...How about yours?
❤
Oh my god I have sooo many ideas regarding this topic and how the right app might simultaneously empower user-communities while also generating large datasets.
Is there any implementation?
Why does HMM model the P(A|W) not P(W|A) as it done in DNN in the speech recognition domain where A is a speech signal and W are words or phones?
Sir from where can I download the dataset
Thank you, This video helped me lot.
I suspect that presaspiration is more prevalent in Welsh English speakers who also speak Welsh. I wonder if there are other varieties of English that also have preaspiration.
That's cool!
Why the maximum output of the tts is only upto 11 seconds??
Heyy, can you help me out with the code of this.
Thank you!
Can the speech enhancement model run in realtime to process an audio stream?
i need dataset of this project.
May I please get a link to the pdf resources?
Hello , is there a software two seperate two voices overlapping ? I would be more than interesting.
Excellent!
Where can I get the source code of this project ma'am?
very informative🙂🙂
This is so cool. Wonder if it would improve the sound from some very old videos I have is there a way for me to test it ? I work on films and sometimes the audio from the clip on mics is not so good so I was wondering if I could give it a try to improve it with this AI ... thank you very much.
If you want i have a code you can run to train your own model
Zielig, zielig, want Mirjam en Co baseren zich op de baanbrekende onderzoeksresultaten van Ans Schapendonk over de universele klankhelix. Bron: Ingo Plag wiens naam helixt in PLAGIAAT.
can we get the code please.Thank you for this video
Hi! Is there a way to use this? This paper seems very useful for artists who need to listen to harmonies to practice songs.
Sir can i get the IEMOCAP dataset ?
1) Do you have a public implementation for these objective evaluations? 2) Should we have any quick conclusion for the quality of vocoder when using a global benchmark? 3) Is the quality of vocoders robust when we plug it into Acoustic Model? By my experiment, when I test vocoder with ground truth melspec input, it is good, but when connect it to AM to make TTS pipeline, the voice seems bad, I must finetune it with AM. Therefore, to benchmark for TTS, I think we should not evaluate partly AM and vocoder model?
How the equal error rate is calculated? Can someone please and
Can you share the code with me if you have done it in matlab?
It is available in matlab??
is there code for this method?
is this toolkit free? or is it money paid?
i want to know how to use this toolkit. plz help me
i need your help on how i can use this toolkit plz
plz how can i download this toolkit
any plan to release source code ?
Thank you for an excellent presentation. 👍👍
What a stunning presentation ! A crystal clear explanation of a complexe subject. Thx Mr Teytaut