Realtime Speech Translation with Facebook's SeamlessM4T

Jarods Journey

Просмотров 8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 5 ноя 2024

Комментарии • 49

@Mowgi 11 месяцев назад ⁺⁹
This is really cool. Being able to create localised versions of your content would open you up to just so many more audiences. Imagine a time when RUclips allows you to upload a video with multiple audio tracks, each linked to a different localised voice track.
@Jarods_Journey 11 месяцев назад ⁺²
I can legit see this happening within the next 3-5 years, EASILY. The tech is already almost here so it's pretty exciting to see, especially since it wouldn't need to be real time
@Vorltzs 10 месяцев назад ⁺¹
Mr.Beast already did have my language(Indonesia) audiotrack in each of their videos... which, I bet, cost a whole lot of money. BUT, with this, everyone would be able to do that without much worry about money... hahah what a time to live in.
(pardon my grammar)
@henkhbit5748 10 месяцев назад
Thanks for the video. Can u run the code in windows? How can I access the demo code?
@artemgrauberger8775 10 месяцев назад ⁺²
So I know German and while the first translation into German was kinda meh, it lacked expression, the Elen-Ring audio was actually not bad. For example, I really see it coming as a quick way to translate dubs of video games or movies. Thanks for the video, keep it up!
@Jarods_Journey 10 месяцев назад
Honestly I could see this too! Dubbing a movie or game in another language could become quite a bit easier as this tech becomes better
@stevenwolf2647 10 месяцев назад ⁺³
The french was completely broken, but overall impressive
@aymericrichard6931 10 месяцев назад
Yes bad translation and bad voice. I don't recommend it to spam french people or rather yes it should be used as the "uncanny Valley" would trigger suspicion:D
@rodrimora 9 месяцев назад ⁺¹
Hi! is there any way make it "real" realtime? I mean without having to upload a .wav file or record each time. Feed it an audio input all the time and make an continuous audio output with the translation, even if there is a 3-10s delay.
@GiovaDuarte 10 месяцев назад ⁺²
Thanks for the demo. How hard would be to make an app that translates "on the fly"? I support a local church and the service is in Spanish but we have many guests that requires live translations to English. It will be cool if we could have an app that "listens" to the sermon and then speaks automatically to English or any other language. Do you have any ideas?
@fredericomolina1692 10 месяцев назад ⁺¹
Speaking another language actually makes you sound different. When I speak Spanish, I have a lower tone compared to when I speak English. Also, non-native speakers will still sound different despite being fluent in a different language. And that doesn’t even take into account dialects, speech impediment, tonal differences, pitch, etc
@jurandfantom 11 месяцев назад ⁺¹
Hey Jarods, would be possible to record how you setup that whole thing (visual studio) to moment when we can open gradio - 2:26 ? Personally, that would help me do similar stuff independent and live example with M4T would be great to learn on real project
@labradorsmeow718 2 месяца назад
That's really cool. Have you try the seamless-streaming one? I wonder how it real time
@RealShaneKing 11 месяцев назад ⁺¹
German speaker: Approve :) just not how you normally say it in everyday language
@shuaibahmed435 9 месяцев назад
Thanks for posting.. I built this from a Mac M2 and it's horribly slow.. so good to know it can actually be faster than the input length if you have a legit GPU
@nodewizard 11 месяцев назад ⁺¹
Thanks for this video. Is this model available locally? Can you give a tutorial on how to install it on Windows (I have a 4090 GPU). Thanks. I see a near future (within months) where we could have real time speech to speech translations for live streaming where multiple audiences can enjoy the "Zeitgeist" experience. And this will make life easier for channels that want to broadcast in other foreign markets without going through the boring text to speech way. Looks awesome. Stability AI's SDXL Turbo, which I use a lot, open the floodgates for turbo-fying AI. We're going to see it in LLM and TTS.
@Jarods_Journey 11 месяцев назад
It is all local, yup! Installation is more complicated than normal cause you have to deal with windows subsytem for Linux (wsl) and with that, it opens up a huge can of worms to deal with.
@nufh 11 месяцев назад
In a few years, we can have a portable real time translators. It will be widely use once the hardware limitation is gone.
@NootNoot. 11 месяцев назад ⁺¹
This is really interesting. Haven't been keeping up with this at all apart from lurking your videos. I've only known OpenAI and DeepL being pretty reliable for translations, and I guess Facebook's SeamlessM4T is another one on that list that has the benefit of being ran locally and no api costs. Are there any other known models for translations as well?
@Jarods_Journey 11 месяцев назад ⁺¹
Not sure, whisper and now seamless for local stuff, but seamless is still kinda in a research phase
@Bicyclesidewalk 11 месяцев назад
Ought to get a Linux distro running with the 4090. Great videos/info here. Keep them coming~
@Jarods_Journey 11 месяцев назад ⁺¹
Holding off until wsl2 no longer workers for me 😂
@cosmicdust632 8 месяцев назад
The french sounded real human like, except for 'M40'
@nesdi6653 6 месяцев назад
french sounded pretty good
@xaiyeon_xiuzhen 11 месяцев назад ⁺¹
holy shitttt , great video too ty o7
@jahhe2611 10 месяцев назад
If only RVC would add this as some sort of addon would be so handy, so we don't have to switch in between both of them!
@MisionJapon 4 месяца назад
Thank you for the video. I am currently living in Osaka, Japan and I am very interested in Instant Translation with AI models. However, what I understand by "Instant Translation" is not: "I say a sentence - The model translates it after a few seconds and I can hear it - I say another sentence - The model translates it after a few seconds and I can hear it..." What I understand by Instant Translation is: "You are talking in Japanese and, while you are talking in Japanese (with a delay of a few senconds), I listen your speech in Spanish. No matter how long it is the speech. May be the Japanese speech is 10 minutes long and I can begin to listen to it after 5 seconds in Spanish and will end 5 seconds after finishing in Japanese". Basically it is like having a interpreteur by your side.
Do you think is it possible to implement such a thing with this model?
@Jarods_Journey 4 месяца назад ⁺¹
Answering in "general": With text, yes, it is possible to do this as you can continually transcribe and translate the audio as it's being spoken. With text-to-speech, it's a bit harder and slower. You need to wait for the speaker to finish their sentence first before you translate it and have a tts engine speak it, therefore, you'd be at least a sentence behind. A short sentence: この人、怪しいと思います might translate to "I think this person is suspicious," but if you had this technology translate word for word, it'd probably be like "this person, suspicious I think"
@MisionJapon 4 месяца назад
@@Jarods_Journey there is a company called “Interprefy”. I think this is the idea of what I am trying to achieve.
@SrDarkDeimon 11 месяцев назад
espero pronto halla una herramienta como esta en tiempo real para poder hablar con naturalidad con personas de otros países
y por fin romper la berrera del idioma que nos a tenido separados durante años
@RobertJene 10 месяцев назад
4:51 could use use this with English to English with a trained model? And be better than So-Vits-SVC?
@Jarods_Journey 10 месяцев назад
Ahhh, this I don't think it's there yet unfortunately. But I haven't tried it tbh
@Godofreduardo 8 месяцев назад
Jarod, how can i use this tool without the 10 seconds limitation? i want to use for longer files, is it possible?
@Lv7-L30N 9 месяцев назад
i speak spanish and it sounds very good
@billyindrajaya 10 месяцев назад
Please create tutorial how to install locally
@marsonal 11 месяцев назад
can these models work on Gtx 1650 4 g , or are they vram hungry?
@Jarods_Journey 11 месяцев назад
It should fit, the models are less than 3gb large
@M4rt1nX 11 месяцев назад
Ok, te creo. El primero no estaba super bien, pero el que estaba traducido desde el Japones sonó bien. Des-afortunadamente el modelo tiene un poco de problemas con los artículos y preposiciones.
Try Swedish next time!!!!
@Jarods_Journey 11 месяцев назад ⁺¹
As it goes, this is the worst the tech will be :), can't wait till they do add more voices to the expressive model
@M4rt1nX 11 месяцев назад
@@Jarods_Journey Yes, it is a great way to see it.
@m.bashar4309 10 месяцев назад
Only 9 second works
@PokettoMusic 11 месяцев назад
it is cool yeah, but the translation to spanish is not that good lol
@dyoanima 10 месяцев назад
hola
@vvii100max 11 месяцев назад
"Am I the only one who sees this video in 360p quality?"
@Artholos 11 месяцев назад
I have 1080p 🎉
@finn_the_dog 11 месяцев назад
When I started to watch it was at 360 too, sometimes at the first minutes just after the video it's uploaded not all the qualities are available. Check if you can change the quality in the settings
@Jarods_Journey 11 месяцев назад
From the new upload, too early for 1080! :)
@finn_the_dog 11 месяцев назад
@@Jarods_Journey Notification Squad!
Great video, just for reference the model is quite interesting, as a native spanish speaker the translation was decent a bit robotic but the pronunciation of Godzilla was quite weird. You have a great channel

Следующие

Автовоспроизведение

How to Train & Install F5 TTS - New Language and Single Speaker Voice Clone