The ONLY FREE AI Voice Text-to-Speech YOU NEED!!! (Bark AI Full Tutorial)
HTML-код
- Опубликовано: 24 июл 2023
- Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying.
©️ Bark is now licensed under the MIT License, meaning it's now available for commercial use!
⚡ 2x speed-up on GPU. 10x speed-up on CPU. We also added an option for a smaller version of Bark, which offers additional speed-up with the trade-off of slightly lower quality.
Suno Bark AI Official Repo - github.com/suno-ai/bark
Bark AI Google Colab - colab.research.google.com/dri...
Bark AI Speaker Prompts suno-ai.notion.site/8b8e8749e...
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1littlecoder
I was just watching your previous video on the same topic when I stumbled upon this new video about the open-source text-to-speech tool and its usage in Colab. That's amazing! You're doing a great job with your content. Keep up the good work!"
Glad it was helpful!
Bark is probably the best text-to-audio AI around when it works, but it's also super unstable, which means it only works without falling apart like 20% of the time. I hope it improves with time; Bark 2 or Bark 3 might be a winner.
Another good video, thank you!
The Spanish version sounds too robotic, but I guess it will increasingly get better and better overtime.
I find these kind of tutorials very helpful thanks
Glad you liked it
Thank you for providing this video.
Great video! Can you show us how to use it for longform audio generation? Is there a colab for it, or a way to simply append the code in the provided colab? Help appreciated!
Thank you
Great stuff, thanks
You're welcome
Do you know if you can "update" the models to get better or train your own voice for bark?
There is some confusion about licensing at the start of this video. The source code of Bark is now licensed under MIT and can be used in commercial software. This has nothing at all to do with how you can or can't use output generated by Bark. At least in the US, AI generated content is automatically public domain content and that means anyone can do anything they want with it, regardless of how the source code of the generating software is licensed.
where do you get that idea? if you wrote the text, then it is copyrighted by you. You can't copyright the voice recording of it (I'm not sure there has been any ruling on this yet besides with MidJourney), but what you wrote is yours.
@@jamad-y7m I'm not sure what you're disagreeing with; any script you wrote would be copyrightable, but an ai generated audio of that script wouldn't be copyrightable. Neither of those has anything to do with Bark having an MIT license.
@1littlecoder, are there any tweaks to use BARK for real-time text-to-speech scenarios?
Please review a TTS system that is good for real-time speech generation from text. And I have a idea if we can implement a pipeline for speech generation like in chatgpt….. that will give us benefits in assistant like systems
Thank you. Well done and thorough. I appreciate that you used a notebook.
This is a begining of something that s will be great. 😉
man this was the perfect tts but the 14 second limit and long wait to process the tts makes it horrible ;( hope there a way to fix it
Thanks.
You're welcome
Could u make please a live streaming tutorial, so that i use this in this video 4 live streaming voice???
I've got GTX 1650 Max Q and Ryzen 9, can I use or no point?
how to clone voice using suno bark ?
this text to voice is limited, rgiht? cuz it allow me record seconds. but i want to create my pdf to audiobook? please answer me friend. great channel
looking to do the same thing!
Bark's idea of laughter is kinda maniacal :|
(the 1st example, that is)
I edited out the video clip where I said it's villainous. Maniacal is probably a better word :D
@@1littlecoder Next time, don't hold back. Let's call it as it is. If any Inteligence, artifical or otherwise, get's offended then that's on them.
Есть ли возможность использовать эту программу вместе с моделью, которую я сделал с помощью RVC?
Works for me, but no AMD GPU support, CPU takes quite a while. For now very stable, but it have 13-14 sec limit :(
I can't use it for a long text? only 13 sec
how to do this in amd graphic card? and can this clone other language or just english?
I could not able to create a lengthy audio (more than 40 words). It shows an error of "WARNING:bark.generation:warning, text too long, lopping of last 66.7%". Any solution for this?
this is not good
Bro I am getting generate_audio() is not defined. Help me sort it out.
Can you explain step by step targeting a layman?
So the voices are finally NOT randomized in Suno? Normally you'd get all sorts of different voices. Is it always the selected voice now?
Tortoise TTS is way better. I've tried both
Do you know if it's possible to import our own model ?
Bro the problem is we can't give longer text,then it is giving error
Great tutorial, thanks
I can never get it to behave like you would want for production. Voices always changing, hallucinations, and inference not real-time (Which is what I would need it for). I just tried it again on a 4090 with same result.
Tortoise TTS is way better. I've tried both. But it's not very fast and takes 15 gb of GPU memory on standard preset
@@MautozTech yeah but can you get it to keep the same voice and produce long outputs?
@@thedoctor5478 I will test long outputs later, but voice stays the same. I can send you an example, my mail is in the channel description
Why not [MAN] as described in the docs?
how can we change the rate in collab?
Make a video on best speech to text AI as well..
Can I use this for RUclips videos for free, RUclips videos that have ads
Not good enough yet. I need a TTS that runs in real time on my phone without an internet connection.
I guess we are just chillin' for the next one to three years. You and me. Chillin.
bro i am rohan , im studying CSE finial year please give any project title or idea for me
Can Bark run convincingly on an android phone? Has anyone tried to test this yet?
Yes, with google colab
Can I use this in phone
Did you try NeMo TTS models? Specifically Tacotron2Model and HifiGanModel seems to work much better and faster than Bark.
how to add kurdish language for text to speech
Hi, thanks for sharing, can you also do it offline, i mean by just using a VS code?
Yes, of course, If you have a GPU
why bark generates audio very slow. it takes like 3 mins to generate on my machine. is this normal duration?
It depends upon the configuration of your machine and length of the sound
Why iam getting error?
Its limited it only creates audio of 13 sec
As much as I like Bark, I still prefer MS/Azure TTS for a nice balance of quality and speed. It's near realtime on a local machine, and even faster than Eleven Labs.
Bark is okay, but it pretty much needs to run on a dedi machine tweaked for speed, and ideally with streamed/cache response to make it viable for conversational-NLP.
You can run Azure neural/AI voices locally? I'm not seeing anything about that.
@@blisphul8084
I've run MS voices locally before, but obviously the connected versions are newer/better (and some require subs and some dont). It really depends on your needs. Like any model, if it's small enough, you can run it.
Windows itself has embedded TTS, but it's unclear how extensible it is offline.
@@interspacer4277 I hope Microsoft can improve their Azure ai voices, I don't know if it's just me, but I do think they voices are falling behind 11 labs and other company's products.
Bad at Voice cloning.Doesn't have inbuilt support for voice cloning. There are some extensions for that, but not good
Is there an api to do curl and http requests?
elevenlabs.io/?via=1lc for now!
Bro how to download generated audio from Google collab to local machine?
If you right click there you'll see save as option
@@1littlecoder Yes i made it. Thanks man.
That is - no quantized version?
Nope they're the full one.
quality is not usable... personally i would never put this voice on the video...
Bark is cool, but sadly kinda slow! 😅
can we use it for CV
Do you mean Computer Vision ?
@@1littlecoder curriculum vitae 😁
12:32 Translation: “Your colleague thinks that your German is extremely bad. But I suppose your English isn’t terrible!”
It's not better than tortoise TTS
Also, there's tortoise-tts-fast, but bark is very, VERY promising.
@@gabluz currently their voice cloning and tts is way to inferior to tortoise, but let's see if they improve it
@@topcca bark is also incredibly slow. That's disappointing.
@@gabluz yes, but for a reason.. there is always a balance between quality and quantity.. if you want good quality results, it will take more time
kept saying Bart too lol
Honestly I don't know why maybe because I'm using Bard a lot or don't know. Stupid mistake
@@1littlecoder you're good brother i meant I kept saying bart when reading bark haha
@@fractalarbitrage Oh I did that too. Edited it out in many places.
Sometime the voice are bad
Very bad tutorial