I have worked with a non-profit that helps with Fair housing problems. I think a good use case would be receiving calls for a business and helping the customers understand if they are having a real fair housing problem. Would Groq be able to receive phone calls?
My first thought is how can we use this for scam baiting? We just need an elderly person's voice option to make the call and then prompt the AI to waste the scammers time talking about gift card activation codes.
🎯 Key Takeaways for quick navigation: 00:32 *🧠 Introduction to Groq's LPU (Large Language Model Processing Unit)* - Introduction to Groq's LPU architecture designed specifically for AI inference. - Explanation of the need for LPU in large language model inference. - Comparison between LPU and other processing units like CPU and GPU. 05:37 *🔍 Comparison between CPU and GPU* - Description of CPU as the central processing unit and its limitations in parallel computing. - Explanation of GPU architecture, parallel computing power, and its expansion beyond gaming. - Illustration of the difference between CPU and GPU through a painting demonstration. 06:05 *🔄 Limitations of GPU in Large Language Model Inference* - Discussion on the limitations of GPU in handling large language model inference. - Explanation of the complexities in achieving sequential execution on GPU. - Overview of the latency issues and the need for complex control mechanisms. 09:47 *🚀 Groq's LPU Architecture and Performance Benefits* - Introduction to Groq's LPU architecture designed for sequential tasks and low latency. - Explanation of the simplified architecture and shared memory advantages. - Discussion on the predictability and performance gains achieved with Groq's LPU. 11:37 *🗣️ Applications of Fast Inference Speeds* - Exploration of potential applications such as real-time voice AI for natural conversations. - Discussion on the reduction of latency enabling smoother interactions. - Demonstration of real-time voice AI and its impact on user experience. 13:17 *🖼️ Utilization in Image and Video Processing* - Highlighting the effectiveness of Groq for real-time image and video processing. - Demonstration of image processing capabilities for various applications. - Discussion on unlocking consumer-facing use cases with fast inference speeds. 14:40 *🤖 Building Real-time Voice AI with Groq* - Discussion on building outbound sales agents using real-time voice AI. - Introduction to platforms like Vee for integrating voice AI into applications. - Demonstration of setting up a real-time voice AI assistant using Groq's model. 00:00 *📞 Setting Up Real-time Voice AI Cold Call Agent* - Setting up a real-time voice AI cold call agent using Groq technology. - Integration of voice AI capabilities into existing agent systems. - Configuring API calls and server URLs for seamless communication between systems. 19:18 *🛠️ Integrating Real-time Voice AI with Existing Agent Systems* - Demonstrates how to integrate real-time voice AI with existing agent systems. - Setting up agent tools for making phone calls and receiving transcriptions. - Configuring metadata and webhooks for seamless communication between platforms. 20:41 *📞 Configuring Call Functionality and AI Assistant* - Configuring call functionality within agent systems for real-time voice AI interaction. - Setting up dynamic message generation and personalized interactions. - Defining schemas, URLs, and metadata for effective communication between systems. Made with HARPA AI
This is one true gem of a video that focusses more on the use case. Thank you for breaking down the concepts really well and showing us demo of it's capabilities
@@hiandrewfisher Sales bot or human, what ever company still thinks in our time that cold calling is the way to go, is beyond the point of saving, and it should go bankrupt, for its own stupidity. The bots will just speed up that process.
Creating a UI questionnaire for non coder types to build applications to solve problems. Mostly business applications that might otherwise require a developer or consultant.
Its easy to see this will replace all callcenters very soon. I assume they originally developed this chip for the new Tesla Autopilot software, that is mainly AI/video based.
The phone number thing is interesting... makes me fantasize about being able to have this as a replacement for the "leave a message after the beep" answering machines for your mobile if you don't get a call. A lot of people find leaving a message without having a conversation really awkward, so if you could instead connect to an AI assistant like this that actually talks to you, you could leave better messages, and the AI can summarize the conversation and leave you a txt message of the contents, or just leave their own summarized voice message.
I can't trust anything anymore! The demo in the end is very impressive This is so powerful but also scary, what the world will look like in 12 month, when all the communication are driven by AI?
How wonderful, this is bound to improve trust among people and all of our lives. This is the best thing that science has wrought since industrialized warfare. Thank you, technology.
hey great video - can you do a full walkthrough of relevanceai and how you set that agent up as its not possible to follow from your video as looks like you had some pre defined steps in there thanks or drop and drop a link to the code you used to build this? thanks
I'll have to try this. I managed to get very fast, close to realtime speech with the chatgpt api using a few queues and a local text to speech. The slowest part was the actual speech to text processing i believe. I was using whisper before they added all the new upgrades to the gpt api (this was when gpt3.5 just came out basically). It just processed two sentences to speech and put out the audio while it provessed the next sentences. The issue was that twilio made it very difficult to work with this since I needed to make it a stream and that required some realtime communication protocol that worked over phone, so i just stopped and had my own little chat assistant. Im a weeb. It was an anime girl ai assistant.
We did this too, some of the audio engines even give an output that tells you the realtime factor -> if it's less than one, it means you can generate the sentences faster than they can be spoken! Basically we used a queue and pipe-lining to reduce the mean time to first output. I don't think you need these LPU things unless you're trying to use an online service that just bulk process a bunch of sentences.
If we can get ai bots to do this, surely we can get them filling up the comments sections of yt videos too with sensible yet gratuitous, meaningless, insincere comments. Get after it guys!
They have been filling it up, along with all Social media, especially Twitter, oh and dating sites for a long time. Twitter is probably the worst. But Sorry fellas, you’re paying for a dating membership because of the AI girls they gave you. Happy for all the good relationships found from dating sites.
Another awesome video with great presentation and overview ,i give your video's example to many to make them understand how to educate viewer abour particular thing and tell about what,why,how and then implement things in easy way possible. Keep feeding us quality content buddy :-))
Please check your view with the name " "okay, but I want GPT to perform 10x for my specific use case" - Here is how ". A lot of people, including me, can't work with the code. There are mistakes in it. Just choose a T4 GPU and you'll figure it out for yourself. Thanks for your attention
One effective usage for voice agents, for now is for incoming calls for leads coming in from ads, or website/social media. They have a real interest in the product/service beforehand. This is a viable use. As for cold calls - I do not think they are ready yet.
Hi Jason , great content , I just have one remark concerning the demo , the video is being cut it would be really nice if it was left intact just to have an idea of the latency , otherwise nice video
Hesu, Jason The best channel You grew so much Since the first video I love this moment Where I am like, Opening the feed, Oh okay, Jason released a new video, "Well, it's probably _Good As Always_". ... Proceed to watch ... ABSOLUTE PERFECTION HANDS DOWN MAJESTIC INFORMATION BOILED DOWN LIKE A BOOSTED MONKEY ANIMAL YOU ARE NEVER HAVE I SEEN THINGS PUT IN THAT MANNER TOGETHER MUCH HARMONY STRONG BALANCE RESONANCE LEVEL? DEEeeeeee eeeeeeeeeee eeeeeP. From the Bottom of my heart, With Love & Respect Ivan
This is awesome. I've seen a bunch of Voice AIs and all of them have terrible latency issues as well as obvious AI voices. Using Groq to get the latency way down and custom voices with PlayHT solves both issues. Thanks for sharing!
As far as I know from the All in Podcast, “Groq” isn’t particularly made to be the LPU or language processing unit. It was build as a very parallel processor and had little use case until it was a perfect fit for LLMs. The brown skinned dude from the podcast owning a stake in the “Groq” company, also explained, that they didn’t have a compiler as in Nvidias Cuda, thus they build one in the last year. As the company was working on the idea for a while. It is more like the use case fits the product. LLMs definitely don’t exist long enough, that it was specifically made for it. So even as the LPU might be an adequate description right now: It rather looks like the chip picked up that profession, when growing up/maturing. Perfect timing interval for success: -Later and we would see another chip taking the spotlight, even if a little later. -earlier and the company might have bankrupted, if no use case were to be found
The company wasn't built for LLM's , mostly for providing processors specifically for Machine Learning use cases. The LLM wave was just something they were uniquely in a strong position to pursue, so they made a small natural pivot.
twilio is free to setup signup even give you test $. buck a month fotr a number. vapi is probaby free to get going very cheap to keep up. whatsapp looked local gpt->api powered maybe.
Looked like there was some cuts between when you finished speaking and when the bot starts speaking. Can we see the actual unedited version? I've had issues with groq getting to the first token.
It's not a new concept, LPU has been existent since GPU.. the thing with Groq is they have dedicated chips for LPU. You train with GPU, and execute LLMs with LPUs.
I'm excited to share my thoughts about Sora Pika Labs' Runway ML and other amazing tools like Synthesia, Speechify, Suno plug-in, Copilot, Grok-1, Claude Opus, Gemini Ultra, ChatGPT, and more. Stay updated!
I would say Trelis Research has good content youtube.com/@TrelisResearch?si=oM1o4NaE30h2nI4y and learning wise all of Lex Fridman youtube.com/@lexfridman?si=yHJb1O-mzDYqS6c1
@@amrdeabes6338 search up Neuro-sama it’s an AI vTuber that finish 2023 as the most popular, female streamer, even though she’s an AI the way, she talks and responds is insane and her creator Vetle put in a lot of work until the code. And I’m almost certain he didn’t use a large language model.
I think it'd be very cool to use this for on-demand mini language lessons. Imagine before you go into any situation where you will be able to use your target language you can set up a quick call with the AI and have it role-play a conversation with you. And you could iteratively improve your language skills per situation. And have transcripts to further work on with your flesh and blood language teacher.
As much as exciting Groq is in voice ai.LPU may also help in video generating ai models like sora. since sora uses the same structure as llm. i believe it will help create videos faster and longer.
@@PazLeBon I don't mind waiting. Been wanting to make 3 games for 20 years. I can wait a bit longer for AI to get to that point. Will be fun choosing voices for my characters. Hell, Google has Genie now. Can't be much longer.
Awesome tutorial! The output seems to be conversation-aware. How can I train the voicebot so it will handle questions, and scripted answers the way I want it to? Would this be done in Groq? Your fitness caller did a great job and asked relevant questions to qualify you and give her an idea of where to go with the conversation...and the focus was on helping you and sales. Keep up the great work! I'm going to watch your video on how you built AI Agents for Research.
can you also use AI to not have double audio when playing videos? would help. but right, what the world now needs is more cold robocalls to sell sh1t :D nah, that sucks.
Okay, let's break this down step-by-step: Given information: - Usage: 240 hours - Transcription provider: Deepgram ($0.01/min or $0.60/hr) - Voice provider: ElevenLabs ($0.04/min or $2.40/hr) - Model provider: GPT-3.5 ($0.02/min or $1.20/hr) Step 1: Calculate the total minutes of usage. Total minutes = 240 hours x 60 minutes/hour = 14,400 minutes Step 2: Calculate the cost for transcription. Transcription cost = $0.01/min x 14,400 minutes = $144 Step 3: Calculate the cost for voice. Voice cost = $0.04/min x 14,400 minutes = $576 Step 4: Calculate the cost for the model. Model cost = $0.02/min x 14,400 minutes = $288 Step 5: Calculate the total cost. Total cost = Transcription cost + Voice cost + Model cost Total cost = $144 + $576 + $288 = $1,008 Therefore, the total cost for 240 hours of usage with Deepgram for transcription, ElevenLabs for voice, and GPT-3.5 for the model will be $1,008. so... 1008usd a month for 8 hours per day, is it cheaper than hiring someone?
@@N7Tonik well... that was previous way of doing things... you don't know how cheaper today, (october 2024) with streaming API we don't have to use other models with almost 100 ms latency too... cheaper, faster, grounded to knowledge base, able do function calls...
Do another application that shows real-time speed to do something that no body expects. Maybe like super fast poker bot/trading analysis/ etc something that is massive but done in 1 sec
25:42 watch the clock on the phone - several seconds of delay was edited out. it's an interesting setup, but this is still far from the experience that's currently possible with an LLM. I don't think LLMs are useful for this use-case, and probably won't be in the near future. if you want something that's going to be natural to talk to, it needs to be trained and optimized specifically for real time conversations - Google had some tech some years back and briefly released some videos demonstrating real time conversations with an AI that was actually built for it, it was very convincing, would even interject "uhm" and "hmm" like a person. It was never released, so either they realized this was too likely to get abused, or they faked the videos. I think it could be done - but not with an LLM. It's not as simple as just making them faster - people interrupt each other in real conversations, they make sounds of acknowledgement just to let you know they're listening, lots of behaviors that would throw off an algorithm... LLMs were just not designed to work this way.
For functions, you can just write your own “AI function” similar to Marvin AI like we did in the Rust Auto GPT udemy course. So even though it’s not supported yet, we should be able to take a “hacky” approach
The problem with Groq is that none of the LLMs it supports can handle real-life situations. They are inconsistent and generate mixed results, especially if you ask for the output in JSON. Even if you break down the problem into multiple steps, getting consistent results on each step is difficult. If anyone has a suggestion, let me know.
The demo at 13:29 is a bit silly; there's nearly 30 seconds between when the image is uploaded and the outputs being shown, during which the server could be preemptively generating them. Not saying it's faked but it's the type of thing a company might do to make their method seem more impressive.
What are the use cases you want to see me building with Groq?
Are you working for groq now?
I need a ai girlfriend
Personal agent that ‘sees’ zpwhat you do on your computer / phone and helps with it. (By sending a screenshot to it)
Doing literally anything requiring intelligence beyond a basic best-case simple script.
I have worked with a non-profit that helps with Fair housing problems. I think a good use case would be receiving calls for a business and helping the customers understand if they are having a real fair housing problem.
Would Groq be able to receive phone calls?
My first thought is how can we use this for scam baiting? We just need an elderly person's voice option to make the call and then prompt the AI to waste the scammers time talking about gift card activation codes.
Until the AI conjures up real credit card information from within its data and then some unfortunate persons life savings are gone 😢
@@venim1103 yeah nah mate.
i like this idea!
i suppose it is the other way around^^ natural speaking "people" will now scam old persons
It still costs tokens tho
🎯 Key Takeaways for quick navigation:
00:32 *🧠 Introduction to Groq's LPU (Large Language Model Processing Unit)*
- Introduction to Groq's LPU architecture designed specifically for AI inference.
- Explanation of the need for LPU in large language model inference.
- Comparison between LPU and other processing units like CPU and GPU.
05:37 *🔍 Comparison between CPU and GPU*
- Description of CPU as the central processing unit and its limitations in parallel computing.
- Explanation of GPU architecture, parallel computing power, and its expansion beyond gaming.
- Illustration of the difference between CPU and GPU through a painting demonstration.
06:05 *🔄 Limitations of GPU in Large Language Model Inference*
- Discussion on the limitations of GPU in handling large language model inference.
- Explanation of the complexities in achieving sequential execution on GPU.
- Overview of the latency issues and the need for complex control mechanisms.
09:47 *🚀 Groq's LPU Architecture and Performance Benefits*
- Introduction to Groq's LPU architecture designed for sequential tasks and low latency.
- Explanation of the simplified architecture and shared memory advantages.
- Discussion on the predictability and performance gains achieved with Groq's LPU.
11:37 *🗣️ Applications of Fast Inference Speeds*
- Exploration of potential applications such as real-time voice AI for natural conversations.
- Discussion on the reduction of latency enabling smoother interactions.
- Demonstration of real-time voice AI and its impact on user experience.
13:17 *🖼️ Utilization in Image and Video Processing*
- Highlighting the effectiveness of Groq for real-time image and video processing.
- Demonstration of image processing capabilities for various applications.
- Discussion on unlocking consumer-facing use cases with fast inference speeds.
14:40 *🤖 Building Real-time Voice AI with Groq*
- Discussion on building outbound sales agents using real-time voice AI.
- Introduction to platforms like Vee for integrating voice AI into applications.
- Demonstration of setting up a real-time voice AI assistant using Groq's model.
00:00 *📞 Setting Up Real-time Voice AI Cold Call Agent*
- Setting up a real-time voice AI cold call agent using Groq technology.
- Integration of voice AI capabilities into existing agent systems.
- Configuring API calls and server URLs for seamless communication between systems.
19:18 *🛠️ Integrating Real-time Voice AI with Existing Agent Systems*
- Demonstrates how to integrate real-time voice AI with existing agent systems.
- Setting up agent tools for making phone calls and receiving transcriptions.
- Configuring metadata and webhooks for seamless communication between platforms.
20:41 *📞 Configuring Call Functionality and AI Assistant*
- Configuring call functionality within agent systems for real-time voice AI interaction.
- Setting up dynamic message generation and personalized interactions.
- Defining schemas, URLs, and metadata for effective communication between systems.
Made with HARPA AI
Thanks, Jason for the great work!
Thanks a lot mate!
@@AIJasonZ use the ai to order pizza
This is one true gem of a video that focusses more on the use case. Thank you for breaking down the concepts really well and showing us demo of it's capabilities
Yes because we all want more cold calls from sales bots.
came here to also say this. Yech... Leave the calling to the humans, everything automated should have been an email.
Sure but what about more cold calls from better sales bots?
@@hiandrewfisher
Sales bot or human, what ever company still thinks in our time that cold calling is the way to go, is beyond the point of saving, and it should go bankrupt, for its own stupidity. The bots will just speed up that process.
@@nikolaizaicev9297i make 100k a year off of coldcalls
@@nikolaizaicev9297amen to that
Creating a UI questionnaire for non coder types to build applications to solve problems. Mostly business applications that might otherwise require a developer or consultant.
great video, Jason! thank you for the insigths on how to build these flows
Its easy to see this will replace all callcenters very soon. I assume they originally developed this chip for the new Tesla Autopilot software, that is mainly AI/video based.
They even added vocal fry to the woman’s voice for realism. * slow clap *
2:55 "In every frame 2 million pixels have to be generated"
This guy broke down graphics in a way that made sense, for the first time in 20 years.
Good for you ✌
isnt true dough, it just needs to get the pixels who are changing. And you dont render every pixel alone, but in object for object.
@@danielchoritz1903 In graphics you are rendering every pixel. You're talking about video codecs, whole different ballgame.
These are amazing use cases!! Lowering the barriers of entry to do high quality business associated with big companies!!
Thanks Jason
The phone number thing is interesting... makes me fantasize about being able to have this as a replacement for the "leave a message after the beep" answering machines for your mobile if you don't get a call. A lot of people find leaving a message without having a conversation really awkward, so if you could instead connect to an AI assistant like this that actually talks to you, you could leave better messages, and the AI can summarize the conversation and leave you a txt message of the contents, or just leave their own summarized voice message.
nobody listens to answerfone messages, not since abourt 2007 id say haha
That’s a super amazing idea. Build it! You will become rich lol
You just described an AI secretary and yes this would be an amazing tool. Build it !!
With all this current technology it is possible to create a really cool AI girlfriend. And highly customizable.
@@abandonedmuse Launched it today and I'm still not rich lol
This is really interesting. Thanks for the sharing Jason.
I can't trust anything anymore! The demo in the end is very impressive
This is so powerful but also scary, what the world will look like in 12 month, when all the communication are driven by AI?
you would be busy scratching your balls, while AI does everything else.
You're incredible. Thanks for this Demo, Jason Sensei.
How wonderful, this is bound to improve trust among people and all of our lives. This is the best thing that science has wrought since industrialized warfare. Thank you, technology.
Many thanks for never bothering to define what LPU is an actual acronym for.
hey great video - can you do a full walkthrough of relevanceai and how you set that agent up as its not possible to follow from your video as looks like you had some pre defined steps in there thanks or drop and drop a link to the code you used to build this? thanks
17:17 That is so fast and seamless. Super cool.
The Sales Agencies after watching this video: „Ah f*** this sh*t, let‘s learn some new skills“
😂😂😂😂😂😂😂😂😂
I'll have to try this. I managed to get very fast, close to realtime speech with the chatgpt api using a few queues and a local text to speech. The slowest part was the actual speech to text processing i believe. I was using whisper before they added all the new upgrades to the gpt api (this was when gpt3.5 just came out basically).
It just processed two sentences to speech and put out the audio while it provessed the next sentences. The issue was that twilio made it very difficult to work with this since I needed to make it a stream and that required some realtime communication protocol that worked over phone, so i just stopped and had my own little chat assistant. Im a weeb. It was an anime girl ai assistant.
We did this too, some of the audio engines even give an output that tells you the realtime factor -> if it's less than one, it means you can generate the sentences faster than they can be spoken! Basically we used a queue and pipe-lining to reduce the mean time to first output.
I don't think you need these LPU things unless you're trying to use an online service that just bulk process a bunch of sentences.
super @@ultimape
the thumbnail of this video is really cool, the text looks like it sticks out.
If we can get ai bots to do this, surely we can get them filling up the comments sections of yt videos too with sensible yet gratuitous, meaningless, insincere comments. Get after it guys!
They have been filling it up, along with all Social media, especially Twitter, oh and dating sites for a long time. Twitter is probably the worst. But Sorry fellas, you’re paying for a dating membership because of the AI girls they gave you. Happy for all the good relationships found from dating sites.
I wonder how many "Nigerian Prince" this thing could run in parallel? 🤔🤭
😂😂😂😂😂😂
1:31 "I haven't do exercise at all for the past 3...or 6 months..." 😂
Another awesome video with great presentation and overview ,i give your video's example to many to make them understand how to educate viewer abour particular thing and tell about what,why,how and then implement things in easy way possible.
Keep feeding us quality content buddy :-))
Thank you for covering this, we are building AI Applications using groq. Fast, cheap, and reliable.
Please check your view with the name " "okay, but I want GPT to perform 10x for my specific use case" - Here is how ".
A lot of people, including me, can't work with the code. There are mistakes in it.
Just choose a T4 GPU and you'll figure it out for yourself.
Thanks for your attention
One effective usage for voice agents, for now is for incoming calls for leads coming in from ads, or website/social media. They have a real interest in the product/service beforehand. This is a viable use. As for cold calls - I do not think they are ready yet.
Loved this Jason!!! Thank you
Hi Jason , great content , I just have one remark concerning the demo , the video is being cut it would be really nice if it was left intact just to have an idea of the latency , otherwise nice video
Hesu, Jason
The best channel
You grew so much
Since the first video
I love this moment
Where I am like,
Opening the feed,
Oh okay, Jason released a new video,
"Well, it's probably _Good As Always_".
...
Proceed to watch
...
ABSOLUTE PERFECTION
HANDS DOWN
MAJESTIC
INFORMATION BOILED DOWN LIKE A
BOOSTED MONKEY ANIMAL YOU ARE
NEVER HAVE I SEEN
THINGS PUT IN THAT MANNER TOGETHER
MUCH HARMONY
STRONG BALANCE
RESONANCE LEVEL?
DEEeeeeee
eeeeeeeeeee
eeeeeP.
From the Bottom of my heart,
With Love & Respect
Ivan
Can't wait to try this on some use cases I have in mind :D Great video as usual ;)
So not quite there yet or reliable enough but getting closer. Thanks for these insights!
This is awesome. I've seen a bunch of Voice AIs and all of them have terrible latency issues as well as obvious AI voices. Using Groq to get the latency way down and custom voices with PlayHT solves both issues. Thanks for sharing!
Thanks for this awesome content, first time on your page but this is great and simple to follow and understand!
As far as I know from the All in Podcast, “Groq” isn’t particularly made to be the LPU or language processing unit. It was build as a very parallel processor and had little use case until it was a perfect fit for LLMs.
The brown skinned dude from the podcast owning a stake in the “Groq” company,
also explained, that they didn’t have a compiler as in Nvidias Cuda, thus they build one in the last year.
As the company was working on the idea for a while. It is more like the use case fits the product.
LLMs definitely don’t exist long enough, that it was specifically made for it.
So even as the LPU might be an adequate description right now:
It rather looks like the chip picked up that profession, when growing up/maturing.
Perfect timing interval for success:
-Later and we would see another chip taking the spotlight, even if a little later.
-earlier and the company might have bankrupted, if no use case were to be found
The company wasn't built for LLM's , mostly for providing processors specifically for Machine Learning use cases. The LLM wave was just something they were uniquely in a strong position to pursue, so they made a small natural pivot.
It will be something when ai can interrupt into a conversation correctly.
wow man this is incredible... holy molly!
Thanks Jason for the good work.
Great video, but just to clarify: GPU is Graphics Processing Unit not General Purpose Unit
Even if you are misleading with the idle cut times on the demo its impressive.
I'd love to know how much it cost for the demo you created in the video. There were a lot of parts there.
twilio is free to setup signup even give you test $. buck a month fotr a number. vapi is probaby free to get going very cheap to keep up. whatsapp looked local gpt->api powered maybe.
God my manifestation skills went through the roof this time. Only 7 minutes from process start until this video magically materialized.
As a non-dev, I am _so_ looking forward to tools like these.
Looked like there was some cuts between when you finished speaking and when the bot starts speaking. Can we see the actual unedited version? I've had issues with groq getting to the first token.
Why are there cuts every time before the agent answers in the final demo? Was she perhaps taking more time to respond than video shows?
The highly-anticipated tool use (aka function calling) feature for Groq API was released last week!
It's not a new concept, LPU has been existent since GPU.. the thing with Groq is they have dedicated chips for LPU. You train with GPU, and execute LLMs with LPUs.
I'm excited to share my thoughts about Sora Pika Labs' Runway ML and other amazing tools like Synthesia, Speechify, Suno plug-in, Copilot, Grok-1, Claude Opus, Gemini Ultra, ChatGPT, and more. Stay updated!
WOW! Amazing tutorial. Top 3 I've watched ever! Keep up the great work! 🎉
what other two?
Tell us what other two asap ! Why are you treatening us like that.
I would say Trelis Research has good content youtube.com/@TrelisResearch?si=oM1o4NaE30h2nI4y and learning wise all of Lex Fridman youtube.com/@lexfridman?si=yHJb1O-mzDYqS6c1
Seems like my replies to the questions were deleted by RUclips 😑
Really great synopsis
This makes what Vedal achieved with Neuro-sama even more impressive. He did all of that with pure code without any LLM or LPU
what do u mean ?
@@amrdeabes6338 search up Neuro-sama it’s an AI vTuber that finish 2023 as the most popular, female streamer, even though she’s an AI the way, she talks and responds is insane and her creator Vetle put in a lot of work until the code. And I’m almost certain he didn’t use a large language model.
@@amrdeabes6338 search Neuro-sama, she’s an AI vTuber
It will not be intelligent just logical there is a difference. Read up Ai version 101
I need one acting as my office assistant answering my phone calls.
I can build it for you for some money. Would you like that?
Great share. Seriously grateful for creators like you!
I think it'd be very cool to use this for on-demand mini language lessons. Imagine before you go into any situation where you will be able to use your target language you can set up a quick call with the AI and have it role-play a conversation with you. And you could iteratively improve your language skills per situation. And have transcripts to further work on with your flesh and blood language teacher.
Good stuff! Keep it up
As much as exciting Groq is in voice ai.LPU may also help in video generating ai models like sora. since sora uses the same structure as llm. i believe it will help create videos faster and longer.
12:30 The ai voice sounds amazingly natural. All the lonely people will soon have a friend to talk to.
it doesnt tho , i dont agree, they all sound fake, every one of them..even tho we use them in various forms
@@PazLeBon they sound fake now. They wont in a year or less.
@@RhumpleOriginal oh for sure but 2 years of that numbnuts sound? oh dear :)
@@PazLeBon I don't mind waiting. Been wanting to make 3 games for 20 years. I can wait a bit longer for AI to get to that point. Will be fun choosing voices for my characters. Hell, Google has Genie now. Can't be much longer.
@@RhumpleOriginal will be 56 billlion games out. so gl
Awesome tutorial! The output seems to be conversation-aware. How can I train the voicebot so it will handle questions, and scripted answers the way I want it to? Would this be done in Groq? Your fitness caller did a great job and asked relevant questions to qualify you and give her an idea of where to go with the conversation...and the focus was on helping you and sales. Keep up the great work! I'm going to watch your video on how you built AI Agents for Research.
FYI AI cold calling is illegal in the US per the FTC. You WILL get fined into oblivion if you use any automation or AI to make AI generated calling.
I can’t wait for this technology to get better. I need AI agents to for sales 😊
It's good enough now, why wait.
This is great but I just saw you cut the latency between your voice and the AI voice
Why are you edited your final demo to make responces appear faster than they actually are?
Great video, would be awesome if you could make one video of building a wrapper like this from scratch 😀
Excellent video! Keep up the good work.
Thank you for detailed, informative content 10/10
If we are utilizing AI for anything related to cold callers, it should be working on how to eradicate them.
can you also use AI to not have double audio when playing videos? would help. but right, what the world now needs is more cold robocalls to sell sh1t :D nah, that sucks.
that intro was gold
*This is god send for the indian 🇮🇳 economy, now we willvbe able to 200x our call centres and the call will sound a lot more professional.*
Okay, let's break this down step-by-step:
Given information:
- Usage: 240 hours
- Transcription provider: Deepgram ($0.01/min or $0.60/hr)
- Voice provider: ElevenLabs ($0.04/min or $2.40/hr)
- Model provider: GPT-3.5 ($0.02/min or $1.20/hr)
Step 1: Calculate the total minutes of usage.
Total minutes = 240 hours x 60 minutes/hour = 14,400 minutes
Step 2: Calculate the cost for transcription.
Transcription cost = $0.01/min x 14,400 minutes = $144
Step 3: Calculate the cost for voice.
Voice cost = $0.04/min x 14,400 minutes = $576
Step 4: Calculate the cost for the model.
Model cost = $0.02/min x 14,400 minutes = $288
Step 5: Calculate the total cost.
Total cost = Transcription cost + Voice cost + Model cost
Total cost = $144 + $576 + $288 = $1,008
Therefore, the total cost for 240 hours of usage with Deepgram for transcription, ElevenLabs for voice, and GPT-3.5 for the model will be $1,008.
so... 1008usd a month for 8 hours per day, is it cheaper than hiring someone?
thx man
fu.....king yes by the way..
of course its cheaper wtf
@@N7Tonik well... that was previous way of doing things... you don't know how cheaper today, (october 2024) with streaming API we don't have to use other models with almost 100 ms latency too...
cheaper, faster, grounded to knowledge base, able do function calls...
Do another application that shows real-time speed to do something that no body expects. Maybe like super fast poker bot/trading analysis/ etc something that is massive but done in 1 sec
amazing stuff 💯
grok grok groq, groq groq groq. how did you build the whatsapp integration on relevanceAI? i dont see the option
groque :)
I loved your video!
Eleven labs conversational voice is so good for this. You should do it.
The jump cut at the end make me question how responsive this is. After every question, time was trimmed to not show the lagged AI response.
Well done. It could be helpfull for custommer support actions
Amazing!
Awesome Video
Great explanation and example. Thank you very much.
Interesting ai, gonna give it a whirl on monday with my turbo api keys
This workflow is insane for CRM.
That sounded JUST like you were talking to a real person! 😮
did it f
lol no it didn't
No
@@fredfred2363😋
25:42 watch the clock on the phone - several seconds of delay was edited out.
it's an interesting setup, but this is still far from the experience that's currently possible with an LLM.
I don't think LLMs are useful for this use-case, and probably won't be in the near future.
if you want something that's going to be natural to talk to, it needs to be trained and optimized specifically for real time conversations - Google had some tech some years back and briefly released some videos demonstrating real time conversations with an AI that was actually built for it, it was very convincing, would even interject "uhm" and "hmm" like a person. It was never released, so either they realized this was too likely to get abused, or they faked the videos.
I think it could be done - but not with an LLM. It's not as simple as just making them faster - people interrupt each other in real conversations, they make sounds of acknowledgement just to let you know they're listening, lots of behaviors that would throw off an algorithm... LLMs were just not designed to work this way.
Love this Jason, keep'em coming !!
I loved the Crysis reference hahaha
At 25:17 you can the video is trimmed, that means the Ai is kinda slow that you cut some frames from the video, You betrayed me jason
Please don't skip the part where you wait for a response on the call...
good demo
Wait until someone hooks the “Nigerian Prince” AI model. That's going to revolutionize scamming.
For functions, you can just write your own “AI function” similar to Marvin AI like we did in the Rust Auto GPT udemy course. So even though it’s not supported yet, we should be able to take a “hacky” approach
do you know any AI service that can fully interact with a browser, meaning using doing amazon reserach for me
Still does not quite sound human. needs more variable pacing, volume, and emotion.
Have you spoken to some call centre people, eleven labs api..
Hey Jason thank you for sharing ! Any ressources on connecting relevance ai to WhatsApp Business?
The problem with Groq is that none of the LLMs it supports can handle real-life situations. They are inconsistent and generate mixed results, especially if you ask for the output in JSON. Even if you break down the problem into multiple steps, getting consistent results on each step is difficult.
If anyone has a suggestion, let me know.
Nobody picks up unknown cold calls anymore lol. The real use-case is in fast-food orders, where ppl prefer AI over Migranteese. E.g. Carls Jr.
The demo at 13:29 is a bit silly; there's nearly 30 seconds between when the image is uploaded and the outputs being shown, during which the server could be preemptively generating them. Not saying it's faked but it's the type of thing a company might do to make their method seem more impressive.