How to Build an AI Voice Agent using OpenAI Realtime API
HTML-код
- Опубликовано: 7 окт 2024
- In this video, I will show you how to build and deploy an AI Voice Agent using OpenAI's new Realtime API (takes 10 min!). This agent will take bookings and send data to Make.com where you can then run any of your other automations. I give you the full code in from my Github Repo. I also show you step-by-step how to set up Replit and how to deploy on Replit so it's always live. I also show you how to plug in Twilio so you can have a phone number that calls your AI agent. I also show you how to connect Make.com. This is a beginner friendly tutorial.
Want me to build this for you?
👉 Contact me on bart@supportlaunchpad.com
🗂️ Github repo: github.com/Bar...
This is a good demo - looking forward to part 2.
🙏🙏
Thank you for sharing, Bart! Amazing! I'm already waiting for part2! Keep going.
Thank you my man 🙏 Will make part 2 soon!
Thank you so much! learning to code, and got it hard with intergrating twilio. You're video created my day!
thank you legend 🙏
So good, thank you for sharing. Subbed and looking forward to rag and function call future videos!
awesome! Thank you :)
Amazing content, man! Please do another one with RAG and function calling
you got it 🤝
yes we would love to see more of these kind of videos
you got it! will work on more :)
Great project. Thank you for that! It would be great to see a part 2 with bidirectional connection to any calendar. I'm also waiting for better version of real API since this version cannot be used at least for czech language (making a lot of mistakes).
Díky! Interesting about czech not being so good yet, but yeah I bet it will improve soon. Keep at it legend :)
As instructed, I liked this content, turned me into a new Sub. Standing by for a part 2-100 🙂Thank you, sir, for the education and value bombs you are dropping.💪🗿🔥🦅👊
my good man, thank you for the support 👊
Thank you bro, great content! subscribed ✅
thank you legend 🤝
This is quite interesting to see OpenAI releasing in 2024, a technology Google demonstrated in Google IO 2017, and it was called Duplex, where an AI was at that time able to pass a phone call and was sounding so real. Google never released that API to the masses and is again late to the show in 2024.
wow I didn't even know this. Lucky there are other companies bringing out cool stuff and releasing to the public 💪
Hi Bart! Excellent tutorial. I have a question about Replit. How does the pricing work? I'm currently working with the Assistants API. Thank you!
thank you :) So you pay $25 a month (month to month plan) and you $10 in credits each month. If you're just starting out with Replit, I don't think you'l go over this limit. I've been using Replit for like a year now and have deployed lots of things, lots of testing, and have not yet gone over. I think if you get lots and lots of users then you'll use those credits up quickly. Hope this helps legend!
@@BartSlodyczka Any alternatives to Replit?
Hey man thanks for the video, I would like to know if we can use a platform other than twilio to do this, because Twilio does not support many countries?
great question, I think so but I haven't looked into it yet. What other platforms do you know that support more countries?
Great video! Is it possible to also use the OpenAI voice assistant for the initial greetings message? I don't like the switch between the twilio tts voice and the openai realtime voice.
Found it:
- remove the `` tag from the twilio stream connection
- change openAi ws open listener:
```
openAiWs.on("open", () => {
console.log("Connected to the OpenAI Realtime API");
setTimeout(sendSessionUpdate, 250);
setTimeout(() => {
openAiWs.send(JSON.stringify({
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [
{
type: "input_text",
text: "Hello!",
},
],
},
}));
openAiWs.send(JSON.stringify({ type: "response.create" }));
}, 500);
});
```
Very good video! When I was testing with Twilio's dev phone, I found an issue. We are unable to interrupt the conversation directly, like we can when using OpenAI Realtime. How should this problem be resolved?
Yes I think Twilio is working on this ATM. I'll see if I can find more info :)
Nice work! I have been wondering how to have a phone number be used for stuff like this! Thanks
this is awesome to hear :) thank you!
thank you! this is my first JS code and it is working. Tried to rework it in russian) works pretty well but first message read wtih heavy accent) how can i change system message? i guess it takes it from my accont - default message
onya mate, node, webhooks, whisper transcripts, logging, right URLs, deploying, live! boom! we're already in your debt... sweet as! I think most RAG implementations are in python may not need if less tha n 250 pages of text just need a large context window for an outfit like Bert's automotive! I did RAG: Beyond Basics from Prompt Engineer I strongly recommend it!
thank you legend! excellent recommendations, hooroo 💪
Super cool brother, i am making such cool things with chat gpts text api, I can't wait you try this! I don't even know how to code and i can do this!
thank you my man, this comment makes me so happy 💪
thank you! I want to see more!
wicked - will whip something up soon 💪
Unbelievable great code
💪💪
Very cool! Quick question. Were you able to get the barge in to work in your version. The Twilio version I tried I was unable to and the Twilio author stated it was a know issue and they are looking into it.
thanks my man! I haven't tried to do barge in yet, but if twilio said its a known issue then maybe it's not possible just yet? but I imagine they'd fix it quickly considering they are the main partner for voice integration into the realtime api. I'll probably make another video with more features in the coming days and I'll suss out the barge stuff too 💪
@@BartSlodyczka I'm hoping so. Looking forward to your functions and rag videos! 👍
I would like to use local microphone or iphone app to talk to the local PC server so that way we can skip calling/using Twilio? Any recommendation how?
thanks for sharing. i was wondering instead of it being a phonecall, how can the realtime api be accesed by pushing a button on a app like thats made with react native? That way it also can serve other usecases and the audio is better.
very interesting idea 🤔
thanks man highly appreciated
Thank you my man 💪
This is very impressive
thank you 🙏
This is great! Any ideas how to use phone numbers for other countries? I have an online store located in Ukraine. It would be great to receive incoming calls when no one is in the office.
Thanks! Great question, I'll suss it out and see if I can have some solutions for my next vid :)
Hy Bart . What About Outbound Calls , Can We Also Handle Outbound Calls Using RealTime API ?
I haven't tried yet but I feel like yes, I'll look into it and make a follow up vid if i figure it out :)
Great content.
thank you 🙏
If I may ask, so for the openai API, it costs $3 / 10-minute of call?
Imagine a business handling on average 10-minute inbound call with 1,000 of people🤦♂️
Today, tomorrow it will be cheaper, and the day after cheaper than a person.
@@thomasjamesbailey1209 Thank you for your answer my grandma knows. I was just clarifying the pricing "in the moment", not tomorrow, the day after, or hundred of years later.
I think $3 per 10-minute call is still cheap, considering all the costs and operations that go into hiring someone. Costs: salary + medical/ salary taxes + subscription costs (ie the business uses SAAS products and each person needs a seat) + sick days + etc. Operations: hiring + training + need a team manager + etc. From a cost and operations POV - I think business owners would be happy to pay considering how easy it is and how little overhead they have. Hope this kind of context helps :)
@@BartSlodyczka $3K will pay 1000 customer service in my country Indonesia for 10 hours. And Indonesia is even #16 in gdp with income per capita around $5K.
You can literally can pay $3 for 100 human customer service for talking for 10-minute in some poor country.
When I say expensive, it means globally. Not in the US with #1 GDP.
@@elpablitorodriguezharrera Very good points. At $3 per 100 human * 10 min this is 16.67 hours. Or 18c per hour. Now I see your point. I guess it then comes down to the business and where the employees are located. Either way, appreciate the time taken to explain your point, I learned something new 🤝
How many paid-for services were needed to support this app?
Need to pay for OpenAI API, otherwise Replit, Twilio, and Make.com you can start for free
@@BartSlodyczka Tx for that! TBH you do seem to need to be an existing experienced web dev to handle this stuff. (I'm an embedded realtime dev .. a very different world)
@@coldlyanalytical1351 If I can do it, I believe you can do it too 💪 I started learning how to code and dev around 1.5 years ago (started from absolutely zero) and I'd attribute my core success to (1) just believing I could do it and (2) lots of practice and ChatGPT prompting. You already have the foundational skill set and are probably a lot closer to becoming experienced in this area than you think. You got this, you are legend 🤝
Thank you Bart! Great explanation, made it easy to understand. I’m waiting for your next videos to see how you implement KBs and functions ❤
@@Omri.Tal. thank you legend 💪 will get the next vid out soon :)
thanks :)
🙏🙏
Does the AI agent have the ability to remember returning callers?
this is a 10/10 suggestion holy shmoly. Will look into this for the next vid. WOW
can you make it let OpenAI bot to talk first, twilio's greeting sound is different openai sound
great idea! will look into this :)
More pls
will do 🤝
can you add function call to the bot?
will do for the next vid 🚀
how much does it cost per minute on average?
On average it costs around $0.06 per minute for audio input and $0.24 per minute for audio output, so $0.30 per minute if you're using both audio input and output
@@BartSlodyczka i know the openai website says that. But i was asking how much did it cost you in your demos. 0.30/minute doesn't seem realistic as you will not have 2 person speaking at same time. I mean at any given time the ai will be either listening or speaking. Not doing both. What do you say
Idk why it does this but why, no matter how much I try to prompt/tweak parameters, does the API voice sound so monotone and bad at taking speech directions compared to the ChatGPT voices?
Yeah I agree, right now it's not the best sounding, but I'm sure in time it will get better. when it does, we will be ready 💪
@@BartSlodyczka fr. In reality this will likely birth an entirely new protocol/ web framework. Feels like an early days of the internet era where things are being bootstrap established for the first time ever
@@jalengonel such an exciting time man, such an exciting time
gj
thank you 💪
Thats a similar tutorial that twilio posted a few days ago but its going to cost some money to be run in production
Yeah Twilio had a great tutorial and this is very similar :)