Part 1: How to Build an AI Voice Agent using OpenAI Realtime API
HTML-код
- Опубликовано: 9 фев 2025
- WATCH PART 2: • Part 2: How to Build a...
WATCH PART 3: • Part 3: How to Build a...
In this video, I will show you how to build and deploy an AI Voice Agent using OpenAI's new Realtime API (takes 10 min!). This agent will take bookings and send data to Make.com where you can then run any of your other automations. I give you the full code in from my Github Repo. I also show you step-by-step how to set up Replit and how to deploy on Replit so it's always live. I also show you how to plug in Twilio so you can have a phone number that calls your AI agent. I also show you how to connect Make.com. This is a beginner friendly tutorial.
🚀 Sign up to Replit using my link: replit.com/ref...
📺 Watch the ENTIRE series: • OpenAI Realtime API Vo...
📺 AI SMS Assistant: • How to Build an Advanc...
📋 Take This Quick Survey: forms.gle/otAr...
🛠️ Need this built? Contact: bart@supportlaunchpad.com
🗂️ Github repo: github.com/Bar...
👉 LinkedIn: / bartlomiejslodyczka
Learn AI & Coding:
Try Scrimba's AI Engineer course (20% off Pro plan with my link):
v2.scrimba.com...
Other related videos: • Exploring OpenAI's New...
#openai #realtimeapi #maketutorial #replit
Note: Affiliate links support this channel through commissions.
📺 Watch Part 2: ruclips.net/video/ffDm4HVGuTM/видео.htmlsi=W1nfLYgj3zsQ0RWW
📺 Watch Part 3: ruclips.net/video/oQtBwhRLrT4/видео.htmlsi=o56i5609Zp8Ko3eG
🗂 Github repo: github.com/Barty-Bart/openai-realtime-api-voice-assistant
5x NEW VOICES just released: ruclips.net/video/PTCpw1Y9HOQ/видео.htmlsi=roHjjllMKNHNzLGu
📺 AI SMS Assistant: ruclips.net/video/HYPw8TfL2Pg/видео.htmlsi=CVAzhuQzsXH5T2Wa
📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7
Yes Please! Looking forward to the next episode of your AI Voice Agent Build! Thanks for your effort in making this Vid, Bart!
thank you :) appreciate that! next vid will be out by end of this week :)
This is a good demo - looking forward to part 2.
🙏🙏
Thank you for sharing, Bart! Amazing! I'm already waiting for part2! Keep going.
Thank you my man 🙏 Will make part 2 soon!
Greay video. I was doing a similar thing here with VAPI. VAPI is more complicated, but it sounds way more realistic. This one sounds very robotic. It was eye opening for me that you created an assistant in a completely different way.
So many possibilities out there I'm also often surprised :)
Really good, looking forward to PT2, hopefully we get it soon :)
thanks! coming out end of week :)
Superb demo Bart, you're one of the best in the game right now
Thanks Vi-Lo 🤼♂️
This works great and the set up was a breeze, Thank You!
noice!!!
Thank you so much! learning to code, and got it hard with intergrating twilio. You're video created my day!
thank you legend 🙏
Can't wait for part 2 !
will be ready by end of week :)
yes we would love to see more of these kind of videos
you got it! will work on more :)
So good, thank you for sharing. Subbed and looking forward to rag and function call future videos!
awesome! Thank you :)
This is so extremely useful. Thank you!
thanks!!
Legend. Can wait to see more about it!
hell yeah!!
Amazing content, man! Please do another one with RAG and function calling
you got it 🤝
amazing stuff. waiting for v2
thanks! will be out by end of week :)
I have openAI accounts with credits i can sell you at 50% off the credit value.
As instructed, I liked this content, turned me into a new Sub. Standing by for a part 2-100 🙂Thank you, sir, for the education and value bombs you are dropping.💪🗿🔥🦅👊
my good man, thank you for the support 👊
Great project. Thank you for that! It would be great to see a part 2 with bidirectional connection to any calendar. I'm also waiting for better version of real API since this version cannot be used at least for czech language (making a lot of mistakes).
Díky! Interesting about czech not being so good yet, but yeah I bet it will improve soon. Keep at it legend :)
Thank you. Great video. Super helpful.
thanks legend 🤝
Thank you very much, good stuff and very helpful!
thanks man!!
This is quite interesting to see OpenAI releasing in 2024, a technology Google demonstrated in Google IO 2017, and it was called Duplex, where an AI was at that time able to pass a phone call and was sounding so real. Google never released that API to the masses and is again late to the show in 2024.
wow I didn't even know this. Lucky there are other companies bringing out cool stuff and releasing to the public 💪
Nicely explained. Thank you
Thanks legend!
Excellent video. Looking forward to enhancements
thank you very much :) I have Part 2 and Part 3 out on my channel that you can watch 💪
Awesome man! Thank you!
thanks legend!
Great video!
thanks legend!
Thank you bro, great content! subscribed ✅
thank you legend 🤝
Thanks, its super helpful.
my pleasure :) you should watch the part 2!
Thanks! this is a good demo of capabilities. When is part 2 coming out?
thanks! Part 2 coming out later today :)
Would be awesome to see RAG and function calls! Please do this!
I've got a series on this that you should check out: ruclips.net/p/PLi7jtY2ZZqRYE8Lvw4MuLHTZPYTA4jZHQ
onya mate, node, webhooks, whisper transcripts, logging, right URLs, deploying, live! boom! we're already in your debt... sweet as! I think most RAG implementations are in python may not need if less tha n 250 pages of text just need a large context window for an outfit like Bert's automotive! I did RAG: Beyond Basics from Prompt Engineer I strongly recommend it!
thank you legend! excellent recommendations, hooroo 💪
thank you! I want to see more!
wicked - will whip something up soon 💪
Nice work, thanks. What do you suggest to change for a different input / output language communication?
I would edit the main prompt. If you watch the later videos in this series (go to the playlist in video description) you'll see a video where you can add a custom first message into the agent upon starting a new call. You can also set language here if you like
What about the privacy matters on an integration like this? Do you have any details at all? Like for instance, this service is a paid service, that means my data will be held or not?
Always safe to assume it will be held!
Great video! Is it possible to also use the OpenAI voice assistant for the initial greetings message? I don't like the switch between the twilio tts voice and the openai realtime voice.
Found it:
- remove the `` tag from the twilio stream connection
- change openAi ws open listener:
```
openAiWs.on("open", () => {
console.log("Connected to the OpenAI Realtime API");
setTimeout(sendSessionUpdate, 250);
setTimeout(() => {
openAiWs.send(JSON.stringify({
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [
{
type: "input_text",
text: "Hello!",
},
],
},
}));
openAiWs.send(JSON.stringify({ type: "response.create" }));
}, 500);
});
```
Yeah it is, figured out a golden nugget for this, video 2 coming end of this week!
ah you got it anyway!!! nice
Unbelievable great code
💪💪
Thanks for such a wonderful tutorial ❤️.
I am wonder if it is possible to check the availability on google calendar before booking. Is this going to replave vapi or we can use real-time api within vapi or other similar platforms❤
thank you legend :) In the Part 2 and Part 3 videos I explain how to connect the AI caller to make.com. And from within make.com you can connect to google calendar modules. If you watch those vids I also give you the make.com blueprint to get you started 💪 I think Open AI maybe wont replace platforms like Vapi, but it will be a great alternative options. Good to learn it :) Keep it up man!
Great video. Thanks very much! I'm super curious how it would be possible to add RAG support and how well it would work with getting high quality output and low enough latency.
thanks! check out part 2 with RAG here: ruclips.net/video/ffDm4HVGuTM/видео.htmlsi=zyOJMMPYuiY2rdSZ
This is very impressive
thank you 🙏
thanks man highly appreciated
Thank you my man 💪
Hey man thanks for the video, I would like to know if we can use a platform other than twilio to do this, because Twilio does not support many countries?
great question, I think so but I haven't looked into it yet. What other platforms do you know that support more countries?
Super cool brother, i am making such cool things with chat gpts text api, I can't wait you try this! I don't even know how to code and i can do this!
thank you my man, this comment makes me so happy 💪
Great video, the arrow pointer for demonstration is very cool, how do you do that?
thanks man! It's a mac app called "DemoPro - Screen Annotation"
thanks for sharing. i was wondering instead of it being a phonecall, how can the realtime api be accesed by pushing a button on a app like thats made with react native? That way it also can serve other usecases and the audio is better.
very interesting idea 🤔
@@BartSlodyczka yeah, i found swift and kotlin implementations of realtime api but im still searching for react native implementation. do you know how to do that? the component that needs to be changed is the websocket to make it compatible with mobile
Great content.
thank you 🙏
Very cool! Quick question. Were you able to get the barge in to work in your version. The Twilio version I tried I was unable to and the Twilio author stated it was a know issue and they are looking into it.
thanks my man! I haven't tried to do barge in yet, but if twilio said its a known issue then maybe it's not possible just yet? but I imagine they'd fix it quickly considering they are the main partner for voice integration into the realtime api. I'll probably make another video with more features in the coming days and I'll suss out the barge stuff too 💪
@@BartSlodyczka I'm hoping so. Looking forward to your functions and rag videos! 👍
Nice work! I have been wondering how to have a phone number be used for stuff like this! Thanks
this is awesome to hear :) thank you!
Hi Bart,
How are you doing?
this project is amazing and I badly trying to connect that with my Yeastar S20 sip device.
But as now no luck (
Do you have any suggestion?
Hey my man, sorry I don't have any suggestions, I don't know what a Yeastar S20 sip device is and I don't use one 🙏
Many thanks!
I have bought a local(Israeli) phone from twillo, i try to connect it to Vapi but i wont come through...had you encountered in some issues like this?
I haven't experienced anything like this, but also I don't use Vapi, sorry man!
Would love to see function calls - trying to call a fine tuned model
Done! Part 2 with function calls coming out today :)
Hello, everyone it has been a long time, and I have experienced an AI voice conversation, which is very good for children in learning, and they are young onces.
so great to hear :)
Ive got one doubt Ive followed all steps to install the application IVR but its not executed. And i don’t have premium subscription for Replit And Open AI is that giving the problem? Can you please help me out
I don't think you need a premium subscription to Replit, but you might need one for OpenAI. Go to your OpenAI account and check in your settings which models you have available, look for realtime api. The next problem will be if you are Tier 0 or Tier 1, you might have usage restrictions which will stop this from working too. Hope this helps :)
I am trying to test the IVR it shows that error on chat gpt and chat gpt is not responding is it states that i need premium version
Hi Bart! Excellent tutorial. I have a question about Replit. How does the pricing work? I'm currently working with the Assistants API. Thank you!
thank you :) So you pay $25 a month (month to month plan) and you $10 in credits each month. If you're just starting out with Replit, I don't think you'l go over this limit. I've been using Replit for like a year now and have deployed lots of things, lots of testing, and have not yet gone over. I think if you get lots and lots of users then you'll use those credits up quickly. Hope this helps legend!
@@BartSlodyczka Any alternatives to Replit?
I would like to use local microphone or iphone app to talk to the local PC server so that way we can skip calling/using Twilio? Any recommendation how?
haven't played around with local mic yet, but i have seen other tutorials where they might be doing this. let me know how you go?
Is there a way to add a custom voice ?
I don't think so at this stage
@ is there any other way by using any library? I have an saas app idea in mind
Can I use AWA Lambda instead of replic?😊
Yes 100% you can! I haven't used AWS before but I'm sure you can relatively easily convert to AWS
@@BartSlodyczka thanks bro I’ll try it there!
Hy Bart . What About Outbound Calls , Can We Also Handle Outbound Calls Using RealTime API ?
I haven't tried yet but I feel like yes, I'll look into it and make a follow up vid if i figure it out :)
Awesom!
Thanks!
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, Great video!
thank you! 💪
How do you manage to end the phone call? And close the websocket connection.
when you hang up the call the websocket will close :)
Great video!!
Is this can work on more languages than english?
thanks! yes, just make the prompt in your chosen language and speak in your chosen language -- your responses will be in that language too :)
Would you please consider making a video like this but in python as well?
I'll keep this in mind, thanks for the recommendation 🙏
What would be the avg api cost? With an assumption that the calls could be in 100000 of min in a day/ month?
I haven't done any cost tests yet, but openai says roughly 30c per minute
@@BartSlodyczka smart pricing , very expensive , not affordable by small biz..
thank you! this is my first JS code and it is working. Tried to rework it in russian) works pretty well but first message read wtih heavy accent) how can i change system message? i guess it takes it from my accont - default message
nice work man! i haven't looked intot he default message yet, but the system prompt does the trick for me atm. In the next vid I will upgrade my system prompt. keep up the good work man!
@@BartSlodyczka will it work with other services webhook?
can you add function call to the bot?
will do for the next vid 🚀
Does the AI agent have the ability to remember returning callers?
this is a 10/10 suggestion holy shmoly. Will look into this for the next vid. WOW
This is great! Any ideas how to use phone numbers for other countries? I have an online store located in Ukraine. It would be great to receive incoming calls when no one is in the office.
Thanks! Great question, I'll suss it out and see if I can have some solutions for my next vid :)
can you make it let OpenAI bot to talk first, twilio's greeting sound is different openai sound
great idea! will look into this :)
thanks :)
🙏🙏
how much does it cost per minute on average?
On average it costs around $0.06 per minute for audio input and $0.24 per minute for audio output, so $0.30 per minute if you're using both audio input and output
@@BartSlodyczka i know the openai website says that. But i was asking how much did it cost you in your demos. 0.30/minute doesn't seem realistic as you will not have 2 person speaking at same time. I mean at any given time the ai will be either listening or speaking. Not doing both. What do you say
I copied this step by step but my assistant just does the welcome message and then hangs up. No error messages. Anyone else?
Were you able to sort this out? If you copied step by step then the code should be all good. I would check (1) do you have funds in your twilio account and does your twilio number allow calls (2) did you pass the correct replit URL into the Twilio webhook configuration? IE if you deploy your replit code, it is a different URL to when you test the replit code in development mode. LMK how you go!
Great tutorial! I am trying to get it to work better with interruptions (I want to be able to cut into the reply or correct something that was wrong), but it does not seem to respond to that? What am I missing? Found this in a forum : "To whoever is reading this in the future, I found a solution. Here is how I implemented it into my code: if response["type"] == "input_audio_buffer.speech_started": print('Speech Start:', response['type']) # Clear Twilio buffer …"
hey legend, someone posted a comment in my new Part 2 video and it had this code:
"Here is a simple update i made, to make the ai stop talking when the user is talking you need to add this:
if (response.type === "input_audio_buffer.speech_started")
{
console.log("Speech Start:", response.type);
// Clear any ongoing speech on Twilio side
connection.send(
JSON.stringify({
streamSid: streamSid,
event: "clear",
})
);
console.log("Cancelling AI speech from the server");
// Send interrupt message to OpenAI to cancel ongoing response
const interruptMessage = {
type: "response.cancel",
};
openAiWs.send(JSON.stringify(interruptMessage));
}"
You can prob throw the full code into chatGPT, then give it the above snippet, and ask it to insert it. Hope this helps :)
Idk why it does this but why, no matter how much I try to prompt/tweak parameters, does the API voice sound so monotone and bad at taking speech directions compared to the ChatGPT voices?
Yeah I agree, right now it's not the best sounding, but I'm sure in time it will get better. when it does, we will be ready 💪
@@BartSlodyczka fr. In reality this will likely birth an entirely new protocol/ web framework. Feels like an early days of the internet era where things are being bootstrap established for the first time ever
@@jalengonel such an exciting time man, such an exciting time
Can you please do in python as well
interesting! I might do this in the coming weeks :)
If I may ask, so for the openai API, it costs $3 / 10-minute of call?
Imagine a business handling on average 10-minute inbound call with 1,000 of people🤦♂️
Today, tomorrow it will be cheaper, and the day after cheaper than a person.
@@thomasjamesbailey1209 Thank you for your answer my grandma knows. I was just clarifying the pricing "in the moment", not tomorrow, the day after, or hundred of years later.
I think $3 per 10-minute call is still cheap, considering all the costs and operations that go into hiring someone. Costs: salary + medical/ salary taxes + subscription costs (ie the business uses SAAS products and each person needs a seat) + sick days + etc. Operations: hiring + training + need a team manager + etc. From a cost and operations POV - I think business owners would be happy to pay considering how easy it is and how little overhead they have. Hope this kind of context helps :)
@@BartSlodyczka $3K will pay 1000 customer service in my country Indonesia for 10 hours. And Indonesia is even #16 in gdp with income per capita around $5K.
You can literally can pay $3 for 100 human customer service for talking for 10-minute in some poor country.
When I say expensive, it means globally. Not in the US with #1 GDP.
@@elpablitorodriguezharrera Very good points. At $3 per 100 human * 10 min this is 16.67 hours. Or 18c per hour. Now I see your point. I guess it then comes down to the business and where the employees are located. Either way, appreciate the time taken to explain your point, I learned something new 🤝
More pls
will do 🤝
🤗
Hey Bart! Great Video, but unfortunately it is not working for me. I am using replit free plan. My OpenAI Account allow Realtime API requests and i already bought a number on twilio. Every time when i try to call it, it is busy. Can you help me out?
Same here, I have a free openai account and got the api key from there, do tell me if you found a way to tackle this. Thanks
Hmm, I think I would look at your usage limits for the realtime api model in your account. I recently worked with a client who had a Tier 0 or Tier 1 account and their usage was so low that the caller wouldn't work. Only after they went to Tier 2 or Tier 3 did it work. So give that a go, upgrade your account to allow more usage and that should be it. Hope this helps 🙏
gj
thank you 💪
Got it working great, and modified the code to create a HAZMAT advisor for my fire department. Has anyone hooked this up to MS teams or a zoom number?
So awesome! I haven't hooked up to MS teams or Zoom so will be interested to see if others have :)
You have discord? if i want to learn a little more?
yes but email is better - bart@supportlaunchpad.com
Thats a similar tutorial that twilio posted a few days ago but its going to cost some money to be run in production
Yeah Twilio had a great tutorial and this is very similar :)
Hello Bart. This is very good. I'm getting the bot to answer and only says the greeting . I'm getting this error Starting transcript processing for session session_1737809170453...
Starting ChatGPT API call...
Disconnected from the OpenAI Realtime API
ChatGPT API response status: 404
Full ChatGPT API response: {
"error": {
"message": "The model `gpt-4o-2024-08-06` does not exist or you do not have access to it.",
"type": "invalid_request_error",
"param": null,
"code": "model_not_found"
}
}
Raw result from ChatGPT: {
"error": {
"message": "The model `gpt-4o-2024-08-06` does not exist or you do not have access to it.",
"type": "invalid_request_error",
"param": null,
"code": "model_not_found"
}
}
Unexpected response structure from ChatGPT API
Looks like the error is saying the model you are using doesn't exist or you don't have access to it. ("message": "The model `gpt-4o-2024-08-06` does not exist or you do not have access to it."). Make sure on your openai account you have access to this model. Or, if this is outdated, find the model that supports realtime api. And then update the code to use that model :)