📺 Watch Part 2: ruclips.net/video/ffDm4HVGuTM/видео.htmlsi=W1nfLYgj3zsQ0RWW 📺 Watch Part 3: ruclips.net/video/oQtBwhRLrT4/видео.htmlsi=o56i5609Zp8Ko3eG 🗂 Github repo: github.com/Barty-Bart/openai-realtime-api-voice-assistant 5x NEW VOICES just released: ruclips.net/video/PTCpw1Y9HOQ/видео.htmlsi=roHjjllMKNHNzLGu 📺 AI SMS Assistant: ruclips.net/video/HYPw8TfL2Pg/видео.htmlsi=CVAzhuQzsXH5T2Wa 📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7
Great project. Thank you for that! It would be great to see a part 2 with bidirectional connection to any calendar. I'm also waiting for better version of real API since this version cannot be used at least for czech language (making a lot of mistakes).
onya mate, node, webhooks, whisper transcripts, logging, right URLs, deploying, live! boom! we're already in your debt... sweet as! I think most RAG implementations are in python may not need if less tha n 250 pages of text just need a large context window for an outfit like Bert's automotive! I did RAG: Beyond Basics from Prompt Engineer I strongly recommend it!
As instructed, I liked this content, turned me into a new Sub. Standing by for a part 2-100 🙂Thank you, sir, for the education and value bombs you are dropping.💪🗿🔥🦅👊
This is quite interesting to see OpenAI releasing in 2024, a technology Google demonstrated in Google IO 2017, and it was called Duplex, where an AI was at that time able to pass a phone call and was sounding so real. Google never released that API to the masses and is again late to the show in 2024.
Hey man thanks for the video, I would like to know if we can use a platform other than twilio to do this, because Twilio does not support many countries?
Very cool! Quick question. Were you able to get the barge in to work in your version. The Twilio version I tried I was unable to and the Twilio author stated it was a know issue and they are looking into it.
thanks my man! I haven't tried to do barge in yet, but if twilio said its a known issue then maybe it's not possible just yet? but I imagine they'd fix it quickly considering they are the main partner for voice integration into the realtime api. I'll probably make another video with more features in the coming days and I'll suss out the barge stuff too 💪
Great video. Thanks very much! I'm super curious how it would be possible to add RAG support and how well it would work with getting high quality output and low enough latency.
Great video! Is it possible to also use the OpenAI voice assistant for the initial greetings message? I don't like the switch between the twilio tts voice and the openai realtime voice.
thank you :) So you pay $25 a month (month to month plan) and you $10 in credits each month. If you're just starting out with Replit, I don't think you'l go over this limit. I've been using Replit for like a year now and have deployed lots of things, lots of testing, and have not yet gone over. I think if you get lots and lots of users then you'll use those credits up quickly. Hope this helps legend!
Many thanks! I have bought a local(Israeli) phone from twillo, i try to connect it to Vapi but i wont come through...had you encountered in some issues like this?
Thanks for such a wonderful tutorial ❤️. I am wonder if it is possible to check the availability on google calendar before booking. Is this going to replave vapi or we can use real-time api within vapi or other similar platforms❤
thank you legend :) In the Part 2 and Part 3 videos I explain how to connect the AI caller to make.com. And from within make.com you can connect to google calendar modules. If you watch those vids I also give you the make.com blueprint to get you started 💪 I think Open AI maybe wont replace platforms like Vapi, but it will be a great alternative options. Good to learn it :) Keep it up man!
Hi Bart, How are you doing? this project is amazing and I badly trying to connect that with my Yeastar S20 sip device. But as now no luck ( Do you have any suggestion?
thanks for sharing. i was wondering instead of it being a phonecall, how can the realtime api be accesed by pushing a button on a app like thats made with react native? That way it also can serve other usecases and the audio is better.
@@BartSlodyczka yeah, i found swift and kotlin implementations of realtime api but im still searching for react native implementation. do you know how to do that? the component that needs to be changed is the websocket to make it compatible with mobile
I would like to use local microphone or iphone app to talk to the local PC server so that way we can skip calling/using Twilio? Any recommendation how?
Ive got one doubt Ive followed all steps to install the application IVR but its not executed. And i don’t have premium subscription for Replit And Open AI is that giving the problem? Can you please help me out
I don't think you need a premium subscription to Replit, but you might need one for OpenAI. Go to your OpenAI account and check in your settings which models you have available, look for realtime api. The next problem will be if you are Tier 0 or Tier 1, you might have usage restrictions which will stop this from working too. Hope this helps :)
thank you! this is my first JS code and it is working. Tried to rework it in russian) works pretty well but first message read wtih heavy accent) how can i change system message? i guess it takes it from my accont - default message
nice work man! i haven't looked intot he default message yet, but the system prompt does the trick for me atm. In the next vid I will upgrade my system prompt. keep up the good work man!
Hello, everyone it has been a long time, and I have experienced an AI voice conversation, which is very good for children in learning, and they are young onces.
Great tutorial! I am trying to get it to work better with interruptions (I want to be able to cut into the reply or correct something that was wrong), but it does not seem to respond to that? What am I missing? Found this in a forum : "To whoever is reading this in the future, I found a solution. Here is how I implemented it into my code: if response["type"] == "input_audio_buffer.speech_started": print('Speech Start:', response['type']) # Clear Twilio buffer …"
hey legend, someone posted a comment in my new Part 2 video and it had this code: "Here is a simple update i made, to make the ai stop talking when the user is talking you need to add this: if (response.type === "input_audio_buffer.speech_started") { console.log("Speech Start:", response.type); // Clear any ongoing speech on Twilio side connection.send( JSON.stringify({ streamSid: streamSid, event: "clear", }) ); console.log("Cancelling AI speech from the server"); // Send interrupt message to OpenAI to cancel ongoing response const interruptMessage = { type: "response.cancel", }; openAiWs.send(JSON.stringify(interruptMessage)); }" You can prob throw the full code into chatGPT, then give it the above snippet, and ask it to insert it. Hope this helps :)
This is great! Any ideas how to use phone numbers for other countries? I have an online store located in Ukraine. It would be great to receive incoming calls when no one is in the office.
If I may ask, so for the openai API, it costs $3 / 10-minute of call? Imagine a business handling on average 10-minute inbound call with 1,000 of people🤦♂️
@@thomasjamesbailey1209 Thank you for your answer my grandma knows. I was just clarifying the pricing "in the moment", not tomorrow, the day after, or hundred of years later.
I think $3 per 10-minute call is still cheap, considering all the costs and operations that go into hiring someone. Costs: salary + medical/ salary taxes + subscription costs (ie the business uses SAAS products and each person needs a seat) + sick days + etc. Operations: hiring + training + need a team manager + etc. From a cost and operations POV - I think business owners would be happy to pay considering how easy it is and how little overhead they have. Hope this kind of context helps :)
@@BartSlodyczka $3K will pay 1000 customer service in my country Indonesia for 10 hours. And Indonesia is even #16 in gdp with income per capita around $5K. You can literally can pay $3 for 100 human customer service for talking for 10-minute in some poor country. When I say expensive, it means globally. Not in the US with #1 GDP.
@@elpablitorodriguezharrera Very good points. At $3 per 100 human * 10 min this is 16.67 hours. Or 18c per hour. Now I see your point. I guess it then comes down to the business and where the employees are located. Either way, appreciate the time taken to explain your point, I learned something new 🤝
Idk why it does this but why, no matter how much I try to prompt/tweak parameters, does the API voice sound so monotone and bad at taking speech directions compared to the ChatGPT voices?
@@BartSlodyczka fr. In reality this will likely birth an entirely new protocol/ web framework. Feels like an early days of the internet era where things are being bootstrap established for the first time ever
On average it costs around $0.06 per minute for audio input and $0.24 per minute for audio output, so $0.30 per minute if you're using both audio input and output
@@BartSlodyczka i know the openai website says that. But i was asking how much did it cost you in your demos. 0.30/minute doesn't seem realistic as you will not have 2 person speaking at same time. I mean at any given time the ai will be either listening or speaking. Not doing both. What do you say
Were you able to sort this out? If you copied step by step then the code should be all good. I would check (1) do you have funds in your twilio account and does your twilio number allow calls (2) did you pass the correct replit URL into the Twilio webhook configuration? IE if you deploy your replit code, it is a different URL to when you test the replit code in development mode. LMK how you go!
Hey Bart! Great Video, but unfortunately it is not working for me. I am using replit free plan. My OpenAI Account allow Realtime API requests and i already bought a number on twilio. Every time when i try to call it, it is busy. Can you help me out?
Hmm, I think I would look at your usage limits for the realtime api model in your account. I recently worked with a client who had a Tier 0 or Tier 1 account and their usage was so low that the caller wouldn't work. Only after they went to Tier 2 or Tier 3 did it work. So give that a go, upgrade your account to allow more usage and that should be it. Hope this helps 🙏
📺 Watch Part 2: ruclips.net/video/ffDm4HVGuTM/видео.htmlsi=W1nfLYgj3zsQ0RWW
📺 Watch Part 3: ruclips.net/video/oQtBwhRLrT4/видео.htmlsi=o56i5609Zp8Ko3eG
🗂 Github repo: github.com/Barty-Bart/openai-realtime-api-voice-assistant
5x NEW VOICES just released: ruclips.net/video/PTCpw1Y9HOQ/видео.htmlsi=roHjjllMKNHNzLGu
📺 AI SMS Assistant: ruclips.net/video/HYPw8TfL2Pg/видео.htmlsi=CVAzhuQzsXH5T2Wa
📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7
Nicely explained. Thank you
This is a good demo - looking forward to part 2.
🙏🙏
Yes Please! Looking forward to the next episode of your AI Voice Agent Build! Thanks for your effort in making this Vid, Bart!
thank you :) appreciate that! next vid will be out by end of this week :)
Thank you for sharing, Bart! Amazing! I'm already waiting for part2! Keep going.
Thank you my man 🙏 Will make part 2 soon!
yes we would love to see more of these kind of videos
you got it! will work on more :)
Superb demo Bart, you're one of the best in the game right now
Thanks Vi-Lo 🤼♂️
Can't wait for part 2 !
will be ready by end of week :)
Really good, looking forward to PT2, hopefully we get it soon :)
thanks! coming out end of week :)
This works great and the set up was a breeze, Thank You!
noice!!!
amazing stuff. waiting for v2
thanks! will be out by end of week :)
I have openAI accounts with credits i can sell you at 50% off the credit value.
Legend. Can wait to see more about it!
hell yeah!!
This is so extremely useful. Thank you!
thanks!!
Great video!
thanks legend!
So good, thank you for sharing. Subbed and looking forward to rag and function call future videos!
awesome! Thank you :)
Awesome man! Thank you!
thanks legend!
Thank you. Great video. Super helpful.
thanks legend 🤝
Thank you so much! learning to code, and got it hard with intergrating twilio. You're video created my day!
thank you legend 🙏
Excellent video. Looking forward to enhancements
thank you very much :) I have Part 2 and Part 3 out on my channel that you can watch 💪
Thanks, its super helpful.
my pleasure :) you should watch the part 2!
Amazing content, man! Please do another one with RAG and function calling
you got it 🤝
Thank you very much, good stuff and very helpful!
thanks man!!
thank you! I want to see more!
wicked - will whip something up soon 💪
Thank you bro, great content! subscribed ✅
thank you legend 🤝
Great project. Thank you for that! It would be great to see a part 2 with bidirectional connection to any calendar. I'm also waiting for better version of real API since this version cannot be used at least for czech language (making a lot of mistakes).
Díky! Interesting about czech not being so good yet, but yeah I bet it will improve soon. Keep at it legend :)
Great content.
thank you 🙏
Unbelievable great code
💪💪
This is very impressive
thank you 🙏
thanks man highly appreciated
Thank you my man 💪
onya mate, node, webhooks, whisper transcripts, logging, right URLs, deploying, live! boom! we're already in your debt... sweet as! I think most RAG implementations are in python may not need if less tha n 250 pages of text just need a large context window for an outfit like Bert's automotive! I did RAG: Beyond Basics from Prompt Engineer I strongly recommend it!
thank you legend! excellent recommendations, hooroo 💪
As instructed, I liked this content, turned me into a new Sub. Standing by for a part 2-100 🙂Thank you, sir, for the education and value bombs you are dropping.💪🗿🔥🦅👊
my good man, thank you for the support 👊
This is quite interesting to see OpenAI releasing in 2024, a technology Google demonstrated in Google IO 2017, and it was called Duplex, where an AI was at that time able to pass a phone call and was sounding so real. Google never released that API to the masses and is again late to the show in 2024.
wow I didn't even know this. Lucky there are other companies bringing out cool stuff and releasing to the public 💪
Thanks! this is a good demo of capabilities. When is part 2 coming out?
thanks! Part 2 coming out later today :)
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, Great video!
thank you! 💪
Would be awesome to see RAG and function calls! Please do this!
I've got a series on this that you should check out: ruclips.net/p/PLi7jtY2ZZqRYE8Lvw4MuLHTZPYTA4jZHQ
Super cool brother, i am making such cool things with chat gpts text api, I can't wait you try this! I don't even know how to code and i can do this!
thank you my man, this comment makes me so happy 💪
Hey man thanks for the video, I would like to know if we can use a platform other than twilio to do this, because Twilio does not support many countries?
great question, I think so but I haven't looked into it yet. What other platforms do you know that support more countries?
Very cool! Quick question. Were you able to get the barge in to work in your version. The Twilio version I tried I was unable to and the Twilio author stated it was a know issue and they are looking into it.
thanks my man! I haven't tried to do barge in yet, but if twilio said its a known issue then maybe it's not possible just yet? but I imagine they'd fix it quickly considering they are the main partner for voice integration into the realtime api. I'll probably make another video with more features in the coming days and I'll suss out the barge stuff too 💪
@@BartSlodyczka I'm hoping so. Looking forward to your functions and rag videos! 👍
Great video, the arrow pointer for demonstration is very cool, how do you do that?
thanks man! It's a mac app called "DemoPro - Screen Annotation"
Great video. Thanks very much! I'm super curious how it would be possible to add RAG support and how well it would work with getting high quality output and low enough latency.
thanks! check out part 2 with RAG here: ruclips.net/video/ffDm4HVGuTM/видео.htmlsi=zyOJMMPYuiY2rdSZ
Great video! Is it possible to also use the OpenAI voice assistant for the initial greetings message? I don't like the switch between the twilio tts voice and the openai realtime voice.
Found it:
- remove the `` tag from the twilio stream connection
- change openAi ws open listener:
```
openAiWs.on("open", () => {
console.log("Connected to the OpenAI Realtime API");
setTimeout(sendSessionUpdate, 250);
setTimeout(() => {
openAiWs.send(JSON.stringify({
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [
{
type: "input_text",
text: "Hello!",
},
],
},
}));
openAiWs.send(JSON.stringify({ type: "response.create" }));
}, 500);
});
```
Yeah it is, figured out a golden nugget for this, video 2 coming end of this week!
ah you got it anyway!!! nice
Hi Bart! Excellent tutorial. I have a question about Replit. How does the pricing work? I'm currently working with the Assistants API. Thank you!
thank you :) So you pay $25 a month (month to month plan) and you $10 in credits each month. If you're just starting out with Replit, I don't think you'l go over this limit. I've been using Replit for like a year now and have deployed lots of things, lots of testing, and have not yet gone over. I think if you get lots and lots of users then you'll use those credits up quickly. Hope this helps legend!
@@BartSlodyczka Any alternatives to Replit?
Many thanks!
I have bought a local(Israeli) phone from twillo, i try to connect it to Vapi but i wont come through...had you encountered in some issues like this?
I haven't experienced anything like this, but also I don't use Vapi, sorry man!
Thanks for such a wonderful tutorial ❤️.
I am wonder if it is possible to check the availability on google calendar before booking. Is this going to replave vapi or we can use real-time api within vapi or other similar platforms❤
thank you legend :) In the Part 2 and Part 3 videos I explain how to connect the AI caller to make.com. And from within make.com you can connect to google calendar modules. If you watch those vids I also give you the make.com blueprint to get you started 💪 I think Open AI maybe wont replace platforms like Vapi, but it will be a great alternative options. Good to learn it :) Keep it up man!
Would love to see function calls - trying to call a fine tuned model
Done! Part 2 with function calls coming out today :)
Hi Bart,
How are you doing?
this project is amazing and I badly trying to connect that with my Yeastar S20 sip device.
But as now no luck (
Do you have any suggestion?
Hey my man, sorry I don't have any suggestions, I don't know what a Yeastar S20 sip device is and I don't use one 🙏
Great video!!
Is this can work on more languages than english?
thanks! yes, just make the prompt in your chosen language and speak in your chosen language -- your responses will be in that language too :)
thanks for sharing. i was wondering instead of it being a phonecall, how can the realtime api be accesed by pushing a button on a app like thats made with react native? That way it also can serve other usecases and the audio is better.
very interesting idea 🤔
@@BartSlodyczka yeah, i found swift and kotlin implementations of realtime api but im still searching for react native implementation. do you know how to do that? the component that needs to be changed is the websocket to make it compatible with mobile
Nice work! I have been wondering how to have a phone number be used for stuff like this! Thanks
this is awesome to hear :) thank you!
I would like to use local microphone or iphone app to talk to the local PC server so that way we can skip calling/using Twilio? Any recommendation how?
haven't played around with local mic yet, but i have seen other tutorials where they might be doing this. let me know how you go?
Would you please consider making a video like this but in python as well?
I'll keep this in mind, thanks for the recommendation 🙏
Can I use AWA Lambda instead of replic?😊
Yes 100% you can! I haven't used AWS before but I'm sure you can relatively easily convert to AWS
@@BartSlodyczka thanks bro I’ll try it there!
How do you manage to end the phone call? And close the websocket connection.
when you hang up the call the websocket will close :)
thanks :)
🙏🙏
Ive got one doubt Ive followed all steps to install the application IVR but its not executed. And i don’t have premium subscription for Replit And Open AI is that giving the problem? Can you please help me out
I don't think you need a premium subscription to Replit, but you might need one for OpenAI. Go to your OpenAI account and check in your settings which models you have available, look for realtime api. The next problem will be if you are Tier 0 or Tier 1, you might have usage restrictions which will stop this from working too. Hope this helps :)
Hy Bart . What About Outbound Calls , Can We Also Handle Outbound Calls Using RealTime API ?
I haven't tried yet but I feel like yes, I'll look into it and make a follow up vid if i figure it out :)
Is there a way to add a custom voice ?
I don't think so at this stage
@ is there any other way by using any library? I have an saas app idea in mind
thank you! this is my first JS code and it is working. Tried to rework it in russian) works pretty well but first message read wtih heavy accent) how can i change system message? i guess it takes it from my accont - default message
nice work man! i haven't looked intot he default message yet, but the system prompt does the trick for me atm. In the next vid I will upgrade my system prompt. keep up the good work man!
@@BartSlodyczka will it work with other services webhook?
What would be the avg api cost? With an assumption that the calls could be in 100000 of min in a day/ month?
I haven't done any cost tests yet, but openai says roughly 30c per minute
@@BartSlodyczka smart pricing , very expensive , not affordable by small biz..
Hello, everyone it has been a long time, and I have experienced an AI voice conversation, which is very good for children in learning, and they are young onces.
so great to hear :)
can you make it let OpenAI bot to talk first, twilio's greeting sound is different openai sound
great idea! will look into this :)
Great tutorial! I am trying to get it to work better with interruptions (I want to be able to cut into the reply or correct something that was wrong), but it does not seem to respond to that? What am I missing? Found this in a forum : "To whoever is reading this in the future, I found a solution. Here is how I implemented it into my code: if response["type"] == "input_audio_buffer.speech_started": print('Speech Start:', response['type']) # Clear Twilio buffer …"
hey legend, someone posted a comment in my new Part 2 video and it had this code:
"Here is a simple update i made, to make the ai stop talking when the user is talking you need to add this:
if (response.type === "input_audio_buffer.speech_started")
{
console.log("Speech Start:", response.type);
// Clear any ongoing speech on Twilio side
connection.send(
JSON.stringify({
streamSid: streamSid,
event: "clear",
})
);
console.log("Cancelling AI speech from the server");
// Send interrupt message to OpenAI to cancel ongoing response
const interruptMessage = {
type: "response.cancel",
};
openAiWs.send(JSON.stringify(interruptMessage));
}"
You can prob throw the full code into chatGPT, then give it the above snippet, and ask it to insert it. Hope this helps :)
Does the AI agent have the ability to remember returning callers?
this is a 10/10 suggestion holy shmoly. Will look into this for the next vid. WOW
can you add function call to the bot?
will do for the next vid 🚀
This is great! Any ideas how to use phone numbers for other countries? I have an online store located in Ukraine. It would be great to receive incoming calls when no one is in the office.
Thanks! Great question, I'll suss it out and see if I can have some solutions for my next vid :)
Can you please do in python as well
interesting! I might do this in the coming weeks :)
If I may ask, so for the openai API, it costs $3 / 10-minute of call?
Imagine a business handling on average 10-minute inbound call with 1,000 of people🤦♂️
Today, tomorrow it will be cheaper, and the day after cheaper than a person.
@@thomasjamesbailey1209 Thank you for your answer my grandma knows. I was just clarifying the pricing "in the moment", not tomorrow, the day after, or hundred of years later.
I think $3 per 10-minute call is still cheap, considering all the costs and operations that go into hiring someone. Costs: salary + medical/ salary taxes + subscription costs (ie the business uses SAAS products and each person needs a seat) + sick days + etc. Operations: hiring + training + need a team manager + etc. From a cost and operations POV - I think business owners would be happy to pay considering how easy it is and how little overhead they have. Hope this kind of context helps :)
@@BartSlodyczka $3K will pay 1000 customer service in my country Indonesia for 10 hours. And Indonesia is even #16 in gdp with income per capita around $5K.
You can literally can pay $3 for 100 human customer service for talking for 10-minute in some poor country.
When I say expensive, it means globally. Not in the US with #1 GDP.
@@elpablitorodriguezharrera Very good points. At $3 per 100 human * 10 min this is 16.67 hours. Or 18c per hour. Now I see your point. I guess it then comes down to the business and where the employees are located. Either way, appreciate the time taken to explain your point, I learned something new 🤝
Idk why it does this but why, no matter how much I try to prompt/tweak parameters, does the API voice sound so monotone and bad at taking speech directions compared to the ChatGPT voices?
Yeah I agree, right now it's not the best sounding, but I'm sure in time it will get better. when it does, we will be ready 💪
@@BartSlodyczka fr. In reality this will likely birth an entirely new protocol/ web framework. Feels like an early days of the internet era where things are being bootstrap established for the first time ever
@@jalengonel such an exciting time man, such an exciting time
More pls
will do 🤝
how much does it cost per minute on average?
On average it costs around $0.06 per minute for audio input and $0.24 per minute for audio output, so $0.30 per minute if you're using both audio input and output
@@BartSlodyczka i know the openai website says that. But i was asking how much did it cost you in your demos. 0.30/minute doesn't seem realistic as you will not have 2 person speaking at same time. I mean at any given time the ai will be either listening or speaking. Not doing both. What do you say
🤗
I copied this step by step but my assistant just does the welcome message and then hangs up. No error messages. Anyone else?
Were you able to sort this out? If you copied step by step then the code should be all good. I would check (1) do you have funds in your twilio account and does your twilio number allow calls (2) did you pass the correct replit URL into the Twilio webhook configuration? IE if you deploy your replit code, it is a different URL to when you test the replit code in development mode. LMK how you go!
Hey Bart! Great Video, but unfortunately it is not working for me. I am using replit free plan. My OpenAI Account allow Realtime API requests and i already bought a number on twilio. Every time when i try to call it, it is busy. Can you help me out?
Same here, I have a free openai account and got the api key from there, do tell me if you found a way to tackle this. Thanks
Hmm, I think I would look at your usage limits for the realtime api model in your account. I recently worked with a client who had a Tier 0 or Tier 1 account and their usage was so low that the caller wouldn't work. Only after they went to Tier 2 or Tier 3 did it work. So give that a go, upgrade your account to allow more usage and that should be it. Hope this helps 🙏
gj
thank you 💪
Thats a similar tutorial that twilio posted a few days ago but its going to cost some money to be run in production
Yeah Twilio had a great tutorial and this is very similar :)
You have discord? if i want to learn a little more?
yes but email is better - bart@supportlaunchpad.com