How to Build an AI Voice Agent using OpenAI Realtime API

Поделиться
HTML-код
  • Опубликовано: 7 окт 2024
  • In this video, I will show you how to build and deploy an AI Voice Agent using OpenAI's new Realtime API (takes 10 min!). This agent will take bookings and send data to Make.com where you can then run any of your other automations. I give you the full code in from my Github Repo. I also show you step-by-step how to set up Replit and how to deploy on Replit so it's always live. I also show you how to plug in Twilio so you can have a phone number that calls your AI agent. I also show you how to connect Make.com. This is a beginner friendly tutorial.
    Want me to build this for you?
    👉 Contact me on bart@supportlaunchpad.com
    🗂️ Github repo: github.com/Bar...

Комментарии • 93

  • @WillCousin
    @WillCousin 2 дня назад +3

    This is a good demo - looking forward to part 2.

  • @fernandomendes1177
    @fernandomendes1177 2 дня назад +1

    Thank you for sharing, Bart! Amazing! I'm already waiting for part2! Keep going.

    • @BartSlodyczka
      @BartSlodyczka  2 дня назад +2

      Thank you my man 🙏 Will make part 2 soon!

  • @minasmarioskontid
    @minasmarioskontid День назад

    Thank you so much! learning to code, and got it hard with intergrating twilio. You're video created my day!

  • @lakergreat1
    @lakergreat1 15 часов назад

    So good, thank you for sharing. Subbed and looking forward to rag and function call future videos!

  • @stevebim000
    @stevebim000 День назад +1

    Amazing content, man! Please do another one with RAG and function calling

  • @SP-js4gf
    @SP-js4gf 2 дня назад +1

    yes we would love to see more of these kind of videos

  • @bartunma
    @bartunma День назад

    Great project. Thank you for that! It would be great to see a part 2 with bidirectional connection to any calendar. I'm also waiting for better version of real API since this version cannot be used at least for czech language (making a lot of mistakes).

    • @BartSlodyczka
      @BartSlodyczka  15 часов назад

      Díky! Interesting about czech not being so good yet, but yeah I bet it will improve soon. Keep at it legend :)

  • @tuaitituaiti1565
    @tuaitituaiti1565 2 дня назад +1

    As instructed, I liked this content, turned me into a new Sub. Standing by for a part 2-100 🙂Thank you, sir, for the education and value bombs you are dropping.💪🗿🔥🦅👊

    • @BartSlodyczka
      @BartSlodyczka  День назад

      my good man, thank you for the support 👊

  • @Noirteclabs
    @Noirteclabs День назад

    Thank you bro, great content! subscribed ✅

  • @jeanchindeko5477
    @jeanchindeko5477 День назад +1

    This is quite interesting to see OpenAI releasing in 2024, a technology Google demonstrated in Google IO 2017, and it was called Duplex, where an AI was at that time able to pass a phone call and was sounding so real. Google never released that API to the masses and is again late to the show in 2024.

    • @BartSlodyczka
      @BartSlodyczka  День назад

      wow I didn't even know this. Lucky there are other companies bringing out cool stuff and releasing to the public 💪

  • @felipesuaya5646
    @felipesuaya5646 19 часов назад

    Hi Bart! Excellent tutorial. I have a question about Replit. How does the pricing work? I'm currently working with the Assistants API. Thank you!

    • @BartSlodyczka
      @BartSlodyczka  15 часов назад

      thank you :) So you pay $25 a month (month to month plan) and you $10 in credits each month. If you're just starting out with Replit, I don't think you'l go over this limit. I've been using Replit for like a year now and have deployed lots of things, lots of testing, and have not yet gone over. I think if you get lots and lots of users then you'll use those credits up quickly. Hope this helps legend!

    • @santosh0011
      @santosh0011 3 часа назад

      @@BartSlodyczka Any alternatives to Replit?

  • @DUBOURGIA
    @DUBOURGIA 2 дня назад +1

    Hey man thanks for the video, I would like to know if we can use a platform other than twilio to do this, because Twilio does not support many countries?

    • @BartSlodyczka
      @BartSlodyczka  День назад +1

      great question, I think so but I haven't looked into it yet. What other platforms do you know that support more countries?

  • @XxX-mb2tg
    @XxX-mb2tg 6 часов назад

    Great video! Is it possible to also use the OpenAI voice assistant for the initial greetings message? I don't like the switch between the twilio tts voice and the openai realtime voice.

    • @XxX-mb2tg
      @XxX-mb2tg 2 часа назад

      Found it:
      - remove the `` tag from the twilio stream connection
      - change openAi ws open listener:
      ```
      openAiWs.on("open", () => {
      console.log("Connected to the OpenAI Realtime API");
      setTimeout(sendSessionUpdate, 250);
      setTimeout(() => {
      openAiWs.send(JSON.stringify({
      type: "conversation.item.create",
      item: {
      type: "message",
      role: "user",
      content: [
      {
      type: "input_text",
      text: "Hello!",
      },
      ],
      },
      }));
      openAiWs.send(JSON.stringify({ type: "response.create" }));
      }, 500);
      });
      ```

  • @QianliangHuang
    @QianliangHuang День назад

    Very good video! When I was testing with Twilio's dev phone, I found an issue. We are unable to interrupt the conversation directly, like we can when using OpenAI Realtime. How should this problem be resolved?

    • @BartSlodyczka
      @BartSlodyczka  15 часов назад +1

      Yes I think Twilio is working on this ATM. I'll see if I can find more info :)

  • @AlfredNutile
    @AlfredNutile 2 дня назад

    Nice work! I have been wondering how to have a phone number be used for stuff like this! Thanks

  • @РусланНагимов-д7д
    @РусланНагимов-д7д 5 часов назад

    thank you! this is my first JS code and it is working. Tried to rework it in russian) works pretty well but first message read wtih heavy accent) how can i change system message? i guess it takes it from my accont - default message

  • @enthogenesis
    @enthogenesis 2 дня назад

    onya mate, node, webhooks, whisper transcripts, logging, right URLs, deploying, live! boom! we're already in your debt... sweet as! I think most RAG implementations are in python may not need if less tha n 250 pages of text just need a large context window for an outfit like Bert's automotive! I did RAG: Beyond Basics from Prompt Engineer I strongly recommend it!

    • @BartSlodyczka
      @BartSlodyczka  День назад +1

      thank you legend! excellent recommendations, hooroo 💪

  • @AISlopForHumans
    @AISlopForHumans 2 дня назад

    Super cool brother, i am making such cool things with chat gpts text api, I can't wait you try this! I don't even know how to code and i can do this!

    • @BartSlodyczka
      @BartSlodyczka  День назад

      thank you my man, this comment makes me so happy 💪

  • @hickam16
    @hickam16 2 дня назад

    thank you! I want to see more!

    • @BartSlodyczka
      @BartSlodyczka  2 дня назад +1

      wicked - will whip something up soon 💪

  • @MontyChicola
    @MontyChicola 2 дня назад

    Unbelievable great code

  • @mikew2883
    @mikew2883 2 дня назад

    Very cool! Quick question. Were you able to get the barge in to work in your version. The Twilio version I tried I was unable to and the Twilio author stated it was a know issue and they are looking into it.

    • @BartSlodyczka
      @BartSlodyczka  2 дня назад

      thanks my man! I haven't tried to do barge in yet, but if twilio said its a known issue then maybe it's not possible just yet? but I imagine they'd fix it quickly considering they are the main partner for voice integration into the realtime api. I'll probably make another video with more features in the coming days and I'll suss out the barge stuff too 💪

    • @mikew2883
      @mikew2883 2 дня назад +1

      @@BartSlodyczka I'm hoping so. Looking forward to your functions and rag videos! 👍

  • @radoslav07
    @radoslav07 10 часов назад

    I would like to use local microphone or iphone app to talk to the local PC server so that way we can skip calling/using Twilio? Any recommendation how?

  • @xSneakybeast
    @xSneakybeast 2 дня назад

    thanks for sharing. i was wondering instead of it being a phonecall, how can the realtime api be accesed by pushing a button on a app like thats made with react native? That way it also can serve other usecases and the audio is better.

  • @barisbesorak
    @barisbesorak 3 дня назад

    thanks man highly appreciated

  • @fredericherrera
    @fredericherrera 3 дня назад

    This is very impressive

  • @vengeshop
    @vengeshop День назад

    This is great! Any ideas how to use phone numbers for other countries? I have an online store located in Ukraine. It would be great to receive incoming calls when no one is in the office.

    • @BartSlodyczka
      @BartSlodyczka  15 часов назад

      Thanks! Great question, I'll suss it out and see if I can have some solutions for my next vid :)

  • @muhammadazfar6361
    @muhammadazfar6361 День назад

    Hy Bart . What About Outbound Calls , Can We Also Handle Outbound Calls Using RealTime API ?

    • @BartSlodyczka
      @BartSlodyczka  15 часов назад +1

      I haven't tried yet but I feel like yes, I'll look into it and make a follow up vid if i figure it out :)

  • @wongr643
    @wongr643 2 дня назад

    Great content.

  • @elpablitorodriguezharrera
    @elpablitorodriguezharrera 2 дня назад +2

    If I may ask, so for the openai API, it costs $3 / 10-minute of call?
    Imagine a business handling on average 10-minute inbound call with 1,000 of people🤦‍♂️

    • @thomasjamesbailey1209
      @thomasjamesbailey1209 2 дня назад

      Today, tomorrow it will be cheaper, and the day after cheaper than a person.

    • @elpablitorodriguezharrera
      @elpablitorodriguezharrera 2 дня назад

      @@thomasjamesbailey1209 Thank you for your answer my grandma knows. I was just clarifying the pricing "in the moment", not tomorrow, the day after, or hundred of years later.

    • @BartSlodyczka
      @BartSlodyczka  День назад +1

      I think $3 per 10-minute call is still cheap, considering all the costs and operations that go into hiring someone. Costs: salary + medical/ salary taxes + subscription costs (ie the business uses SAAS products and each person needs a seat) + sick days + etc. Operations: hiring + training + need a team manager + etc. From a cost and operations POV - I think business owners would be happy to pay considering how easy it is and how little overhead they have. Hope this kind of context helps :)

    • @elpablitorodriguezharrera
      @elpablitorodriguezharrera День назад

      @@BartSlodyczka $3K will pay 1000 customer service in my country Indonesia for 10 hours. And Indonesia is even #16 in gdp with income per capita around $5K.
      You can literally can pay $3 for 100 human customer service for talking for 10-minute in some poor country.
      When I say expensive, it means globally. Not in the US with #1 GDP.

    • @BartSlodyczka
      @BartSlodyczka  День назад

      @@elpablitorodriguezharrera Very good points. At $3 per 100 human * 10 min this is 16.67 hours. Or 18c per hour. Now I see your point. I guess it then comes down to the business and where the employees are located. Either way, appreciate the time taken to explain your point, I learned something new 🤝

  • @coldlyanalytical1351
    @coldlyanalytical1351 3 дня назад +3

    How many paid-for services were needed to support this app?

    • @BartSlodyczka
      @BartSlodyczka  3 дня назад +2

      Need to pay for OpenAI API, otherwise Replit, Twilio, and Make.com you can start for free

    • @coldlyanalytical1351
      @coldlyanalytical1351 3 дня назад +1

      @@BartSlodyczka Tx for that! TBH you do seem to need to be an existing experienced web dev to handle this stuff. (I'm an embedded realtime dev .. a very different world)

    • @BartSlodyczka
      @BartSlodyczka  2 дня назад +7

      @@coldlyanalytical1351 If I can do it, I believe you can do it too 💪 I started learning how to code and dev around 1.5 years ago (started from absolutely zero) and I'd attribute my core success to (1) just believing I could do it and (2) lots of practice and ChatGPT prompting. You already have the foundational skill set and are probably a lot closer to becoming experienced in this area than you think. You got this, you are legend 🤝

    • @Omri.Tal.
      @Omri.Tal. 2 дня назад

      Thank you Bart! Great explanation, made it easy to understand. I’m waiting for your next videos to see how you implement KBs and functions ❤

    • @BartSlodyczka
      @BartSlodyczka  2 дня назад

      @@Omri.Tal. thank you legend 💪 will get the next vid out soon :)

  • @micbab-vg2mu
    @micbab-vg2mu 3 дня назад

    thanks :)

  • @angeloh-u1q
    @angeloh-u1q День назад

    Does the AI agent have the ability to remember returning callers?

    • @BartSlodyczka
      @BartSlodyczka  День назад

      this is a 10/10 suggestion holy shmoly. Will look into this for the next vid. WOW

  • @alanzou7677
    @alanzou7677 2 дня назад

    can you make it let OpenAI bot to talk first, twilio's greeting sound is different openai sound

  • @localloop
    @localloop День назад

    More pls

  • @alanzou7677
    @alanzou7677 2 дня назад

    can you add function call to the bot?

  • @jujhaarai
    @jujhaarai День назад

    how much does it cost per minute on average?

    • @BartSlodyczka
      @BartSlodyczka  15 часов назад

      On average it costs around $0.06 per minute for audio input and $0.24 per minute for audio output, so $0.30 per minute if you're using both audio input and output

    • @jujhaarai
      @jujhaarai 14 часов назад

      @@BartSlodyczka i know the openai website says that. But i was asking how much did it cost you in your demos. 0.30/minute doesn't seem realistic as you will not have 2 person speaking at same time. I mean at any given time the ai will be either listening or speaking. Not doing both. What do you say

  • @jalengonel
    @jalengonel 2 дня назад

    Idk why it does this but why, no matter how much I try to prompt/tweak parameters, does the API voice sound so monotone and bad at taking speech directions compared to the ChatGPT voices?

    • @BartSlodyczka
      @BartSlodyczka  День назад +1

      Yeah I agree, right now it's not the best sounding, but I'm sure in time it will get better. when it does, we will be ready 💪

    • @jalengonel
      @jalengonel День назад +1

      @@BartSlodyczka fr. In reality this will likely birth an entirely new protocol/ web framework. Feels like an early days of the internet era where things are being bootstrap established for the first time ever

    • @BartSlodyczka
      @BartSlodyczka  День назад

      @@jalengonel such an exciting time man, such an exciting time

  • @Numi2003
    @Numi2003 2 дня назад

    gj

  • @JohnDoe-rk7ex
    @JohnDoe-rk7ex День назад

    Thats a similar tutorial that twilio posted a few days ago but its going to cost some money to be run in production

    • @BartSlodyczka
      @BartSlodyczka  День назад

      Yeah Twilio had a great tutorial and this is very similar :)