Using OpenAI Realtime API to build a Twilio Voice AI assistant with Node.js

Поделиться
HTML-код
  • Опубликовано: 26 дек 2024

Комментарии • 194

  • @TwilioDevs
    @TwilioDevs  2 месяца назад +5

    What should we build next?
    Next up on the channel is likely going to be the Python version of this tutorial followed by some updates regarding interruptions and having the AI talk first.

    • @ethereal-rzn
      @ethereal-rzn 2 месяца назад +1

      AI talk first pleaseeee. Couldnt find any tutorial on that in the web

    • @mhazwan
      @mhazwan 2 месяца назад

      Want to see how the AI talks first

  • @georgedukic9955
    @georgedukic9955 2 месяца назад +5

    This makes things so much easier. I was trying to do this manually, converting voice to text, sending prompt to openai, and then converting the response back to voice..

    • @nags9723yt
      @nags9723yt 2 месяца назад

      Yeah. This is great feature. Imagine the lag by passing data between the different apis. 😊

  • @nlarchive
    @nlarchive 2 месяца назад +10

    that Twilio robotic voice need and update, thank for the content!!!

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +2

      There are other voice options available that sound better but definitely agree that one is from different era 😅

  • @markustrasberg3957
    @markustrasberg3957 2 месяца назад +4

    There's a small bug in the blog post guide. The websocket connection URL is mistyped (should contain a single model=, atm has two)

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      Thanks, I'll let Paul know!

  • @Sa-if
    @Sa-if 2 месяца назад +20

    This will start a new age of AI...

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +3

      It's really impressive how interactive it is!

    • @EDashMan
      @EDashMan 2 месяца назад

      @@TwilioDevs Yoo that’s crazy. I’m going to test the repo myself first, seeing is believing haha!

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      @@EDashMan Let me know how it goes! I know OpenAI is rolling this out in stages so if it doesn't work at first, check to make sure you have access to the OpenAI Realtime API. I was blown away the first time I got this working though. Feel free to mix up the SYSTEM_MESSAGE prompt and the temperature a bit too. It's pretty amazing. I feel like I should have it coach me through making a meal :D

    • @EDashMan
      @EDashMan 2 месяца назад

      @@TwilioDevs Yeah I'm getting: Error in the OpenAI WebSocket: Error: Unexpected server response: 403
      I don't even have gpt-4o-realtime-preview-2024-10-01 in my playground. I guess I can't use it yet :(

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      @@EDashMan Bummer! Yeah hopefully it'll roll out pretty quickly.

  • @riley_blackwell
    @riley_blackwell 2 месяца назад +8

    Now you just have to provide customer data from Segment to the model. Then when a customer calls the model can give a personalized answer.
    For example, a customer calls a car repair shop. Then the model using RAG accesses a customer’s data to check on the status of a car repair. Lastly, the model responds with the status of the car repair.
    All the customer has to do is call the car repair shop and ask a simple question with voice. A great customer experience if you ask me 😊

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      Yes, this is a great scenario! That's exactly the type of exciting things that can be enabled by combining all of the pieces. Thanks for watching and for the comment!

    • @CodyDietzofficial
      @CodyDietzofficial 2 месяца назад +2

      I am literally building this right now...

    • @riley_blackwell
      @riley_blackwell 2 месяца назад

      @@CodyDietzofficial Awesome! Can’t wait to see it :)

    • @LettersAndNumbers300
      @LettersAndNumbers300 Месяц назад

      Yes car repair shops is where the big bucks are to be made

  • @jothamdudley4116
    @jothamdudley4116 2 месяца назад +5

    got this working using my azure endpoint with some help from chatgpt!
    I did notice this example doesn't handle interruptions, will you be updating the repo with more features in the future?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +4

      that's awesome! thanks for giving it a try.
      We decided to leave interruptions out for this blog post/video because the code was already pretty long. We talked about doing follow-ups for things like interruptions and function calling. I'll check with the team and see what the plan is.

    • @gurumack
      @gurumack 2 месяца назад +1

      @@TwilioDevs to be honest, I'd really appreciate this - this is a huge part of what makes this tech so amazing. Any high level support on how to accomplish this, if it's even possible? thanks!

    • @limebulls
      @limebulls 2 месяца назад

      Great! Would you mind to share your code?

    • @jonasmassieAI
      @jonasmassieAI 2 месяца назад

      @@TwilioDevs looking for this also...

    • @ethanfossett5835
      @ethanfossett5835 2 месяца назад

      @@TwilioDevs Also looking for this - even just the samples of the code would be great don't need a full video.

  • @thechannel8x
    @thechannel8x 21 день назад +1

    Deployment? Great work, great explanation - what's the best place to deploy this? TW Services? Or that wouldn't work?

    • @TwilioDevs
      @TwilioDevs  21 день назад

      I usually leave out deployment since it can be a fairly personal choice and outside of the scope of the tutorial. That said, this code should work anywhere you can deploy a full Node.js app. Some popular options include Render (render.com), Railway (railway.app), DigitalOcean or building your own setup within a VPS.
      Lots of options out there! Thanks for watching and let us know if you need any further help.

  • @ArmaanSood-y9d
    @ArmaanSood-y9d 2 месяца назад +3

    hey this is amazing , revolutionary even! , how do i connect my model to a vector_store / a knowledge base that it can refer to? or is that not supported yet ? I am trying to figure out if i should implement that in the function calling ; tools {} parameter or not? Thanks !!!!

    • @natevance3661
      @natevance3661 2 месяца назад

      I'm wondering if this is possible / how to do this as well

    • @ArmaanSood-y9d
      @ArmaanSood-y9d 2 месяца назад

      @@natevance3661 I have found out about some crazy shit , trying to piece it all together but you gotta use make

    • @titimiti1984
      @titimiti1984 2 месяца назад

      Did you figure out how to do that? Let me know if you do

    • @clairedubiel1
      @clairedubiel1 Месяц назад

      Please let me know as well!

  • @PraiseYeezus
    @PraiseYeezus 2 месяца назад +4

    Would like to see a tutorial about using OpenAI to get on-screen transcriptions of phone calls

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +4

      That's a cool idea. I'll see what we can do!

    • @limebulls
      @limebulls 2 месяца назад +1

      @@TwilioDevsyes please!

  • @dawid_dahl
    @dawid_dahl 2 месяца назад +2

    Can you show how we can integrate Function Calling as well?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +3

      That's a good idea for a follow-up video, thanks!

  • @craigsdennis
    @craigsdennis 2 месяца назад +4

    Love the video Brent! 💪🚀

  • @VibeTech311
    @VibeTech311 2 месяца назад +1

    This was a great video. I am looking for a way to output the conversation both what was received and how it responded. Is that possible through the realtimeapi? Currently I can capture the response in text but I have not figured out how to capture what is said to it in text, via realtimeapi.
    Thanks again.

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      I'll see if I can put something together for that. First up is the Python version of this tutorial which got delayed a little bit.

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      So for clarity, you want the text of what the caller says to the AI?

    • @VibeTech311
      @VibeTech311 2 месяца назад +1

      @@TwilioDevs yes, and thank you so much. I can get the text for the realtime api response, but the text for the caller is where I am struggling. I don’t know if realtime has a way, and I recently saw something in Twilio that could possibly help. But thank you again, I truly appreciate your response and consideration.

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      No promises but I'll see what I can do. If not a video perhaps we can at least get you a code snippet.

    • @VibeTech311
      @VibeTech311 2 месяца назад

      @@TwilioDevs you are amazing thank you 🙏

  • @zhangxiang18
    @zhangxiang18 2 месяца назад

    Thanks for the fantastic video and do I need to upgrad my twilio account to a full version to perform this function? I have set up everthing right based on the tutorial but no response from the AI even I spoke the first sentence. Alas..

  • @xlretard
    @xlretard 2 месяца назад +1

    I needed this 18 months ago lol

  • @HarborProjectB
    @HarborProjectB 2 месяца назад +1

    This is great. But I have been struggling with the ability to interrupt the AI when on a call with Twilio.

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      Working on something for this! Stay tuned.

  • @SaminYasar_
    @SaminYasar_ 2 месяца назад +6

    Already built this on my channel will be crazy

  • @fantasticshorts167
    @fantasticshorts167 2 месяца назад +1

    Hey! I have use the function calling in this real time api for calendar bookking but I am struggling with how to send the response of the function back to API for TTS. Can you please help me with that?

  • @DanBorgia
    @DanBorgia 2 месяца назад +1

    Perfect timing!

  • @randotkatsenko5157
    @randotkatsenko5157 2 месяца назад +2

    One thing I dont undrstand - how to make OpenAI speak first when it answers the call?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +4

      Right after the code sends the sessionUpdate object you can send something like this (feel free to modify the prompt):
      const event = {
      type: 'conversation.item.create',
      item: {
      type: 'message',
      role: 'user',
      content: [
      {
      type: 'input_text',
      text: 'Please greet the caller and say "hi there, how can i help you?"'
      }
      ]
      }
      };
      openAiWs.send(JSON.stringify(event));
      openAiWs.send(JSON.stringify({type: 'response.create'}));

    • @randotkatsenko5157
      @randotkatsenko5157 2 месяца назад

      @@TwilioDevs Thank you very MUCH! I got the code, but still no access to realtime API. Hopefully soon! Thanks again. Twilio is good.

    • @Bangs_Theory
      @Bangs_Theory 2 месяца назад

      @@randotkatsenko5157 try livekit

  • @sarzzfish8420
    @sarzzfish8420 2 месяца назад +1

    can i use it in danish, turkish or german?

    • @WaiZe0
      @WaiZe0 2 месяца назад

      I struggled to make audio input detect for a specific language, even with whisper’s language parameter. Tell me if u were able to choose any other language.

  • @RobertSpartacus
    @RobertSpartacus 2 месяца назад

    Twilio Folks,
    Is there any tutorila to use realtime api for outbound calls ? i.e - triggering a call & taking it forward

  • @krloschavarriasauceda151
    @krloschavarriasauceda151 2 месяца назад +1

    what theme of you vscode you have?

  • @cyruszad
    @cyruszad 2 месяца назад +1

    This is going to really help you guys. I worked on this immediately when this was dropped but this setup has a weakness. Interruptions don’t work when you interrupt the agent in the middle of a larger audio playback (ask it to read an example paragraph) and then try to interrupt it in the middle - it won’t work. I tried messing with it but nothing worked.

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      We're working on it! I should have something to share this week.

    • @josephbesgen4729
      @josephbesgen4729 2 месяца назад

      @@TwilioDevs Fantastic video! Just curious if you've uploaded anything regarding how to deal with interruptions

  • @bahubaliavenger472
    @bahubaliavenger472 6 дней назад

    Hello sir, i did something similar in python flaks. But i am getting huge delay ( 5 second ) to download the audio file. From twilio. Any alternative please reply

  • @ziv4gamer
    @ziv4gamer Месяц назад +1

    Is there a way to trigger the first response without needing to say something first?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      Check the GitHub repo. It has a "assistant speaks first" option in it that got added after this video was made.

  • @cscrowley1
    @cscrowley1 2 месяца назад +1

    Also, do you guys have any thoughts you would care to share on outbound calling?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      What specifically are you looking for thoughts on?

  • @mustaphaboutzoua8046
    @mustaphaboutzoua8046 2 месяца назад +1

    "Thank you, Brent! Do I need a Twilio subscription for communication between two valid numbers? (The trial only provides one valid number.) When I try to make a call using the Twilio dev phone with the same number, I don't receive anything." it seems i need two numbers?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      You can add a verified number to test your app with your own phone during trial: help.twilio.com/articles/223180048-Adding-a-Verified-Phone-Number-or-Caller-ID-with-Twilio

  • @ryanroman6589
    @ryanroman6589 2 месяца назад +1

    running `twilio dev-phone` launches the dev phone but also updates the webhooks. anyone get this to work?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +2

      You need to use a different phone number than the one you are testing.

  • @mspicela
    @mspicela Месяц назад

    Thanks again for the tutorial. What is needed to make it possible to interrupt the AI? I think Twilio may be buffering received audio from OpenAI that it finishes playing even when interrupted.
    I tried several changes to try to fix things. I wonder if the audio from OpenAI is sent to twilio that is buffering it. Then when it is interrupted, that is why it still keeps playing what it's already received. Is there a way to tell twilio to stop playing what had already been sent when an interruption is detected.
    The Web only implementations with webrtc handle interruptions immediately just like the ChatGPT official app. I know phone networks have a delay but this is more than that is seems to keep talking for many seconds.
    Thank you in advance.

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      Hey hey! Check out this timestamp from our recent livestream where I helped Alex and Bianca add this (i'm the robot 😂). The timestamp starts at their first interaction with it where they see how the lack of interruptions impacts things and then we walk through how to add a version of interrupt to it: ruclips.net/video/_itrbiszfiE/видео.htmlfeature=shared&t=2843

    • @mspicela
      @mspicela Месяц назад

      @@TwilioDevs Perfect and thank you! I watched the livestream recording and rebased my stuff on the newer version. It is working well now.
      What are you using to be a robot in the livestream?

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      @@mspicela Total custom build inside of OBS (obsproject.com). It's a pile of PNG files, a waveform generator for the mouth, and some subtle motion effects.

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      @@mspicela Also super glad you got it working! Let us know if there's anything else we can help with!

  • @mohibahmed5098
    @mohibahmed5098 7 дней назад +1

    It doesn't handle interruptions while the AI is speaking. Am i missing something?

    • @TwilioDevs
      @TwilioDevs  7 дней назад

      Check the repo that is linked in the description. We figured out how to add that after the video shipped. Thanks for watching!

  • @ankitrawat7211
    @ankitrawat7211 2 месяца назад

    How can I load my own trained models in this?

  • @carloslfu
    @carloslfu 2 месяца назад

    This is great! Thanks for sharing!

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      Glad you enjoyed it! Thanks for watching 🎉

  • @natevance3661
    @natevance3661 2 месяца назад

    Is there a way to connect this to a GPT assistant?

  • @gurumack
    @gurumack 2 месяца назад +4

    has anyone here figured out how to modify this code for interrupts?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      Working on this at the moment. Hopefully have an update yearly this coming week.

    • @gurumack
      @gurumack 2 месяца назад

      I was able to figure it out! thanks

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      @@gurumack Happy to hear it!

    • @exploretheworld1736
      @exploretheworld1736 2 месяца назад

      @gurumack can you plz share it. How to handle intruptions.

  • @MohsinAli-x8r5r
    @MohsinAli-x8r5r 2 месяца назад

    How can we get access to Realtime API on Openai account (I have paid account already). I integrated code and added openai key but problem is that during call, it's started communicating and not listening to me (No two-way communication). Can someone help me out?

  • @muhammadatif9263
    @muhammadatif9263 2 месяца назад

    What is the reason for using fastify over express?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      The websocket module for fastify is nice to work with and fastify is more performant than Express for this use case.

  • @KirkBell
    @KirkBell 2 месяца назад

    Will this work with changing the default voices accents to accents like Australian, English/UK and others?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      I believe I read that OpenAI will detect the regional accent and speak the responses in that accent. I think you can add that to the instructions (SYSTEM_MESSAGE) in the app to help reinforce the goal.

  • @Philosophicflix
    @Philosophicflix 2 месяца назад

    any replacement instead of ngrok? having issues with my terminal

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      There's a full list of alternatives here: github.com/anderspitman/awesome-tunneling

  • @RobertSpartacus
    @RobertSpartacus 2 месяца назад

    Any guide on how to add function calling ? Also can't we buy an Indian number rn ?

  • @limebulls
    @limebulls 2 месяца назад

    Can you make a tutorial for this on azure as well?

  • @wordpressobsessed9067
    @wordpressobsessed9067 2 месяца назад +1

    So can we host this on Twilio serverless? If so, which file would we point the incoming call to? Also, it can be modified to greet the caller first, correct? I'm thinking for a business AI assistant to take calls, give information etc. I have created these AI apps with Vapi, but it gets pretty expensive. Twilio would be so much cheaper.

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      I think with the need for a persistent web socket connection you're probably going to be best served doing this outside of our serverless Twilio Functions. I can double check with the team though!
      As for greeting, you can definitely change the tags to customize the greeting from Twilio or I believe you could pre-prompt OpenAI with a text prompt using the Realtime API if you want the greeting to come from the assistant.

    • @wordpressobsessed9067
      @wordpressobsessed9067 2 месяца назад

      @@TwilioDevs Thanks, I'll mess around with it some. Is that voice coming from AWS? I've never heard that voice, but its really good and would be terrific for most professional business applications. The latency is next to nothing, which has been the biggest hurdle it seems with these voice AI assistants. Good to see Twilio is now in the game!

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      @@wordpressobsessed9067 It's one of OpenAI's voices. I agree it's very natural sounding!

    • @0xb1sh0p8
      @0xb1sh0p8 2 месяца назад +1

      @@TwilioDevs Correct you'll need a persistent ws listening for a unique stream for each number/assistant you're hosting.

    • @aiplaygrounds
      @aiplaygrounds 2 месяца назад

      You can probably run it through your crm before answering to get all the phone info if any.

  • @NexGenUltra
    @NexGenUltra 2 месяца назад +2

    for the interruption issue :
    you need to clear the twilio buffer and then send response.cancel

    • @johns332
      @johns332 2 месяца назад +1

      Can you share how you implemented this? I tried sending the following commands when the response type is input_audio_buffer.speech_started:
      await openai_ws.send(json.dumps({"type": "response.cancel"}))
      await openai_ws.send(json.dumps({"type": "output_audio_buffer.clear"}))
      No dice though :( Your help here would be greatly appreciated!

    • @NexGenUltra
      @NexGenUltra 2 месяца назад

      @@johns332 Use this : case 'input_audio_buffer.speech_started':
      console.log('Speech Start:', response.type);
      twilioWs.send(
      JSON.stringify({
      streamSid: streamSid,
      event: 'clear',
      })
      );
      console.log('Cancelling AI speech from the server');
      const interruptMessage = {
      type: 'response.cancel'
      };
      openaiWs.send(JSON.stringify(interruptMessage));
      }

  • @mspicela
    @mspicela Месяц назад

    Thank you for the tutorial. I built an AI phone agent/bot with this combined with function calling from OpenAI and it worked very well. Unfortunately, now I can no longer edit my phone numbers configuration -- "Voice configuration is unavailable for this phone number" -- but this isn't true because it lists my URL still and worked for days. To make things worse, the support spins and spins so I can't submit a trouble ticket.

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Hi! Thanks for watching and I'm happy you built this. Sorry you're having trouble though (both with the app, and support).
      If you go here: help.twilio.com/ and ask a question, see if anything there helps resolve this.
      If not, there's a section at the bottom asking "Is this helpful?" and you can hit the thumbs down which will prompt you to either log in to submit a ticket or click the link next to it to submit a ticket without logging in.
      Once you have a ticket number, I can try to help escalate (no promises but worth a try!).

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Hello Michael,
      Thank you for getting in touch with our Social Support Team. We sincerely apologize for the inconvenience caused.
      Could you please dm us the email address on file?

    • @mspicela
      @mspicela Месяц назад

      @@TwilioDevs thank you for the reply. It's working now! I didn't do anything to change it but it resolved itself.

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Awesome news! That's much easier to triage 😀 Glad it's working again!

  • @ompawaskar507
    @ompawaskar507 2 месяца назад

    Does this work with gemini 1.5 flash??

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      This tutorial is specifically for the OpenAI Realtime API.

  • @EswaraNadh
    @EswaraNadh 2 месяца назад

    How to make OpenAI speak the function_call results? Like if the appointment is created successfully, then how to let the user know that the appointment is created successfully.

  • @60pluscrazy
    @60pluscrazy 2 месяца назад +1

    Thanks 🎉

  • @RobertSpartacus
    @RobertSpartacus 2 месяца назад

    Is there a way to buy Indian numners on Twilio if not what is the workaround rn ?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      Hi Bharath,
      Thank you for getting in touch with our Social Support Team. Unfortunately, Twilio does not offer the ability to purchase Indian phone numbers directly. However, there are some workarounds and considerations you can explore.
      Kindly dm us for more information.

  • @EDashMan
    @EDashMan 2 месяца назад +1

    Is the speed really this fast?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +4

      Yes! The phone calls shown are not sped up or edited 😃

    • @0xb1sh0p8
      @0xb1sh0p8 2 месяца назад +2

      I can vouch for the speed. I'm just wrapping up development on a project that uses this flow along with some other options for generating assistants.

    • @EDashMan
      @EDashMan 2 месяца назад

      @@0xb1sh0p8 how do you know if you have access to the api, other than a server 403 error I’m not getting an exact messaging regarding the api.. do you have it available in the playground ?

    • @0xb1sh0p8
      @0xb1sh0p8 2 месяца назад

      @@EDashMan I don't have anything public right now. When you signup with twilio, you'll create an account. When you go to that account's dashboard and scroll down, it will show you your SID and Auth Token to access the API

    • @0xb1sh0p8
      @0xb1sh0p8 2 месяца назад

      @@EDashMan hmm, did my last comment get deleted? You'll have access to the api when you sign up and create and account. At the bottom of the account page you'll see your SID and AuthToken to use.

  • @esek-2
    @esek-2 Месяц назад

    Is this still working? I got it to work some weeks ago, but strangely, it is not working anymore - When I call my Twilio Phone Number, in the nodejs output I get the event "input_audio_buffer.speech_started", and after I finished speaking, nothing happens, and the bot does not answer me.

    • @AbhishekMishra-db2tj
      @AbhishekMishra-db2tj Месяц назад

      Hey, I am also facing the same problem, did you find anything to solve this?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      Should still be working, yes. We just were building again it on our livestream today and it was working.

    • @esek-2
      @esek-2 Месяц назад

      @@AbhishekMishra-db2tj Hey, somehow it does not properly detect when I finished speaking with my phone. When trying from a different phone, it worked. Not sure why that is the case.

  • @IdkJustCookingDude
    @IdkJustCookingDude 2 месяца назад

    I'm so frustrated I'm literally at the last step. I got the twilio and openai API to work together and when I call the phone number it says please wait speak your AI agent brought to you by openai and twilio and then says okay you can speak and then hangs up. Can anyone help I have been using chat GPT and Claude and they're both making me run around in circles

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      The symptoms sound like an OpenAI Realtime API key issue. Seems like the call is hanging up at the point the OpenAI Realtime API should be getting connected. Are you getting any errors in the terminal?
      Please refer to the blog post or GitHub repo in the video description to make sure your code is 100% correct. You can also check on your API key's access at platform.openai.com

  • @akelebelay1025
    @akelebelay1025 2 месяца назад +1

    can you do it using python?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      Yes! Should we make a Python video tutorial?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      For now, here's a blog post: www.twilio.com/en-us/blog/voice-ai-assistant-openai-realtime-api-python

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Sorry for the delay!
      ruclips.net/video/OVguB1h-eTs/видео.html

  • @radoslav07
    @radoslav07 2 месяца назад

    If I want to use this example without twillio call, but directly from my mic and web page

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      You'll need to stream audio from your local microphone to the OpenAI websocket.

  • @aiplaygrounds
    @aiplaygrounds 2 месяца назад +1

    My next project ❤

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      Let us know how it goes!

  • @riley_blackwell
    @riley_blackwell 2 месяца назад +1

    This is great!

    • @TwilioDevs
      @TwilioDevs  2 месяца назад +1

      Thanks for watching!

  • @SathishM-n8i
    @SathishM-n8i 2 месяца назад

    this is for Incoming Call right what about outgoing call

  • @TwilioDevs
    @TwilioDevs  Месяц назад

    Would you prefer to see this tutorial in Python? Check it out here: ruclips.net/video/OVguB1h-eTs/видео.html

  • @WaiZe0
    @WaiZe0 2 месяца назад

    How can i set input language to something other that English?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      You can change the system prompt to indicate the language you want to use. It will also usually match whatever language you speak to it.

    • @WaiZe0
      @WaiZe0 2 месяца назад

      @@TwilioDevs I’ve created a twilio program before but using the gather method i was able to choose the language, but with openai realtime api i tried their language parameter for whisper-1 and it doesn’t work.
      And sadly the current state of auto detection is 75% flawed in my tests.

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      @@WaiZe0 At 03:02 we set up a system prompt. You can tell it what language you'd like for it to use in that prompt (and also tell it how to greet the caller, etc.). From my testing it has obeyed that quite well. I told it to converse only in Spanish and I wasn't able to get it to break out of that even by insisting I only knew English.

    • @WaiZe0
      @WaiZe0 2 месяца назад

      @@TwilioDevs I noticed it works well in English and Spanish, but im working with Arabic and it gets it only 1/10 times even with the clearest system prompt. Is there a way to set language like Twilio’s gather method?

  • @sfsadfsadfasdf
    @sfsadfsadfasdf 2 месяца назад +1

    This is the future.. the problem is that the OpenAI's voices in spanish doesn't sounds very well.. they sounds with like an american accent, is there a way to integrate this voice, not using GPT's voice but using elevenlabs without losing the realtime benefit of twilio-openai?

    • @mandrews817
      @mandrews817 2 месяца назад +3

      If you use advanced mode, switch your system language to Spanish, open a new conversation, and tell the assistant: "can you speak to me using a Castillian Spanish accent?"

    • @boytenesee3494
      @boytenesee3494 2 месяца назад

      The realtime API allows either speech or text response - you can send the respond to 11labs and then push back into twilio after

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      Have you tried the options provided by the other commenters yet? Would love to help you find success.

    • @sfsadfsadfasdf
      @sfsadfsadfasdf 2 месяца назад

      @@mandrews817 But the advance mode is available in the API?, or you are talking about the voice assistant that OpenAI is currently launching?, if its the first thing, could you please tell me where i can read more about.. i have never heard about advance mode in the API speech to text

    • @sfsadfsadfasdf
      @sfsadfsadfasdf 2 месяца назад +1

      @@boytenesee3494 Will try this, maybe it will delay the responses a little bit but i think it wouldnt be very noticeable, i will give it a try, thank you for the idea.

  • @momoya8373
    @momoya8373 2 месяца назад

    To avoid any confusion, it’s important to clearly state that even using the development phone incurs charges for both making and receiving calls(x2 charges), as some users might assume it’s free otherwise. Why not be clear?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      The Twilio Dev Phone documentation page states that it is using one of your own Twilio numbers to make the call. There's no intended deception here. I used the Dev Phone in the video as an option to not use my personal phone for the demo since it's easier to see the interaction and logs. It's just an option.

  • @BrainCandyQuiz
    @BrainCandyQuiz 2 месяца назад +1

    Confused. Instructions say "Step 2: Get your Account Sid and Auth Token from the Twilio Console to get started.", but nowhere does it say what do with them. Also call connects ago, but it can't seem to hear me, then disconnected after 5 seconds. Related? Connected to the OpenAI Realtime API
    Sending session update: {"type":"session.update","session":{"turn_detection":{"type":"server_vad"},"input_audio_format":"g711_ulaw","output_audio_format":"g711_ulaw","voice":"alloy","instructions":"You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested about and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling - subtly. Always stay positive, but work in a joke when appropriate.","modalities":["text","audio"],"temperature":0.8}}
    Disconnected from the OpenAI Realtime API

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      If the call is working at all, the Twilio side of this is fine which means you're okay on the Twilio credentials front. This looks like it's not getting audio over to the OpenAI API. There are some more logging types you can enable with the code in the blog post. Can you try turning those on and see what you get in the terminal?

  • @nixoncode
    @nixoncode 2 месяца назад +1

    Somewhat helpful, but why would you want this?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      Probably lots of use cases. This example is very basic but imagine an assistant that replaces the typical phone tree at a company with something that speaks naturally to them, can answer some questions they may have, and ultimately can redirect the call to an actual human if it detects it needs to.

  • @johns332
    @johns332 2 месяца назад +1

    Anyone else getting 403 errors?

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      From the video description:
      "OpenAI is rolling out Realtime API access incrementally. Please watch their site for updates."
      This is likely due to this.

    • @johns332
      @johns332 2 месяца назад

      Darn, thanks for the video and response though!​@TwilioDevs

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      @@johns332 Thank you for watching 😃 Let us know when you get access. Happy building!

  • @musumo1908
    @musumo1908 2 месяца назад

    Using tools and azure realtime endpoint

  • @NexGenUltra
    @NexGenUltra 2 месяца назад

    Thre is an issue on the quality of the answer, especially when dealing with local dialects. While he can somewhat handle English (not Good), it struggles significantly with dialects like Darija or other regional languages. The difference in transcription accuracy between the current implementation and the OpenAI Playground is very noticeable.

  • @mohamudalifarah7722
    @mohamudalifarah7722 2 месяца назад +1

    Node.js 18+

    • @TwilioDevs
      @TwilioDevs  2 месяца назад

      Correct, version 18 or higher. Not sure why I said 18+ like it was an age or something 🤣

  • @cscrowley1
    @cscrowley1 2 месяца назад

    OAI dashboard billing limits says I do have access "Realtime
    gpt-4o-realtime-preview 20,000 TPM 5,000 RPM
    gpt-4o-realtime-preview-2024-10-01 20,000 TPM 5,000 RPM"
    But I can only hear the clunky Twilio TTS at the beginning of the call and do not get connected. Also DTMF button press seems to end the session: "Server is listening on port 5050, Client connected
    Received non-media event: connected
    Incoming stream has started MZcbf17dca62564c8a46602ce815cd43bd
    Connected to the OpenAI Realtime API
    Sending session update: {"type":"session.update","session":{"turn_detection":{"type":"server_vad"},"input_audio_format":"g711_ulaw","output_audio_format":"g711_ulaw","voice":"alloy","instructions":"You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested about and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling - subtly. Always stay positive, but work in a joke when appropriate.","modalities":["text","audio"],"temperature":0.8}}
    Received non-media event: dtmf
    Disconnected from the OpenAI Realtime API
    Received non-media event: stop
    Client disconnected."