Using OpenAI Realtime API to build a Twilio Voice AI assistant with Node.js

Поделиться
HTML-код
  • Опубликовано: 19 ноя 2024

Комментарии • 194

  • @TwilioDevs
    @TwilioDevs  Месяц назад +4

    What should we build next?
    Next up on the channel is likely going to be the Python version of this tutorial followed by some updates regarding interruptions and having the AI talk first.

    • @ethereal-rzn
      @ethereal-rzn 29 дней назад

      AI talk first pleaseeee. Couldnt find any tutorial on that in the web

    • @mhazwan
      @mhazwan 29 дней назад

      Want to see how the AI talks first

  • @nlarchive
    @nlarchive Месяц назад +8

    that Twilio robotic voice need and update, thank for the content!!!

    • @TwilioDevs
      @TwilioDevs  Месяц назад +2

      There are other voice options available that sound better but definitely agree that one is from different era 😅

  • @georgedukic9955
    @georgedukic9955 Месяц назад +2

    This makes things so much easier. I was trying to do this manually, converting voice to text, sending prompt to openai, and then converting the response back to voice..

    • @nags9723yt
      @nags9723yt Месяц назад

      Yeah. This is great feature. Imagine the lag by passing data between the different apis. 😊

  • @Sa-if
    @Sa-if Месяц назад +19

    This will start a new age of AI...

    • @TwilioDevs
      @TwilioDevs  Месяц назад +3

      It's really impressive how interactive it is!

    • @EDashMan
      @EDashMan Месяц назад

      @@TwilioDevs Yoo that’s crazy. I’m going to test the repo myself first, seeing is believing haha!

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      @@EDashMan Let me know how it goes! I know OpenAI is rolling this out in stages so if it doesn't work at first, check to make sure you have access to the OpenAI Realtime API. I was blown away the first time I got this working though. Feel free to mix up the SYSTEM_MESSAGE prompt and the temperature a bit too. It's pretty amazing. I feel like I should have it coach me through making a meal :D

    • @EDashMan
      @EDashMan Месяц назад

      @@TwilioDevs Yeah I'm getting: Error in the OpenAI WebSocket: Error: Unexpected server response: 403
      I don't even have gpt-4o-realtime-preview-2024-10-01 in my playground. I guess I can't use it yet :(

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      @@EDashMan Bummer! Yeah hopefully it'll roll out pretty quickly.

  • @riley_blackwell
    @riley_blackwell Месяц назад +8

    Now you just have to provide customer data from Segment to the model. Then when a customer calls the model can give a personalized answer.
    For example, a customer calls a car repair shop. Then the model using RAG accesses a customer’s data to check on the status of a car repair. Lastly, the model responds with the status of the car repair.
    All the customer has to do is call the car repair shop and ask a simple question with voice. A great customer experience if you ask me 😊

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      Yes, this is a great scenario! That's exactly the type of exciting things that can be enabled by combining all of the pieces. Thanks for watching and for the comment!

    • @CodyDietzofficial
      @CodyDietzofficial Месяц назад +2

      I am literally building this right now...

    • @riley_blackwell
      @riley_blackwell Месяц назад

      @@CodyDietzofficial Awesome! Can’t wait to see it :)

  • @DanBorgia
    @DanBorgia Месяц назад +1

    Perfect timing!

  • @xlretard
    @xlretard Месяц назад +1

    I needed this 18 months ago lol

  • @markustrasberg3957
    @markustrasberg3957 Месяц назад +4

    There's a small bug in the blog post guide. The websocket connection URL is mistyped (should contain a single model=, atm has two)

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Thanks, I'll let Paul know!

  • @craigsdennis
    @craigsdennis Месяц назад +4

    Love the video Brent! 💪🚀

  • @SaminYasar_
    @SaminYasar_ Месяц назад +6

    Already built this on my channel will be crazy

  • @PraiseYeezus
    @PraiseYeezus Месяц назад +4

    Would like to see a tutorial about using OpenAI to get on-screen transcriptions of phone calls

    • @TwilioDevs
      @TwilioDevs  Месяц назад +4

      That's a cool idea. I'll see what we can do!

    • @limebulls
      @limebulls Месяц назад +1

      @@TwilioDevsyes please!

  • @carloslfu
    @carloslfu Месяц назад

    This is great! Thanks for sharing!

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Glad you enjoyed it! Thanks for watching 🎉

  • @fantasticshorts167
    @fantasticshorts167 Месяц назад +1

    Hey! I have use the function calling in this real time api for calendar bookking but I am struggling with how to send the response of the function back to API for TTS. Can you please help me with that?

  • @ArmaanSood-y9d
    @ArmaanSood-y9d Месяц назад +3

    hey this is amazing , revolutionary even! , how do i connect my model to a vector_store / a knowledge base that it can refer to? or is that not supported yet ? I am trying to figure out if i should implement that in the function calling ; tools {} parameter or not? Thanks !!!!

    • @natevance3661
      @natevance3661 Месяц назад

      I'm wondering if this is possible / how to do this as well

    • @ArmaanSood-y9d
      @ArmaanSood-y9d Месяц назад

      @@natevance3661 I have found out about some crazy shit , trying to piece it all together but you gotta use make

    • @titimiti1984
      @titimiti1984 Месяц назад

      Did you figure out how to do that? Let me know if you do

    • @clairedubiel1
      @clairedubiel1 13 дней назад

      Please let me know as well!

  • @zhangxiang18
    @zhangxiang18 Месяц назад

    Thanks for the fantastic video and do I need to upgrad my twilio account to a full version to perform this function? I have set up everthing right based on the tutorial but no response from the AI even I spoke the first sentence. Alas..

  • @riley_blackwell
    @riley_blackwell Месяц назад +1

    This is great!

  • @RobertSpartacus
    @RobertSpartacus Месяц назад

    Twilio Folks,
    Is there any tutorila to use realtime api for outbound calls ? i.e - triggering a call & taking it forward

  • @HarborProjectB
    @HarborProjectB Месяц назад +1

    This is great. But I have been struggling with the ability to interrupt the AI when on a call with Twilio.

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Working on something for this! Stay tuned.

  • @mspicela
    @mspicela 3 дня назад

    Thanks again for the tutorial. What is needed to make it possible to interrupt the AI? I think Twilio may be buffering received audio from OpenAI that it finishes playing even when interrupted.
    I tried several changes to try to fix things. I wonder if the audio from OpenAI is sent to twilio that is buffering it. Then when it is interrupted, that is why it still keeps playing what it's already received. Is there a way to tell twilio to stop playing what had already been sent when an interruption is detected.
    The Web only implementations with webrtc handle interruptions immediately just like the ChatGPT official app. I know phone networks have a delay but this is more than that is seems to keep talking for many seconds.
    Thank you in advance.

    • @TwilioDevs
      @TwilioDevs  3 дня назад +1

      Hey hey! Check out this timestamp from our recent livestream where I helped Alex and Bianca add this (i'm the robot 😂). The timestamp starts at their first interaction with it where they see how the lack of interruptions impacts things and then we walk through how to add a version of interrupt to it: ruclips.net/video/_itrbiszfiE/видео.htmlfeature=shared&t=2843

    • @mspicela
      @mspicela 11 часов назад

      @@TwilioDevs Perfect and thank you! I watched the livestream recording and rebased my stuff on the newer version. It is working well now.
      What are you using to be a robot in the livestream?

    • @TwilioDevs
      @TwilioDevs  5 часов назад

      @@mspicela Total custom build inside of OBS (obsproject.com). It's a pile of PNG files, a waveform generator for the mouth, and some subtle motion effects.

    • @TwilioDevs
      @TwilioDevs  5 часов назад

      @@mspicela Also super glad you got it working! Let us know if there's anything else we can help with!

  • @jothamdudley4116
    @jothamdudley4116 Месяц назад +5

    got this working using my azure endpoint with some help from chatgpt!
    I did notice this example doesn't handle interruptions, will you be updating the repo with more features in the future?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +4

      that's awesome! thanks for giving it a try.
      We decided to leave interruptions out for this blog post/video because the code was already pretty long. We talked about doing follow-ups for things like interruptions and function calling. I'll check with the team and see what the plan is.

    • @gurumack
      @gurumack Месяц назад +1

      @@TwilioDevs to be honest, I'd really appreciate this - this is a huge part of what makes this tech so amazing. Any high level support on how to accomplish this, if it's even possible? thanks!

    • @limebulls
      @limebulls Месяц назад

      Great! Would you mind to share your code?

    • @jonasmassieAI
      @jonasmassieAI Месяц назад

      @@TwilioDevs looking for this also...

    • @ethanfossett5835
      @ethanfossett5835 Месяц назад

      @@TwilioDevs Also looking for this - even just the samples of the code would be great don't need a full video.

  • @cscrowley1
    @cscrowley1 Месяц назад +1

    Also, do you guys have any thoughts you would care to share on outbound calling?

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      What specifically are you looking for thoughts on?

  • @cyruszad
    @cyruszad Месяц назад +1

    This is going to really help you guys. I worked on this immediately when this was dropped but this setup has a weakness. Interruptions don’t work when you interrupt the agent in the middle of a larger audio playback (ask it to read an example paragraph) and then try to interrupt it in the middle - it won’t work. I tried messing with it but nothing worked.

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      We're working on it! I should have something to share this week.

    • @josephbesgen4729
      @josephbesgen4729 Месяц назад

      @@TwilioDevs Fantastic video! Just curious if you've uploaded anything regarding how to deal with interruptions

  • @60pluscrazy
    @60pluscrazy Месяц назад +1

    Thanks 🎉

  • @VibeTech311
    @VibeTech311 25 дней назад +1

    This was a great video. I am looking for a way to output the conversation both what was received and how it responded. Is that possible through the realtimeapi? Currently I can capture the response in text but I have not figured out how to capture what is said to it in text, via realtimeapi.
    Thanks again.

    • @TwilioDevs
      @TwilioDevs  24 дня назад +1

      I'll see if I can put something together for that. First up is the Python version of this tutorial which got delayed a little bit.

    • @TwilioDevs
      @TwilioDevs  24 дня назад +1

      So for clarity, you want the text of what the caller says to the AI?

    • @VibeTech311
      @VibeTech311 24 дня назад +1

      @@TwilioDevs yes, and thank you so much. I can get the text for the realtime api response, but the text for the caller is where I am struggling. I don’t know if realtime has a way, and I recently saw something in Twilio that could possibly help. But thank you again, I truly appreciate your response and consideration.

    • @TwilioDevs
      @TwilioDevs  24 дня назад +1

      No promises but I'll see what I can do. If not a video perhaps we can at least get you a code snippet.

    • @VibeTech311
      @VibeTech311 24 дня назад

      @@TwilioDevs you are amazing thank you 🙏

  • @dawid_dahl
    @dawid_dahl Месяц назад +2

    Can you show how we can integrate Function Calling as well?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +3

      That's a good idea for a follow-up video, thanks!

  • @mustaphaboutzoua8046
    @mustaphaboutzoua8046 Месяц назад +1

    "Thank you, Brent! Do I need a Twilio subscription for communication between two valid numbers? (The trial only provides one valid number.) When I try to make a call using the Twilio dev phone with the same number, I don't receive anything." it seems i need two numbers?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      You can add a verified number to test your app with your own phone during trial: help.twilio.com/articles/223180048-Adding-a-Verified-Phone-Number-or-Caller-ID-with-Twilio

  • @nexgenpcshop
    @nexgenpcshop Месяц назад +2

    for the interruption issue :
    you need to clear the twilio buffer and then send response.cancel

    • @johns332
      @johns332 Месяц назад +1

      Can you share how you implemented this? I tried sending the following commands when the response type is input_audio_buffer.speech_started:
      await openai_ws.send(json.dumps({"type": "response.cancel"}))
      await openai_ws.send(json.dumps({"type": "output_audio_buffer.clear"}))
      No dice though :( Your help here would be greatly appreciated!

    • @nexgenpcshop
      @nexgenpcshop Месяц назад

      @@johns332 Use this : case 'input_audio_buffer.speech_started':
      console.log('Speech Start:', response.type);
      twilioWs.send(
      JSON.stringify({
      streamSid: streamSid,
      event: 'clear',
      })
      );
      console.log('Cancelling AI speech from the server');
      const interruptMessage = {
      type: 'response.cancel'
      };
      openaiWs.send(JSON.stringify(interruptMessage));
      }

  • @aiplaygrounds
    @aiplaygrounds Месяц назад +1

    My next project ❤

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      Let us know how it goes!

  • @randotkatsenko5157
    @randotkatsenko5157 Месяц назад +2

    One thing I dont undrstand - how to make OpenAI speak first when it answers the call?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +4

      Right after the code sends the sessionUpdate object you can send something like this (feel free to modify the prompt):
      const event = {
      type: 'conversation.item.create',
      item: {
      type: 'message',
      role: 'user',
      content: [
      {
      type: 'input_text',
      text: 'Please greet the caller and say "hi there, how can i help you?"'
      }
      ]
      }
      };
      openAiWs.send(JSON.stringify(event));
      openAiWs.send(JSON.stringify({type: 'response.create'}));

    • @randotkatsenko5157
      @randotkatsenko5157 Месяц назад

      @@TwilioDevs Thank you very MUCH! I got the code, but still no access to realtime API. Hopefully soon! Thanks again. Twilio is good.

    • @Bangs_Theory
      @Bangs_Theory Месяц назад

      @@randotkatsenko5157 try livekit

  • @MohsinAli-x8r5r
    @MohsinAli-x8r5r Месяц назад

    How can we get access to Realtime API on Openai account (I have paid account already). I integrated code and added openai key but problem is that during call, it's started communicating and not listening to me (No two-way communication). Can someone help me out?

  • @QianliangHuang
    @QianliangHuang Месяц назад

    Very good video! When I was testing with Twilio's dev phone, I found an issue. We are unable to interrupt the conversation directly, like we can when using OpenAI Realtime. How should this problem be resolved?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      We're working on providing a solution for the interruptions. There's another comment in the comments here that has the right idea though.

    • @QianliangHuang
      @QianliangHuang Месяц назад

      @@TwilioDevs I have one more question. Can the program automatically end the call after the conversation is over?

    • @johns332
      @johns332 Месяц назад +1

      @@QianliangHuang Yes you can. You'll have to make a function which is called when the user starts saying bye or shows intention to end the call. Then have it close the openai websocket once called.

  • @natevance3661
    @natevance3661 Месяц назад

    Is there a way to connect this to a GPT assistant?

  • @RobertSpartacus
    @RobertSpartacus Месяц назад

    Any guide on how to add function calling ? Also can't we buy an Indian number rn ?

  • @ankitrawat7211
    @ankitrawat7211 28 дней назад

    How can I load my own trained models in this?

  • @Philosophicflix
    @Philosophicflix Месяц назад

    any replacement instead of ngrok? having issues with my terminal

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      There's a full list of alternatives here: github.com/anderspitman/awesome-tunneling

  • @EswaraNadh
    @EswaraNadh Месяц назад

    How to make OpenAI speak the function_call results? Like if the appointment is created successfully, then how to let the user know that the appointment is created successfully.

  • @mspicela
    @mspicela 5 дней назад

    Thank you for the tutorial. I built an AI phone agent/bot with this combined with function calling from OpenAI and it worked very well. Unfortunately, now I can no longer edit my phone numbers configuration -- "Voice configuration is unavailable for this phone number" -- but this isn't true because it lists my URL still and worked for days. To make things worse, the support spins and spins so I can't submit a trouble ticket.

    • @TwilioDevs
      @TwilioDevs  5 дней назад

      Hi! Thanks for watching and I'm happy you built this. Sorry you're having trouble though (both with the app, and support).
      If you go here: help.twilio.com/ and ask a question, see if anything there helps resolve this.
      If not, there's a section at the bottom asking "Is this helpful?" and you can hit the thumbs down which will prompt you to either log in to submit a ticket or click the link next to it to submit a ticket without logging in.
      Once you have a ticket number, I can try to help escalate (no promises but worth a try!).

    • @TwilioDevs
      @TwilioDevs  5 дней назад

      Hello Michael,
      Thank you for getting in touch with our Social Support Team. We sincerely apologize for the inconvenience caused.
      Could you please dm us the email address on file?

    • @mspicela
      @mspicela 4 дня назад

      @@TwilioDevs thank you for the reply. It's working now! I didn't do anything to change it but it resolved itself.

    • @TwilioDevs
      @TwilioDevs  4 дня назад

      Awesome news! That's much easier to triage 😀 Glad it's working again!

  • @AISlopForHumans
    @AISlopForHumans Месяц назад

    I'm so frustrated I'm literally at the last step. I got the twilio and openai API to work together and when I call the phone number it says please wait speak your AI agent brought to you by openai and twilio and then says okay you can speak and then hangs up. Can anyone help I have been using chat GPT and Claude and they're both making me run around in circles

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      The symptoms sound like an OpenAI Realtime API key issue. Seems like the call is hanging up at the point the OpenAI Realtime API should be getting connected. Are you getting any errors in the terminal?
      Please refer to the blog post or GitHub repo in the video description to make sure your code is 100% correct. You can also check on your API key's access at platform.openai.com

  • @KirkBell
    @KirkBell Месяц назад

    Will this work with changing the default voices accents to accents like Australian, English/UK and others?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      I believe I read that OpenAI will detect the regional accent and speak the responses in that accent. I think you can add that to the instructions (SYSTEM_MESSAGE) in the app to help reinforce the goal.

  • @ryanroman6589
    @ryanroman6589 Месяц назад +1

    running `twilio dev-phone` launches the dev phone but also updates the webhooks. anyone get this to work?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +2

      You need to use a different phone number than the one you are testing.

  • @gurumack
    @gurumack Месяц назад +4

    has anyone here figured out how to modify this code for interrupts?

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Working on this at the moment. Hopefully have an update yearly this coming week.

    • @gurumack
      @gurumack Месяц назад

      I was able to figure it out! thanks

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      @@gurumack Happy to hear it!

    • @exploretheworld1736
      @exploretheworld1736 Месяц назад

      @gurumack can you plz share it. How to handle intruptions.

  • @sarzzfish8420
    @sarzzfish8420 Месяц назад +1

    can i use it in danish, turkish or german?

    • @WaiZe0
      @WaiZe0 Месяц назад

      I struggled to make audio input detect for a specific language, even with whisper’s language parameter. Tell me if u were able to choose any other language.

  • @momoya8373
    @momoya8373 26 дней назад

    To avoid any confusion, it’s important to clearly state that even using the development phone incurs charges for both making and receiving calls(x2 charges), as some users might assume it’s free otherwise. Why not be clear?

    • @TwilioDevs
      @TwilioDevs  25 дней назад

      The Twilio Dev Phone documentation page states that it is using one of your own Twilio numbers to make the call. There's no intended deception here. I used the Dev Phone in the video as an option to not use my personal phone for the demo since it's easier to see the interaction and logs. It's just an option.

  • @wordpressobsessed9067
    @wordpressobsessed9067 Месяц назад +1

    So can we host this on Twilio serverless? If so, which file would we point the incoming call to? Also, it can be modified to greet the caller first, correct? I'm thinking for a business AI assistant to take calls, give information etc. I have created these AI apps with Vapi, but it gets pretty expensive. Twilio would be so much cheaper.

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      I think with the need for a persistent web socket connection you're probably going to be best served doing this outside of our serverless Twilio Functions. I can double check with the team though!
      As for greeting, you can definitely change the tags to customize the greeting from Twilio or I believe you could pre-prompt OpenAI with a text prompt using the Realtime API if you want the greeting to come from the assistant.

    • @wordpressobsessed9067
      @wordpressobsessed9067 Месяц назад

      @@TwilioDevs Thanks, I'll mess around with it some. Is that voice coming from AWS? I've never heard that voice, but its really good and would be terrific for most professional business applications. The latency is next to nothing, which has been the biggest hurdle it seems with these voice AI assistants. Good to see Twilio is now in the game!

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      @@wordpressobsessed9067 It's one of OpenAI's voices. I agree it's very natural sounding!

    • @0xb1sh0p8
      @0xb1sh0p8 Месяц назад +1

      @@TwilioDevs Correct you'll need a persistent ws listening for a unique stream for each number/assistant you're hosting.

    • @aiplaygrounds
      @aiplaygrounds Месяц назад

      You can probably run it through your crm before answering to get all the phone info if any.

  • @esek-2
    @esek-2 14 дней назад

    Is this still working? I got it to work some weeks ago, but strangely, it is not working anymore - When I call my Twilio Phone Number, in the nodejs output I get the event "input_audio_buffer.speech_started", and after I finished speaking, nothing happens, and the bot does not answer me.

    • @AbhishekMishra-db2tj
      @AbhishekMishra-db2tj 7 дней назад

      Hey, I am also facing the same problem, did you find anything to solve this?

    • @TwilioDevs
      @TwilioDevs  7 дней назад +1

      Should still be working, yes. We just were building again it on our livestream today and it was working.

    • @esek-2
      @esek-2 6 дней назад

      @@AbhishekMishra-db2tj Hey, somehow it does not properly detect when I finished speaking with my phone. When trying from a different phone, it worked. Not sure why that is the case.

  • @limebulls
    @limebulls Месяц назад

    Can you make a tutorial for this on azure as well?

  • @krloschavarriasauceda151
    @krloschavarriasauceda151 Месяц назад +1

    what theme of you vscode you have?

  • @RobertSpartacus
    @RobertSpartacus Месяц назад

    Is there a way to buy Indian numners on Twilio if not what is the workaround rn ?

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Hi Bharath,
      Thank you for getting in touch with our Social Support Team. Unfortunately, Twilio does not offer the ability to purchase Indian phone numbers directly. However, there are some workarounds and considerations you can explore.
      Kindly dm us for more information.

  • @muhammadatif9263
    @muhammadatif9263 Месяц назад

    What is the reason for using fastify over express?

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      The websocket module for fastify is nice to work with and fastify is more performant than Express for this use case.

  • @BrainCandyQuiz
    @BrainCandyQuiz Месяц назад +1

    Confused. Instructions say "Step 2: Get your Account Sid and Auth Token from the Twilio Console to get started.", but nowhere does it say what do with them. Also call connects ago, but it can't seem to hear me, then disconnected after 5 seconds. Related? Connected to the OpenAI Realtime API
    Sending session update: {"type":"session.update","session":{"turn_detection":{"type":"server_vad"},"input_audio_format":"g711_ulaw","output_audio_format":"g711_ulaw","voice":"alloy","instructions":"You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested about and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling - subtly. Always stay positive, but work in a joke when appropriate.","modalities":["text","audio"],"temperature":0.8}}
    Disconnected from the OpenAI Realtime API

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      If the call is working at all, the Twilio side of this is fine which means you're okay on the Twilio credentials front. This looks like it's not getting audio over to the OpenAI API. There are some more logging types you can enable with the code in the blog post. Can you try turning those on and see what you get in the terminal?

  • @radoslav07
    @radoslav07 Месяц назад

    If I want to use this example without twillio call, but directly from my mic and web page

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      You'll need to stream audio from your local microphone to the OpenAI websocket.

  • @SathishM-n8i
    @SathishM-n8i Месяц назад

    this is for Incoming Call right what about outgoing call

  • @ompawaskar507
    @ompawaskar507 Месяц назад

    Does this work with gemini 1.5 flash??

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      This tutorial is specifically for the OpenAI Realtime API.

  • @WaiZe0
    @WaiZe0 Месяц назад

    How can i set input language to something other that English?

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      You can change the system prompt to indicate the language you want to use. It will also usually match whatever language you speak to it.

    • @WaiZe0
      @WaiZe0 Месяц назад

      @@TwilioDevs I’ve created a twilio program before but using the gather method i was able to choose the language, but with openai realtime api i tried their language parameter for whisper-1 and it doesn’t work.
      And sadly the current state of auto detection is 75% flawed in my tests.

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      @@WaiZe0 At 03:02 we set up a system prompt. You can tell it what language you'd like for it to use in that prompt (and also tell it how to greet the caller, etc.). From my testing it has obeyed that quite well. I told it to converse only in Spanish and I wasn't able to get it to break out of that even by insisting I only knew English.

    • @WaiZe0
      @WaiZe0 29 дней назад

      @@TwilioDevs I noticed it works well in English and Spanish, but im working with Arabic and it gets it only 1/10 times even with the clearest system prompt. Is there a way to set language like Twilio’s gather method?

  • @MrDonald911
    @MrDonald911 Месяц назад +1

    do we have a python version of this ?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +2

      We're working on a python video. We have a blog post for now: www.twilio.com/en-us/blog/voice-ai-assistant-openai-realtime-api-python

    • @MrDonald911
      @MrDonald911 Месяц назад

      @@TwilioDevs Awesome thank you so much !

    • @TwilioDevs
      @TwilioDevs  22 дня назад

      Sorry this took so long!
      ruclips.net/video/OVguB1h-eTs/видео.html

  • @sfsadfsadfasdf
    @sfsadfsadfasdf Месяц назад +1

    This is the future.. the problem is that the OpenAI's voices in spanish doesn't sounds very well.. they sounds with like an american accent, is there a way to integrate this voice, not using GPT's voice but using elevenlabs without losing the realtime benefit of twilio-openai?

    • @mandrews817
      @mandrews817 Месяц назад +3

      If you use advanced mode, switch your system language to Spanish, open a new conversation, and tell the assistant: "can you speak to me using a Castillian Spanish accent?"

    • @boytenesee3494
      @boytenesee3494 Месяц назад

      The realtime API allows either speech or text response - you can send the respond to 11labs and then push back into twilio after

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Have you tried the options provided by the other commenters yet? Would love to help you find success.

    • @sfsadfsadfasdf
      @sfsadfsadfasdf Месяц назад

      @@mandrews817 But the advance mode is available in the API?, or you are talking about the voice assistant that OpenAI is currently launching?, if its the first thing, could you please tell me where i can read more about.. i have never heard about advance mode in the API speech to text

    • @sfsadfsadfasdf
      @sfsadfsadfasdf Месяц назад +1

      @@boytenesee3494 Will try this, maybe it will delay the responses a little bit but i think it wouldnt be very noticeable, i will give it a try, thank you for the idea.

  • @nixoncode
    @nixoncode Месяц назад +1

    Somewhat helpful, but why would you want this?

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Probably lots of use cases. This example is very basic but imagine an assistant that replaces the typical phone tree at a company with something that speaks naturally to them, can answer some questions they may have, and ultimately can redirect the call to an actual human if it detects it needs to.

  • @EDashMan
    @EDashMan Месяц назад +1

    Is the speed really this fast?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +4

      Yes! The phone calls shown are not sped up or edited 😃

    • @0xb1sh0p8
      @0xb1sh0p8 Месяц назад +2

      I can vouch for the speed. I'm just wrapping up development on a project that uses this flow along with some other options for generating assistants.

    • @EDashMan
      @EDashMan Месяц назад

      @@0xb1sh0p8 how do you know if you have access to the api, other than a server 403 error I’m not getting an exact messaging regarding the api.. do you have it available in the playground ?

    • @0xb1sh0p8
      @0xb1sh0p8 Месяц назад

      @@EDashMan I don't have anything public right now. When you signup with twilio, you'll create an account. When you go to that account's dashboard and scroll down, it will show you your SID and Auth Token to access the API

    • @0xb1sh0p8
      @0xb1sh0p8 Месяц назад

      @@EDashMan hmm, did my last comment get deleted? You'll have access to the api when you sign up and create and account. At the bottom of the account page you'll see your SID and AuthToken to use.

  • @akelebelay1025
    @akelebelay1025 Месяц назад +1

    can you do it using python?

    • @TwilioDevs
      @TwilioDevs  Месяц назад +1

      Yes! Should we make a Python video tutorial?

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      For now, here's a blog post: www.twilio.com/en-us/blog/voice-ai-assistant-openai-realtime-api-python

    • @TwilioDevs
      @TwilioDevs  22 дня назад

      Sorry for the delay!
      ruclips.net/video/OVguB1h-eTs/видео.html

  • @musumo1908
    @musumo1908 24 дня назад

    Using tools and azure realtime endpoint

  • @johns332
    @johns332 Месяц назад +1

    Anyone else getting 403 errors?

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      From the video description:
      "OpenAI is rolling out Realtime API access incrementally. Please watch their site for updates."
      This is likely due to this.

    • @johns332
      @johns332 Месяц назад

      Darn, thanks for the video and response though!​@TwilioDevs

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      @@johns332 Thank you for watching 😃 Let us know when you get access. Happy building!

  • @nexgenpcshop
    @nexgenpcshop Месяц назад

    Thre is an issue on the quality of the answer, especially when dealing with local dialects. While he can somewhat handle English (not Good), it struggles significantly with dialects like Darija or other regional languages. The difference in transcription accuracy between the current implementation and the OpenAI Playground is very noticeable.

  • @TwilioDevs
    @TwilioDevs  20 дней назад

    Would you prefer to see this tutorial in Python? Check it out here: ruclips.net/video/OVguB1h-eTs/видео.html

  • @mohamudalifarah7722
    @mohamudalifarah7722 Месяц назад +1

    Node.js 18+

    • @TwilioDevs
      @TwilioDevs  Месяц назад

      Correct, version 18 or higher. Not sure why I said 18+ like it was an age or something 🤣

  • @cscrowley1
    @cscrowley1 Месяц назад

    OAI dashboard billing limits says I do have access "Realtime
    gpt-4o-realtime-preview 20,000 TPM 5,000 RPM
    gpt-4o-realtime-preview-2024-10-01 20,000 TPM 5,000 RPM"
    But I can only hear the clunky Twilio TTS at the beginning of the call and do not get connected. Also DTMF button press seems to end the session: "Server is listening on port 5050, Client connected
    Received non-media event: connected
    Incoming stream has started MZcbf17dca62564c8a46602ce815cd43bd
    Connected to the OpenAI Realtime API
    Sending session update: {"type":"session.update","session":{"turn_detection":{"type":"server_vad"},"input_audio_format":"g711_ulaw","output_audio_format":"g711_ulaw","voice":"alloy","instructions":"You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested about and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling - subtly. Always stay positive, but work in a joke when appropriate.","modalities":["text","audio"],"temperature":0.8}}
    Received non-media event: dtmf
    Disconnected from the OpenAI Realtime API
    Received non-media event: stop
    Client disconnected."