How to Build Your Own AI Phone Assistant for Just 1¢/Minute (No Cloud, 1 Second Latency)

Поделиться
HTML-код
  • Опубликовано: 24 дек 2024

Комментарии • 92

  • @BartSlodyczka
    @BartSlodyczka  Месяц назад

    📺 WATCH PART 2 - AI Cold Caller With Google Calendar: ruclips.net/video/J3d92Ak-P7o/видео.html
    👉 GET THE CODE FOR FREE: bartslodyczka.gumroad.com/l/zsjdn
    🛠 Hire me to build out an EPIC AI Voice Assistant for you: bart@supportlaunchpad.com
    🧠 If you are interested in joining my incubator please fill out this form: forms.gle/KJxiqhB3aWxbgGoh8
    📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7

    • @JhnyBravos
      @JhnyBravos 14 дней назад

      Don’t download the free code; it doesn’t work. Don’t anger yourself.

  • @mmdls602
    @mmdls602 29 дней назад +2

    Works flawlessly. The peeps mentioning latency -- its most likely your connection. I have consistently achieved sub 1 second, almost realtime performance with this. Nicely done dude. Function calling would be neat; especially crud ops with a db

  • @malikjaid5163
    @malikjaid5163 4 дня назад

    Amazing video
    I have one question, why are we using replit, can we deploy it on own servers like ec2 , and what things we need to change if done so.. thankyou

  • @arnabing
    @arnabing 24 дня назад

    This is amazing work! How does this compare in intelligence of the OpenaAI realtime api?

    • @BartSlodyczka
      @BartSlodyczka  24 дня назад

      Realtime API is MUCH better and if you can afford it, I would use that. The main reason is because the backend of the realtime api is a built in thread so you’re having a conversation with an “agent” - whereas in this set up we’re sending calls to the completions endpoint along with the entire conversation history. So it’s still very good, but inherently it is not an “agent” (so to speak). For basic calls/ tasks this current set up works great :)

    • @arnabing
      @arnabing 23 дня назад

      @ appreciate that! Also there’s the conversion delay. I wish the realtime was cheaper and had other voices.

  • @gurindersingh1713
    @gurindersingh1713 Месяц назад

    Yes really wanna see function calling like book appointments and transfer calls. Btw isn't it easier to do with livekit?

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Good suggestions, will pencil them in 💪 have never used live kit before will check it out :)

    • @gurindersingh1713
      @gurindersingh1713 Месяц назад

      @@BartSlodyczka bro you can handle alot with livekit more easily. make sure you check it out. you will thank later, thats how good it is

  • @wordpressobsessed9067
    @wordpressobsessed9067 Месяц назад

    Thanks for this video! I've been meaning to set this up with the real time Twilio API, but just haven't gotten to it yet. Been using Vapi but its so expensive. i would like to see how to transfer a call to a real person, or actually book an appointment in a Google calendar. Definitely Eleven Labs integration too!

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Great suggestions, google calendar keeps coming up so I will also look into this :)

  • @radoslav07
    @radoslav07 Месяц назад

    Can you interrupt current voice response? Or can you try to finish your thought if you didn’t manage to say it in full and the agent started voice response? Like saying “continue” which will interrupt the response keeping the previous input prompt and allowing you to properly finish input prompt.
    I implemented this Command words using Microsoft Azure speech services with continuous voice recognition.
    +1 for adding function calling

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      You can do interruptions and toward the end of my video in the final demo I interrupt and continue speaking about the same topic, and the response was in line with what I was saying. The mechanism that sends API calls to the GPT actually holds all conversation items (user message and agent response) and sends the entire history with each api call, so each response is always contextually correct. I don't know how efficient this process is, but it works for now. And haven't thought about commands just yet, but good idea! And noted on function calling 🙏

  • @brentpope1497
    @brentpope1497 29 дней назад

    Yes 11 Labs, definitely!
    Also, would love to see how you would implement a script rather than a faq

    • @BartSlodyczka
      @BartSlodyczka  29 дней назад

      Script is a solid idea, will do more thinking about this :)

  • @bradleyfraser4026
    @bradleyfraser4026 Месяц назад

    I would like to see more the infrastructure side. How to have a small call centre structure

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Very interesting suggestion! I will do more thinking about this 💪

  • @aliabassi1
    @aliabassi1 Месяц назад +1

    Solid build man amazing job!

  • @danielpistola
    @danielpistola Месяц назад

    why not use openai's realtime API? just because of the voices, right? please pardon my ignorance

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад +2

      I’ve got other videos showing how to do that too 💪 but the realtime api is currently like 30 cents per minute to run, and since it’s still in beta it has some stability issues. But realtime api is very fast and I’m sure all the kinks will be ironed out soon :) great question to ask legend

    • @danielpistola
      @danielpistola 29 дней назад

      @@BartSlodyczka

  • @emmanuelkolawole6720
    @emmanuelkolawole6720 Месяц назад

    When I interrupt, the agent stops talking. Is there some kind of bug? I think it has to do with speaker. When I put my phone call on speaker the agent does not reply with audio after the third or fourth interaction. But when I take the phone off speaker it works fine

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Hmm, that is strange. When I demo'd the interaction on youtube I had it on speaker and I had multiple conversation turns (so I spoke many times and the ai replied many times). Not really sure what it could be 🙏

  • @matt.lehodey
    @matt.lehodey Месяц назад

    Need to figure out how to make that reasoner model that formulates the text think on graph now hmm

  • @neozys
    @neozys 26 дней назад

    great it works! Can you expand on implementing function calling and eleven labs or cartesia as an alternative for TTS

    • @BartSlodyczka
      @BartSlodyczka  24 дня назад

      Awesome! And done will pencil it in 💪

  • @cryptnyuz6842
    @cryptnyuz6842 Месяц назад

    can this ai agent can also speaks in different languages or just restricted to english only ?

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад +1

      Haven't tested but should be able to speak in different languages!

    • @mmdls602
      @mmdls602 29 дней назад

      @@BartSlodyczka Tried it; doesn't come out as good as chatgpt, but it definitely works. I just added a line "you can understand and reply in Punjabi" in the prompt haha. The bottleneck in this pipeline is Deepgram's transcription.

  • @KasanThe
    @KasanThe Месяц назад

    hmm what about using gsm modem for calling - AT commands and you are in home. or use voip gateway. Second thought i was thinking about building same purpose app but my main goals are be independent - selfhosted and do it as 'realistic ' as possible with low latency. Using external api it is to easy, building whole from scratch is a good challange to get to know with whole llm - ai -stuff.

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      I have heard of people using a local LLM to run the backend and it is possible, fast, and cheap if you did it this way. I haven't looked into this yet but there may be other videos about this online already. As for calling with GSM modem or VOIP, great ideas!

  • @reider340
    @reider340 Месяц назад

    Hello Bart,
    If you were to use deepgram's TTS Streaming service instead of plain REST api calls, wouldn't the response time be faster?

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Hey legend, yes you're 100% correct, would be even faster than standard REST api calls. I think using elevenlabs streaming would be faster yet again. So really, there is so much opportunity in this code to have a really fast, really cheap AI Caller 💪

  • @danielpistola
    @danielpistola Месяц назад

    can we do this connecting it to a custom GPT?

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Yes you can, but this will be slightly more unstable as the assistants api is in beta (and there are like 5 or 6 api calls per request)

    • @danielpistola
      @danielpistola 29 дней назад

      @@BartSlodyczka That makes sense. Thanks a lot for taking the time to respond!

  • @wawaldekidsfun4850
    @wawaldekidsfun4850 Месяц назад +2

    Cool tech demo, but let's think twice about automating every customer interaction just because we can. Sure, AI phone systems are cheaper than human staff, but real human connection in customer service is priceless. Personal relationships, genuine empathy, and human judgment are what build lasting customer loyalty. Maybe instead of replacing humans, we should use AI to help them do their jobs better? Sometimes the 'old way' with real people is still the best way, even if it costs more than 1¢ per minute. 🤔 Great tutorial though - the technical implementation is impressive!

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад +1

      Thank you and excellent point, for pretty much my entire journey with ai I have this assumption/ belief that initially businesses will adopt ai to save costs and have faster experiences, but then when everyone uses ai, the question will become “what is actually a good support experience?” And for that I think businesses will revert back to human support. It might not be 100% human, but maybe 50/50 with ai and human. Either way, I still use a 100% human customer support team for my ecommerce brand, but I do give my agents ai tools and augment other parts of our support experience with ai (eg ai chatbot, ai search on our help desk). I agree the tech is cool but we should use it wisely 💪 love the comment, I always want to see this kind of discussion 🤝

    • @ColdCallSteve
      @ColdCallSteve Месяц назад

      I couldn’t find your video where you layout how to use Ai on how to help real humans do their jobs. Any help?

    • @danielpistola
      @danielpistola Месяц назад

      What about the MANY times customer service doesn't give a damn about their job and treat customers as if they were asking for a favor. What about the long waiting times? What about the lack of good manners?

    • @reserseAI
      @reserseAI Месяц назад

      Its priceless when employing “customer service” not lazy employees

    • @VijiJohn-w3p
      @VijiJohn-w3p 27 дней назад

      It's the pareto 80/20 rule. 80% of CS requests are easily manageable and answerable through the various channels (bots, agents, knowledge base etc). It's then augmenting this with the human experience for the 20% of more involved requests of support and service.

  • @solarexclusivePL
    @solarexclusivePL Месяц назад

    Hello Bart! Do you think its possible to create something like this for polish market? But without using Twilio cause their rates are crazy

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Siema! I'm not sure what Twilio alternatives work in Poland but you should be able to forward calls from the provider to the Replit code :) And I'm pretty sure you can also change the language to polish - so then you'd have a mega AI Caller 💪

  • @emmanuelkolawole6720
    @emmanuelkolawole6720 Месяц назад

    Outbound agent please? In a way that we can schedule multiple calls one after another, to different customers

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Great suggestion, will pencil it in!

  • @zubairkhankharooti3621
    @zubairkhankharooti3621 Месяц назад

    hi bart... First of all thankew... Secondly... are you going to extend this video... like adding functions/tools..... that's the main purpose of building these callers.....

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад +1

      Hey legend! Yeah I will make a part 2 video with function calling 💪

    • @zubairkhankharooti3621
      @zubairkhankharooti3621 Месяц назад

      @@BartSlodyczka thanks legends Chief...

  • @robertfigueroa425
    @robertfigueroa425 29 дней назад

    thank you so much.amazing video.i look forward to your other videos. im looking to create super reliable appointment booking ai assistants.i would definitely apppreciate a video on that subject.thank you.

  • @asithakoralage628
    @asithakoralage628 Месяц назад

    You’re a legend mate,, great work. I’m learning a lot from your videos.. thanks mate.

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Thank you very much 🤝 keep going man 🚀🚀

  • @mastermason
    @mastermason Месяц назад +2

    Awesome! Thank you for sharing this. I have big plans for you.

  • @TheSopk
    @TheSopk Месяц назад

    Thanks, what about Deepgram Voice Agent API Real Time?

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      Haven't thought about this before! Nice suggestion 💪

  • @cb4623
    @cb4623 Месяц назад

    Function calling booking appoinments

  • @mikew2883
    @mikew2883 Месяц назад

    Very cool stuff! Function call would be nice to see. 👍

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад +1

      Thank you and done will pencil this in 💪

  • @vladimirrumyantsev7445
    @vladimirrumyantsev7445 Месяц назад

    Very nice explanation, love, watching your videos 👍

  • @erickmarin228
    @erickmarin228 Месяц назад

    Awesome! Thanks for sharing. I will definitely give it a try

  • @victorvanvas
    @victorvanvas Месяц назад

    FIRE CONTENT AS USUAL

  • @digitalsoultech
    @digitalsoultech Месяц назад

    Sorry but how is this 1c per minute? I'd really love to know how you came to that conclusion

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад

      I calculated the number or transcription minutes (STT) along with the characters spoken (TTS) via deepgram, then I compared this to the total cost spent via deepgram. This came to ~0.89 Cents (so under 1 Cent). From there I looked at OpenAI API Usage for the same period, which was negligible. So then I decided to just say it was 1 cent total. Hope this makes sense 💪

  • @Scienceiscool355
    @Scienceiscool355 4 дня назад

    Eleven labs plz

  • @micbab-vg2mu
    @micbab-vg2mu Месяц назад

    thanks:)

  • @sanjuburkule
    @sanjuburkule Месяц назад

    this is 2s latency. didn't work.

    • @BartSlodyczka
      @BartSlodyczka  Месяц назад +1

      Can be even faster with streaming api for deep gram TTS and even faster with streaming TTS elevenlabs

    • @zubairkhankharooti3621
      @zubairkhankharooti3621 Месяц назад +1

      The problem is in sanju not in the app..

    • @sanjuburkule
      @sanjuburkule 28 дней назад

      @zubairkhankharooti3621 You try it. Let me know if you are able to get 1s latency. Text to speech and speech to text WITH interruption support from India did not work. But I do want it to work. I will retry and post my findings. If it works, then awesomeness 👌

    • @sanjuburkule
      @sanjuburkule 28 дней назад

      @mmdls602 mentioned he tried it and it worked for him. Let me find the fault in my deployment.

  • @Dispo-co4po
    @Dispo-co4po Месяц назад

    🔥🔥🔥🔥🔥🔥🔥

  • @magicaldocs
    @magicaldocs 17 дней назад

    But this definitely has HORRIBLE turn taking, emotion detection and latency ..
    Or Am i wrong ? Thats what the secret sauce of Retell, Vapi is :)

    • @BartSlodyczka
      @BartSlodyczka  17 дней назад

      Yeah the value prop here is the 1 cent per minute cost, and I agree that other purpose built tools like Retell and Vapi are better at the backend operations of AI calling systems 💪