a 1hr voice convo with AI (VapiAI)

Benyam Ephrem

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 17 окт 2024

Комментарии • 78

@bephrem 7 месяцев назад ⁺²
featuring: Blender Bottle, overheating camera, & VapiAI (vapi.ai/ ) - play at 1.75x - 2x speed for convenience.
@justcallmebrian793 6 месяцев назад ⁺¹
Will your channel be dedicated to AI and other topics pertaining to technology or something else? I like the way you are covering this
@bephrem 6 месяцев назад ⁺¹
@@justcallmebrian793 will be tech focused with other random vids - basically a reflection of my current interests (which may shift & change, but since I’m an engineer likely tech)
@carldomond166 7 месяцев назад ⁺³
This is great, thank you for your thorough explanation!
@bephrem 7 месяцев назад
sure! I couldn't really find any candid long-form demo videos (30min - 1hr+), so just wanted to make one.
@adamknowthyself 5 месяцев назад ⁺¹
ok: this is brilliant on many levels. its simply an example of human-ai interaction using a general LLM / w a more humane conventional AI. this is a great DeMO. love the natural setup and attempt to make it natural as would a real conversation with a person only this case a human Wiki.
@bephrem 5 месяцев назад
thanks haha
@noahdeveloper3836 6 месяцев назад ⁺¹
Appreciate the technical intro on this. Thinking of using Vapi in my stack for a consumer facing product i'm working on.
@bephrem 6 месяцев назад
cool cool
@AndreasLepsuch 5 месяцев назад ⁺¹
The Jarvis-like presentation at 35:00 was pretty cool !!!
@MrArdytube 6 месяцев назад ⁺²
Thoughts on the impact of this tech compared to the internet. I think people forget how long it took to build out and integrate internet technology. It all started with very slow phone modems, and email addresses almost no one had… no on line banking, no internet games, no text messaging, no youtube, no netflix. it took about a decade to start seeing the internet start becoming useful. in that context, it will take a while for these new technologies to penetrate and start impacting jobs. The entire conversational AI seems less than a year old… heck, less than 6 months old.
@bephrem 6 месяцев назад ⁺¹
Right - well I was just making filler conversation to keep the convo moving. But yes it will take time, entrepreneurs will have to create the ventures, investors will have to fund them, markets will have to adopt them, programmers program them, etc etc Good point.
@MrArdytube 6 месяцев назад ⁺²
@@bephrem nice video though… even the clumsy parts illustrate the current nascent reality of this technology
@MrJimmySalsa 5 месяцев назад ⁺¹
Overall, I thought it was a great video. Even when the AI cuts you off, sounds like a natural conversation. I don’t like along pause when someone’s finished talking because it sounds like there or you got disconnected. I have a small business to send outbound calls.
Do you charge to set this up for others?
@bephrem 5 месяцев назад
There's a good bit of no-code solutions you can go with to avoid consultant fees. Feel free to dm me on Twitter.
I'm also working on a site w/ guides to common voice workflows + recommended platforms so you can build on your own.
@MrArdytube 6 месяцев назад ⁺¹
This tech could create a brilliant customer service phone line
@bephrem 6 месяцев назад
indeed
@dhruvreddy4626 6 месяцев назад ⁺¹
Awesome video man
@bephrem 6 месяцев назад
thx haha - very random vid
@No_bread-and-circuses 6 месяцев назад ⁺³
AI never let you finish your thoughts if you paused for a second, which forced you to keep talking with no period, just so she didn't cut in on you. It needs some work to be able to tell when we are done with a thought, verses, just pausing to gather our words.
@bephrem 6 месяцев назад ⁺²
Right - I found myself speaking without break at times to keep things smooth, but I did find that it handled pausing (maybe off-camera) as well. Once you're aware of its limitations you do find yourself trying to "help it out", which in turn leads to unnatural conversation patterns, etc
@runepanix1 6 месяцев назад ⁺²
I'd probably tell it to not respond until I say OVER or some other word, but then it's a little awkward
@No_bread-and-circuses 6 месяцев назад
@@runepanix1 That's a good idea.
Chat GPT has the same problem when you use the voice option. That's why I hold down the button so it has to wait until I'm done.
@bephrem 6 месяцев назад
@@runepanix1 well you’d want it to just know not to interject, having a stop/completion word ruins the point of a freeform conversation interface
@devirickwatsonscreentalkme8084 3 месяца назад
Dude chill. It’s new. This is the worst it’s gonna be. Relax….🥴🤦🏾‍♂️
@MrArdytube 6 месяцев назад ⁺²
I enjoy using “old” text based models like claude. As tedious as that interface is …. Paradoxically i does allow space to better compose thoughts…. Like the difference in content between a letter and a text message
@harchitb 4 месяца назад
this is great training data
@thesystemera 7 месяцев назад ⁺¹
Damn my algo knows me. Hold my beer. Got something to show you all soon. But yeah. Impressive!
@bephrem 7 месяцев назад
excited to see it!
@orebusaiku 5 месяцев назад
Can you try this with Sinderin and VoiceOS?
@bephrem 5 месяцев назад
yes, I set up a playground where you can try different services: vocalized.dev/playground - will add Sindarin soon
@orebusaiku 5 месяцев назад
@@bephremThat’s an awesome project! “VoiceOS io” and “milis ai” could be on the roadmap too. I’d love to compare all of them side by side.
@ultimape 7 месяцев назад ⁺¹
You can do this locally with open source tools and its way easier to get more performant (low latency) responses and manage interruptions and other advanced stuff that you simply can't do with some remote API.
@bephrem 7 месяцев назад ⁺¹
I had mention of this at 38:26 - could you go more into how you deploy your solution & the latency there? I've seen many demos locally w/ low latency, but once you deploy this and have international usage machine positioning becomes important. I do agree on the later idea that I wonder if these 50-100ms sensitive edge-cases can be solved w/ a remote solution, it really pushes optimization all the way down to the physical link layer (which may be the hard limit here, cannot change the highway bits are shipped along).
@ultimape 7 месяцев назад
@@bephrem most of it is that in managing your own local LLMs and speech stuff lets you get data pipelining to minimize round trip issues. Most of the existing AI cloud APIs are set up to handle bulk workloads and struggle on streaming.
Having low level access we can start the local LLMs output to start being TTS'd almost as soon as text tokens are being dumped out because we don't have to worry about bundling everything up and shipping it to a server and sitting around waiting for the wav file to download. Right now we just have a bunch of basic sockets on one workstation but are planning to use zeromq or something similar to get networked stuff going.
A lot of the stuff I am seeing is smoke and mirrors to hide latency issues. Like the AI agents saying "got it" etc seem like pre-scripted responses to trick your mind into thinking it has started talking.
The local one we got running doesn't need to do that and it's still fast enough to manage conversations.
@callanfaulkner7104 7 месяцев назад
@@ultimape I have a pretty big audience in the real estate investing space looking to bring conversational AI to them, any chance we could chat? ☺️
@ultimape 7 месяцев назад
@@bephrem a hybrid solution might be viable. Having a smaller faster agent facilitate talking with a more heavyweight one in the cloud is something we a re looking at.
Like we can't get Grok to run locally really, but having a mixtral agent say something like "fun idea, let me think about that for a moment" and behind the scenes sending off a more complex query to a cloud AI when it needs to seems practical.
Ideally they use a shared RAG or something so it doesn't end up having amnesia.
@bephrem 7 месяцев назад
@@ultimape Right - I've considered the hybrid approach. It opens up a lot of complexity in picking what does what, & as you mention sync'ing state - makes me question if that complexity is necessary, but it seems the latency necessitates it? Not sure, but good ideas.
@morphykg1503 6 месяцев назад ⁺¹
I wonder if AI can move beyond sounding like PR-speak. It speaks a whole bunch of words that don’t mean much.
@bephrem 6 месяцев назад
it soon will
@xAgentVFX 7 месяцев назад ⁺³
Should've done it with Pi Ai
@bephrem 7 месяцев назад ⁺²
good idea 💯
@xAgentVFX 7 месяцев назад ⁺²
@@bephrem It's conversational ability is the best I've come across so far
@tencizinec9583 6 месяцев назад ⁺¹
PI ai doesn't have an API.
@MrArdytube 6 месяцев назад
Interestingly, the presence of a nubile young voice ,,, ultimately just illustrates the inherently hollow nature of the interaction by unrealistically raising anticipated expectations … like an artificial sex doll… the sexier it is only increases the sense of unfulfilled potential
@bephrem 6 месяцев назад
interesting, indeed
@thekittenfreakify 7 месяцев назад ⁺²
All I hear is none answers.
@bephrem 7 месяцев назад ⁺²
Right - the LLM sort of parrots back what was said. My goal was more to just have a long sample video for developers to see latency, user interjection handling, agent interruption prevention, backchanneling, etc. I knew I wouldn't have a very intellectually novel or stimulating conversation.
@thekittenfreakify 7 месяцев назад
I see. How did it do. Seemed ok or adequate to me but I am curious what your data shows.
@bephrem 7 месяцев назад
@@thekittenfreakify Adequate enough for most commercial use-cases. Not really there yet for fluent conversation (both mechanically w/ latency + LLM knowing to say the "human" thing).
@frostymoments 7 месяцев назад ⁺²
My kingdom for a speech interface that properly knows when I'm talking and when I'm done talking. ChatGPTs voice interface in the app has some of the same issues, but it will give much more substantial replies - without trying to be cute.
@bephrem 7 месяцев назад
thanks for sharing!
@WillyJunior 6 месяцев назад ⁺¹
Not as good as chatting with the latest version of GPT.
@bephrem 6 месяцев назад
The LLM in this video was ChatGPT 3.5
@moderncontemplative 6 месяцев назад
I haven’t found any voice chat as sophisticated as talking to GPT 4 or 3.5 for free, which I believe is using their whisper AI model to generate the realistic voices, but they only have five options, currently.
@frinkfronk9198 7 месяцев назад ⁺³
i dont mean to be a be negative nancy but..with this many fails this video should not have been posted..or you could have just cut them..i dig your style but it's infuriating that it stops and starts so many times. like i feel like rage quitting the video. no disrespect to you. thank you for the content. i subscribed and look forward to the next video with a different camera!
@vladonutueu 7 месяцев назад ⁺¹
I think the purpose of the video is exactly that, to show both the innovation and limitations of conversational AI
@bephrem 7 месяцев назад
Right - so the main camera was a Sony A7IV which is known for it's overheating issues. I didn't know there was a specific heat shut-off setting you had to flip to high tolerance (defaults to a low tolerance) since it was my 2nd time using the camera & I had never used it for filming longer than a minute. This was filmed on a Friday & edited all Saturday, I thought it better to release something finished than to never find the time to try again. The demo also times out within ~15 minutes (I think) so there's a cap there as well.
For the purposes of the video, my aim was achieved within ~15 minutes. The LLM won't recall that far back (I imagine) so the most relevant thing was showing the request handling performance without holding back. This video doesn't have to be watched all the way through.
And right - the style is to have the video totally uncut & candid. It was just either I don't reshoot & release this or I release it as we see it today!
And thanks!
@bephrem 7 месяцев назад
@@vladonutueu Right - I think @frinkfronk9198 's point is more about the technical difficulties over the video contents, but addressed above.
@modernsituations 7 месяцев назад
Talk to me when the personality comes through. Worst date ever.
@bephrem 7 месяцев назад
The lack of personality comes from the LLM (ChatGPT 3.5 in this case), not Vapi's orchestration of services.
@jamil8lila 6 месяцев назад
@@bephrem It would be really interesting to do a paid version of this using GPT-4 - I'm hearing the simplistic, "pr-speakishness of ChatGPT 3.5" with most of her flat, emotionless answers , the difference between 3.5 and 4 is glaring in this regard. The latency is super impressive though.
@Mlowe89 6 месяцев назад ⁺¹
Worst podcast guest ever
@bephrem 6 месяцев назад
for sure
@Mlowe89 6 месяцев назад ⁺²
@@bephrem still super interesting though 👌🏻
@__J____ff 7 месяцев назад ⁺¹
6:59 both you and AI fucked up.
@bephrem 7 месяцев назад
what happened? heh
@__J____ff 7 месяцев назад
@@bephrem lady misunderstood your question, then you missed her point, conversation changed direction man xD
@bephrem 7 месяцев назад
@@__J____ff ah I see
@noahdeveloper3836 6 месяцев назад ⁺¹
I think that was just hallucination, it made sense the keep the convo going although it could've been redirected.

Следующие

Автовоспроизведение

Mo Gawdat on AI: The Future of AI and How It Will Shape Our World