How to Connect OpenAI’s Realtime API to a Knowledge Base

Mark Kashef

Просмотров 3,3 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 янв 2025

Комментарии • 41

@Miararoy37 2 месяца назад ⁺³
Hey Mark, I'm the Engineering Manager building Pinecone Assistant. I have to say that this video is beyond amazing and I learned a lot. We are working on exciting features for Pinecone Assistant that I hope will even empower cutting edge applications like the one you built here. Stay awesome! 🤙
@Mark_Kashef 2 месяца назад
Pleasure to e-meet you!
Appreciate your amazing feedback and pumped to see what else you guys have in store 🦾
@yassinerabaoui4219 25 дней назад ⁺¹
An engineer here, great work thank you!
I had the same idea and tried to connect the realtime API to a custom openAI assistant to get the search functionality but run into a lot of issues and i think the way you have done it is better. But it would be great if you can also help with connecting the realtime api to a custom gpt so it can also use code interpreter as a custom function calling maybe! Too ambitious ? Haha anyway great work!
@Mark_Kashef 25 дней назад
Hahaha no it’s ambitious but a really solid goal - I think technically you’d want to connect it to the assistants api, since that’s the bedrock of custom GPTs.
Latency there between the function call then the assistants api call will cause things to get a bit more complex. I’ve been assessing Google’s Realtime API so I’ll wait to see what they have before going on this commando mission haha.
@mikew2883 2 месяца назад
Thanks again! Works great!
@Mark_Kashef 2 месяца назад ⁺¹
Music to my ears! Glad it works well 🦾
And thank you for your generosity Mike
@benbanurji1727 2 месяца назад
Hey Mark - immediate subscribe to your channel, awesome tutorial! I implemented your solution within a couple of hours for my company as an MVP for an AI call centre agent. My bosses were blown away after presenting it the next day, especially given the multi-lingual capability of Realtime API (I work for a non-profit with clients in nearly every country). Now to spend the time to write a million PowerPoint slides to get an obvious winner past the corporate overlords...
@Mark_Kashef 2 месяца назад
Hey Ben!
These are the comments I make these videos for; appreciate you sharing/subbing!
And I relate to the last part, the hardest part is carving out the path to get senior management to test and adopt 🦾
@ManochonARG Месяц назад ⁺¹
Nice video! I’m struggling with using C# in my implementation.
@Mark_Kashef Месяц назад
ahh I wish I could help, I'm solely a javascript and python guy; I do think it's primarily geared for node.js
@ahmedalhosani1377 2 месяца назад
Thanks!
@Mark_Kashef 2 месяца назад
you're too kind! Thanks Ahmed 🦾
@richardadonnell 2 месяца назад
🎯 Key points for quick navigation:
00:00 *🤖 Introduction to Realtime API with Knowledge Base Integration*
- Overview of how to build a “brain” for the Realtime API to enhance precision and speed.
- Describes the ability to quickly upload documents to the knowledge base and make changes in seconds.
- Introduces Mark, the video creator, and his AI automation agency, Prompt Advisors.
01:12 *🧠 Overview of the Brain Integration Concept*
- Explanation of the basic flow from Twilio to the Realtime API and connecting it to the knowledge base.
- Highlights the use of Twilio to link phone calls with the API.
- Emphasizes the benefit of tapping into the knowledge base during conversations to improve responses.
03:03 *🛠️ Setting Up with Replit and Twilio*
- Introduces Replit as a coding environment, explaining deployment and pricing ($25/month).
- Describes linking Twilio phone numbers with the API.
- Warns about potential costs of using the real-time API ($4-$5 for 15 minutes).
05:11 *📡 Adding Features to Improve User Experience*
- Discusses how to modify the Realtime API to allow interruptions and handle conversations better.
- Explains how users can upload PDFs or text to the knowledge base for real-time access.
- Highlights how Pinecone vector databases enhance accuracy for question answering.
08:11 *📂 Configuring Pinecone and API Keys*
- Walks through creating a Pinecone account and uploading documents to the assistant.
- Describes how uploaded files are converted into machine-readable formats by Pinecone.
- Provides tips on managing API keys and deploying the setup with Replit.
10:44 *🔑 Deployment Process and Troubleshooting*
- Demonstrates how to deploy the code and link it to the Twilio phone number.
- Explains the importance of maintaining active deployments to keep services running.
- Walks through naming deployments and entering necessary credentials.
12:33 *🎙️ Testing the AI Assistant with Specific Queries*
- Simulates a conversation with the AI assistant, showcasing how it answers detailed questions from the knowledge base.
- Example queries include statistics on social media engagement, retainer pricing, and PPC campaign results.
- Demonstrates the assistant’s ability to provide precise, cited responses from uploaded documents.
15:16 *💼 Final Setup and Recommendations*
- Guides users on linking Twilio with Replit for continuous voice assistant functionality.
- Provides advice on using tools like OpenPhone for testing voice interactions.
- Encourages viewers to use the provided code, offer feedback, and engage with the content for future updates.
Made with HARPA AI
@Mark_Kashef 2 месяца назад
thanks for the reminder! Will post my own
@MinaEllis-XXAI 2 месяца назад
You really presented your video in a wonderful and beautiful way. I love this stuff and I love you too. You are so creative.
@Mark_Kashef 2 месяца назад
Thanks so much! Always appreciate hearing from you 🦾 if you thought this one was creative, just wait for what I have in store for today haha 🤣
@saedsaify9944 12 дней назад ⁺¹
Thank you, very helpful. 2 things that might help, an estimate of the cost based on traffic (say concurrent calls and total minutes per certain knowledge base size?) and if this works ok with other languages or needs some modifications other than the knowledge base documents?
@Mark_Kashef 12 дней назад
Hi Saed! Thanks for the feedback
Cost has changed a lot since the filming of this video - I’d check out their website for the latest pricing.
For languages, you should be able to pivot between 40-50 languages and dialects by just writing in the prompt that the agent should expect to transition to XYZ languages.
@saedsaify9944 12 дней назад
@@Mark_Kashef thanks it sounds expensive but an estimate will do, hard to tell from their site as the cost is per token. By the way, I didnt get how can we use/buy the code mentioned, gumroad link is about sora prompting guide, is that it?
@Mark_Kashef 12 дней назад
@@saedsaify9944 for some reason there's a huge typo in this description -- I just resolved the link so it should go to the right assets.
pricing used to be .2-.3 per minute, should be around 0.1-0.15 now.
@saedsaify9944 5 дней назад
@@Mark_Kashef Thanks, possible to implement this solution on aws serverless instead of replit? What is needed?
@magicismagic123 2 месяца назад
great share thanks! what use cases this solution can cover?
@Mark_Kashef 2 месяца назад ⁺¹
any ai caller that needs a knowledge base where the files are either PDF or text primarily
@mikew2883 2 месяца назад
Awesome! 👏
@Mark_Kashef 2 месяца назад
thanks Mike!
@rachel.christine 2 месяца назад
Would love to see a tutorial on how to provide real time context from Pinecone in subsequent Real Time API calls (for example, how can I get more context from Pinecone for each new user question before generating a response and how can I save new data from this conversation into Pinecone).
@Mark_Kashef 2 месяца назад
Hi Rachel! Thanks for this novel idea; at the moment it is providing realtime context, and if you ask about something it's already retrieved, it'll have a quicker response.
That said, the saving conversation part we'd need to save elsewhere such as a light database or airtable
@naveennoelj 2 месяца назад
This is a very good video, I was searching for this, keep up the good work. Since OpenAI's realtime API is pretty expensive. Would love to see how we could create a system to use S2T, T2S, livekit, any OSS VectorDB and implement the same, to reduce costs.
@Mark_Kashef 2 месяца назад ⁺¹
Thanks so much Naveen; I’m going to ‘try’ to build my own version of the realtime api with LiveKit this weekend!
Wish me luck 🫡
@naveennoelj 2 месяца назад
@Mark_Kashef Good luck buddy🤞can't wait for it.
@Mark_Kashef 2 месяца назад ⁺¹
@@naveennoelj so far so good aha :)
@cowfactory4903 2 месяца назад
Hey, thanks a lot, Mark, for this great tutorial! I’ve followed all the steps in your Gumroad guide. The only problem is that when Twilio picks up the call, it doesn’t greet the caller first-it just stays silent, waiting for the user to speak. How can I make the model greet the caller when it receives an incoming call?
@Mark_Kashef 2 месяца назад
my pleasure!
you pushed me to figure it out over my morning cereal.
I made the silence duration a bit longer so you can tweak that as you wish -- it should greet you first and not be able to get interrupted until it's done its opening message:
bit.ly/3Uzpmai
I'm travelling to Qatar in a few weeks so the prompt is going to be a travel guide haha -- used it myself for a bit.
Let me know if it helps!
@cowfactory4903 2 месяца назад
@@Mark_Kashef Great solution! thanks a lot!!! Have a nice trip!
@Mark_Kashef 2 месяца назад
@@cowfactory4903 You got it! :) and thanks!
@3echoDev 2 месяца назад
what is the reason to use pinecone for assistant. Instead, can I use Open AI Assistant for the same purpose?
@Mark_Kashef 2 месяца назад ⁺²
The Pinecone Assistant API uses what’s called grounded generation, meaning its accuracy rate, especially with larger documents is much higher than the out of the box vector database of OpenAI
@calebb4632 3 дня назад
Why are you repeatedly cutting the video in the AIS responses to decieve viewers into thinking its replying at a faster pace? This is so disgusting , bye
@Mark_Kashef 3 дня назад
For production value 🤔
Everyone will be able to test and audit the response time themselves, not sure how that’s deceiving - spread positivity, not negativity.
@calebb4632 3 дня назад
@ its deceiving because its active omission. And not everyone has the time to try your approach. There are many different approaches , imagine one tries to find the best one with the lowest latency and each time he ends up unsatisfied after finding out all youtubers simply cut their response times to make it look like its responding faster than it actually is. Its a waste of time. I also can't present a video to my customers about a product and let them find out themselves tht my first introduction is different than what they used. Lol they could sew me. Nobody would buy products from me with that approach. I noticed you cut them by simply looking closely at your video and for some reason all of youtubers do this its ridiculous

Следующие

Автовоспроизведение

How to Build a Voice Agent Using Only Your Voice