Not Just Talk: A Voice Assistant That can take Actions

Prompt Engineering

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 13 янв 2025

Комментарии • 49

@Musicalcode313 4 месяца назад
this was on my todo list for a bigger project. I'm highly confident i should be able to plug this in for my apllication. you rock and have saved me alot of time i think.
@Alex-rg1rz 4 месяца назад ⁺¹
Very intersting ! Eager for the next video about local LLM integration. Subscribed !
@CryptoMaN_Rahul 4 месяца назад ⁺²
i want to implement this using verbi - a system that:
● Engages in real-time voice conversations with customers, mimicking the style and
knowledge of a product seller
● Understands and responds to complex product queries with low latency (
@CryptoMaN_Rahul 4 месяца назад
any tips ???
@Shubham-p1o 4 месяца назад
@@CryptoMaN_Rahul is it regarding Grid ???
@literallyshane4306 4 месяца назад
The work you are doing is fantastic. Thank you for sharing
@filstudio-zulfadli9554 4 месяца назад ⁺¹
This is what im looking for. Nice work..
This video is realy help me with my thesis
@engineerprompt 4 месяца назад
Glad it was helpful!
@mtpscout 3 месяца назад
Adding Vision and Voice Activity Detector (VAD) would take it to the next level
@SaddamBinSyed 3 месяца назад
@PromptEngg. How will you handle it when a user speaks other than the English language ? Do I need to translate the STT output text before passing to LLM or Is there any way to handle this in PROMPT itself ?
I am stuck with this point..
Appreciate your advice on this . Thanks😊
@alrojas68 4 месяца назад
brilliant, thanks a lot for the content, you are a true master.
@AbdaljalilAboulrhit 3 месяца назад
your work is fantastic. Can you make a video to show us how to install it locally step by step please. Thanks
There is a problem when restarting the code sometimes it gives error about api. Is there any way to fix that?
@yazanrisheh5127 4 месяца назад
Can you make it such that verbi has a wakeword so that we can keep it running everytime to call it whenever we want and not have it say random things if I am talking to my friend who is next to me. It would only reply or act if I say "Verbi what tasks do I have" instead of "What tasks do I have"
@gnosisdg8497 4 месяца назад
well suggestions are make it so it will start most or all windows based application via voice commands, have it clicking and moving mouse , creating new business emails reply etc, make it able to run or q&A pdf.word excel etc , rag,memory,tree if thoughts etc multiple agents , making code , executing scripts etc and voila
@AmansLab 4 месяца назад ⁺¹
I made a same thing with ollama with local TTS
@pranitrock 4 месяца назад
Funny enough I am also working on a similar project but a more complicated AI having controls to vscode and my stack I believe fits most of your as from grok. Have just watched 12 seconds of your video and I am eager to know the TTS part because I am struggling with good quality in that department. Rest is almost done some tweaking here and there is remaining.
@rubencabrera8519 4 месяца назад
Amazing videos like always. What software do you use to record your screen in that way?
@junchen-jm2vg 4 месяца назад
I try many time, but failed due to audio error, can you help me to check ?
@DihelsonMendonca 4 месяца назад
Now, finally, something really USEFUL from AI. Congratulations 🎉❤
@極東panda1337 4 месяца назад
how do you implement session?
@junchen-jm2vg 4 месяца назад
can you help me to check issues in github ? I need to demo to my customer
@criskitz23 4 месяца назад
I do have an idea for a project, but I am not a developer, so I am relying on an LLM to generate the code. My idea is to have a bot inside MS Teams meetings that can do a presentation and interact with other attendees if they ask questions. For example, if the bot is presenting and an attendee asks a question, it would pause the presentation to answer. If the answer is satisfactory, the bot would resume the presentation. Only relevant topics should be answered; otherwise, the question would be listed as an action item. Is this feasible?
@Devishetty001 4 месяца назад
You are a developer
@criskitz23 4 месяца назад
@@Devishetty001what a great relevant answer LMAO
@saxtant 4 месяца назад
I have this idea working locally using vllm, xtts2, whisper v3 large and brave search api.
@patricktang3377 4 месяца назад
What is the difference between this and Verbi?
@engineerprompt 4 месяца назад
It's verbi with additional functionality
@patricktang3377 4 месяца назад
@@engineerprompt What 'additional functionality' are you referring to?
Can I add it to Verbi when testing? If so, how?
@aa-xn5hc 4 месяца назад ⁺¹
Brillante
@build.aiagents 4 месяца назад
Phenomenal
@sammcj2000 4 месяца назад ⁺¹
That voice is SO creepy!
@im-notai 4 месяца назад ⁺¹
its actually copy of original jarvis voice,lol
@Create-The-Imaginable 4 месяца назад
Didn't JARVIS go Evil in the Marvel movies? Maybe that is what you are picking up on?
@im-notai 4 месяца назад ⁺¹
@@Create-The-Imaginable nah, he won't go evil, the evil version of ai was Ultron, not Jarvis.
@Create-The-Imaginable 4 месяца назад ⁺¹
@@im-notai I never saw that movie! I need to check it out! Was it good from an AI philisophical perspective?
@im-notai 4 месяца назад ⁺¹
@@Create-The-Imaginable you should 👍
@Kaalkian 4 месяца назад
these TTS api fees pricey. we need on-device TTS and idealy fully open source. my current demo TTS is like 90% of the voice to voice stack fees per minute. we need the price to drop an order of magnitude
@freeideas 4 месяца назад ⁺²
Since I am a human being, I am forced to click checkboxes and buttons and fill out forms, both on web pages and on fat apps, all day every day. This frustrates me because it is the type of brainless work that a machine should be doing. Also because I am a human, I don't use APIs to do this, I use UIs. And I know for sure that there are no APIs for much of what I do with UIs. If someone else would make an ai that operates UIs (actually SEES them, not using appletalk or puppeteer or similar APIs -- which don't actually SEE anything, and often don't detect controls that I can see -- e.g. an imagemap with an on-click event), I would gladly pay for it. But since I can find no ai projects that make even a reasonable attempt at this (they all use APIs), I am having to build one for myself, just so I can use it.
@HariPrezadu 4 месяца назад ⁺¹
Just clicking checkbox. It’s easy to make. But click captcha image. Thats difficult
@freeideas 4 месяца назад
@@HariPrezadu I wish the tedious work I must do consisted only of clicking checkboxes. But I am forced to type on the keyboard quite a bit also, and tab and click on fields, operate pulldown menus, and lots of other UI forms of torture. Oh what I would give if I could only get an ai to do this for me. Fortunately I am well versed in construction of ai-driven software, so maybe my suffering will so on be over. Or maybe I will die trying to make such an ai; but that would also end my UI suffering, so there is hope.
@rahees_ahmed 4 месяца назад
I have done a project using salenium and BERT user can ask go to google open website fill the form etc its was amazing project that i have done for pne of my client
@zeusconquers 4 месяца назад ⁺¹
Just look out there, this exists through control over the browser. If the UI is browser based you can do this and much more. I just forgot the name but there more than three they just new. This was a big problem for agents. Solving captchas and behave as a human would
@freeideas 4 месяца назад
@@zeusconquers I am frustrated because i HAVE examined the half-hearted attempts at operating UIs. The browser-based solutions are very unsatisfying because they can see only controls that are well marked and easily distinguished by the HTML/JS/CSS as controls. An old-fashioned image map with js on-click events that look at the x,y coordinate of the click, for example, would be nearly impossible to operate with this kind of solution. The non-browser based solutions suffer from the same problem because they use OS API's to query for buttons, text fields, and other controls. If the app doesn't go out of its way to publish these controls the right way, the OS doesn't know about them. Bottom line: everything I have examined for operating UIs can't SEE anything; they are using APIs or sort-of APIs.
@Elsag_GeliNakh 4 месяца назад
👍

Следующие

Автовоспроизведение

Anthropic’s Claude System Prompt Revealed: Key Takeaways for Developers!