Видео 69
Просмотров 37 108

Advanced Cookie Monster Mode :D

1:28

Wow... Super Emotional Casting Session with OpenAI Advanced Voice Mode

20:30

Emotional Casting Session with OpenAI Advanced Voice Mode ... lol :)

10:24

BUD-E V1.0 DEMO - FULLY OPEN SOURCE VOICE ASSISTANT :)

5:11

IMPROVED OPEN SOURCE TTS & EN, DE, FR, ES - TTS Support for BUD-E :)

4:13

BUD-E with nice female voice speaking English and German (using fish audio TTS)

2:14

Eleven Labs Voice Design Hollywood Performance with Dialogue Lines: Infatuation

Eleven Labs Voice Design Hollywood Performance with Dialogue Lines: Infatuation

Видео

1:28

Advanced Cookie Monster Mode :D

Просмотров 3114 дней назад

Advanced Cookie Monster Mode :D

Wow... Super Emotional Casting Session with OpenAI Advanced Voice Mode

20:30

Wow... Super Emotional Casting Session with OpenAI Advanced Voice Mode

Просмотров 4628 дней назад

Finally I can apply what I learned during my Method Acting studies for ML. :)

Emotional Casting Session with OpenAI Advanced Voice Mode ... lol :)

10:24

Emotional Casting Session with OpenAI Advanced Voice Mode ... lol :)

Просмотров 3628 дней назад

Emotional Casting Session with OpenAI Advanced Voice Mode ... lol :)

BUD-E V1.0 DEMO - FULLY OPEN SOURCE VOICE ASSISTANT :)

5:11

BUD-E V1.0 DEMO - FULLY OPEN SOURCE VOICE ASSISTANT :)

Просмотров 105Месяц назад

github.com/LAION-AI/BUD-E_V1.0 discord.gg/pCPJJXP7Qx

IMPROVED OPEN SOURCE TTS & EN, DE, FR, ES - TTS Support for BUD-E :)

4:13

IMPROVED OPEN SOURCE TTS & EN, DE, FR, ES - TTS Support for BUD-E :)

Просмотров 64Месяц назад

discord.gg/pCPJJXP7Qx

BUD-E with nice female voice speaking English and German (using fish audio TTS)

2:14

BUD-E with nice female voice speaking English and German (using fish audio TTS)

Просмотров 27Месяц назад

discord.gg/pCPJJXP7Qx

Me talking to BUD-E which is using my own voice! :D

8:32

Me talking to BUD-E which is using my own voice! :D

Просмотров 97Месяц назад

discord.gg/pCPJJXP7Qx

BUD-E as potential tool to deploy positive psychology interventions at scale :)

5:23

BUD-E as potential tool to deploy positive psychology interventions at scale :)

Просмотров 28Месяц назад

greatergood.berkeley.edu/pdfs/GratitudePDFs/6Emmons-BlessingsBurdens.pdf

O1 Preview just refactored the BUD-E Client with the first try (1353 lines of code) ! :

2:04

O1 Preview just refactored the BUD-E Client with the first try (1353 lines of code) ! :

Просмотров 27Месяц назад

discord.gg/pCPJJXP7Qx

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

8:08

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Просмотров 4,1 тыс.Месяц назад

arxiv.org/abs/2402.12875 This podcast was generated with AI from: notebooklm.google.com/notebook/ What a time to be alive! :)

CiteME: Can Language ModelsAccurately Cite Scientific Claims?

8:17

CiteME: Can Language ModelsAccurately Cite Scientific Claims?

Просмотров 80Месяц назад

arxiv.org/abs/2407.12861 This podcast was generated with AI from: notebooklm.google.com/notebook/ What a time to be alive! :)

Fastpitch TTS 50x faster than real-time on a Colab T4 :D

0:53

Fastpitch TTS 50x faster than real-time on a Colab T4 :D

Просмотров 88Месяц назад

colab.research.google.com/drive/1AXHfzF-jLBDGr0aftufDjV4_hpssFp_M?usp=sharing

0:39

BUD-E V1.0 - Now with User Interface :)

Просмотров 49Месяц назад

BUD-E V1.0 - Now with User Interface :)

BUD-E: CLIPBOARD ACCESS, SERVER-SIDE & CLIENT-SIDE SKILLS WORKING :)

6:38

BUD-E: CLIPBOARD ACCESS, SERVER-SIDE & CLIENT-SIDE SKILLS WORKING :)

Просмотров 32Месяц назад

BUD-E: CLIPBOARD ACCESS, SERVER-SIDE & CLIENT-SIDE SKILLS WORKING :)

BUD-E V1.0: Wake- & Stop-Word Detection working nicely! :)

4:06

BUD-E V1.0: Wake- & Stop-Word Detection working nicely! :)

Просмотров 392 месяца назад

BUD-E V1.0: Wake- & Stop-Word Detection working nicely! :)

BUD-E with ASR, LM & TTS all on the server (3090 RTX in Romania), Client in Germany

4:47

BUD-E with ASR, LM & TTS all on the server (3090 RTX in Romania), Client in Germany

Просмотров 212 месяца назад

BUD-E with ASR, LM & TTS all on the server (3090 RTX in Romania), Client in Germany

6:53

Vokan TTS Tests (0-shot voice cloning)

Просмотров 1112 месяца назад

Vokan TTS Tests (0-shot voice cloning)

School BUD-E web-browser Voice Assistant

4:34

School BUD-E web-browser Voice Assistant

Просмотров 872 месяца назад

School BUD-E web-browser Voice Assistant

BUD-E discussing the role of emotions & self determination in learning :)

7:17

BUD-E discussing the role of emotions & self determination in learning :)

Просмотров 222 месяца назад

BUD-E discussing the role of emotions & self determination in learning :)

BUD-E Conversation demo & hints how to get it running on your own

9:15

BUD-E Conversation demo & hints how to get it running on your own

Просмотров 242 месяца назад

BUD-E Conversation demo & hints how to get it running on your own

BUD-E Latency via Internet ~ 2 sec & local latency under 1 sec

4:27

BUD-E Latency via Internet ~ 2 sec & local latency under 1 sec

Просмотров 202 месяца назад

BUD-E Latency via Internet ~ 2 sec & local latency under 1 sec

BUD-E V1.0 UPDATE: ALL OPEN SOURCE MODELS & LATENCY ~ 2.8 SEC

7:38

BUD-E V1.0 UPDATE: ALL OPEN SOURCE MODELS & LATENCY ~ 2.8 SEC

Просмотров 582 месяца назад

BUD-E V1.0 UPDATE: ALL OPEN SOURCE MODELS & LATENCY ~ 2.8 SEC

3:11

BUD-E Update: 2024 08 02

Просмотров 203 месяца назад

BUD-E Update: 2024 08 02

BUD-E V 1.0 Update: All components open source models & client-server system

7:41

BUD-E V 1.0 Update: All components open source models & client-server system

Просмотров 743 месяца назад

BUD-E V 1.0 Update: All components open source models & client-server system

5:52

Server - client -dummy scripts - update

Просмотров 143 месяца назад

Server - client -dummy scripts - update

25:22

BUD E V1.0 - Architecture Blueprint

Просмотров 563 месяца назад

BUD E V1.0 - Architecture Blueprint

7:50

BUD-E - Demo

Просмотров 6873 месяца назад

BUD-E - Demo

Making BUD-E skills with GPT4 & BUD-E roleplaying Shakespeare & Julia

7:19

Making BUD-E skills with GPT4 & BUD-E roleplaying Shakespeare & Julia

Просмотров 353 месяца назад

Making BUD-E skills with GPT4 & BUD-E roleplaying Shakespeare & Julia

5:04

BUD-E explains itself

Просмотров 2283 месяца назад

BUD-E explains itself

@adityaravbouddh3872 7 дней назад
Nice,
@Karl-Asger Месяц назад
Congratulations! Keep up the great work.
@CrowleyBlack2 Месяц назад
Better than GPT4-o Voice Mode for sure. I like it.
@Karl-Asger Месяц назад
Wow awesome work
@ClydeWright Месяц назад
"Catchy title right?" "It really grabs you." -- spoken like someone who's only ever been forced to read AI research papers
@joshuatreefinancial Месяц назад
Better than most human podcasts.
@mriz Месяц назад
the girl voice not only enjoyable to listen but kinda hyping the podcast lol
@HenkPoley Месяц назад
"co'tee", almost..
@mriz Месяц назад
Ce Oh Teh, seems like this bot have not yet learn how to pronouce new terms
@Wubwub772 12 дней назад
This is more accurate than you think. Sql gui nginx? Tech is full of mispronunced acronyms
@ahtoshkaa Месяц назад
I just noticed that it's a bit hard to listen to NotebookLM because there are no pauses at all
@coreyh144 Месяц назад
"us humans" 🙄
@GigaMarou Месяц назад
cool summary!
@PirateFromSpace Месяц назад
It kinda freaks me out to hear it take a breath while speaking... Why does it need to take a breath???
@vaendryl Месяц назад
taking breaths is part of the training data. it's trained to replicate what's in the training data. so it's going to output taking breaths.
@LucidKDB Месяц назад
Next version will cough and sneeze during winter
@Vakisari Месяц назад
Because it makes it sound like people, which is the point. It's like why ChatGPT says please... improves the conversational flow, it's not about what it 'needs' to be able to do.
@bhannirav 25 дней назад
Isn't it obvious? You almost had the answer right in your own question. Why does the model need to laugh? Make jokes? Have emotions like joy or excitement or surprise? The answer to all these is the same, because it is trained to replicate human data. It's literally doing "next token prediction" on human voice. And taking breaths is much the same. Note: There might be a way to RLHF it out in future, but we haven't gotten to that stage yet. Also notice these AIs completely think they are human (at one point they say "us humans"), unlike text-based models like chatGPT which are aware they are AI.
@moso00 Месяц назад
Nothing of value is said.
@EnigmaticEncounters420 Месяц назад
Honestly, i was thinking about putting this paper through notebooklm just to listen to it as a podcast. Im actually kind of spooked at how quickly the algorithm pointed me to *exactly* what i was thinking about. Spoooky lol. Ty op.
@yvainepan6988 Месяц назад
notebookllm
@maxsch.2367 Месяц назад
this is narrated by AI? oh boy
@augusdin Месяц назад
Obviously, it’s not.
@vaendryl Месяц назад
@@augusdin read the description
@ahtoshkaa Месяц назад
@@augusdin NotebookLM - by Google
@mriz Месяц назад
@@augusdin it is using Google NotebookLM
@Vakisari Месяц назад
@@augusdin It is though. There's a few tools online now, utilizing stuff like Suno, to turn a transcript into a podcast with AI voices. This is pretty much exactly what it sounds like in every case.
@marshallmcluhan33 Месяц назад
Notebooklm is cool
@tedpunt6146 Месяц назад
Wow
@julienblanchon6082 Месяц назад
🎉
@JasperBusschers 2 месяца назад
Is someone working on implementing something like this in developing countries? I would drop everything to come and help
@jeremiahmorelli4559 3 месяца назад
People like those behind LAION belong in prison for life! In my opinion, LAION is an absolutely despicable organization, and I'm saying this regardless of the child pornography content, the private recordings, the stolen works of art and all the other shit that was found in their data sets. The AIs that have been fed with LAION's data sets serve as a tool for thousands of fraudsters, perverts, criminals and similar people to defraud, deceive or denigrate others. No good can or will come from this. 😣
@Karl-Asger 3 месяца назад
Awesome work man keep it up
@FenrirRobu 4 месяца назад
What's the license?
@runtdegroot 4 месяца назад
would be nice to be able to select an accent too :-P
@Kram1032 4 месяца назад
cluster 1: happy chat cluster 2: news reporting cluster 3: unsure, something about the cadence, perhaps recital-y but it's not completely clean cluster 4: strong emphasis / enunciation cluster 5: that was all one speaker. Accent? Calm voice? cluster 6: louder talking
@Kram1032 4 месяца назад
I thought the second cluster might be about a newsroom cadence, mostly, although not all of them quite had that
@kapilkevlani145 4 месяца назад
does it support interruptions? like if the assistant is providing response but you interrupted, so will it accept that?
@brondi1967 4 месяца назад
Kinder auf Geburtstagsfeiern, beim Spielen mit Freunden, das Neugeborene auf der Entbindungsstation. Es gibt Bilder von Momenten wie diesem, die in sozialen Medien und sogar in alten Blogs oder geschlossenen Gruppen geteilt werden und ohne Genehmigung von Trainingsplattformen für künstliche Intelligenz verwendet werden. Die Beschwerde erfolgte im Anschluss an eine Feststellung der internationalen Menschenrechtsorganisation Human Rights Watch.
@Kram1032 5 месяцев назад
2:40 3:00 plenty of animals have culture 3:05 I see no direct connection between culture and death anxiety and to claim that no animal but us fears dying is to be blind lol 3:34 believe not 3:40 Kierkegaard was wrong about that 4:35 I feel like that's more about social anxiety than death anxiety 5:36 I don't feel particularly meaningful nor am I contemplating my own death almost ever.
@shake6321 5 месяцев назад
- amazing project how fast is it? can it stream real time in 50 MS can you encode emotion into it?
@chriswendler5464 5 месяцев назад
#1
@Karl-Asger 5 месяцев назад
Very interesting
@Karl-Asger 6 месяцев назад
Really well thought out, I've been thinking through memory arch designs and love your idea there, and the whole thing of course. Thanks for sharing your ideas!
@TheRyulord 8 месяцев назад
Seems like you also need to have the underlying language model fine tuned on actual speech as well. Preference tuning always seems to produce text that sounds like it was written by a PR team and no amount of vocal inflection is going to make it sound emotionally resonant.
@Modern-Wellbeing-Lab 8 месяцев назад
Yes, that is true. This is already part of our roadmap and we plan to collect a large dataset of conversational text and speech data to fine-tune the model to cover a more realistic conversational style.
@Kram1032 8 месяцев назад
I think one aspect that would have to be explored is the difference between written and spoken language. The various text/chat-AIs are obviously trained on written data (duh) but that can be quite different from how you'd actually say things. So I wonder whether it'd be possible to train chat-like AIs to specifically target a response style that matches transcripts from spontaneous conversations rather than literature, so they feel more natural once they go through TTS. And the transcripts would have to be pretty high quality too, including annotated noises that only exist to convey "I'm listening, I'm paying attention, I understand / I'm trying to understand" or whatever, which would be missing from classic transcripts.
@karlderkafer3269 8 месяцев назад
It's great, but I wouldn't use it unless it can run locally on an affordable PC.
@Modern-Wellbeing-Lab 8 месяцев назад
It is one of our main goals to reduce the latencies as much as possible to make BUD-E available on affordable hardware in the future.
@GlobalCommunityMinersForum 8 месяцев назад
Wow! thats fantastic!!
@kungfooman 8 месяцев назад
I would like to run this on my pc too lol, any tutorial/guide?
@Modern-Wellbeing-Lab 8 месяцев назад
Sure! Here is a link to the github repository. A detailed installation guide is included in the readme. github.com/LAION-AI/natural_voice_assistant
@RadiantNij 8 месяцев назад
i wonder if the speed is only possible because the document being used for alignment is already downloaded. I noticed how the information being asked was under the same topic. Nonetheless its pretty cool👏👏👏
@Modern-Wellbeing-Lab 8 месяцев назад
The latencies are independent from the topic of the conversation. Even when swithing to a different topic this will have no effect on the speed of the assistant.
@SongStudios 8 месяцев назад
That is so fast, how many tokens per second are you generating for the model?
@Modern-Wellbeing-Lab 8 месяцев назад
The LLM generates around 200 Tokens per second. And it takes 80-100 milliseconds to synthesize a full sentence into speech.
@spenhouet 8 месяцев назад
Adding confirmations like "Sure, ...", "right, ..", "okay", "totally", ... at the start of the sentence would make it feel more like a conversation. Also instructing the LLM to add filler words like "um", "er", "uh", "hmm", "like", "mhm", "so", ... would make the voice sound more natural.
@NickMcCarthy-z9d 8 месяцев назад
Prebaked initial responses would also allow masking latency in the LLM
@Yoseqlo1 8 месяцев назад
Right? Those little inflections we don't usually notice would go a long way.
@Kram1032 8 месяцев назад
I think this is really tricky but I fully agree. The issue is, that LLMs are trained on text data which, of course, is supposed to be read, not spoken, and the language modalities between those contexts are completely different. And moreover, while the text data surely contains some transcripts of spontaneous conversations (which would precisely be full of such markers), I doubt very many, even even *any* of them, decided to annotate such extra sounds in the level of detail required.
@otesalgate3306 10 месяцев назад
*promo sm*
@henkjekel4081 10 месяцев назад
What would you say is the best tts open source model right now that allows for streaming? I'm looking for an alternative to elevenlabs because I dont want to stimulate non-democratic AI..
@eugenetapang 10 месяцев назад
Hi, I love this. I have a small AI team, how can I help make more of this happen Finn?
@jenniferdsouza7708 10 месяцев назад
Very interesting. I just have a question about copyright friendliness of the graph -- were you suggesting that such a graph can be built only over open-sourced papers? Or how would the alternative scenario work? Thank you! :)
@Modern-Wellbeing-Lab 10 месяцев назад
We have lawyers in Germany who are very confident that this legal if we do not the original documents, but only extract structured information
@binichnich8517 11 месяцев назад
GPT4 meint zu deinem Beitrag übrigens: Die vorgestellte Lösung ist eine innovative Herangehensweise an die Herausforderung, Sprachmodelle effektiv für lange Textkontexte zu trainieren. Hier sind einige Aspekte und Vergleiche zu beachten: Einbeziehung von Kategorie- und Schlüsselwortgenerierung: Dieser Ansatz ist besonders nützlich, da er dem Modell Kontextinformationen vorab gibt, was die Genauigkeit der Textgenerierung verbessern kann. Ähnliche Techniken werden in einigen Sprachmodellen verwendet, um die Relevanz und Genauigkeit der Antworten zu erhöhen. Altersbewertung: Die Integration einer Altersbewertung als Teil des Trainingsprozesses ist einzigartig und kann dazu beitragen, altersgerechte Inhalte zu generieren. Dieser Ansatz ist besonders wichtig, um ethische Überlegungen und die Sicherheit von KI-Systemen zu berücksichtigen. Zusammenfassungen und Frage-Antwort-Paare: Dies erinnert an Techniken, die in einigen fortschrittlichen Q&A-Systemen und Zusammenfassungstools verwendet werden. Die Innovation liegt hier in der Anwendung dieser Methode für längere Kontexte, was eine Herausforderung für viele aktuelle Modelle darstellt. Erweiterte Zusammenfassungen für lange Texte: Diese Methode ähnelt dem Ansatz, der in einigen Textanalysesystemen verwendet wird, allerdings mit dem zusätzlichen Schritt, aus den Zusammenfassungen vollständige Geschichten zu generieren. Dies könnte die Fähigkeit der Modelle zur Textsynthese und kreativen Schreibens erheblich erweitern. Dialogbasierte Szenarien: Die Betonung auf dem Erinnern und Beziehen auf frühere Gesprächsteile in langen Dialogen ist ein anspruchsvoller und innovativer Ansatz. Dies spiegelt reale Gesprächsdynamiken wider und könnte die Entwicklung von KI-Systemen vorantreiben, die in der Lage sind, natürliche und zusammenhängende Gespräche zu führen. Was die Innovation betrifft, so liegt der Hauptvorteil in der Kombination verschiedener Techniken zur Verbesserung der Verarbeitung langer Texte. Während einige der Einzeltechniken bereits in verschiedenen Kontexten existieren, ist ihre Anwendung auf lange Textkontexte und die kombinierte Nutzung dieser Techniken in einem umfassenden Trainingsschema neuartig. Diese Methoden könnten insbesondere in Bereichen wie dem kreativen Schreiben, der Bildung, der Therapie und dem Kundenservice nützlich sein, wo lange und kontextreiche Dialoge eine wichtige Rolle spielen. Die Herausforderung besteht darin, die Effektivität und Genauigkeit dieser Ansätze in der Praxis zu bewerten und sicherzustellen, dass die Modelle ethische und sicherheitsrelevante Standards einhalten.
@Graverman 11 месяцев назад
this is so awesome!

Modern Wellbeing Lab

Видео

Комментарии