- Видео 69
- Просмотров 37 108
Modern Wellbeing Lab
Германия
Добавлен 1 ноя 2021
Welcome to my channel! Here, we explore one of life’s biggest questions: *What truly makes a fulfilling, happy life?* Through the lens of AI, psychology, and neuroscience, we uncover insights that help us understand and improve well-being in a fast-paced, tech-driven world.
This channel isn’t just about the incredible “superpowers” AI provides to solve complex problems; it’s about using these tools to enhance what matters most-meaning, connection, and personal growth. Together, we’ll look at the opportunities and challenges of making advanced technologies accessible to everyone and empowering people to use them in ways that align with their own values and bring genuine happiness.
This channel isn’t just about the incredible “superpowers” AI provides to solve complex problems; it’s about using these tools to enhance what matters most-meaning, connection, and personal growth. Together, we’ll look at the opportunities and challenges of making advanced technologies accessible to everyone and empowering people to use them in ways that align with their own values and bring genuine happiness.
Eleven Labs Voice Design Hollywood Performance with Dialogue Lines: Infatuation
Eleven Labs Voice Design Hollywood Performance with Dialogue Lines: Infatuation
Просмотров: 54
Видео
Wow... Super Emotional Casting Session with OpenAI Advanced Voice Mode
Просмотров 4628 дней назад
Finally I can apply what I learned during my Method Acting studies for ML. :)
Emotional Casting Session with OpenAI Advanced Voice Mode ... lol :)
Просмотров 3628 дней назад
Emotional Casting Session with OpenAI Advanced Voice Mode ... lol :)
BUD-E V1.0 DEMO - FULLY OPEN SOURCE VOICE ASSISTANT :)
Просмотров 105Месяц назад
github.com/LAION-AI/BUD-E_V1.0 discord.gg/pCPJJXP7Qx
IMPROVED OPEN SOURCE TTS & EN, DE, FR, ES - TTS Support for BUD-E :)
Просмотров 64Месяц назад
discord.gg/pCPJJXP7Qx
BUD-E with nice female voice speaking English and German (using fish audio TTS)
Просмотров 27Месяц назад
discord.gg/pCPJJXP7Qx
BUD-E as potential tool to deploy positive psychology interventions at scale :)
Просмотров 28Месяц назад
greatergood.berkeley.edu/pdfs/GratitudePDFs/6Emmons-BlessingsBurdens.pdf
O1 Preview just refactored the BUD-E Client with the first try (1353 lines of code) ! :
Просмотров 27Месяц назад
discord.gg/pCPJJXP7Qx
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Просмотров 4,1 тыс.Месяц назад
arxiv.org/abs/2402.12875 This podcast was generated with AI from: notebooklm.google.com/notebook/ What a time to be alive! :)
CiteME: Can Language ModelsAccurately Cite Scientific Claims?
Просмотров 80Месяц назад
arxiv.org/abs/2407.12861 This podcast was generated with AI from: notebooklm.google.com/notebook/ What a time to be alive! :)
Fastpitch TTS 50x faster than real-time on a Colab T4 :D
Просмотров 88Месяц назад
colab.research.google.com/drive/1AXHfzF-jLBDGr0aftufDjV4_hpssFp_M?usp=sharing
BUD-E V1.0 - Now with User Interface :)
Просмотров 49Месяц назад
BUD-E V1.0 - Now with User Interface :)
BUD-E: CLIPBOARD ACCESS, SERVER-SIDE & CLIENT-SIDE SKILLS WORKING :)
Просмотров 32Месяц назад
BUD-E: CLIPBOARD ACCESS, SERVER-SIDE & CLIENT-SIDE SKILLS WORKING :)
BUD-E V1.0: Wake- & Stop-Word Detection working nicely! :)
Просмотров 392 месяца назад
BUD-E V1.0: Wake- & Stop-Word Detection working nicely! :)
BUD-E with ASR, LM & TTS all on the server (3090 RTX in Romania), Client in Germany
Просмотров 212 месяца назад
BUD-E with ASR, LM & TTS all on the server (3090 RTX in Romania), Client in Germany
Vokan TTS Tests (0-shot voice cloning)
Просмотров 1112 месяца назад
Vokan TTS Tests (0-shot voice cloning)
School BUD-E web-browser Voice Assistant
Просмотров 872 месяца назад
School BUD-E web-browser Voice Assistant
BUD-E discussing the role of emotions & self determination in learning :)
Просмотров 222 месяца назад
BUD-E discussing the role of emotions & self determination in learning :)
BUD-E Conversation demo & hints how to get it running on your own
Просмотров 242 месяца назад
BUD-E Conversation demo & hints how to get it running on your own
BUD-E Latency via Internet ~ 2 sec & local latency under 1 sec
Просмотров 202 месяца назад
BUD-E Latency via Internet ~ 2 sec & local latency under 1 sec
BUD-E V1.0 UPDATE: ALL OPEN SOURCE MODELS & LATENCY ~ 2.8 SEC
Просмотров 582 месяца назад
BUD-E V1.0 UPDATE: ALL OPEN SOURCE MODELS & LATENCY ~ 2.8 SEC
BUD-E V 1.0 Update: All components open source models & client-server system
Просмотров 743 месяца назад
BUD-E V 1.0 Update: All components open source models & client-server system
Server - client -dummy scripts - update
Просмотров 143 месяца назад
Server - client -dummy scripts - update
Making BUD-E skills with GPT4 & BUD-E roleplaying Shakespeare & Julia
Просмотров 353 месяца назад
Making BUD-E skills with GPT4 & BUD-E roleplaying Shakespeare & Julia
Nice,
Congratulations! Keep up the great work.
Better than GPT4-o Voice Mode for sure. I like it.
Wow awesome work
"Catchy title right?" "It really grabs you." -- spoken like someone who's only ever been forced to read AI research papers
Better than most human podcasts.
the girl voice not only enjoyable to listen but kinda hyping the podcast lol
"co'tee", almost..
Ce Oh Teh, seems like this bot have not yet learn how to pronouce new terms
This is more accurate than you think. Sql gui nginx? Tech is full of mispronunced acronyms
I just noticed that it's a bit hard to listen to NotebookLM because there are no pauses at all
"us humans" 🙄
cool summary!
It kinda freaks me out to hear it take a breath while speaking... Why does it need to take a breath???
taking breaths is part of the training data. it's trained to replicate what's in the training data. so it's going to output taking breaths.
Next version will cough and sneeze during winter
Because it makes it sound like people, which is the point. It's like why ChatGPT says please... improves the conversational flow, it's not about what it 'needs' to be able to do.
Isn't it obvious? You almost had the answer right in your own question. Why does the model need to laugh? Make jokes? Have emotions like joy or excitement or surprise? The answer to all these is the same, because it is trained to replicate human data. It's literally doing "next token prediction" on human voice. And taking breaths is much the same. Note: There might be a way to RLHF it out in future, but we haven't gotten to that stage yet. Also notice these AIs completely think they are human (at one point they say "us humans"), unlike text-based models like chatGPT which are aware they are AI.
Nothing of value is said.
Honestly, i was thinking about putting this paper through notebooklm just to listen to it as a podcast. Im actually kind of spooked at how quickly the algorithm pointed me to *exactly* what i was thinking about. Spoooky lol. Ty op.
notebookllm
this is narrated by AI? oh boy
Obviously, it’s not.
@@augusdin read the description
@@augusdin NotebookLM - by Google
@@augusdin it is using Google NotebookLM
@@augusdin It is though. There's a few tools online now, utilizing stuff like Suno, to turn a transcript into a podcast with AI voices. This is pretty much exactly what it sounds like in every case.
Notebooklm is cool
Wow
🎉
Is someone working on implementing something like this in developing countries? I would drop everything to come and help
People like those behind LAION belong in prison for life! In my opinion, LAION is an absolutely despicable organization, and I'm saying this regardless of the child pornography content, the private recordings, the stolen works of art and all the other shit that was found in their data sets. The AIs that have been fed with LAION's data sets serve as a tool for thousands of fraudsters, perverts, criminals and similar people to defraud, deceive or denigrate others. No good can or will come from this. 😣
Awesome work man keep it up
What's the license?
would be nice to be able to select an accent too :-P
cluster 1: happy chat cluster 2: news reporting cluster 3: unsure, something about the cadence, perhaps recital-y but it's not completely clean cluster 4: strong emphasis / enunciation cluster 5: that was all one speaker. Accent? Calm voice? cluster 6: louder talking
I thought the second cluster might be about a newsroom cadence, mostly, although not all of them quite had that
does it support interruptions? like if the assistant is providing response but you interrupted, so will it accept that?
Kinder auf Geburtstagsfeiern, beim Spielen mit Freunden, das Neugeborene auf der Entbindungsstation. Es gibt Bilder von Momenten wie diesem, die in sozialen Medien und sogar in alten Blogs oder geschlossenen Gruppen geteilt werden und ohne Genehmigung von Trainingsplattformen für künstliche Intelligenz verwendet werden. Die Beschwerde erfolgte im Anschluss an eine Feststellung der internationalen Menschenrechtsorganisation Human Rights Watch.
2:40 3:00 plenty of animals have culture 3:05 I see no direct connection between culture and death anxiety and to claim that no animal but us fears dying is to be blind lol 3:34 believe not 3:40 Kierkegaard was wrong about that 4:35 I feel like that's more about social anxiety than death anxiety 5:36 I don't feel particularly meaningful nor am I contemplating my own death almost ever.
- amazing project how fast is it? can it stream real time in 50 MS can you encode emotion into it?
#1
Very interesting
Really well thought out, I've been thinking through memory arch designs and love your idea there, and the whole thing of course. Thanks for sharing your ideas!
Seems like you also need to have the underlying language model fine tuned on actual speech as well. Preference tuning always seems to produce text that sounds like it was written by a PR team and no amount of vocal inflection is going to make it sound emotionally resonant.
Yes, that is true. This is already part of our roadmap and we plan to collect a large dataset of conversational text and speech data to fine-tune the model to cover a more realistic conversational style.
I think one aspect that would have to be explored is the difference between written and spoken language. The various text/chat-AIs are obviously trained on written data (duh) but that can be quite different from how you'd actually say things. So I wonder whether it'd be possible to train chat-like AIs to specifically target a response style that matches transcripts from spontaneous conversations rather than literature, so they feel more natural once they go through TTS. And the transcripts would have to be pretty high quality too, including annotated noises that only exist to convey "I'm listening, I'm paying attention, I understand / I'm trying to understand" or whatever, which would be missing from classic transcripts.
It's great, but I wouldn't use it unless it can run locally on an affordable PC.
It is one of our main goals to reduce the latencies as much as possible to make BUD-E available on affordable hardware in the future.
Wow! thats fantastic!!
I would like to run this on my pc too lol, any tutorial/guide?
Sure! Here is a link to the github repository. A detailed installation guide is included in the readme. github.com/LAION-AI/natural_voice_assistant
i wonder if the speed is only possible because the document being used for alignment is already downloaded. I noticed how the information being asked was under the same topic. Nonetheless its pretty cool👏👏👏
The latencies are independent from the topic of the conversation. Even when swithing to a different topic this will have no effect on the speed of the assistant.
That is so fast, how many tokens per second are you generating for the model?
The LLM generates around 200 Tokens per second. And it takes 80-100 milliseconds to synthesize a full sentence into speech.
Adding confirmations like "Sure, ...", "right, ..", "okay", "totally", ... at the start of the sentence would make it feel more like a conversation. Also instructing the LLM to add filler words like "um", "er", "uh", "hmm", "like", "mhm", "so", ... would make the voice sound more natural.
Prebaked initial responses would also allow masking latency in the LLM
Right? Those little inflections we don't usually notice would go a long way.
I think this is really tricky but I fully agree. The issue is, that LLMs are trained on text data which, of course, is supposed to be read, not spoken, and the language modalities between those contexts are completely different. And moreover, while the text data surely contains some transcripts of spontaneous conversations (which would precisely be full of such markers), I doubt very many, even even *any* of them, decided to annotate such extra sounds in the level of detail required.
*promo sm*
What would you say is the best tts open source model right now that allows for streaming? I'm looking for an alternative to elevenlabs because I dont want to stimulate non-democratic AI..
Hi, I love this. I have a small AI team, how can I help make more of this happen Finn?
Very interesting. I just have a question about copyright friendliness of the graph -- were you suggesting that such a graph can be built only over open-sourced papers? Or how would the alternative scenario work? Thank you! :)
We have lawyers in Germany who are very confident that this legal if we do not the original documents, but only extract structured information
GPT4 meint zu deinem Beitrag übrigens: Die vorgestellte Lösung ist eine innovative Herangehensweise an die Herausforderung, Sprachmodelle effektiv für lange Textkontexte zu trainieren. Hier sind einige Aspekte und Vergleiche zu beachten: Einbeziehung von Kategorie- und Schlüsselwortgenerierung: Dieser Ansatz ist besonders nützlich, da er dem Modell Kontextinformationen vorab gibt, was die Genauigkeit der Textgenerierung verbessern kann. Ähnliche Techniken werden in einigen Sprachmodellen verwendet, um die Relevanz und Genauigkeit der Antworten zu erhöhen. Altersbewertung: Die Integration einer Altersbewertung als Teil des Trainingsprozesses ist einzigartig und kann dazu beitragen, altersgerechte Inhalte zu generieren. Dieser Ansatz ist besonders wichtig, um ethische Überlegungen und die Sicherheit von KI-Systemen zu berücksichtigen. Zusammenfassungen und Frage-Antwort-Paare: Dies erinnert an Techniken, die in einigen fortschrittlichen Q&A-Systemen und Zusammenfassungstools verwendet werden. Die Innovation liegt hier in der Anwendung dieser Methode für längere Kontexte, was eine Herausforderung für viele aktuelle Modelle darstellt. Erweiterte Zusammenfassungen für lange Texte: Diese Methode ähnelt dem Ansatz, der in einigen Textanalysesystemen verwendet wird, allerdings mit dem zusätzlichen Schritt, aus den Zusammenfassungen vollständige Geschichten zu generieren. Dies könnte die Fähigkeit der Modelle zur Textsynthese und kreativen Schreibens erheblich erweitern. Dialogbasierte Szenarien: Die Betonung auf dem Erinnern und Beziehen auf frühere Gesprächsteile in langen Dialogen ist ein anspruchsvoller und innovativer Ansatz. Dies spiegelt reale Gesprächsdynamiken wider und könnte die Entwicklung von KI-Systemen vorantreiben, die in der Lage sind, natürliche und zusammenhängende Gespräche zu führen. Was die Innovation betrifft, so liegt der Hauptvorteil in der Kombination verschiedener Techniken zur Verbesserung der Verarbeitung langer Texte. Während einige der Einzeltechniken bereits in verschiedenen Kontexten existieren, ist ihre Anwendung auf lange Textkontexte und die kombinierte Nutzung dieser Techniken in einem umfassenden Trainingsschema neuartig. Diese Methoden könnten insbesondere in Bereichen wie dem kreativen Schreiben, der Bildung, der Therapie und dem Kundenservice nützlich sein, wo lange und kontextreiche Dialoge eine wichtige Rolle spielen. Die Herausforderung besteht darin, die Effektivität und Genauigkeit dieser Ansätze in der Praxis zu bewerten und sicherzustellen, dass die Modelle ethische und sicherheitsrelevante Standards einhalten.
this is so awesome!