Modern Wellbeing Lab
Modern Wellbeing Lab
  • Видео 69
  • Просмотров 37 108

Видео

Advanced Cookie Monster Mode :D
Просмотров 3114 дней назад
Advanced Cookie Monster Mode :D
Wow... Super Emotional Casting Session with OpenAI Advanced Voice Mode
Просмотров 4628 дней назад
Finally I can apply what I learned during my Method Acting studies for ML. :)
Emotional Casting Session with OpenAI Advanced Voice Mode ... lol :)
Просмотров 3628 дней назад
Emotional Casting Session with OpenAI Advanced Voice Mode ... lol :)
BUD-E V1.0 DEMO - FULLY OPEN SOURCE VOICE ASSISTANT :)
Просмотров 105Месяц назад
github.com/LAION-AI/BUD-E_V1.0 discord.gg/pCPJJXP7Qx
IMPROVED OPEN SOURCE TTS & EN, DE, FR, ES - TTS Support for BUD-E :)
Просмотров 64Месяц назад
discord.gg/pCPJJXP7Qx
BUD-E with nice female voice speaking English and German (using fish audio TTS)
Просмотров 27Месяц назад
discord.gg/pCPJJXP7Qx
Me talking to BUD-E which is using my own voice! :D
Просмотров 97Месяц назад
discord.gg/pCPJJXP7Qx
BUD-E as potential tool to deploy positive psychology interventions at scale :)
Просмотров 28Месяц назад
greatergood.berkeley.edu/pdfs/GratitudePDFs/6Emmons-BlessingsBurdens.pdf
O1 Preview just refactored the BUD-E Client with the first try (1353 lines of code) ! :
Просмотров 27Месяц назад
discord.gg/pCPJJXP7Qx
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Просмотров 4,1 тыс.Месяц назад
arxiv.org/abs/2402.12875 This podcast was generated with AI from: notebooklm.google.com/notebook/ What a time to be alive! :)
CiteME: Can Language ModelsAccurately Cite Scientific Claims?
Просмотров 80Месяц назад
arxiv.org/abs/2407.12861 This podcast was generated with AI from: notebooklm.google.com/notebook/ What a time to be alive! :)
Fastpitch TTS 50x faster than real-time on a Colab T4 :D
Просмотров 88Месяц назад
colab.research.google.com/drive/1AXHfzF-jLBDGr0aftufDjV4_hpssFp_M?usp=sharing
BUD-E V1.0 - Now with User Interface :)
Просмотров 49Месяц назад
BUD-E V1.0 - Now with User Interface :)
BUD-E: CLIPBOARD ACCESS, SERVER-SIDE & CLIENT-SIDE SKILLS WORKING :)
Просмотров 32Месяц назад
BUD-E: CLIPBOARD ACCESS, SERVER-SIDE & CLIENT-SIDE SKILLS WORKING :)
BUD-E V1.0: Wake- & Stop-Word Detection working nicely! :)
Просмотров 392 месяца назад
BUD-E V1.0: Wake- & Stop-Word Detection working nicely! :)
BUD-E with ASR, LM & TTS all on the server (3090 RTX in Romania), Client in Germany
Просмотров 212 месяца назад
BUD-E with ASR, LM & TTS all on the server (3090 RTX in Romania), Client in Germany
Vokan TTS Tests (0-shot voice cloning)
Просмотров 1112 месяца назад
Vokan TTS Tests (0-shot voice cloning)
School BUD-E web-browser Voice Assistant
Просмотров 872 месяца назад
School BUD-E web-browser Voice Assistant
BUD-E discussing the role of emotions & self determination in learning :)
Просмотров 222 месяца назад
BUD-E discussing the role of emotions & self determination in learning :)
BUD-E Conversation demo & hints how to get it running on your own
Просмотров 242 месяца назад
BUD-E Conversation demo & hints how to get it running on your own
BUD-E Latency via Internet ~ 2 sec & local latency under 1 sec
Просмотров 202 месяца назад
BUD-E Latency via Internet ~ 2 sec & local latency under 1 sec
BUD-E V1.0 UPDATE: ALL OPEN SOURCE MODELS & LATENCY ~ 2.8 SEC
Просмотров 582 месяца назад
BUD-E V1.0 UPDATE: ALL OPEN SOURCE MODELS & LATENCY ~ 2.8 SEC
BUD-E Update: 2024 08 02
Просмотров 203 месяца назад
BUD-E Update: 2024 08 02
BUD-E V 1.0 Update: All components open source models & client-server system
Просмотров 743 месяца назад
BUD-E V 1.0 Update: All components open source models & client-server system
Server - client -dummy scripts - update
Просмотров 143 месяца назад
Server - client -dummy scripts - update
BUD E V1.0 - Architecture Blueprint
Просмотров 563 месяца назад
BUD E V1.0 - Architecture Blueprint
BUD-E - Demo
Просмотров 6873 месяца назад
BUD-E - Demo
Making BUD-E skills with GPT4 & BUD-E roleplaying Shakespeare & Julia
Просмотров 353 месяца назад
Making BUD-E skills with GPT4 & BUD-E roleplaying Shakespeare & Julia
BUD-E explains itself
Просмотров 2283 месяца назад
BUD-E explains itself

Комментарии

  • @adityaravbouddh3872
    @adityaravbouddh3872 7 дней назад

    Nice,

  • @Karl-Asger
    @Karl-Asger Месяц назад

    Congratulations! Keep up the great work.

  • @CrowleyBlack2
    @CrowleyBlack2 Месяц назад

    Better than GPT4-o Voice Mode for sure. I like it.

  • @Karl-Asger
    @Karl-Asger Месяц назад

    Wow awesome work

  • @ClydeWright
    @ClydeWright Месяц назад

    "Catchy title right?" "It really grabs you." -- spoken like someone who's only ever been forced to read AI research papers

  • @joshuatreefinancial
    @joshuatreefinancial Месяц назад

    Better than most human podcasts.

  • @mriz
    @mriz Месяц назад

    the girl voice not only enjoyable to listen but kinda hyping the podcast lol

  • @HenkPoley
    @HenkPoley Месяц назад

    "co'tee", almost..

    • @mriz
      @mriz Месяц назад

      Ce Oh Teh, seems like this bot have not yet learn how to pronouce new terms

    • @Wubwub772
      @Wubwub772 12 дней назад

      This is more accurate than you think. Sql gui nginx? Tech is full of mispronunced acronyms

  • @ahtoshkaa
    @ahtoshkaa Месяц назад

    I just noticed that it's a bit hard to listen to NotebookLM because there are no pauses at all

  • @coreyh144
    @coreyh144 Месяц назад

    "us humans" 🙄

  • @GigaMarou
    @GigaMarou Месяц назад

    cool summary!

  • @PirateFromSpace
    @PirateFromSpace Месяц назад

    It kinda freaks me out to hear it take a breath while speaking... Why does it need to take a breath???

    • @vaendryl
      @vaendryl Месяц назад

      taking breaths is part of the training data. it's trained to replicate what's in the training data. so it's going to output taking breaths.

    • @LucidKDB
      @LucidKDB Месяц назад

      Next version will cough and sneeze during winter

    • @Vakisari
      @Vakisari Месяц назад

      Because it makes it sound like people, which is the point. It's like why ChatGPT says please... improves the conversational flow, it's not about what it 'needs' to be able to do.

    • @bhannirav
      @bhannirav 25 дней назад

      Isn't it obvious? You almost had the answer right in your own question. Why does the model need to laugh? Make jokes? Have emotions like joy or excitement or surprise? The answer to all these is the same, because it is trained to replicate human data. It's literally doing "next token prediction" on human voice. And taking breaths is much the same. Note: There might be a way to RLHF it out in future, but we haven't gotten to that stage yet. Also notice these AIs completely think they are human (at one point they say "us humans"), unlike text-based models like chatGPT which are aware they are AI.

  • @moso00
    @moso00 Месяц назад

    Nothing of value is said.

  • @EnigmaticEncounters420
    @EnigmaticEncounters420 Месяц назад

    Honestly, i was thinking about putting this paper through notebooklm just to listen to it as a podcast. Im actually kind of spooked at how quickly the algorithm pointed me to *exactly* what i was thinking about. Spoooky lol. Ty op.

  • @yvainepan6988
    @yvainepan6988 Месяц назад

    notebookllm

  • @maxsch.2367
    @maxsch.2367 Месяц назад

    this is narrated by AI? oh boy

    • @augusdin
      @augusdin Месяц назад

      Obviously, it’s not.

    • @vaendryl
      @vaendryl Месяц назад

      @@augusdin read the description

    • @ahtoshkaa
      @ahtoshkaa Месяц назад

      @@augusdin NotebookLM - by Google

    • @mriz
      @mriz Месяц назад

      @@augusdin it is using Google NotebookLM

    • @Vakisari
      @Vakisari Месяц назад

      @@augusdin It is though. There's a few tools online now, utilizing stuff like Suno, to turn a transcript into a podcast with AI voices. This is pretty much exactly what it sounds like in every case.

  • @marshallmcluhan33
    @marshallmcluhan33 Месяц назад

    Notebooklm is cool

  • @tedpunt6146
    @tedpunt6146 Месяц назад

    Wow

  • @julienblanchon6082
    @julienblanchon6082 Месяц назад

    🎉

  • @JasperBusschers
    @JasperBusschers 2 месяца назад

    Is someone working on implementing something like this in developing countries? I would drop everything to come and help

  • @jeremiahmorelli4559
    @jeremiahmorelli4559 3 месяца назад

    People like those behind LAION belong in prison for life! In my opinion, LAION is an absolutely despicable organization, and I'm saying this regardless of the child pornography content, the private recordings, the stolen works of art and all the other shit that was found in their data sets. The AIs that have been fed with LAION's data sets serve as a tool for thousands of fraudsters, perverts, criminals and similar people to defraud, deceive or denigrate others. No good can or will come from this. 😣

  • @Karl-Asger
    @Karl-Asger 3 месяца назад

    Awesome work man keep it up

  • @FenrirRobu
    @FenrirRobu 4 месяца назад

    What's the license?

  • @runtdegroot
    @runtdegroot 4 месяца назад

    would be nice to be able to select an accent too :-P

  • @Kram1032
    @Kram1032 4 месяца назад

    cluster 1: happy chat cluster 2: news reporting cluster 3: unsure, something about the cadence, perhaps recital-y but it's not completely clean cluster 4: strong emphasis / enunciation cluster 5: that was all one speaker. Accent? Calm voice? cluster 6: louder talking

  • @Kram1032
    @Kram1032 4 месяца назад

    I thought the second cluster might be about a newsroom cadence, mostly, although not all of them quite had that

  • @kapilkevlani145
    @kapilkevlani145 4 месяца назад

    does it support interruptions? like if the assistant is providing response but you interrupted, so will it accept that?

  • @brondi1967
    @brondi1967 4 месяца назад

    Kinder auf Geburtstagsfeiern, beim Spielen mit Freunden, das Neugeborene auf der Entbindungsstation. Es gibt Bilder von Momenten wie diesem, die in sozialen Medien und sogar in alten Blogs oder geschlossenen Gruppen geteilt werden und ohne Genehmigung von Trainingsplattformen für künstliche Intelligenz verwendet werden. Die Beschwerde erfolgte im Anschluss an eine Feststellung der internationalen Menschenrechtsorganisation Human Rights Watch.

  • @Kram1032
    @Kram1032 5 месяцев назад

    2:40 3:00 plenty of animals have culture 3:05 I see no direct connection between culture and death anxiety and to claim that no animal but us fears dying is to be blind lol 3:34 believe not 3:40 Kierkegaard was wrong about that 4:35 I feel like that's more about social anxiety than death anxiety 5:36 I don't feel particularly meaningful nor am I contemplating my own death almost ever.

  • @shake6321
    @shake6321 5 месяцев назад

    - amazing project how fast is it? can it stream real time in 50 MS can you encode emotion into it?

  • @chriswendler5464
    @chriswendler5464 5 месяцев назад

    #1

  • @Karl-Asger
    @Karl-Asger 5 месяцев назад

    Very interesting

  • @Karl-Asger
    @Karl-Asger 6 месяцев назад

    Really well thought out, I've been thinking through memory arch designs and love your idea there, and the whole thing of course. Thanks for sharing your ideas!

  • @TheRyulord
    @TheRyulord 8 месяцев назад

    Seems like you also need to have the underlying language model fine tuned on actual speech as well. Preference tuning always seems to produce text that sounds like it was written by a PR team and no amount of vocal inflection is going to make it sound emotionally resonant.

    • @Modern-Wellbeing-Lab
      @Modern-Wellbeing-Lab 8 месяцев назад

      Yes, that is true. This is already part of our roadmap and we plan to collect a large dataset of conversational text and speech data to fine-tune the model to cover a more realistic conversational style.

  • @Kram1032
    @Kram1032 8 месяцев назад

    I think one aspect that would have to be explored is the difference between written and spoken language. The various text/chat-AIs are obviously trained on written data (duh) but that can be quite different from how you'd actually say things. So I wonder whether it'd be possible to train chat-like AIs to specifically target a response style that matches transcripts from spontaneous conversations rather than literature, so they feel more natural once they go through TTS. And the transcripts would have to be pretty high quality too, including annotated noises that only exist to convey "I'm listening, I'm paying attention, I understand / I'm trying to understand" or whatever, which would be missing from classic transcripts.

  • @karlderkafer3269
    @karlderkafer3269 8 месяцев назад

    It's great, but I wouldn't use it unless it can run locally on an affordable PC.

    • @Modern-Wellbeing-Lab
      @Modern-Wellbeing-Lab 8 месяцев назад

      It is one of our main goals to reduce the latencies as much as possible to make BUD-E available on affordable hardware in the future.

  • @GlobalCommunityMinersForum
    @GlobalCommunityMinersForum 8 месяцев назад

    Wow! thats fantastic!!

  • @kungfooman
    @kungfooman 8 месяцев назад

    I would like to run this on my pc too lol, any tutorial/guide?

    • @Modern-Wellbeing-Lab
      @Modern-Wellbeing-Lab 8 месяцев назад

      Sure! Here is a link to the github repository. A detailed installation guide is included in the readme. github.com/LAION-AI/natural_voice_assistant

  • @RadiantNij
    @RadiantNij 8 месяцев назад

    i wonder if the speed is only possible because the document being used for alignment is already downloaded. I noticed how the information being asked was under the same topic. Nonetheless its pretty cool👏👏👏

    • @Modern-Wellbeing-Lab
      @Modern-Wellbeing-Lab 8 месяцев назад

      The latencies are independent from the topic of the conversation. Even when swithing to a different topic this will have no effect on the speed of the assistant.

  • @SongStudios
    @SongStudios 8 месяцев назад

    That is so fast, how many tokens per second are you generating for the model?

    • @Modern-Wellbeing-Lab
      @Modern-Wellbeing-Lab 8 месяцев назад

      The LLM generates around 200 Tokens per second. And it takes 80-100 milliseconds to synthesize a full sentence into speech.

  • @spenhouet
    @spenhouet 8 месяцев назад

    Adding confirmations like "Sure, ...", "right, ..", "okay", "totally", ... at the start of the sentence would make it feel more like a conversation. Also instructing the LLM to add filler words like "um", "er", "uh", "hmm", "like", "mhm", "so", ... would make the voice sound more natural.

    • @NickMcCarthy-z9d
      @NickMcCarthy-z9d 8 месяцев назад

      Prebaked initial responses would also allow masking latency in the LLM

    • @Yoseqlo1
      @Yoseqlo1 8 месяцев назад

      Right? Those little inflections we don't usually notice would go a long way.

    • @Kram1032
      @Kram1032 8 месяцев назад

      I think this is really tricky but I fully agree. The issue is, that LLMs are trained on text data which, of course, is supposed to be read, not spoken, and the language modalities between those contexts are completely different. And moreover, while the text data surely contains some transcripts of spontaneous conversations (which would precisely be full of such markers), I doubt very many, even even *any* of them, decided to annotate such extra sounds in the level of detail required.

  • @otesalgate3306
    @otesalgate3306 10 месяцев назад

    *promo sm*

  • @henkjekel4081
    @henkjekel4081 10 месяцев назад

    What would you say is the best tts open source model right now that allows for streaming? I'm looking for an alternative to elevenlabs because I dont want to stimulate non-democratic AI..

  • @eugenetapang
    @eugenetapang 10 месяцев назад

    Hi, I love this. I have a small AI team, how can I help make more of this happen Finn?

  • @jenniferdsouza7708
    @jenniferdsouza7708 10 месяцев назад

    Very interesting. I just have a question about copyright friendliness of the graph -- were you suggesting that such a graph can be built only over open-sourced papers? Or how would the alternative scenario work? Thank you! :)

    • @Modern-Wellbeing-Lab
      @Modern-Wellbeing-Lab 10 месяцев назад

      We have lawyers in Germany who are very confident that this legal if we do not the original documents, but only extract structured information

  • @binichnich8517
    @binichnich8517 11 месяцев назад

    GPT4 meint zu deinem Beitrag übrigens: Die vorgestellte Lösung ist eine innovative Herangehensweise an die Herausforderung, Sprachmodelle effektiv für lange Textkontexte zu trainieren. Hier sind einige Aspekte und Vergleiche zu beachten: Einbeziehung von Kategorie- und Schlüsselwortgenerierung: Dieser Ansatz ist besonders nützlich, da er dem Modell Kontextinformationen vorab gibt, was die Genauigkeit der Textgenerierung verbessern kann. Ähnliche Techniken werden in einigen Sprachmodellen verwendet, um die Relevanz und Genauigkeit der Antworten zu erhöhen. Altersbewertung: Die Integration einer Altersbewertung als Teil des Trainingsprozesses ist einzigartig und kann dazu beitragen, altersgerechte Inhalte zu generieren. Dieser Ansatz ist besonders wichtig, um ethische Überlegungen und die Sicherheit von KI-Systemen zu berücksichtigen. Zusammenfassungen und Frage-Antwort-Paare: Dies erinnert an Techniken, die in einigen fortschrittlichen Q&A-Systemen und Zusammenfassungstools verwendet werden. Die Innovation liegt hier in der Anwendung dieser Methode für längere Kontexte, was eine Herausforderung für viele aktuelle Modelle darstellt. Erweiterte Zusammenfassungen für lange Texte: Diese Methode ähnelt dem Ansatz, der in einigen Textanalysesystemen verwendet wird, allerdings mit dem zusätzlichen Schritt, aus den Zusammenfassungen vollständige Geschichten zu generieren. Dies könnte die Fähigkeit der Modelle zur Textsynthese und kreativen Schreibens erheblich erweitern. Dialogbasierte Szenarien: Die Betonung auf dem Erinnern und Beziehen auf frühere Gesprächsteile in langen Dialogen ist ein anspruchsvoller und innovativer Ansatz. Dies spiegelt reale Gesprächsdynamiken wider und könnte die Entwicklung von KI-Systemen vorantreiben, die in der Lage sind, natürliche und zusammenhängende Gespräche zu führen. Was die Innovation betrifft, so liegt der Hauptvorteil in der Kombination verschiedener Techniken zur Verbesserung der Verarbeitung langer Texte. Während einige der Einzeltechniken bereits in verschiedenen Kontexten existieren, ist ihre Anwendung auf lange Textkontexte und die kombinierte Nutzung dieser Techniken in einem umfassenden Trainingsschema neuartig. Diese Methoden könnten insbesondere in Bereichen wie dem kreativen Schreiben, der Bildung, der Therapie und dem Kundenservice nützlich sein, wo lange und kontextreiche Dialoge eine wichtige Rolle spielen. Die Herausforderung besteht darin, die Effektivität und Genauigkeit dieser Ansätze in der Praxis zu bewerten und sicherzustellen, dass die Modelle ethische und sicherheitsrelevante Standards einhalten.

  • @Graverman
    @Graverman 11 месяцев назад

    this is so awesome!