Offline Speech Recognition on Meta Quest: Testing Unity Sentis + Whisper AI

Поделиться
HTML-код
  • Опубликовано: 13 янв 2025

Комментарии • 11

  • @andru5054
    @andru5054 4 месяца назад

    thanks for the video

  • @xanderlinhares
    @xanderlinhares 4 месяца назад +2

    What about running the model in a background thread? Is that possible or is this bound to the main thread somehow?

    • @LudicWorlds
      @LudicWorlds  4 месяца назад +2

      Good point! However, I've not looked into that yet - this video represents a first step in getting Whisper working. I do intend to improve upon this (I'd also like it to detect and process audio without the need to press any buttons). Will add to my todo list!

    • @gamermixer7953
      @gamermixer7953 Месяц назад +2

      I tried it, but the issue is that it completely overwhelms the CPU to the point where all threads are delayed (freezing). I tested it on a Pico 4 headset with an XR2 Gen1, which is less capable than the Q3 with the newer XR2 Gen2 (much improved CPU and GPU). I also tried running the process using Unity Jobs on separate threads, along with some other optimizations, but the results are still more or less the same. It feels a bit better this way, but there is still significant lag and freezing when the encoding is running. It's just unbearable and unusable with my hardware. maybe using tensorflow lite library and convert whisper to that format and try it this way, I will update once I find a good enough solution

  • @홍성찬-j3q
    @홍성찬-j3q 4 месяца назад +1

    "I'm curious, are the models that can be run on Unity Sentis limited?"

    • @LudicWorlds
      @LudicWorlds  4 месяца назад

      AFAIK, the models you can run on Unity Sentis are somewhat limited. It's designed to work on-device within your Unity app, so it's best suited for smaller, optimized models. You need to consider the capabilities of your target platform. You can use models exported to the ONNX format, but more complex models might not run efficiently.

    • @홍성찬-j3q
      @홍성찬-j3q 3 месяца назад

      @@LudicWorlds I wanted to put in a TTS model specialized for Korean and wondered if it was possible!
      thank you!

  • @emmanouildrivas4961
    @emmanouildrivas4961 2 месяца назад

    Hi, good introduction video. I was thinking maybe you can create a tutorial how to use unity sentis to create an AI NPCs that generate dialogs based on its knowledge database? I saw a video for the Inworld AI ruclips.net/video/gUCahPSAut4/видео.html but right now it don't exist in the asset store so I was wondering maybe I can create an AI with similar abilities but using instead the Unity Sentis. What do you think about it? Also is there a tutorial how to create models for Unity sentis?

    • @LudicWorlds
      @LudicWorlds  2 месяца назад +1

      Thanks for the suggestion! I've just checked the 'Inworld' website. and it seems they've changed their licensing - no more free tier! :( I do, in fact, plan to expand upon this tutorial: I would like to feed the results from 'Whisper' into a locally running LLM. Probably would not feasibly to run this on a standalone headset, but it may be OK for PC-based VR.

    • @gamermixer7953
      @gamermixer7953 Месяц назад +1

      @@LudicWorlds I tried running a local quantized 2B/1.5B parameter models, and yeah, it didn’t go well. The tracking system, along with the huge background tasks, is always overloading the CPU. There is simply not enough room for an LLM (tested on a Pico 4 with XR2 Gen 1) at least at the moment.

    • @LudicWorlds
      @LudicWorlds  Месяц назад +1

      @@gamermixer7953 Yep, I was expecting that the Quest would struggle with even the Tiny Model - but I was keen to try it out regardless. ;) Give it a couple generations though...