Good point! However, I've not looked into that yet - this video represents a first step in getting Whisper working. I do intend to improve upon this (I'd also like it to detect and process audio without the need to press any buttons). Will add to my todo list!
I tried it, but the issue is that it completely overwhelms the CPU to the point where all threads are delayed (freezing). I tested it on a Pico 4 headset with an XR2 Gen1, which is less capable than the Q3 with the newer XR2 Gen2 (much improved CPU and GPU). I also tried running the process using Unity Jobs on separate threads, along with some other optimizations, but the results are still more or less the same. It feels a bit better this way, but there is still significant lag and freezing when the encoding is running. It's just unbearable and unusable with my hardware. maybe using tensorflow lite library and convert whisper to that format and try it this way, I will update once I find a good enough solution
AFAIK, the models you can run on Unity Sentis are somewhat limited. It's designed to work on-device within your Unity app, so it's best suited for smaller, optimized models. You need to consider the capabilities of your target platform. You can use models exported to the ONNX format, but more complex models might not run efficiently.
Hi, good introduction video. I was thinking maybe you can create a tutorial how to use unity sentis to create an AI NPCs that generate dialogs based on its knowledge database? I saw a video for the Inworld AI ruclips.net/video/gUCahPSAut4/видео.html but right now it don't exist in the asset store so I was wondering maybe I can create an AI with similar abilities but using instead the Unity Sentis. What do you think about it? Also is there a tutorial how to create models for Unity sentis?
Thanks for the suggestion! I've just checked the 'Inworld' website. and it seems they've changed their licensing - no more free tier! :( I do, in fact, plan to expand upon this tutorial: I would like to feed the results from 'Whisper' into a locally running LLM. Probably would not feasibly to run this on a standalone headset, but it may be OK for PC-based VR.
@@LudicWorlds I tried running a local quantized 2B/1.5B parameter models, and yeah, it didn’t go well. The tracking system, along with the huge background tasks, is always overloading the CPU. There is simply not enough room for an LLM (tested on a Pico 4 with XR2 Gen 1) at least at the moment.
@@gamermixer7953 Yep, I was expecting that the Quest would struggle with even the Tiny Model - but I was keen to try it out regardless. ;) Give it a couple generations though...
thanks for the video
What about running the model in a background thread? Is that possible or is this bound to the main thread somehow?
Good point! However, I've not looked into that yet - this video represents a first step in getting Whisper working. I do intend to improve upon this (I'd also like it to detect and process audio without the need to press any buttons). Will add to my todo list!
I tried it, but the issue is that it completely overwhelms the CPU to the point where all threads are delayed (freezing). I tested it on a Pico 4 headset with an XR2 Gen1, which is less capable than the Q3 with the newer XR2 Gen2 (much improved CPU and GPU). I also tried running the process using Unity Jobs on separate threads, along with some other optimizations, but the results are still more or less the same. It feels a bit better this way, but there is still significant lag and freezing when the encoding is running. It's just unbearable and unusable with my hardware. maybe using tensorflow lite library and convert whisper to that format and try it this way, I will update once I find a good enough solution
"I'm curious, are the models that can be run on Unity Sentis limited?"
AFAIK, the models you can run on Unity Sentis are somewhat limited. It's designed to work on-device within your Unity app, so it's best suited for smaller, optimized models. You need to consider the capabilities of your target platform. You can use models exported to the ONNX format, but more complex models might not run efficiently.
@@LudicWorlds I wanted to put in a TTS model specialized for Korean and wondered if it was possible!
thank you!
Hi, good introduction video. I was thinking maybe you can create a tutorial how to use unity sentis to create an AI NPCs that generate dialogs based on its knowledge database? I saw a video for the Inworld AI ruclips.net/video/gUCahPSAut4/видео.html but right now it don't exist in the asset store so I was wondering maybe I can create an AI with similar abilities but using instead the Unity Sentis. What do you think about it? Also is there a tutorial how to create models for Unity sentis?
Thanks for the suggestion! I've just checked the 'Inworld' website. and it seems they've changed their licensing - no more free tier! :( I do, in fact, plan to expand upon this tutorial: I would like to feed the results from 'Whisper' into a locally running LLM. Probably would not feasibly to run this on a standalone headset, but it may be OK for PC-based VR.
@@LudicWorlds I tried running a local quantized 2B/1.5B parameter models, and yeah, it didn’t go well. The tracking system, along with the huge background tasks, is always overloading the CPU. There is simply not enough room for an LLM (tested on a Pico 4 with XR2 Gen 1) at least at the moment.
@@gamermixer7953 Yep, I was expecting that the Quest would struggle with even the Tiny Model - but I was keen to try it out regardless. ;) Give it a couple generations though...