wow this is exactly what I've been looking for, subscribed instantly. do you interested to cover more models? such as kyutai moshi, hertz-dev? they seems use different architecture.
@EfficientNLP awesome, can't wait until next video. and... well they are pretty similar but i think the architecture inside is different, however they aren't as smart as openai realtime API. oh this one = llama-omni this one base on llama 3 with similar realtime AI Conversation
Neither of the two models in this video have RAG, but it is possible to add a retrieval system prior to generation, since text tokens can be interleaved into speech LLMs.
Awesome content
Very interesting video, as usual!
wow this is exactly what I've been looking for, subscribed instantly. do you interested to cover more models? such as kyutai moshi, hertz-dev? they seems use different architecture.
Great suggestions! I haven't looked at these two, but they are certainly relevant.
@EfficientNLP awesome, can't wait until next video. and... well they are pretty similar but i think the architecture inside is different, however they aren't as smart as openai realtime API. oh this one = llama-omni this one base on llama 3 with similar realtime AI Conversation
Can you also make a video about Moshi or Mimi and how they have been trained?
Edit: maybe also mini-omni2?
Thanks for the suggestion; I will keep it in mind for the next video!
didn't check moshi from Kyutai ??
You are correct; this is a relevant model, and the field is evolving rapidly. However, the principles in this video should still apply.
Can it support RAG?
Neither of the two models in this video have RAG, but it is possible to add a retrieval system prior to generation, since text tokens can be interleaved into speech LLMs.