OpenAI's gpt4o

Поделиться
HTML-код
  • Опубликовано: 13 май 2024
  • GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction-it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.
    In this video, I talk about the following: GPT-4omni Capability Summary. Several use cases supported by gpt4o. Comparison of Gpt4o with other models on Text, Audio ASR, Audio Translation, M3Exam (multimodal and multilingual QA), Vision understanding and Language tokenization.
    For more details, please look at openai.com/index/hello-gpt-4o/
  • НаукаНаука

Комментарии • 1

  • @4141462
    @4141462 20 дней назад

    Would love to understand how single stage voice to text to audio works. Intresting to know if it's clever programming or a method.