Это видео недоступно.
Сожалеем об этом.

Deduct OpenAI GPT-4o's Neural Network Architecture

Поделиться
HTML-код
  • Опубликовано: 16 авг 2024
  • GPT-4o, ... new flagship model that can reason across audio, vision, and text in real time.”
    GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction-it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.
    #google #gpt4o #gpt-4o #openai #whisper #multimodal #audio #voicemode #voiceengine #voice #speech #projectastra #astra
    ==
    Olewave offers avant-garde bespoke solutions for proprietary data labeling, normalization, and transformation.
    Tired of inaccurate transcriptions and frustrating APIs? Olewave offers a superior solution with:
    • AI-powered Accuracy: Transcribe any audio, regardless of language, dialect, accent, or topic, with exceptional accuracy. We surpass the competition in understanding even the most challenging recordings.
    • Detailed Insights: Gain valuable insights with word/character-level confidence scores, precise timestamps, and advanced speech analytics.
    • Privacy Guaranteed: Keep your data secure. Integrate our powerful data labeling tool directly into your platform, eliminating risks associated with external APIs.
    • Competitive Pricing: Enjoy high-quality service at accessible prices, outperforming both tech giants and human-intensive transcription solutions.
    Ready to experience the difference? Don't settle for mediocrity. Contact info@olewave.com and give us a try!
    Customized Large-Scale Datasets
    Olewave delivers customized, labeled, and validated large-scale real-world NLP/CV/speech/multimodal datasets of various scenarios such as dictation and conversation in multi accents/dialects/languages, and of diverse topics such as education, finance, legal, entertainment, healthcare, retail, and customer service.
    Our datasets include:
    • topic-specific text datasets for training your own LLM/ChatGPT/LLaMA model.
    • visual/video/image datasets with tags/prompts for training your own CV/SAM model;
    • speech/audio datasets of different languages and dialects for training your own ASR/Whisper/SeamlessM4T/TTS model.
    • and multimodal datasets.
    We constantly collect timely data from languages including Brazilian Portuguese, Latin America Spanish, Arabic, Southeast Asian, Chinese, Japanese, Korean.
    Faster and affordable in data delivery than traditional data vendors;
    More effective and efficient than traditional data vendors.

Комментарии • 5

  • @reynoldsVincent
    @reynoldsVincent 2 месяца назад

    Working my way through this, it looks like you did a lot of insightful work, Sub'd--thanks!

  • @haibinwu1568
    @haibinwu1568 2 месяца назад +1

    What do you think is the most likely tokenizer for audio in GPT-4O? What do you think are the possible solutions? Could it be BASE TTS style tokens or Codec-style tokens?

    • @olewave
      @olewave  2 месяца назад +1

      What do you think is the most likely tokenizer for audio in GPT-4O? What do you think are the possible solutions?
      > I think you meant 'encoder' other than 'tokenizer'. I mentioned that in the video, please watch the video.
      Could it be BASE TTS style tokens or Codec-style tokens?
      > This is very specific, I am not OpenAI employees. I only infer their architecture, not their implementation.

  • @augmentos
    @augmentos 2 месяца назад

    Great video ser thanks subd

    • @augmentos
      @augmentos 2 месяца назад

      Would love you to expand on how they ‘optimize heavily’ to run very fast