Warning GPT-4o: DON'T translate to Chinese (MIT)

Поделиться
HTML-код
  • Опубликовано: 17 май 2024
  • MIT (Massachusetts Institute of Technology) Tech review reports on massive problems with GPT-4o regarding the Chinese language, discovering heavy tokenizer "pollution".
    Warning if you use this AI to translate business correspondence into Chinese, since MIT reports on a heavy data pollution with Chinese tokens.
    Currently double check the translation results of GPT-4o with an independent source, especially your business communication to your Chinese partners. Otherwise you might find your company and yourself in a strange business situation ....
    All rights w/ authors:
    GPT-4o’s Chinese token-training data is polluted by spam and porn websites
    www.technologyreview.com/2024...
    #airesearch
    #gpt4o
  • НаукаНаука

Комментарии • 8

  • @mshonle
    @mshonle Месяц назад +1

    It’s time we move to something more advanced and curated than BPE for tokenizers. The “just add more data, scale!” crowd seems to have a major blind spot here. The “bitter lesson” applies to neural architectures and generalizing it to other big data tasks is, to put it nicely, a failed experiment.

  • @Quaintcy
    @Quaintcy Месяц назад +2

    I suspect teaming up with microsoft and using their search data is a mistake

  • @justindressler5992
    @justindressler5992 Месяц назад

    So the new model is meant to be better at multilingual it was well of the bigger gains. But I fails to translate for 1.2 billion people, classic.

  • @propeacemindfortress
    @propeacemindfortress Месяц назад +4

    crap in crap out...

  • @dragonbone1020
    @dragonbone1020 Месяц назад

    I just asked it to write a business letter in Chinese and it was all fine.

    • @code4AI
      @code4AI  Месяц назад

      Thanks for the update. Running multiple generations of models in parallel on multiple clusters helps with internal switching.