Warning GPT-4o: DON'T translate to Chinese (MIT)
HTML-код
- Опубликовано: 17 май 2024
- MIT (Massachusetts Institute of Technology) Tech review reports on massive problems with GPT-4o regarding the Chinese language, discovering heavy tokenizer "pollution".
Warning if you use this AI to translate business correspondence into Chinese, since MIT reports on a heavy data pollution with Chinese tokens.
Currently double check the translation results of GPT-4o with an independent source, especially your business communication to your Chinese partners. Otherwise you might find your company and yourself in a strange business situation ....
All rights w/ authors:
GPT-4o’s Chinese token-training data is polluted by spam and porn websites
www.technologyreview.com/2024...
#airesearch
#gpt4o Наука
It’s time we move to something more advanced and curated than BPE for tokenizers. The “just add more data, scale!” crowd seems to have a major blind spot here. The “bitter lesson” applies to neural architectures and generalizing it to other big data tasks is, to put it nicely, a failed experiment.
I suspect teaming up with microsoft and using their search data is a mistake
So the new model is meant to be better at multilingual it was well of the bigger gains. But I fails to translate for 1.2 billion people, classic.
crap in crap out...
I just asked it to write a business letter in Chinese and it was all fine.
Thanks for the update. Running multiple generations of models in parallel on multiple clusters helps with internal switching.