Training and adding new tokens in a Pre-trained Tokenizer !!

Поделиться
HTML-код
  • Опубликовано: 27 янв 2025

Комментарии • 5

  • @tharunbhaskar6795
    @tharunbhaskar6795 5 месяцев назад

    Some quality content here. So the new tokens just get appended to the current tokenizer right?

  • @CantBeSubh
    @CantBeSubh 7 месяцев назад

    good shit bro

  • @pranilpatil4109
    @pranilpatil4109 5 месяцев назад

    I did the same for 22 Indian languages. But when I searched a kannada language character in the tokens for a test purpose, it was not showing anything. Also, tokenizer separates punctuation as well. Your method of splitting is not optimal.