Azure OpenAI Service - Rate Limiting, Quotas, and throughput optimization

Поделиться
HTML-код
  • Опубликовано: 6 окт 2024
  • This video explains how Azure OpenAI Service's rate limiting and quota configuration works and shows suggestions for optimizing the throughput for a given model.
    Blog post: clemenssiebler...
    #azure #openai #gpt4

Комментарии • 7

  • @Stateoftheheart
    @Stateoftheheart 8 месяцев назад +1

    Thank you, Clemens, very helpful! Keep them coming :)

  • @jonathanbarton5243
    @jonathanbarton5243 4 месяца назад

    Most concise explanation - thank you

  • @jagadeeskumarlenin5517
    @jagadeeskumarlenin5517 9 месяцев назад

    Thanks for this video. May i know what is the user hit limt for 240k token. (Per second or per minute)

    • @Leavinggermany
      @Leavinggermany 9 месяцев назад +1

      It’s in TPMs, so Tokens per Minute. There’s now also a dynamic quota feature that allows to go over that limit in case there is capacity. 👍🏻

  • @jagadeeskumarlenin5517
    @jagadeeskumarlenin5517 9 месяцев назад

    Is it only supported for round robin only ?

  • @nclub976
    @nclub976 8 месяцев назад

    Hello. I want to use Chatgbt4 Turbo vision for my application however I am not sure about the charges I am paying the way of calculation is very confusing to me. Does anyone know for sure what is paid on Azure open ai for using the Chatgbt 4 Turbo vision model, is it just spent tokens or something extra,host? Thank you

    • @clemenssiebler
      @clemenssiebler  8 месяцев назад

      Azure OpenAI just charges you for the tokens you consume when you use pay as a you go! 👍🏻