ChatGPT - but Open Sourced | Running HuggingChat locally (VM) | Chat-UI + Inference Server + LLM

Поделиться
HTML-код
  • Опубликовано: 7 авг 2024
  • In this video you learn how you can run HuggingChat, an Open Sourced ChatGPT alternative, locally (on a VM) and interact with the Open Assistant model, respectively with any LLM, in two variants.
    Variant 1: Run just the Chat-UI locally and utilize a remote inference endpoint from Hugging Face
    Variant 2: Run the whole stack, the Chat-UI, the Text Generation Inference Server and the (Open Assistant) LLM on your Virtual Machine
    Chapters in this video:
    0:00 - Intro and Explanation
    0:48 - Live Demo: Running HuggingChat locally
    4:53 - Installing the Chat-UI
    10:19 - Run the Chat-UI
    11:26 - Text Generation Inference Server
    12:57 - Set up the Inference Server
    15:37 - Run the Inference Server
    16:22 - Testing the Open Assistant Model
    17:08 - Connect Chat-UI to Inference Server
    18:52 - Outro
    Video related links:
    - SSH into Remote VM with VS Code: • SSH into Remote VM wit...
    - Detailed instructions and Scripts (Free): www.blueantoinette.com/2023/0...
    - Downloadable Installation Scripts (Payable): www.blueantoinette.com/produc...
    About us:
    - Homepage: www.blueantoinette.com/
    - Contact us: www.blueantoinette.com/contac...
    - Twitter: / blueantoinette_
    - Consulting Hour: www.blueantoinette.com/produc...
    Hashtags:
    #chatgpt #opensource #huggingchat #openassistant
  • НаукаНаука

Комментарии • 44

  • @BlueAntoinette
    @BlueAntoinette  Год назад +4

    In the meanwhile we created a HuggingChat Plugin for aitom8 which is our professional AI Automation software. It allows you to install HuggingChat with just one command. While everything explained in this video is still valid and fully functional, please consider further improving your efficiency with this video: ruclips.net/video/HO1V7kLQu6s/видео.html

  • @kaitglynn2472
    @kaitglynn2472 11 месяцев назад +1

    Thank you so much for this wealth of knowledge!! Spectacular job!

  • @itsmith32
    @itsmith32 Год назад +1

    Thank you so much! Great job

  • @BlueAntoinette
    @BlueAntoinette  11 месяцев назад +2

    Update: New video about running Code Llama locally available:
    ruclips.net/video/mhq6BQX0_P0/видео.html

  • @user-fu7er9gl1g
    @user-fu7er9gl1g Год назад +4

    Thanks for this detailed tutorial. Would you mind sharing the scripts that you created?

    • @BlueAntoinette
      @BlueAntoinette  Год назад +6

      Hi, I now have added a link to my instructions and scripts in the video description. You can access it directly on our site at this link www.blueantoinette.com/2023/05/09/chatgpt-but-open-sourced-running-huggingchat-locally-vm/

  • @chuizitech
    @chuizitech 11 месяцев назад

    兄弟,感谢!正好准备进行私有化部署。

  • @MultiTheflyer
    @MultiTheflyer Год назад +1

    Thank you!!! this has been super useful. I'm trying to use this front end, however I'd like to use OPENAI APIs as a backend, because it currently supports function calling (I don't know of any other model that does). I'm quite new to programming in general and don't have any experience with docker, in my understanding though, the huggingface chatui front end cannot be "edited" and can only be deployed as is because it's already in a container, is that correct?
    I'd like to change it slightly so that it shows when a function is being called etc but it seems that's not possible right?
    thanks again for the useful tutorial, it really did open up a new world of possibilities to me

    • @BlueAntoinette
      @BlueAntoinette  Год назад +2

      Not quite right. I do not run the Chat-UI in a container, instead I run its source code directly with npm run, please check this out again in the video. If you want to make changes to the source code simply clone or fork it from repo and adapt it to your needs. The Chat-UI is written in Typescript and it utilizes Svelte and Tailwind, so you want to make yourself familiar with these technologies.

    • @MultiTheflyer
      @MultiTheflyer Год назад

      @@BlueAntoinette thank you!

  • @ShravaniSreeRavinuthala
    @ShravaniSreeRavinuthala 2 месяца назад

    Thank you for this video, I am trying to use the UI with my custom backend server which has RAG setup in it, but all it needs as parameters are the queries, as per what I explored, it looks like I have to make changes in the source code, is there any easier way to achieve this

    • @BlueAntoinette
      @BlueAntoinette  2 месяца назад

      I did the same once with my RAG backend and I had to make changes to the source code as well. Learn more about my solution here: aitomChat - Talk with documents | Retrieval Augmented Generation (RAG) | Huggingchat extension
      ruclips.net/video/n63SDeQzwHc/видео.html

  • @eduardmart1237
    @eduardmart1237 Год назад +2

    Is it possible to train it on a custom data?
    What are the ways to it?
    Does it support any languages except for English?

    • @BlueAntoinette
      @BlueAntoinette  Год назад +2

      Theoretically you can run it with any model, however I just tested it with the Open Assistent Model so far.

  • @oryxchannel
    @oryxchannel Год назад

    I wonder if you can build a privacy filter built of diverse prompt clusters. Tell it that its all PII or something so that the VM isn't able to read your data. Tunneling and all that. This may be an added privacy solution, and maybe it won't work at all. But the fact that it's on a Google virtual machine does not mean that it is "local" or "private". Also, if you had the video support, an MMLU AI benchmark would be helpful.

  • @thevadimb
    @thevadimb Год назад +1

    First, thank you for your video and for sharing your experience! A question - why did you allocate two GPUs? Why do you need more than one for simple inference purposes?

    • @BlueAntoinette
      @BlueAntoinette  Год назад +1

      Well, this was a little bit of trial and error. I first increased the number of GPUs and then, because it still did not work, the CPUs and RAM, which eventually turned out to be the deciding factor. So potentially you can get away with just one GPU, but I did not test that.

    • @thevadimb
      @thevadimb Год назад +1

      @@BlueAntoinette Thank you!

    • @BlueAntoinette
      @BlueAntoinette  Год назад +2

      @@thevadimb FYI, I now tried it with just one GPU but this results in an error "AssertionError: Each process is one gpu". Then I tried to reduce the number of shards to 1 but this results in waiting endlessly with the message "Waiting for shard 0 to be ready...". Therefore the only reliable configuration so far is the one that I show in the video (with 2 GPUs).

    • @thevadimb
      @thevadimb Год назад +1

      @@BlueAntoinette Thank you for devoting your time to checking this point. It is a bit weird that it requires at least two GPUs. HF did tremendous work building this server, so it is a bit strange that after all this profound design they ended up with such a strange restriction. I would bet that there is some hidden configuration setting... Probably 🙂

    • @BlueAntoinette
      @BlueAntoinette  Год назад +2

      @@thevadimb Well, apparently they optimized it for their needs. Maybe there are settings for this or it requries changes to the code and a rebuild of their docker image. However that's beyond the time I can spend on it for free.

  • @frankwilder6860
    @frankwilder6860 11 месяцев назад

    Is there an easy way to run the HuggingChat UI on port 80 with SSL encryption?

    • @BlueAntoinette
      @BlueAntoinette  11 месяцев назад +1

      Yes, you can setup NGINX Reverse Proxy with SSL encryption. I fully automated this process in this video: ruclips.net/video/v0D2rNHmSD4/видео.htmlsi=xjU2QGt_vQHXaBgj
      With this approach it requires just one command!

  • @user-zz9pb9yg9q
    @user-zz9pb9yg9q 8 месяцев назад

    damn, almost 600 usd monthly for the inference server alone.

    • @BlueAntoinette
      @BlueAntoinette  8 месяцев назад +1

      Well, just if you choose variant 2. Actually, the high costs for variant 2 are caused by the required hardware, the inference server (software) comes at no cost (apart from integration, automation, maintenance, etc).
      LLMs are very resource intensive and the cloud providers charge a lot for the required GPUs.
      Alternatively you can stick with the remote endpoints (variant 1).

  • @deathybrs
    @deathybrs Год назад +1

    I am a little curious - why a VM?

    • @BlueAntoinette
      @BlueAntoinette  Год назад +2

      You mean in contrast to running on your local machine? Well, there are several reasons. For example if you do not have sufficient hardware resources on your local machine, which is especially likely when you choose variant 2. Or if want to to make it publicly available with SSL encryption and reverse proxy.

    • @fredguth1315
      @fredguth1315 Год назад

      Also, if you are developing as s team, a vem is handy for keeping environment in sync

    • @deathybrs
      @deathybrs Год назад

      At the end of the day, I think maybe I should have been more clear in my question.
      Why a VM *before* explaining how to set it up *without* a VM? I understand the value of a VM, but there aren't many videos explaining how to do this, so why *start* with the VM explanation rather than explaining how to get it set up in our native environment first?

    • @BlueAntoinette
      @BlueAntoinette  Год назад +1

      @@deathybrs When it comes down to the Chat-UI, there is no difference in the installation on a VM or on a local linux compatible machine. If you don‘t want to use a VM then you don‘t have to for the UI part. If you run Windows locally you could utilize WSL as well. If you want to discuss your situation more specific feel free to share your native environment.

    • @deathybrs
      @deathybrs Год назад +1

      @@BlueAntoinette I really appreciate that, thanks!
      At this point, I am actually not ready right now to set it up as my use-case for AI is 90% Diffusion rather than LLM, and I suspect that unless I need it soon, the tech will have changed shape so much that by the time I do get there, a video made now would not be applicable enough for it to be worth your time.
      But as I said, your kindness is certainly appreciated!

  • @GAMINGDEADCRACKER
    @GAMINGDEADCRACKER Год назад +1

    May I get you mail address? I want to know more about it.

    • @BlueAntoinette
      @BlueAntoinette  Год назад +1

      Yes, please find my contact details here: www.blueantoinette.com/contact-us/
      Thx!