GPT scrapes + answers from any sites (ft. Chromadb, Trafilatura)

Поделиться
HTML-код
  • Опубликовано: 3 дек 2024

Комментарии • 24

  • @moreshk
    @moreshk Год назад

    Another great video in this series!

  • @MrNootka
    @MrNootka Год назад +1

    love your tutorials thanks!
    Tip: if you could please make your cam smaller & circular would be a great upgrade to your videos :)

    • @SamuelChan
      @SamuelChan  Год назад

      Good tip! And relatively easy to implement! Thank you! :)

  • @arnaudlacour1188
    @arnaudlacour1188 Год назад +2

    when I try this exact thing I get an error that GPTChromaIndex is not in llama_index, can you think of a reason why?

    • @SamuelChan
      @SamuelChan  Год назад +1

      Yes! When this lesson was published the latest version of LlamaIndex was 0.5.7.
      2 months later it’s now 0.6.x.
      So you can downgrade to the 0.5.7 version to follow along or just use a new environment and then pip install -r requirements.txt from the GitHub repo.
      I’m in the middle of upgrading the codebase to the latest version but admittedly have limited time between my day job, so we’ll see! :)

    • @arnaudlacour1188
      @arnaudlacour1188 Год назад

      @@SamuelChan very awesome of you to reply so quickly! Much appreciated, thank you!

  • @SivaKumar-of7mu
    @SivaKumar-of7mu Месяц назад

    I also cant find repo on your git

  • @TheShreyas10
    @TheShreyas10 2 месяца назад

    Hey can you please share the repo, I cant find it on your git

  • @noualiibrahimyassine1336
    @noualiibrahimyassine1336 Год назад

    Great tutorial, thank you.
    Question: in my terminal window i'm getting only question/answer, i'm not getting the other additional informations like llm token usage, sentenceTransformer, pytorch device, etc... How can i get those informations ?

    • @SamuelChan
      @SamuelChan  Год назад

      Thank you!
      You can do logging many different ways and I showed them in many videos later on in this series. For example, in the "building a GPT-powered journal system"
      ruclips.net/video/OzDhJOR5IfQ/видео.htmlsi=SZXzbH1hLeJ0QFzH
      I use the following technique to wrap the returned results.
      import logging
      logging.basicConfig(stream=sys.stdout, level=logging.INFO)
      logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
      LangChain also has its own tracking utilities:
      from langchain.callbacks import get_openai_callback
      with get_openai_callback() as cb:
      result = llm("Your query")
      print(cb)
      The context manager (cb) that is printed returns:
      Tokens Used: 42
      Prompt Tokens: 4
      Completion Tokens: 38
      Successful Requests: 1
      Total Cost (USD): $0.00084

    • @noualiibrahimyassine1336
      @noualiibrahimyassine1336 Год назад

      @@SamuelChan Thank you !

  • @llmia-n2x
    @llmia-n2x Год назад +1

    Please can you make similar video with open source (free) LLM ?

    • @SamuelChan
      @SamuelChan  Год назад +1

      LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma)
      ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
      I have a lot of videos where I use open source LLMs from huggingface. I also have a video that shows how to use a locally-hosted LLM on your machine! Check out the playlist above! :)

    • @llmia-n2x
      @llmia-n2x Год назад +1

      @@SamuelChan Thanks à lot. I'll check

  • @utkarshpandey8967
    @utkarshpandey8967 Год назад

    I am not able to use GPT ChromaIndex in python 3.10 can you suggest an alternative for this

    • @SamuelChan
      @SamuelChan  Год назад

      What does "not able to use" means? did you fork from the github repo? if you install the dependencies it will work with python 3.10 (and I try to keep it up to date with every major version update of LangChain and LlamaIndex) -- cant see any reason why it wont work.

  • @ramp2011
    @ramp2011 Год назад

    Thank you for the video. I just checked your github and I do not see the code copied over. Could you please copy over this code there? Thank you

    • @SamuelChan
      @SamuelChan  Год назад

      Hey its here in the GitHub repo!
      github.com/onlyphantom/llm-python/blob/main/6_team.py

  • @8eck
    @8eck Год назад

    This Trafilatura is able to read javascript websites? I mean, can it read react-based websites?

    • @SamuelChan
      @SamuelChan  Год назад

      Depends on whether the react side uses SSG (static site generation), SSR (server side rendering) or CSR (client), it works like any other web crawler / scraper :)

    • @8eck
      @8eck Год назад

      @@SamuelChan naah, i was talking exactly about non SSR or static generated.

    • @8eck
      @8eck Год назад

      @@SamuelChan Guess it can read only non-js content.

    • @SamuelChan
      @SamuelChan  Год назад +1

      Yeah not with Trafilatura I don’t think
      I think for those cases you can use an automation tool like Selenium to do a wait, wait for 1 second till content has loaded, and then retrieve. If div id not found, wait another 1 second etc in a while loop with break statement?