Learn Data with Mark
Learn Data with Mark
  • Видео 99
  • Просмотров 427 320
Full-Text Search vs Vector Search (RAG with DuckDB)
In this video, we're going to compare Full-Text Search and Vector Search using DuckDB. We'll learn how to load data and query for both approaches, as well as seeing the types of queries where each thrives.
#duckdb #vectorsearch #databases #llamacpp
Code - github.com/mneedham/LearnDataWithMark/blob/main/fts-vs-vector-search/fts_vector.ipynb
llama.cpp - llama-cpp-python.readthedocs.io/en/latest/
langchain-text-splitters - pypi.org/project/langchain-text-splitters/
DuckDB fts - duckdb.org/docs/extensions/full_text_search.html
DuckDB vector search - duckdb.org/2024/05/03/vector-similarity-search-vss.html
Просмотров: 967

Видео

Search-Based RAG with DuckDB and GLiNER
Просмотров 647День назад
In this video, we're going to learn how to do Search-Based RAG on a podcast trasnscript. Instead of doing vector search to retrieve appropriate text chunks, we'll use Full-Text Search with DuckDB. We'll also use GLiNER to extract key words from the user's prompt. #gliner #duckdb #llamacpp Import code - github.com/mneedham/LearnDataWithMark/blob/main/search-based-rag/import.py Search code - gith...
Local RAG with llama.cpp
Просмотров 1,3 тыс.14 дней назад
In this video, we're going to learn how to do naive/basic RAG (Retrieval Augmented Generation) with llama.cpp on our own machine. * Mixed Bread AI - huggingface.co/mixedbread-ai/mxbai-embed-large-v1 * Llama3 - huggingface.co/bartowski/Llama-3-Instruct-8B-SPPO-Iter3-GGUF * llama.cpp - llama-cpp-python.readthedocs.io/en/latest/ * Qdrant - github.com/qdrant/qdrant-client * langchain-text-splitters...
A UI to quantize Hugging Face LLMs
Просмотров 43921 день назад
In this video, we're going to learn about a Hugging Face Space called gguf-my-repo that makes it super easy to quantize LLMs to GGUF format so that you can run them on your laptop. #huggingface #llms #gguf #llamacpp GGUF my repo - huggingface.co/spaces/ggml-org/gguf-my-repo llama.cpp - github.com/ggerganov/llama.cpp GGUF - huggingface.co/docs/hub/en/gguf#viewer-for-metadata tensors-info
Mistral 7B Function Calling with llama.cpp
Просмотров 902Месяц назад
In this video, we'll learn how to do Mistral 7B function calling using llama.cpp. And it works much better than my experiments with Ollama. #llms #mistralai #llamacpp Blog - www.markhneedham.com/blog/2024/06/23/mistral-7b-function-calling-llama-cpp/ Model Downloader - github.com/bodaay/HuggingFaceModelDownloader llama.cpp Python - llama-cpp-python.readthedocs.io/en/latest/ Mistral 7B v3 - huggi...
Does Mistral 7B function calling ACTUALLY work?
Просмотров 704Месяц назад
Since making an Intro to Mistral 7B with support for function calling video last week, I've been playing around with it a bit more with different parameters to see how well it works beyond the Hello World example. The results aren't promising and in this video I'll share what I tested and the output that I saw. #functioncalling #llms #mistralai Code - github.com/mneedham/LearnDataWithMark/tree/...
Mistral 7B Function Calling with Ollama
Просмотров 2,2 тыс.Месяц назад
In this video, we're going to learn how to do function calling with the latest Mistral 7B model, which has been fine tuned for just that purpose. #mistralai #llms #functioncalling Code - github.com/mneedham/LearnDataWithMark/blob/main/mistral-v3/app.py Mistral - ollama.com/library/mistral Ollama - ollama.com/
Hugging Face SafeTensors LLMs in Ollama
Просмотров 1,9 тыс.Месяц назад
In this video, we're going to learn how to use Hugging Face safetensors models with Ollama on our own machine. We'll also learn how to quantize the model to reduce the memory required and increase the number of tokens generated per second. #llms #ollama #safetensors Code from video - github.com/mneedham/LearnDataWithMark/tree/main/ollama-own-model Ollama Quantization options - github.com/ollama...
Are LLaVA variants better than original?
Просмотров 1,3 тыс.Месяц назад
LLaVA is an open-source large multi-modal that uses a combination of the Vicuna LLM and CLIP vision encoder. In this video, we're going to compare the initial LLaVA model with more recently trained LLaVA models based on Meta's llama3 and Microsoft's phi3. We'll see if they can extract code from a SQL query, tell us who Cristiano Ronaldo is, understand a graph/network diagram, and more! #lmmsson...
An Ollama Chatbot Arena (with Streamlit)
Просмотров 1 тыс.2 месяца назад
In this video, we introduce a Chatbot Arena for Ollama models written using Streamlit. You can use the arena to do blind comparisons or your local LLMs answering the same prompts and then vote on which one you think answered the question best. #ollama #llms #streamlit Chatbot Arena - github.com/mneedham/chatbot-arena/tree/main Ollama - ollama.com/ Streamlit - streamlit.io/ structlog - www.struc...
Ollama can run LLMs in parallel!
Просмотров 3,5 тыс.2 месяца назад
In this video, we're going to learn how to run LLMs in parallel on our local machine using Ollama version 0.1.33. #ollama #llms #llama3 #phi3 Code - github.com/mneedham/LearnDataWithMark/blob/main/ollama-parallel/app.py Ollama 0.1.33 - github.com/ollama/ollama/releases/tag/v0.1.33 Blog post - www.markhneedham.com/blog/2024/05/11/side-by-side-local-llms-ollama-streamlit/
Serverless GenAI with Beam (GPU as a service)
Просмотров 6192 месяца назад
In this video, we're going to learn about Beam.Cloud, a service that offers Serverless Infrastructure for Generative AI. I like to think of it as being like AWS Lambda but with GPUs. We'll get it setup and then learn how to deploy a Python function to Beam and call it from a Python script #whisper #llms #beam #serverless Beam - www.beam.cloud/ Pricing Page - www.beam.cloud/pricing App Code - gi...
Voice to Text on a Mac with insanely-fast-whisper
Просмотров 1,8 тыс.2 месяца назад
In this video, we'll learn how to use the insanely-fast-whisper tool to generate a transcript for an episode from the Marketing Against The Grain podcast. We'll then feed that transcript to ChatGPT and ask some questions. Blog post - www.markhneedham.com/blog/2023/12/23/insanely-fast-whisper-experiments/ insanely-fast-whisper - github.com/Vaibhavs10/insanely-fast-whisper/tree/main
How does OpenAI Function Calling work?
Просмотров 8 тыс.3 месяца назад
In this video, we're going to dig into OpenAI function calling. We'll explore what's happening under the hood before working through an example of how to use it to call a Weather API to see how warm (or not!) it is right now. #llms #openai #functioncalling Code - github.com/mneedham/LearnDataWithMark/blob/main/openai-function-calling/function_calling_openai.py OpenAI Documentation - platform.op...
Semantic Router: No more rogue LLM chatbots?
Просмотров 9573 месяца назад
In this video, we're going to explore Semantic Router and its role in keeping LLM-based chatbots on the rails. Semantic Router describes itself as 'a superfast decision-making layer for your LLMs and agents' and to me its a bit like defining the entry points for a web application. We're going to explore a tiny subset of its functionality to see if we can create a chatbot that will only talk abo...
Running LLMs on a Mac with llama.cpp
Просмотров 5 тыс.3 месяца назад
Running LLMs on a Mac with llama.cpp
GLiNER: Easiest way to do Entity Extraction in 2024?
Просмотров 1,8 тыс.3 месяца назад
GLiNER: Easiest way to do Entity Extraction in 2024?
Visualising embeddings with t-SNE
Просмотров 1,5 тыс.4 месяца назад
Visualising embeddings with t-SNE
Exploring the comments of AI YouTube channels
Просмотров 6884 месяца назад
Exploring the comments of AI RUclips channels
Google Gemma 2B vs 7B with Ollama
Просмотров 2,1 тыс.5 месяцев назад
Google Gemma 2B vs 7B with Ollama
SLIM: Small models for specific tasks by LLMWare
Просмотров 1,7 тыс.5 месяцев назад
SLIM: Small models for specific tasks by LLMWare
Ollama adds OpenAI API support
Просмотров 5 тыс.5 месяцев назад
Ollama adds OpenAI API support
Content Discovery with Embeddings (ft. Qdrant/FastEmbed)
Просмотров 7145 месяцев назад
Content Discovery with Embeddings (ft. Qdrant/FastEmbed)
LLaVA 1.6 is here...but is it any good? (via Ollama)
Просмотров 11 тыс.5 месяцев назад
LLaVA 1.6 is here...but is it any good? (via Ollama)
Ollama has a Python library!
Просмотров 15 тыс.5 месяцев назад
Ollama has a Python library!
Langroid: Chat to a CSV file using Mixtral (via Ollama)
Просмотров 6 тыс.6 месяцев назад
Langroid: Chat to a CSV file using Mixtral (via Ollama)
User-Selected metadata in RAG Applications with Qdrant
Просмотров 2,1 тыс.6 месяцев назад
User-Selected metadata in RAG Applications with Qdrant
Building a local ChatGPT with Chainlit, Mixtral, and Ollama
Просмотров 6 тыс.6 месяцев назад
Building a local ChatGPT with Chainlit, Mixtral, and Ollama
Constraining LLMs with Guidance AI
Просмотров 2,8 тыс.6 месяцев назад
Constraining LLMs with Guidance AI
Running Mixtral on your machine with Ollama
Просмотров 6 тыс.7 месяцев назад
Running Mixtral on your machine with Ollama

Комментарии

  • @Seedlinux
    @Seedlinux День назад

    Thanks a lot for this! I was looking for a video that explained this exact topic and you did it in such a simple and efficient way. Kudos!❤

  • @inf-c9o
    @inf-c9o День назад

    Hi Mark, great tutorial. I have been playing around a bit and tried to use my already existing ChromaDb as a retriever. Unfortunately simply changing the context to my retrieverDB did not work. I received "ValueError: Requested tokens (941) exceed context window of 512". Do you happen to know how to expand the context window or how to fix this?

    • @learndatawithmark
      @learndatawithmark День назад

      I think it should be the n_ctx parameter e.g. llm = llama_cppLlama(model_path="./models/7B/llama-model.gguf", n_ctx=2048)

  • @michaelvilarino5464
    @michaelvilarino5464 2 дня назад

    I have several functions implemented in my project, each responsible for a specific task. However, frequently, when the user requests the extraction of information that should trigger function A, other functions (like B or C) are called instead. Although function A is correctly triggered on some occasions, this does not happen consistently. Why is this happening? I am using OpenAI's GPT-4 model.

  • @CMAZZONI
    @CMAZZONI 3 дня назад

    Has anyone use this via a cloud endpoint? Its so custom that is not even compatible with hugging face cloud endpoint

  • @hugh_martin
    @hugh_martin 3 дня назад

    Thank you. This helped me solve the last step in a process to generate consistently sorted JSON data.

  • @user-rp2vv5oq6e
    @user-rp2vv5oq6e 4 дня назад

    Use vector search with function calling. Before the vector search, prompt the llm to form a search query that is perfect for vector search, then use that query for the vector search. This gave the best results for me. You could also use function calling to form the query for the vector search

    • @learndatawithmark
      @learndatawithmark 3 дня назад

      Cool idea! Thanks - I'll have to give that a try. Why do we need function calling though - is that to make sure that we get back just a query rather than the LLM going off and writing a bunch of other text?

    • @user-rp2vv5oq6e
      @user-rp2vv5oq6e 3 дня назад

      @@learndatawithmark Exactly. Because they are guaranteeing the query.

  • @beliebigerusername
    @beliebigerusername 4 дня назад

    Mark, you're cool! Awesome tutorial - clean and on point. Do you have a tutorial on your mention "today i do everything with Docker" - Docker is quite overwhelming! Thanks, Mate!

  • @cosmos177
    @cosmos177 5 дней назад

    I'd give the features a ranking of 2,3,4,4,5 out of 10. cheers.

    • @learndatawithmark
      @learndatawithmark 5 дней назад

      You are a tough one to please! 😄

    • @cosmos177
      @cosmos177 5 дней назад

      @@learndatawithmark Sorry ... Felt most ware cosmetic... a bit of fluff.. However i later saw your PIVOT and UNPIVOT videos... and they were really excellent. Both in terms of DuckDb syntax /features and also your example/content. Thanks for the videos. New subscriber. cheers.

  • @Sendero-yp5gi
    @Sendero-yp5gi 6 дней назад

    Hi Mark, I don't understand why you're chunking the documents with your function "chunk"? Can we not just feed each 247 documents to the llm to create the embeddings? Like: document_embeddings = llm.create_embedding( [ item.page_content for item in documents ] ). We get the embedding back for each document and that's it. Am I missing something? Are you doing it just to have 3 batches (100, 100 and 47) and embed them in parallel?

    • @learndatawithmark
      @learndatawithmark 6 дней назад

      No need to do that batching - I recorded it all and realised that it was a bit pointless afterwards. I do want to play around and see whether at some sort of batch size it gets slower. Good question about doing it in parallel though. I dunno whether the API supports it, but if it doesn't, the model is tiny, so we could just spin it up a bunch of times and call each in parallel?

  • @user-gh2rv1in3h
    @user-gh2rv1in3h 7 дней назад

    Hi, I get error "Error: unknown data type: U8", has anyone solved similar problems?

    • @learndatawithmark
      @learndatawithmark 6 дней назад

      When do you get that error"

    • @user-gh2rv1in3h
      @user-gh2rv1in3h 4 дня назад

      @@learndatawithmark on converting model stage, after run "ollama create ..."

    • @user-gh2rv1in3h
      @user-gh2rv1in3h 4 дня назад

      Figured out the problem. The problem was that I made fine-tuning on quantized model, and it is not good)

  • @antonpictures
    @antonpictures 7 дней назад

    HELP: YES, I need it for LLAVA !!! to store images in chunks in the vector database? same method? Can you help? antonpictures.wordpress.com/2016/07/28/donald-trump-calls-on-russia-to-find-hillary-clintons-missing-free-cinema-movies-httpanton-pictures-3/

  • @dexterlee1199
    @dexterlee1199 7 дней назад

    Thanks for great sharing ! Quick qiuesiton my embedding parts seems not work not sure why can you guide a bit ? thanks

    • @dexterlee1199
      @dexterlee1199 7 дней назад

      I'm using Auzre OpenAi and put my paramter in .env shall i do anything else?

    • @learndatawithmark
      @learndatawithmark 4 дня назад

      What error do you see?

  • @DihelsonMendonca
    @DihelsonMendonca 8 дней назад

    There must be a new and easy method currently to run GGUF models in Ollama or in Open WebUI. Please update this method. 🎉❤

    • @learndatawithmark
      @learndatawithmark 8 дней назад

      As far as I know, this is still the way to run GGUF models with Ollama. I wish you could use GGUF files directly, it would be so much easier! I haven't used Open WebUI, I'll take a look at that. If you want command line tools that can run GGUF files directly, take a look at llamafile or llama.cpp github.com/ggerganov/llama.cpp github.com/Mozilla-Ocho/llamafile

    • @DihelsonMendonca
      @DihelsonMendonca 8 дней назад

      ​@@learndatawithmark Thanks for answering. Indeed, my interest is in running them on Ollama, due to the new Open WebUI, which is the most marvelous thing invented. Open WebUI is a frontend to Ollama, presenting an interface like Chatgpt, with history of the conversations, talks with LLMs completely hands free, you talk and listen, and you can input your texts, PDFs, RAG, makes LLMs access internet in real time, upload images, multimodality, it's fantastic. You definitely need to test it. The problem with it is that it's based on Ollama, so, I use LM Studio, with dozens of Hugging face models, and I love them. I would like these models to be used in Open WebUI, but they are in GGUF format, that's why I found your video, in order to use gguf models in Ollama. 🙏👍💥

  • @MuhammadZubair-fl7wd
    @MuhammadZubair-fl7wd 9 дней назад

    Hi Mark, I can't find your chunk function on the github page u mentioned in description. Could u help me with that. Sorry am new to all this so might be a silly ask at the moment. Thanks a lot

    • @learndatawithmark
      @learndatawithmark 9 дней назад

      Sorry - my mistake, I forgot to add it. It's here - github.com/mneedham/LearnDataWithMark/blob/main/llamacpp-rag/utils.py

  • @not_a_human_being
    @not_a_human_being 9 дней назад

    Well done Mark, keep doing content like this, we surely need it. Refreshing clarity! Genuinely struggling with some other tutorials!

  • @mbikangruth5630
    @mbikangruth5630 10 дней назад

    I have done as you say, but running the model pipeline is taking forever to work. It still has not worked, please what can I do?

    • @learndatawithmark
      @learndatawithmark 8 дней назад

      If it's running too slowly then maybe it'd make sense to try out some of the quantised models instead. Those ones are smaller and better suited for running on consumer hardware. I quite like Ollama and I've made a few videos on that. This is probably the best place to start - ruclips.net/video/NFgEgqua-fg/видео.html

  • @imaginarybuddy
    @imaginarybuddy 10 дней назад

    hi, thanks for the video. May I ask what's the meaning of legacy=False when using the pretrained model?

  • @ranjit9427
    @ranjit9427 10 дней назад

    Can we combine both Text Search and Vector Search? And can u create a complete implementation video?

    • @learndatawithmark
      @learndatawithmark 8 дней назад

      We can! This would be hybrid search which I need to read a bit more about. Once I've done that, I'll make another video - thanks for the idea :)

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 11 дней назад

    why would you do full text search over rag?

    • @learndatawithmark
      @learndatawithmark 8 дней назад

      You'd use Full-Text Search (FTS) if you were searching for specific terms that you know are in the text. But it will only find those terms or any variants close to those words. So if you know exactly what you're looking for it should (in theory) work better than doing vector search, where you'd be matching an embedding of your search query/keywords against embeddings of paragraphs of text. Of course this technique won't find other bits of text that have the same semantic meaning. But your question makes me curious to compare the two techniques on the same dataset to see if/how the query results differ.

  • @Sendero-yp5gi
    @Sendero-yp5gi 14 дней назад

    Hi Mark, how quicker should inference be when setting n_gpu_layers = 1? I am on a Mac M1 pro with 16GB GPU, and if I set n_gpu_layers = 1 it is actually slower than not using it. Do you have an explanation for that or a way to check what is happening? Cheers!

    • @learndatawithmark
      @learndatawithmark 13 дней назад

      Should be -1 rather than 1. Then it will put as many layers as possible to the GPU rather than CPU.

    • @Sendero-yp5gi
      @Sendero-yp5gi 8 дней назад

      @@learndatawithmark Yep, silly mistake by me. Could you make a video recapping the different frameworks/libraries for LLMs inference and their pros and cons? Like HF transformers VS Llama-cpp-python VS Ollama

  • @ravishmahajan9314
    @ravishmahajan9314 14 дней назад

    Great video! Can we have a video on similar framework called LMQL?? I have seen people talking about it in terms of it's simplicity & closer to SQL than guidance.

    • @learndatawithmark
      @learndatawithmark 6 дней назад

      not heard of that one - let me look into it, good idea!

  • @davidtindell950
    @davidtindell950 15 дней назад

    Thank You. Very Useful and Very Timely!

  • @bithigh8301
    @bithigh8301 15 дней назад

    Very nice channel, your videos and tutorials are really great! Thanks for sharing. Let's comment and subscribe, Mark deserves to have 100M subscribers!

  • @goktugkoksal8643
    @goktugkoksal8643 17 дней назад

    Super

  • @chenghung0510
    @chenghung0510 17 дней назад

    very clear introduction for open AI function calls this video is super useful for me to understand function calls.

    • @learndatawithmark
      @learndatawithmark 15 дней назад

      Great! Glad it was useful - let me know if there are any other topics in this area you'd like me to cover next.

  • @parthwagh3607
    @parthwagh3607 17 дней назад

    Thank you so much. I am having problem running models downloaded from hugging face having safetensor file. I have these files in oobabooga/text-generation-webui. I have to use this for ollama. I followed everything, even created modelfile with path to safetensor directory, but it is not running >> ollama create model_name -f modelfile. Please help me.

    • @learndatawithmark
      @learndatawithmark 15 дней назад

      What happens when you run the command?

    • @parthwagh3607
      @parthwagh3607 13 дней назад

      @@learndatawithmark Thank you so much for quick response. Your videos have helped me a lot. I am running this on windows 11. I did follow steps: 1) created Modelfile with this script: "FROM C:\Users\PARTH\Downloads\text-generation-webui-main \text-generation-webui-main\models\TheBloke_dolphin-2.7- mixtral-8x7b-AWQQ TEMPLATE = """{{ if .System }}<|im_start|>system {{ .System }}<|im_end|>{{ end }} <|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant """ PARAMETER stop <|start_header_id|> PARAMETER stop <|end_header_id|> PARAMETER stop <|eot_id|>" 2) I ran following command on terminal opend from where this modelfile is stored. "ollama create mixtral:dolphine -f .\Modelfile" 3)It showed me this error: "Error: command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"" 4) I only made file with FROM statement without parameter and template, It ran, but gave this error: "C:\Users\PARTH\.ollama>ollama create mixtral:dolphine -f .\Modelfile transferring model data unpacking model metadata processing tensors Error: couldn't find a layer name for 'model.layers.0.block_sparse_moe.experts.0.w1.qweight'" 5) I ran again with another models, but gave same error: "C:\Users\PARTH\.ollama>ollama create slimorca:13b -f .\Modelfile transferring model data unpacking model metadata processing tensors Error: couldn't find a layer name for 'model.layers.0.mlp.down_proj.qweight'"

  • @marcusk7855
    @marcusk7855 18 дней назад

    Nice tutorial.

  • @AlexanderSuraphel
    @AlexanderSuraphel 19 дней назад

    Mark, I think Phi 3 got the answer for What is this *company* most famous for? Since the company in the image is Microsoft.

  • @SonGoku-pc7jl
    @SonGoku-pc7jl 20 дней назад

    thanks!

  • @RealEstate3D
    @RealEstate3D 20 дней назад

    1. The command is: brew install ggerganov/ggerganov/llama.cpp 2. It's not clear from the video if it's also downloading llama3, the model. I guess no?! I have the source code of llama.cpp built on my MPB. 3. ... too many questions here ...

    • @learndatawithmark
      @learndatawithmark 20 дней назад

      1. It was at the time I made the video - but I think it'll now work if you do brew install llama.cpp 2. No it's not - I had that on my machine already. If you've already compiled llama.cpp, you don't need to do 1. 3. What's the next one?!

    • @RealEstate3D
      @RealEstate3D 20 дней назад

      @@learndatawithmark Oh ... nice. Well than the brew install only installs the hugging-face cli? How can be assured that models for ollama, llama.cpp an open-webui only are stored once? Is linked folder enough? Which strategy to adopt for not having the same models several times on disk? In which formats one has to reformat the models for silicon Macs? How can one use Apple neural engine? Just some 😉

    • @learndatawithmark
      @learndatawithmark 15 дней назад

      It's not actually a Hugging Face CLI - it's a separate project - but yeh it downloads a CLI tool. There are then two tools - llama-cli and llama-server. The first is a command line chat interface and the second starts a server. > How can be assured that models for ollama, llama.cpp an open-webui only are stored once? Ollama is kind of a pain IMO as you can't point it directly at GGUF fails AFAIK. Instead you have to convert it from GGUF to the Ollama format. If you use llama.cpp or llamafile you can use the same GGUF files with no duplication. > In which formats one has to reformat the models for silicon Macs? llama.cpp has support for running files on silicon Macs (I'm using an M1 Max from 2021) so you don't need to do anything there > How can one use Apple neural engine? AFAIK for doing this you seem to need different models. This blog is pretty good on that - medium.com/@manuelescobar-dev/running-large-language-models-llama-3-on-apple-silicon-with-apples-mlx-framework-4f4ee6e15f31

    • @RealEstate3D
      @RealEstate3D 15 дней назад

      @@learndatawithmark Thanks for sharing 👍

  • @tee_iam78
    @tee_iam78 20 дней назад

    Thank you for the contents.

  • @abhishekprakash4793
    @abhishekprakash4793 20 дней назад

    Thanks, Mark, for this awesome video and this is very informative

  • @IanTindale
    @IanTindale 21 день назад

    I keep following along until about 12 seconds in, where you start typing into something and you say let’s open up age something, and carry on typing into whatever it is you’re typing into - I can’t get that far, I don’t know what to type into

    • @learndatawithmark
      @learndatawithmark 21 день назад

      I'm using a Jupyter Notebook, but the code would work in any Python environment or script jupyter.org/

    • @IanTindale
      @IanTindale 21 день назад

      @@learndatawithmark ah thanks, that’s interesting - I’ve never heard of that

  • @laurogastonmolina5093
    @laurogastonmolina5093 22 дня назад

    I have an array with the data 1.500 800.00 How to convert them into integers? can you help me? I can't manage to do a .replace

    • @learndatawithmark
      @learndatawithmark 20 дней назад

      values = [1.500, 800.00] [int(value) for value in values]

  • @rubencontesti221
    @rubencontesti221 23 дня назад

    Another great video Mark! Wish you could share some good options for local hosting large models like Llama 3 70B quantized to 4 bits. I'm curious about the cheapest ways to host these models with my own server. Thank you!

    • @learndatawithmark
      @learndatawithmark 15 дней назад

      If it's fully blown models, I think Hugging Face's inference server is best - github.com/huggingface/text-generation-inference If it's quantized models, llamafile are doing some cool work to make a super fast server - github.com/Mozilla-Ocho/llamafile

  • @Sendero-yp5gi
    @Sendero-yp5gi 24 дня назад

    What is the difference w.r.t to using the classical: from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) Thanks in advance!

    • @learndatawithmark
      @learndatawithmark 15 дней назад

      I think it's the same thing under the hood - no need to change from your approach!

  • @Sendero-yp5gi
    @Sendero-yp5gi 24 дня назад

    Nice video Mark! Could you do one where you show a quick example using the Python binding llama-cpp-python? Thanks!!

    • @learndatawithmark
      @learndatawithmark 15 дней назад

      Latest video inspired by your comment :D Here we go - ruclips.net/video/gigip1Pxf88/видео.html

    • @Sendero-yp5gi
      @Sendero-yp5gi 15 дней назад

      @@learndatawithmark Amazing, I am also implementing local RAG with llama-cpp-python, so the video is completely spot on ahah

  • @saramirabi1485
    @saramirabi1485 24 дня назад

    Hello, Thanks for the great videos. It's been about several ours I am browsing in your channel. Just a question is it possible to use Ollama and doing fine-tuning with that?

  • @ingenieroriquelmecagardomo4067
    @ingenieroriquelmecagardomo4067 26 дней назад

    Good video, mark.

  • @patriciubogatu7663
    @patriciubogatu7663 26 дней назад

    Good day. I was wondering whether this technique will be enough to make GPT-4 LLM solve arduino circuit projections on the breadboard, for the most basic problems. I want the LLM to be able to pinpoint the exact hole locations on the breadboard, where the pins/wires should be placed. I tried by just explaining but it still commits the trivial mistake. I seeking to make it avoiding mistake by remembering the correct answers for past problems. Is RAG enough or I need to fine-tune, or create new LLM? Regards.

  • @rmackay9
    @rmackay9 28 дней назад

    I just want to say that the past couple of videos on function calling have been very valuable to me (and I'm sure others). Thanks very much!

  • @mo5168
    @mo5168 29 дней назад

    it would be great if you could mention the CPU and GPU core counts of you mac. I am trying to decide which Mac to buy :-)

    • @learndatawithmark
      @learndatawithmark 25 дней назад

      I'm using a Mac M1 Max (64 GB RAM) with 10 CPUs. It shares the RAM between CPU and GPU.

    • @mo5168
      @mo5168 25 дней назад

      @@learndatawithmark what about the GPU core count? a 10 core CPU M1 Max can be configured with 16, 24, 32 GPU cores. I believe the GPU is the most important part for AI applications... BTW, i went ahead and bought a *specced out* 14 inch M3 Max with 2 TB SSD.

    • @learndatawithmark
      @learndatawithmark 25 дней назад

      @@mo5168 32 apparently! Chipset Model: Apple M1 Max Type: GPU Bus: Built-In Total Number of Cores: 32 Vendor: Apple (0x106b) Metal Support: Metal 3

    • @mo5168
      @mo5168 25 дней назад

      @@learndatawithmarkthank you!

  • @static_frostBRK
    @static_frostBRK Месяц назад

    Hello there Mark i was wondering if i could use this method to download other ai models for example text to image models?

    • @learndatawithmark
      @learndatawithmark 25 дней назад

      Yes you should be able to use a similar approach. There's a good guide on image to text over here - huggingface.co/tasks/image-text-to-text

  • @AdandKidda
    @AdandKidda Месяц назад

    great comparision. I have used gemini pro vision, and looking for simliar open source solution. can u plz suggest something for : "extracting data in key-value-pair from documents(images and pdfs) like invoice , forms , ids". a light model can help, so that it can run on less memory. thanks in advance . :)

    • @learndatawithmark
      @learndatawithmark 25 дней назад

      I think to use a smaller model for a specific task we might want to fine tune something ourself with a bunch of images and the expected output. I read that PaliGemma is supposed to be a good base image for that, but I haven't tried fine tuning it myself - huggingface.co/google/paligemma-3b-mix-448

  • @OlenaKutsenko
    @OlenaKutsenko Месяц назад

    So cool!

  • @user-rp2vv5oq6e
    @user-rp2vv5oq6e Месяц назад

    Thanks for discovering this

  • @fancypnatz
    @fancypnatz Месяц назад

    Is there some limitation with ollama that prevents good function calling?

    • @learndatawithmark
      @learndatawithmark Месяц назад

      At the moment they don't seem to support the function calling API but I imagine they will at some point. AFAIK Ollama uses llama.cpp for a lot of stuff, so they should be able to use this feature too

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Месяц назад

    Which of the many do you recommend? Langgraph?

    • @learndatawithmark
      @learndatawithmark Месяц назад

      I've played with Langroid and CrewAI and neither worked that well. I haven't tried AutoGen or LangGrfaph yet, but I guess I should get on it!

  • @AlexX-xtimes
    @AlexX-xtimes Месяц назад

    The best simple explanation I have found about Function Calling. Thanks for making it so easy to understand!

  • @vladimirnicolescu1342
    @vladimirnicolescu1342 Месяц назад

    if I have multiple functions how does the "for tool_call in tool_calls" loop work with different functions? the function_response parameter has the latitude and longitude hardcoded as arguments. what if my other functions deal with other stuff and don't require lat and long as arguments, but other arguments? I'm really confused by that bit. If I have more functions do I just add a "match case" check to pass the correct arguments for each function?

    • @learndatawithmark
      @learndatawithmark Месяц назад

      Yeh this is a bit hardcoded for a single function. The arguments are in that function_args variable, so you could pass them in using the kwargs syntax. Maybe I'll make another video to show how to do that.