mildlyoverfitted
mildlyoverfitted
  • Видео 27
  • Просмотров 258 587
BentoML SageMaker deployment
In this video, we are going to discuss the basics of BentoML and then go through a hands-on example of taking a Scikit-learn model and deploying it on SageMaker with the help of BentoML.
The code + sketches from the video can be found here: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/bentoml
00:00 Intro
00:52 [diagram] Ideas behind BentoML
03:07 [diagram] Step by step procedure
03:21 [code] Creating a model
06:50 [code] Creating a bento - service.py
14:31 [code] Creating a bento - bentofile.yaml
16:53 [code] bentoctl init
19:34 [code] Inspecting terraform files
21:10 [code] Containerization + pushing to ECR
23:15 [code] Deployment via terraform
25:13 [code] Sending request and run...
Просмотров: 1 357

Видео

Retrieval augmented generation with OpenSearch and rerankingRetrieval augmented generation with OpenSearch and reranking
Retrieval augmented generation with OpenSearch and reranking
Просмотров 4,5 тыс.Год назад
In this video, we are going to be using OpenSearch and Cohere's Reranker endpoint to implement a minimal Retrieval augmented generation system that is able to perform question answering. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/rag-rerank/mini_tutorials/rag_with_reranking Cohere blogpost: txt.cohere.com/rerank/ 00:00 Intro 00:52 RAG with embeddings (semantic search) 03:16 ...
Named entity recognition (NER) model evaluationNamed entity recognition (NER) model evaluation
Named entity recognition (NER) model evaluation
Просмотров 2,6 тыс.Год назад
In this video we are going to talk about different ways how one can evaluate an NER (named entity recognition) model. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/ner_evaluation github.com/chakki-works/seqeval 00:00 Intro 00:31 Mispredictions 02:31 IOB2 notation 04:03 Evaluation approaches 07:38 [code] HF evaluate seqeval 14:36 [code] Enitity-level fro...
Asynchronous requests and rate limiting (HTTPX and asyncio.Semaphore)Asynchronous requests and rate limiting (HTTPX and asyncio.Semaphore)
Asynchronous requests and rate limiting (HTTPX and asyncio.Semaphore)
Просмотров 2,6 тыс.Год назад
Today we are going to talk about how to use HTTPX to send requests asynchronously and also, we will talk about how to perform rate limiting. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/httpx_rate_limiting/ 00:00 Intro 01:15 [Code] Implement async requests WITHOUT rate limiting 07:20 [Code] Trying it out 08:48 [Code] Implement async requests WITH rate lim...
Few-shot text classification with promptsFew-shot text classification with prompts
Few-shot text classification with prompts
Просмотров 3,8 тыс.Год назад
In this video, I will talk about a possible way how to perform few-shot text classification using prompt engineering and the OpenAI API. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/fewshot_text_classification Inspiration for the video: github.com/explosion/prodigy-openai-recipes/tree/main Chat Completion API from OpenAI: platform.openai.com/docs/guides/g...
OpenAI function callingOpenAI function calling
OpenAI function calling
Просмотров 2,9 тыс.Год назад
In this video we will go through the new feature "Function calling" of the OpenAI API (see more info here: openai.com/blog/function-calling-and-other-api-updates). First, I talk about the concepts and then I code up a small example where we implement a "financial analyst" bot. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/openai_function_calling/example.py...
Deploying machine learning models on KubernetesDeploying machine learning models on Kubernetes
Deploying machine learning models on Kubernetes
Просмотров 18 тыс.Год назад
In this video, we will go through a simple end to end example how to deploy a ML model on Kubernetes. We will use an pretrained Transformer model on the task of masked language modelling (fill-mask) and turn it into a REST API. Then we will containerize our service and finally deploy it on a Kubernetes cluster. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials...
Haiku basics (neural network library from DeepMind)Haiku basics (neural network library from DeepMind)
Haiku basics (neural network library from DeepMind)
Просмотров 3,4 тыс.2 года назад
In this video, we will go through basic concepts of Haiku which is a deep learning library created by DeepMind. Official repo: github.com/deepmind/dm-haiku Official docs: dm-haiku.readthedocs.io/en/latest/ Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/haiku_basics Chapters: 00:00 Intro 00:35 Cloning the repo setting things up 01:52 Parameters: hk.transform...
Product quantization in Faiss and from scratchProduct quantization in Faiss and from scratch
Product quantization in Faiss and from scratch
Просмотров 6 тыс.2 года назад
In this video, we talk about a vector compression technique called Product quantization. We first explain conceptually, what the main ideas are and then show how one can use an existing implementation of it from Faiss (IndexPQ). Finally, we also implement the algorithm from scratch. Last but not least, we run some experiments and compare different methods. Paper: lear.inrialpes.fr/pubs/2011/JDS...
GPT in PyTorchGPT in PyTorch
GPT in PyTorch
Просмотров 11 тыс.2 года назад
In this video, we are going to implement the GPT2 model from scratch. We are only going to focus on the inference and not on the training logic. We will cover concepts like self attention, decoder blocks and generating new tokens. Paper: openai.com/blog/better-language-models/ Code minGPT: github.com/karpathy/minGPT Code transformers: github.com/huggingface/transformers/blob/0f69b924fbda6a442d7...
The Lottery Ticket Hypothesis and pruning in PyTorchThe Lottery Ticket Hypothesis and pruning in PyTorch
The Lottery Ticket Hypothesis and pruning in PyTorch
Просмотров 8 тыс.2 года назад
In this video, we are going to explain how one can do pruning in PyTorch. We will then use this knowledge to implement a paper called "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". The paper states that feedforward neural networks have subnetworks (winning tickets) inside of them that perform as good as (or even better than) the original network. It also proposes a ...
The Sensory Neuron as a Transformer in PyTorchThe Sensory Neuron as a Transformer in PyTorch
The Sensory Neuron as a Transformer in PyTorch
Просмотров 3 тыс.2 года назад
In this video, we implement a paper called "The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning" in PyTorch. It proposes a permutation invariant module called the Attention Neuron. Its goal is to independently process local information from the features and then combine the local knowledge into a global picture. Paper: arxiv.org/abs/2109.02869 O...
Integer embeddings in PyTorchInteger embeddings in PyTorch
Integer embeddings in PyTorch
Просмотров 2,3 тыс.3 года назад
In this video, we implement a paper called "Learning Mathematical Properties of Integers". Most notably, we use an LSTM network and an Encyclopedia of integer sequences to train custom integer embeddings. At the same time, we also extract integer sequences from already pretrained models - BERT and GloVe. We then compare how good these embeddings are at encoding mathematical properties of intege...
PonderNet in PyTorchPonderNet in PyTorch
PonderNet in PyTorch
Просмотров 2,3 тыс.3 года назад
In this video, we implement the PonderNet that was proposed in the paper "PonderNet: Learning to Ponder". It is a network that dynamically decides on the size of its forward pass. We are going to implement it and experiment with it a little bit on the so called ParityDataset. Note that the implementation is based on the labml.ai implementaiotn (see link below). I made some modification though s...
Mixup in PyTorchMixup in PyTorch
Mixup in PyTorch
Просмотров 3,3 тыс.3 года назад
In this video, we implement the (input) mixup and manifold mixup. They are regularization techniques proposed in the papers "mixup: Beyond Empirical Risk Minimization" and "Manifold Mixup: Better Representations by Interpolating Hidden States". We investigate how these two schemes compare against more mainstream regularization methods like dropout and weight decay. Paper (Input mixup): arxiv.or...

Комментарии

  • @PriyaDas-he4te
    @PriyaDas-he4te Месяц назад

    Can we use this code for Change detection in two satellite images

  • @tirthasg
    @tirthasg Месяц назад

    What font, and color theme are you using? Looks really nice!

  • @bpac90
    @bpac90 Месяц назад

    excellent!! I'm curious why my search always shows garbage and videos like this never come up. This was suggested by Gemini when I asked a question about ML model deployment.

  • @SunilSamson-w2l
    @SunilSamson-w2l 2 месяца назад

    the reason you got . , ? as the output for [MASK] because you didn't end your input request with a full stop. Bert Masking Models should be passed that way. "my name is [MASK]." should have been your request.

  • @JorgeGarcia-eg5ps
    @JorgeGarcia-eg5ps 3 месяца назад

    Thank you for sharing this, I was actually looking for results of DINO on smaller compute/data so this is so helpful

  • @krishsharma4507
    @krishsharma4507 3 месяца назад

    its printing Original prediction: 293 how can I check the values or names of this predicted class

  • @Saevires
    @Saevires 3 месяца назад

    I am using custom tags, such as InvoiceNumber and GrossTotal. To work on entity level, does seqeval need tags in the format B- and I-?

  • @Huawei_Jiang
    @Huawei_Jiang 4 месяца назад

    Hello authors, thank you for your video. It helped me a lot. However, I have one question about your code. In the original mixup, which is from the link you provided, the author mixed the loss function instead of mixing the label. But I noticed you mixed the label. Could you please explain the reason for this difference in operation? Looking forward to your reply

  • @shivendrasingh9759
    @shivendrasingh9759 4 месяца назад

    Really helpful for foundation on ml ops

  • @larrymckuydee5058
    @larrymckuydee5058 4 месяца назад

    Is this method good if we want to search for list of products rather than chat-liked response?

    • @mildlyoverfitted
      @mildlyoverfitted 4 месяца назад

      Sure:) If you have text descriptions of the products then Elasticsearch/Opensearch + reranking is definitely a great option:)

  • @Larmbs
    @Larmbs 4 месяца назад

    You are incredible man. -You go at a good pace. -Each project feels well planed. -Nice formating style. -Good explanation. Ive just started really digging into this machine learning space, any recommendation on learning on all the different layer types, and problem types?

    • @mildlyoverfitted
      @mildlyoverfitted 4 месяца назад

      Thanks a ton! ML has changed quite a lot over the past few years. I guess one architecture you should be familiar with nowadays is the transformer:) But I guess you have heard about it by now:D Good luck with your learning!

  • @mmacasual-
    @mmacasual- 4 месяца назад

    Great example. Thanks for the information

  • @lucianobatista6295
    @lucianobatista6295 5 месяцев назад

    hi man, do you offer some training or mentorship?

  • @paolobarba1782
    @paolobarba1782 5 месяцев назад

    What to do if you want the encoding make by OpenSearch directly?

  • @akk2766
    @akk2766 5 месяцев назад

    I concur with what everyone is saying - best video on function calling for sure. I really like the laid back nature of the tutorial - seriously simplifying function calling - even to the uninitiated! Only one suggestion: Please move inset video to top right so output can be seen in its entirety. Obviously not for this video, but for future awesome videos you produce.

    • @mildlyoverfitted
      @mildlyoverfitted 5 месяцев назад

      Glad it was helpful! And thank you for the constructive feedback:)

  • @Munk-tt6tz
    @Munk-tt6tz 5 месяцев назад

    This is the best video on this topic. Thank you!

  • @swk9015
    @swk9015 5 месяцев назад

    what's the font you use?

    • @mildlyoverfitted
      @mildlyoverfitted 5 месяцев назад

      Note sure. I am using this vim theme: github.com/morhetz/gruvbox so maybe you can find it somewhere in their repo.

  • @mmazher5826
    @mmazher5826 5 месяцев назад

    is there any way of re SSL a pretrained DINO?

  • @danielasefa8087
    @danielasefa8087 5 месяцев назад

    Thank you so much for helping me to understand ViT!! Great work

  • @PrafulKava
    @PrafulKava 5 месяцев назад

    Great video ! Good explanation. Thanks for all your efforts in making detailed video along with code !

  • @leeuw6481
    @leeuw6481 6 месяцев назад

    wow, this is dangerous xd

  • @prajyotmane9067
    @prajyotmane9067 6 месяцев назад

    Where did you include positional encoding ? or its not needed when using convolutions for patching and embedding ?

  • @neiro314
    @neiro314 6 месяцев назад

    great video as a student, thank you so much! i will say a few lines didn't feel very well explained, however im sure to someone with a bit more knowledge than I it would be clearer but overall 10/10 tysm

    • @mildlyoverfitted
      @mildlyoverfitted 5 месяцев назад

      Great point actually:) Appreciate your feedback:)

  • @КириллКлимушин
    @КириллКлимушин 6 месяцев назад

    I'm a huge fan of implementing algorithms from scratch by myself and watched this video with a great pleasure. Thanks for your work, it deserves more attention.

  • @danieltello8016
    @danieltello8016 6 месяцев назад

    great video, can i run the code in a mac with M1 chip as it is?

  • @iamragulsurya
    @iamragulsurya 7 месяцев назад

    Name of the font?

    • @mildlyoverfitted
      @mildlyoverfitted 7 месяцев назад

      So the theme I am using is here: github.com/morhetz/gruvbox . The README talks about the fonts I believe.

  • @navins2246
    @navins2246 7 месяцев назад

    Doing ML in vim is absolutely gigachad

  • @harrisnisar5345
    @harrisnisar5345 7 месяцев назад

    Amazing video. Just curious, what keyboard are you using?

    • @mildlyoverfitted
      @mildlyoverfitted 7 месяцев назад

      Glad you enjoyed it! Logitech MX Keys S

  • @jeffg4686
    @jeffg4686 7 месяцев назад

    "mildly overfitted" is how I like to keep my underwear so I don't get the hyena.

  • @davidpratr
    @davidpratr 7 месяцев назад

    really nice video. Would you see any benefit of using the deployment in a single node with M1 chip? I'd say somehow yes because an inference might not be taking all the CPU of the M1 chip, but how about scaling the model in terms of RAM? one of those models might take 4-7GB of RAM which makes up to 21GB of RAM only for 3 pods. What's you opinion on that?

    • @mildlyoverfitted
      @mildlyoverfitted 7 месяцев назад

      Glad you liked the video! Honestly, I filmed the video on my M1 using minikube mostly because of convenience. But on real projects I have always worked with K8s clusters that had multiple nodes. So I cannot really advocate for the single node setup other than for learning purposes.

    • @davidpratr
      @davidpratr 7 месяцев назад

      @@mildlyoverfittedgot it. So, very likely more petitions could be resolved at the same time but with a very limited scalability and probably with performance loss. By the way, what are those fancy combos with the terminal? is it tmux?

    • @mildlyoverfitted
      @mildlyoverfitted 7 месяцев назад

      @@davidpratr interesting:) yes, it is tmux:)

  • @woutderijck5389
    @woutderijck5389 8 месяцев назад

    When starting out, would you recommend just using embedding and vectorsearch or should you also consider the hybrid case of opensearch & vectorsearch? In the video it looks like you should go all in on vectorsearch

    • @mildlyoverfitted
      @mildlyoverfitted 7 месяцев назад

      I would recommend just doing Opensearch + reranking. No embeddings (=vector search). Assuming you wanna have something minimal really quickly as demonstrated in the video:)

  • @Ldmp807
    @Ldmp807 8 месяцев назад

    isn't this concurrency limit not rate limit? i.e limit per second

    • @mildlyoverfitted
      @mildlyoverfitted 8 месяцев назад

      I think you are right:) The video title is definitely misleading. Sorry about that!

  • @vidinvijay
    @vidinvijay 8 месяцев назад

    novelty explained in just over 6 minutes. 🙇

  • @kascesar
    @kascesar 8 месяцев назад

    hi, im getting this error: ""'sagemaker_service:svc' is not found in BentoML store <osfs '/home/bentoml/bentos'>, you may need to run `bentoml models pull` first'."" any idea ? Thnks a lot

    • @mildlyoverfitted
      @mildlyoverfitted 8 месяцев назад

      Hmmm, if the problem still persists you can create an issue here: github.com/jankrepl/mildlyoverfitted/issues Describing exactly what you did and I can try to help!

    • @kascesar
      @kascesar 8 месяцев назад

      @@mildlyoverfitted solved, i did It. The problem come with bentoml versión, i had install bentoml==1.1.11 this solve the problema for me

  • @yuricastro522
    @yuricastro522 8 месяцев назад

    Thank you so much, your example helped me to solve some problems :)

  • @macx7760
    @macx7760 8 месяцев назад

    why is the shape of the mlp input at 2nd dim n_patches +1, isnt the mlp just applied to the class token?

    • @mildlyoverfitted
      @mildlyoverfitted 8 месяцев назад

      So the `MLP` module is used inside of the Transformer block and and it inputs a 3D tensor. See this link for the only place where the CLS is explicitly extracted github.com/jankrepl/mildlyoverfitted/blob/22f0ecc67cef14267ee91ff2e4df6bf9f6d65bc2/github_adventures/vision_transformer/custom.py#L423-L424 Hope that helps:)

    • @macx7760
      @macx7760 8 месяцев назад

      thanks, yeah confused the mlp inside the block with the mlp at the end for classification@@mildlyoverfitted

  • @macx7760
    @macx7760 8 месяцев назад

    fantastic video, just a quick note: at 16:01 you say that "none of the operations are changing the shape of the tensor", but isnt this wrong, since when applying fc2, the last dim should be out_features, not hidden_features, so the shapes are also wrongly commented.

    • @mildlyoverfitted
      @mildlyoverfitted 8 месяцев назад

      Nice find and sorry for the mistake:)! Somebody already pointed it out a while ago:) Look at the pinned errata comment:)

    • @macx7760
      @macx7760 8 месяцев назад

      ah i see, my bad :D @@mildlyoverfitted

  • @TwenTV
    @TwenTV 8 месяцев назад

    Which frameworks would you recommend if you had to scale to +1000 models? I am looking at custom FastAPI and MLFlow with AWS Lambda, but where each inference request will load the model from object storage and call .predict. The models are generally lightweight and predictions would only have to be made on an hourly basis, so I don't think its necessary to serve them in memory.

    • @mildlyoverfitted
      @mildlyoverfitted 8 месяцев назад

      If you are not experiencing a cold start (or you don't care) then Lambda is definitely a great solution:)

  • @noedie4973
    @noedie4973 8 месяцев назад

    Thanks for the nice video explanation! Could you please tell me what modifications I can make to get the output in a certain format? Say I want it to output only the label value with no other text?

    • @mildlyoverfitted
      @mildlyoverfitted 8 месяцев назад

      Thank you! The current template should lead to you only getting the label. However, feel free to prompt engineer it if you are not getting the expected result. You can also request it to give you a valid JSON which you can then easily parse:) Just an idea. Hope that helps:)

    • @noedie4973
      @noedie4973 8 месяцев назад

      @@mildlyoverfitted thanks, it really helped me a lot. I achieved perfect results by restricting my response token limit. So it focusses on outputting the digit label (in flexible forms), from which i can extract it using simple regex. The JSON method seems v clean too.

  • @idoronen9497
    @idoronen9497 8 месяцев назад

    Thank you for the video! I have a question: If I need to make updates to an existing service, do I have to go through the entire process again, or is there a more efficient way? Bentoctl build seems quite time-consuming. Appreciate your help!"

    • @mildlyoverfitted
      @mildlyoverfitted 8 месяцев назад

      Appreciate your comment! If the change is inside of your ML model or the serving logic (service.py) you will have to rebuild the image. However, the second time around some layers should be cached (docs.docker.com/build/guide/layers/ ) so in theory it should be faster (it depends though). Another thing you can do is to build the image in some virtual machine rather than locally. A common setup is that you build it + upload to ECR in your CI (e.g. GitHub actions) Just some ideas:)

  • @Lithdren
    @Lithdren 8 месяцев назад

    Is there a method you can use to rate limit by time? Im interacting with an API that limits me to no more than 20 requests a minute, and i've been struggling with a way to handle that. Right now I keep track of the time of the last call, and if I made a request within the last 3 seconds I wait till 3 seconds, then send out the next request. I have multiple API keys I can utilize, and each key has a set limit, so I cycle through them, but it feels like there must be a faster way.

    • @mildlyoverfitted
      @mildlyoverfitted 8 месяцев назад

      One alternative solution is to use some open source package (e.g. github.com/florimondmanca/aiometer ). I don't really know much about it but maybe it can help:)

  • @gunabalang9543
    @gunabalang9543 9 месяцев назад

    what keyboard are you using?

  • @aditya_01
    @aditya_01 9 месяцев назад

    great video thanks a lot really liked the explanation !!!.

  • @nandakishorejoshi3487
    @nandakishorejoshi3487 9 месяцев назад

    Great video. How to run a text generation model? I tried running a GPT2 model with the below code Creating API : transformers-cli serve --task=text-generation --model=gpt2 Calling API: curl -X POST localhost:8888/forward -H "accept: application/json" -H "Content-Type: application/json" -d '{"inputs":"What is Deep Learning","parameters":{"max_new_tokens":20}}' But getting error in the response {"detail":[{"type":"json_invalid","loc":["body",0],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting value"}}]}

  • @theAhmd
    @theAhmd 10 месяцев назад

    terminal and theme name please

  • @kyrylogorbachov3779
    @kyrylogorbachov3779 10 месяцев назад

    Thanks a lot for a content!

  • @thinkman2137
    @thinkman2137 10 месяцев назад

    Thank you for detail tutorial!

    • @thinkman2137
      @thinkman2137 10 месяцев назад

      But torchserve now has kubernetes intergration

    • @mildlyoverfitted
      @mildlyoverfitted 10 месяцев назад

      I will definitely look into it:) Thank you for pointing it out!!

  • @mkamp
    @mkamp 10 месяцев назад

    Using VIM, Tmux and an audible keyboard never gets old!

  • @diegosabajo2182
    @diegosabajo2182 10 месяцев назад

    Thanks for the video man. there aren't many resources on bentoml so I appreciate your contribution. can you please at more in the future.

    • @mildlyoverfitted
      @mildlyoverfitted 10 месяцев назад

      Appreciate your message:) Thank you! I will very likely do more BentoML related stuff in the future:)

  • @faizasetif1103
    @faizasetif1103 10 месяцев назад

    is this code for classification images !!

    • @mildlyoverfitted
      @mildlyoverfitted 10 месяцев назад

      Not sure what you mean, but DINO is a self-supervised algorithm:) Not a supervised one (e.g. classification)

    • @faizasetif1103
      @faizasetif1103 10 месяцев назад

      @@mildlyoverfitted i want use dino for classification task how!!