Using Llama Coder As Your AI Assistant
HTML-код
- Опубликовано: 27 ноя 2024
- In a recent video some folks asked which languages these AI assistants support. To answer that, I have to explain how they work. And this video does that.
My Links 🔗
👉🏻 Subscribe (free): / technovangelist
👉🏻 Join and Support: / @technovangelist
👉🏻 Newsletter: technovangelis...
👉🏻 Twitter: / technovangelist
👉🏻 Discord: / discord
👉🏻 Patreon: / technovangelist
👉🏻 Instagram: / technovangelist
👉🏻 Threads: www.threads.ne...
👉🏻 LinkedIn: / technovangelist
👉🏻 All Source Code: github.com/tec...
Want to sponsor this channel? Let me know what your plans are here: technovangelis...
This is by far the best video I have seen for self hosted ai code assistance! Great understandable explanation and very important details! Thank you!
Thanks so much for the comment. It is interesting that others are getting a huge number of views by saying things that are just flat out not true...but I need to not compare. Thanks for the watch.
Agreed.
Those other channels are likely buying YT recommendations because I’ve crashed into all of the others b4 finding your channel.
Love the clarity and production of your videos. Thank you Matt!
Thanks so much for the comment. They are fun to make so glad folks appreciate them. The mic was a little distorted a few times on this one. Always something to improve.
I haven't really followed this kind of stuff but your explanation is very clear and I am now running a local instance of code assistance AI. You deserve more subscribers.
Great, tell everyone you know.
I just discovered self hosted ai and I’m going through all of your videos. Thank you so much for providing this very useful content.
Particularly liked the part where you show the prompt being passed visible in the logs
Dude your awesome, easy to understand and love the subtle bits of humor. Subbed
Just subscribed. Love how your videos stay on point and how your explanations are easy, taking into account all our (assuming) varying levels of knowledge and experience in using these tools.
I like that you are not chasing clicks. Keep up the good work.
This is a much better way to run models compared to LMStudio. Thanks for the in depth to replace codepilot!
Llama is good at explaining documentations too
This was great. Clarified some things for me and it's great to be able to get at log messages and see the prompts, etc.
love the way you do these video, Thank you very much creating such niche content for us
“What’s Idris?” as yes, a language where you can define the types and the compiler writes the internals of the function. It’s like Haskell, but even more elite
Great video, Matt!
Thanks. It’s come a long way since those first short conversations with you and intro ing it on your discord back in July.
Just wonderful. Much thanks from South Korea.
Incredible ! thank you Matt !
Can you show which tools or process can be used to train or give our context to a model? would be great to tell the model `learn from this documentation` or `seek in my previous projects`
I'm not one of the people *_really_* into AI, but I have really got to genuinely compliment this video.
Its not overly energetic, its fairly paced, it critiques many over-simplifications without over-detailing it either, and just presents a very simple and objective set of facts, all the while seemingly keeping the focus on "how do you self-host an effective assistant to help you in this limited-scope problem" rather than either preaching the divine virtues or condemning the infernal existence of AI.
I very rarely compliment videos but this is one of the few videos that I think a grandma would understand *_and_* a computer science graduate would gleam legitimately useful information from. It's easy to do either of those in isolation, but managing to do both is extremely impressive.
Edit : its really minor, but even saying exact version numbers & mentioning linux are those little details that make this video come off like it respects the viewer. A lot of things (videos included) hide information when simplifying or overload with information when not, but this video strikes the balance of saying everything it needs to such that if you don't know it or it's not relevant you lose nothing, but if you do know it and think it *_is_* relevant you'll learn it. I can't quantify that balance, but I sure as hell can recognize it
Wow thank you so much for taking the time to leave such a detailed comment. I really do appreciate it and am grateful to already find viewers like you. Making these videos is so much fun and it has been wonderful to hear they are resonating with folks out there.
Subscribed and linked. Definitely an underrated video. Yet to check out your other videos, and will check those out too. You mentioned that you were using a m1 based laptop (assuming, due to the reference to battery life). Not sure if you sped up the inference part of the video, but if that is the native inference speed, it really is great. Is there any video where you've covered the specs of this laptop ?
I have a M1 Max MacBook Pro with 64gb ram and 4tb disk. It’s great. I’ve had it for about 2.5 years. It was expensive at the time but I see it pop up on eBay for 1600 (usd) or so every now and then.
Good opinion, deeper understanding.
Really informative and great delivery! I see you're just starting your channel (5.71k subs at time of writing). Keep up the great work!
Scroll back and there is a video I posted about RightFax and Outlook from 15 years ago. It’s terrible. But I really started doubling down on the channel a month ago. Thanks for stopping by.
So good I watched it twice. Saved for later. Thanks for the peak under the hood!
I won't be hurt if you watch it a few times every day.... i promise
I really wish you talked about the hardware requirements for running the models locally.
The thumbnail says “better than copilot” - that’s the thing I want to know! I use copilot and ChatGPT daily in my work and it’s hard to find time to set up and maintain the best open-weights models.
So my question is, for popular languages (let’s say TypeScript and python), do the open-weights models match copilot and gpt4? I’m concerned with code quality.
I also often use the chat interfaces to just rubber duck, so that’s valuable too.
Thanks for your insight. I know there probably isn’t an unambiguous answer, but I’d love to hear your thoughts.
So far, my attempts to have it honor my request of // Generate a foreach loop from 1-100 gives me terrible results. Hardly even listens to me about the 1 - 100.
Depends. Sometimes I go round in circles with chatgpt never getting anywhere. And get a quick answer with local models. And sometimes it’s the other way around.
Interesting. First try and it gave me a working solution.. basically created an array of 100 vals and the for each through them. Not the way I would have done it, but it works.
@@technovangelist One thing is I just dont have the beefed up computer for it. Just an i7 with an Nvidia GTX 1060 6gb. Its too slow for even auto completion so I have to go back to copilot for now.
I patiently await the large language model that will understand interpretive dance with jazz hands
BEST FUCKING VIDEO OF WHOLE INTERNET
Best comment of my entire career.
@@technovangelist Hey there! We're looking to create a video that breaks down all the concepts of Language Models, focusing on the basics and tailored for those who might find it a bit challenging to grasp. We want explanations on everything, including fine-tuning models, making them understand various languages, and navigating platforms like Hugging Face. Your expertise in simplifying complex topics would be invaluable for those starting in this field. Could you create a video covering these aspects? Thanks a bunch!
Thank you for sharing this topic with us. In the software list I was honestly looking for react however, that is JavaScript. Do these understand frameworks?
It tends to not know the newest stuff. But old stuff like react should be good
Hi Matt, Amazing content. Llama coder vs Tabby ML. Which one you think is better?
I haven’t tried tabby
I was waiting last 20 seconds that you gonna say Subscribe
i sometimes say it, but usually not. I figure that most folks know how to subscribe if they are interested
Hey thank you for making this! You have a very easy to understand pace and style of speaking and i think this video will come in handy for folks new to these concepts
If I may ask, would you consider doing some sort of periodic update on what the best models to use are? It is kind of hard to keep up with all the updates to this space and I think people will appreciate comparing different open source models every 2 months or so , so that they know if its worth switching to a newer one
Thanks for leaving a comment. That’s a great idea. Understanding what are the best models is a challenge. I'll add it to the list
Fourth isn't in the list of languages. Darn! :)
Awesome video Matt!!
I didn't hear about real work and the performance running it locally
GPT-4 has high ability to correct itself. Have you tested if Llama coder can be used as reliably as GPT4 to review and correct the code itself if it's incorrect?
I have never seen gpt 4 correct itself. I have to tell it you are wrong or this is the error or I don’t think so and it comes up with a correction. I find the local models are often as good or better at that. And sometimes worse. Or are you saying copilot does this which isn’t gpt but rather the codex model.
@@technovangelist sorry i wasn't clear. I meant that under a 2-agent system where one reviews and one codes GPT4 has high enough "IQ" in general to be able to find errors and revise itself. That's what I meant. And thanks for the answer.
This looks really cool!
I like your vid more than those of that angry dude. Cheers.
We need a benchmark. For energy consumption/response delay/suggestions quality.
a benchmark of what? Pretty hard to compare this type of thing. For response time and quality, I'd say twinny/continue/llama coder are all on par with copilot. For different options to use it, copilot is still ahead. For privacy and security and offline use, copilot can't even compete.
This is the setup I've also discovered being the best combination, although the results with Llama coder have been poor, where the suggestions have been pretty bad. Might be because the template used was wrong or something, not sure. Also, continue has embedding support which is nice although no good results with that yet either. Supposedly you can ask more about your whole codebase because of it, but I suppose it needs some improvements there. It certainly can't help you refactor yet...
Still waiting for Llama coder to be better with the context and suggest proper code based on that like copilot does. Things like seeing consts used and repeat those for a switch statement or following the fmt.Printf style used earlier. Copilot is still better with this. Not sure if Llama coder needs embedding too for this to be better.
Yeah, knowing the whole code base is helpful sometimes. The only tool I have seen that does that is Cody from Sourcegraph. But that is not offline and it has to use their backend servers to do that despite what some recent yt videos have claimed. Quinn, the CEO, built the ollama integration mostly in a weekend a few weeks after ollama started back in July and it’s pretty amazing but you need to be connected to the internet to use it.
@@technovangelist Well, when it comes to code the main idea with running all this locally is for privacy. Although, sure, sending vectors based on your code over the internet isn't all that bad maybe, but still. It's more of a principal thing. And also a certain cool factor to have it all locally. And naturally where you don't have Internet it still works.
This brings me to ollama and I must say that it's such a tremendously great project, so well done on that one! It's such a smart solution. It being so easy to get going and running on command line is so smart.
Thank you very much for sharing this new! incredible!
Glad you like it!
Matt, I do not see COBOL (of any flavor) especially for PCs; e.g., MicroCOBOL and or GnuCOBOL 😢😢😢
Which Ai Code Assistant can GnuCOBOL?
Is LaTeX not considered a programming language?
Or is LaTeX considered native to all Ai entities?
My only suggestion is to try it. Try each model to see what supports it.
@@technovangelist it’s good to see ADA on that list. I had three RFC adopted in the early 1980s and it was my primary language at NASA in the early STS project.
This is so amazing!!
I understand there are subscription based AI coding tools like Cody ($10/mo), CoPilot Pro ($20/mo), AWS Code Whisperer, etc..
But do the FREE tools (GPT-Pilot, gptengineer, CodiumAi , BlackBoxAI , BitoAi, CrewAI , Tabnine, Gemini) still require the GPT4 Plus subscription ($20/mo)? Or does it use the pay-per-use Assistant API pricing?
i don't know about the tools you mentioned, but the ones I have been covering use opensource models and also are 100% free (open source does not mean free but the two terms align here). But I don't think the cost should be the determining factor of whether you use the tool. If it makes you 20% better every week and month, the cost is probably worth it. But the fact that the free tools are just as good means maybe its money you don't need to spend.
thank you for the explanation!
I wonder if is it possible to train offline AI Brain for specific programming languages or tasks?
Bert Kreischer if he got a CS degree.
That’s an interesting comment. Because we both did Russian at FSU. He was a year after me. The teacher he talked about in his routine? I know exactly who he means and she has disappeared since then. I went on the first trip to Russia, he was in the second. But I didn’t do CS. My major is in Russian with minors in Math, Physics, and CS.
and much cleaner :)
I will never be taking my shirt off for a video. I promise you that.
LOL. Now THAT would be Bert for sure.@@technovangelist
Got my sub Matt!
Nice sharing 👍
(6:05) Is it possible to expose an API/debug endpoint? I like the debug info that you are showing. It would be cool if I could stream the sentence encoding/vectors for input and output over the API. With that I could build a visualizer using the API. Think WinAmp audio visualizers versus doing any actual debugging. Just something interesting to look at.
That is a neat idea. Have you posted it as an issue?
Any idea why Swift doesn’t appear to be supported?
Good explanation.
I have a question actually. Why is ollama several hundred megabytes (or well, 236MB) in size for linux?
I think it’s either nvidia or amd that forces it to be so big. Is that the original package or the community installs like for arch and apt. They are often behind the real thing and I think 0.1.21 was huge and 22 was better. At least in docker.
But what is that page where you enter those "llama coder" words, actually?
At 3:30? That’s just vscode. It’s an editor. Go to extensions. It’s the way to find extensions. And since the majority of users are on vscode that’s what I showed.
Would love to see how to fine tune a language model with a Lora easily for completion. Ideally would be able to use the model + Lora with ollama but any other local inference would be fine.
Thanks for watching and thanks for the suggestions for a topic. I’ll start looking into some of the easy ways folks accomplish this.
ollama does not support windows right now so there is no way using this correct?
It has run on Windows for well over 4 months. You have to go into add remove features in Windows and enable WSL2. Then you can run the linux install script and it works great. That said a native windows install is probably a week away. But you should be up and running in minutes with wsl and its easy to remove when the native version is released.
the preview version on windows is running smoothly rn
Ollama used with langchain is getting stuck after about 10 prompts when running on a gpu (didn't test it with cpu). To resolve this I have to restart the server. This happends with all models I have tested. Everything is up to date, ollama itself and the models. What could cause this behaviour?
Your best bet is to ask on the discord
Amazing video!!
Great video!!!
Here we go again. Ollama...
Ouch, no apple languages besides applescript?
can we host a bigger instance somewhere else, and connect to it via vscode or something?
Subbed enjoyed your video 🙌
Thanks for the sub! I appreciate every single one of them.
Is Windows support on the roadmap? I'm having a very difficult time getting this running in WSL...
strange, for most getting it up and running in WSL2 is pretty easy. The big focus for the team has been AMD support. A lot of the work needed for Windows actually helped with AMD too... so once all of that is out the door, it shouldn't be much longer for windows. But I think I said that in August too.... its taken way longer than expected, but the team wants to do it right.
@@technovangelist Ok well that's good to hear. This is a great project and I'll keep watching it for updates. This weekend I'll try and get things working in WSL2. If I get stuck again, I'll submit and issue on GitHub.
The discord is a great place for questions like that. Discord.gg/ollama
So, what model and quantisation level do you suggest running locally in M1 laptop?
How much memory?
@@technovangelist 64GB M1 Max
You are pretty much open to anything. Depends on how long you are ok with waiting. For inline completion in vscode I still go with tiny. But for bigger jobs I tend to stick to 7 b models. Maybe a few 30 ish billion. Which specific model depends on the questions you ask and you should try that.
@@technovangelist thanks for the tip! 30B you mean quantized?
Q4 is usually the best to start with. It’s the best performance generally and really good results too
I have a question. How is Llama better than Copilot? Or was the video thumbnail just clickbait?
Didn’t I answer that? Or did I edit bad. The fact that it’s on par more often than not and then has all the benefits of ollama. I can use it when I have no connection so can rely on it being there. Then it’s offline thus no worries about privacy and security. Which opens it up to huge numbers of employees at companies that ban use of hosted ai.
What is the hardware requirements for running this model locally with good response times?
I have an m1 and it works great. Responding with a suggestion within half second or less
@@technovangelist thanks how much ram would you recommend I’m thinking of going for the 32Gi
The more the better. I am very hopeful of the small models. But the big ones will still get lots of love. 32 seems like a good min. 64 is a bit more capable.
Always best vidéo by pro
thank you excellent!
Glad it was helpful!
Has anybody come up with a competent assistant for embedded development yet?
Now if we can get an AI to spot food on your hoodie before you start filming. 😂
Danke!
is llama better than copilot?
not sure what the question is. Llama is a family of models. copilot is an extension which submits text to the Codex model which i seem to remember is close to GPT2.
7:09 Doesn't support UEFN/UE Verse scripting language.
Are you basing that on the description, or did you actually try it? Not many use it so I wouldn't be surprised if its not in there, but then again, I wouldn't be shocked if it knew it just fine.
Hi, are these codebase aware ?
Thank you
Some are trying to be.
@@technovangelist Thank you for the answer, I would love to see a video about this topic, just subscribed ^^
It’s a great idea
Dumb question... can i use this to write code for Gamemaker ??
Best way to know is to try
First one to use it, ha? 🤭
??
so many languages and no 1c xD
Not sure what 1c xd means
Understandable. 1c is a Russian programming language for accounting and business related applications stuff .@@technovangelist
interesting. So having a book about c in Russian still won't help. I picked it up at Dom Knigi in St Petersburg when I was in Russia for school back in 1991. Back when you told the person working there what you wanted and they brought it to you and stared at you while you looked at it.
@@technovangelist well, this misunderstanding is understandable, because from the look it's look like english C, but it's actually russian word "С". Basically two different words that's looks exactly the same 😂 So it's nothing to do with c, c#, c+ etc xd
Great video! ❤
Thank you!!