- Видео 5
- Просмотров 31 374
Debug with Lewis
Добавлен 13 апр 2024
Spin off of my main "Coding with Lewis" channel. Breaking things and going into more detail.
Can DeepSeek R1 Run Locally on a NVIDIA RTX 5090!?
I try DeepSeek R1 locally on the NVIDIA RTX 5090. We try different model sizes of different parameters with no quantization as well as Q4. I am pretty impressed with the results!
LINKS
---
MY 12K+ DISCORD 💬
discord.gg/GkrFX4zT2C
CONNECT WITH ME ON SOCIAL
📸 Instagram:
lewismenelaws
🎚TikTok:
tiktok.com/@lewismenelaws
🐣 Twitter:
LewisMenelaws
My gear 💻
liinks.co/lewismenelaws
-----
LINKS
---
MY 12K+ DISCORD 💬
discord.gg/GkrFX4zT2C
CONNECT WITH ME ON SOCIAL
📸 Instagram:
lewismenelaws
🎚TikTok:
tiktok.com/@lewismenelaws
🐣 Twitter:
LewisMenelaws
My gear 💻
liinks.co/lewismenelaws
-----
Просмотров: 11 079
Видео
12 Non-Developer Tools That Boost Your Productivity
Просмотров 9 тыс.2 месяца назад
Here are 12 tools that ARE NOT developer related that devs can use to improve their productivity. In this video, I go over what I use to help me organize my videos and projects! LINKS MY 12K DISCORD 💬 discord.gg/GkrFX4zT2C CONNECT WITH ME ON SOCIAL 📸 Instagram: lewismenelaws 🎚TikTok: tiktok.com/@lewismenelaws 🐣 Twitter: LewisMenelaws My gear 💻 liinks.co/lewismenelaws
Is OpenAI's Realtime API REALLY Worth the Hype?
Просмотров 7 тыс.3 месяца назад
OpenAI has released their realtime API for developers. Is this ChatGPT in real time experience worth it? Let's find out. LINKS MY 12K DISCORD 💬 discord.gg/GkrFX4zT2C CONNECT WITH ME ON SOCIAL 📸 Instagram: lewismenelaws 🎚TikTok: tiktok.com/@lewismenelaws 🐣 Twitter: LewisMenelaws My gear 💻 liinks.co/lewismenelaws
This CHANGED the Way I Use Databases (Atlas)
Просмотров 2,8 тыс.8 месяцев назад
Atlas is a tool written in Go that helps you manage your databases by writing your schema as code. This is similar to existing features like Alembic, Django and other database migration tools. The package is still a bit early and YMMV but this has been a huge help in my productivity. So in this video, I discuss Atlas and how it works for those who may or may not be familiar with databases. Link...
Why I Stopped Using LangChain
Просмотров 4,4 тыс.8 месяцев назад
Here's some reasons why you shouldn't use LangChain is your next AI project. Personally, I have given this framework at least 3 or 4 times now and it's been an absolute struggle everytime. Some of the main reasons include tough documentation, hard to understand syntax and strange abstractions. In this video, I will show you some examples of LangChain doing this and provide some alternatives if ...
What is the name of the GUI tool you are using?
I ran 7b on gtx 1070ti. Speed is pretty good
Thanks to DeepSeek we now know Nvidia is using AI to squeeze these chips, these cards are a rehash with gddr7.
i just installed full 32B model and i have Sapphire RX 7900 XT 20GB Nitro+ and it runs
what are your PC specs?
Honestly I’m fed up with these 5090 videos. The only fricken people in the world that can actually get their hands on these cards are RUclips reviewers! I think I might start my own channel, just so I can get a GPU. 😂
You can run 32B on a 3090 without any issues, and it runs smoothly.
Rtx 5090 and deepseek in the same title is bound to be viral
Intel b580 24gb with zluda
So you bought a 5090 from the scalpers just to run DeepSeek distilled models locally, not gaming? Seriously?
Why is that an issue?
5090s are better suited at AI workloads than they are at gaming. Like what game even needs anything close to 32GB of vram? 😂 Most games use between 8GB to 12GB, with only a very select few that even use 16GB, which usually involves full path tracing. The 5090 is literally using a binned GB202 die used in AI workstations.
I want you to run more AI models locally on your PC.
its not about the quantization. its about the actual model in parameters size 671b even if you run it at q4 its still much better than all these distill versions because the base model for that was deepseek v3 which is a very good model. And I know its not for home lab as least for now. but there are ways it can run at 1.58 bit with unsloth''s method. what require 131GB vram instead of 741GB vram
I just turn deepseek on my macbook pro of mid 2017 with the badest intel CPU
Please run 70b. Also can we use two GPU for faster and more accuracy?
I got it running on a TITAN X Pascal of course it will I even run it in my application
Which model the web of chat deepseek use for it self?
Yes, it can even run locally without a gpu. Clearly performance is affected, but it can run.
16p, q8 does they have difference in output?
why is 14b using so much of your VRAM? I can run it on a 16gb card with a couple gigs of slack
oh it's not quantized
@@agush22 lmao
hum...maybe two Radeon RX 9070 XT can run better more
i can run 7B on my 3070 pretty well, so why pay more
no, you can't. the distills are not "versions" of the model.
They are hybrids of R1 and other models (either Llama or Qwen depending on the one you download), their weights containing information from both models they were created from. I don't think it is unreasonable to say something like DeepSeek R1 Qwen Distill is a "version of R1," and equally I would not think it is very unreasonable to say it is a "version of Qwen," both statements are true since it's a hybrid of the two. It is being oddly nitpicky to try and fight against this.
@@amihartz sure but it cannot be compared to the real R1, they are not the same model.
You are correct, but 99.9% just can't grasp that the distilled models are qwen or llama. Heck, it even states the arch in this video a such and people still think it's R1. Notice the other one in this thread yapping about it being a hybrid, etc. Sigh.
@ They are objectively not Qwen or Llama, this is easy to prove just by doing the "diff" command between the models, you will see they are different. The models are R1 Qwen Distill and R1 Llama Distill, not Qwen or Llama, nor are they R1. You are spreading provably false misinformation.
@ they are qwen and llama based, yes the weights have been changed but it does not matter. if you do a distance analysis they are very very close.
no deepseek can not run on a rtx 5090 but on a raspberry pi
Do you really use the correct Deepseek R1?? I use the one from Ollama, and it had no problems answering the questions on the 7B model, also, the 32b model is only 20GB
He might've downloaded the unquantized version.
@@amihartz aahh, yes did not think about that :)
i have a very old laptop and it running 7B model makes it go bonkers. i am looking to shift to mac mini m4. for running 14B model will 16gb be enough? or should i go for 24/32?
The more the better honestly. But 16 does me really well. Just can't go any higher than the base sizes.
macs have unified memory so the vram is also your system ram, 14b is around 11gb you would only have 5gb left for macos and whatever else you are working on
@@prof2k which parameter are you running right now?
@@agush22 so 24/ 32 gb would be better for running 14B?
Make a video on how to make any deepseek model quantized
i run that 33b model in RTX 4070 Super, it's really have amazing performance
The fact the AI is reporting having a whole internal debate about how many R's are in strawberry. It's six btw
i got 32b running on my M2 granted it's slow as balls but if i close almost everything it'll run, 14b is almost usable and anything lower runs like the wind, looking at your memory usage is bizarre, maybe i don't have context windows setup but my 7700xt can also run 14b but not 32b, and my mac has 24gb of ram letting it pull 32b nvm i have quantised versions of da models
Even a phone can run 1.5-7 b
Same here, but my Ollama Deepseek did not have any problems with the questions either, so wierd that his did not even could answer the strawberry question correct :)
No joke, In this first release batch of 50 series cards, I think NVIDIA unironically shipped out more review samples to RUclipsrs than they did to retailers. Maybe not too surprising, i guess. If stock is low, may as well build hype instead of selling a few hundred additional GPUs.
the issue with trusting ai is we taught it how to process data, and trained it to give outputs, but we don't know the processes between the two it's considered the black box on some channels it's interesting that deepseek does the thought process thing before giving you the real output. it's aimed for transparency and to give you insight on the black box, but now there's the question of how was the output of the thought process generated? still the unknown black box issue but a clever idea
Eh the thought box is not what it actually "thinks". It just answers the prompt a first time, and then summarises it into the "real" answer. There is no thinking going on, we know _how_ it works. We just don't really understand why it works so well.
It's still a black box , it's generating it's reasoning the same way got generates final output
@randomseer well before you weren't sure what biases were in play for your final output, now you get a false window into how it came up with what it said, but even that solution has a new black box, we still don't fully understand it's interpretation and biases because the output of it's deepthink process it's still inexplainable
its so annoying that Nvidia crashed because deepseek r1, a highly hallucinating copy of a copy (trained by gpts outputs), benchmarked alongside gpt why sell Nvidia? you can run it on M2s? cool that means you can run it better with 5090s Nvidia is down 600 billion for what?
It'll probably trickle back up. It's investor panic by people who aren't really informed about technology and the implications of certain things. I do think NVIDIA is pretty risky though. I think if it takes AI too long to become profitable, people will pull out.
It's not about nvidea consumer GPUs , it's about the fact that it was trained with a lot fewer nvidea GPUs than people expected. The primary reason Nvidea is valued is for the GPUs they use for training
@@randomseer This. If companies are telling investors they need, say, 2 million GPU's to run ChatGPT but an alternative comes along that shows you only need 1/10th of those, well, then the demand for said GPUs might be a lot less than the 2 million.... That and the fact that smaller models that are just as accurate can run on competitors' products (say AMD gpu's or Apple M series stuff) means that demand for Nvidia might actually be even lower. Lower than forcasted demand means with alternatives might mean that the moat Nvidia has is non-existant. Those could be the reasons Nvidia dropped.
How has nobody found this??
new channel
@DebugWithLewis cool
Well technically that's not the real model 🤓☝
Okay, dip****
🤬🤬🤬🤬
I don't know why people say this, all the models are "real" models, they're just different. It would make more sense to say that it is not the "original" model, because the Distill models were produced by taking things like Llama or Qwen and readjusting their weights based on synthetic data generated from R1, so the weights are a hybrid of the two models (either a hybrid of Qwen+R1 or Llama+R1 depending on which you download), but they are still "real" models, just not the original R1. I don't know what it would even mean to have a "fake" model.
??? So when you train on the output of o1 model suddenly the model becomes o1?? Naw its just qwen2 finetuned via grpo
@ You literally are changing the weights of the model, it is no longer the same model. To claim that a modified qwen2 is literally identical to qwen2 is easily falsified just by running the "diff" command on the two model files. They are different models. If you adjusted qwen2's weights based on the output of o1, it would neither be qwen2 nor o1, but would be a new model that is hybrid between them and would take on characteristics of both, as this literally causes the model to acquire information and properties from o1.
Hi everyone! Thanks for watching :) Let me know how you like this format, trying to post a lot more but not get in my head about it.
Hi, Thanks for this video! I am thinking about getting the RTX 5090 (someday!) so this videos will be of great value! I hope you do more!
i think deepseek is very overhyped it has categorical thinking, no thinking is better than categorical thinking it assumes new info by itself and produces erroneous results.... and it even doesn't understand basic instruction like i told it to translate a copied subttile text , even when i taught it to do it right it intentionally made mistake when i gave the entire text
Thank you. But I think this a little shallow. Have you used the realtime API in a production environment ? Curious how this works when it’s exposed to real customers.
They made gpt4-mini available for it, so it's much more affordable now.
Great content, what are you usng now instead of Langchain? What would you recommend for production?
Are you cutting, skipping, editing or anything at all to make this feel actually realtime? because the API is not truly realtime and it averages around 400-700ms to deliver. If you are getting actual real-time performance, could you share your setup?
You forgot a VERY IMPORTANT DRAWBACK.... My company experiments a lot on OpenAI RealTIme API. Here is the main outcome: The audio will alsways drift from the transcript. The spoken audio is UNABLE to follow a list of more than 10 Items, it starts generating it's own inventions from the 5 or 6th item, mixes the data, replace it with whatever, skip items... E.g. just ask it to list the planets of the solar system, in distance to sun order: It messes up totally by the end. For the clever guys: We set minimum temperature and the role states that "data should be listed in sequence from first to last item" ... No way 😅
I don't see any difference from using liquidbase or flyway, additionally, I can use clusters-K9 when I'm in production to save time on pitfalls. Not familiar with Atlas, I'd appreciate your insights
Notion, TickTick, ClickUp, Discord, Mattermost, Obsidian, Logseq, Thunderbird, Brave, Firefox Developer Edition, Zen browser, Miro, Loom, RemNote, 1Password, Superhuman, and Missive
I focus by watching your vids
notes 📝 all apps summarized, hope it helps. 1. excalidraw 2. todoist (natural lang processing, rest api) alternatives ( google calendar for time block, notion databases, pen and paper) 3. note taking 3.1 notion ( has api, databases works best ) 3.2 obsidian (excellent dev support , graph view, loads of plugins, faster to load) 4. ai tools - his friend dreams of code does not use github co-pilot - he uses ai as rubber duck than a codegen - he loves open source ai models 4.1 ollama with models + open webui 4.2 claude 3.5 sonnet is best for coding 5. timer (pomodoro 25-5) - not the best for programmers so he stretches to 50-10 5.1 pomofocus 5.2 he actually uses a physical kitchen timer he shopped on amazon 6. notion calendar app 7. superhuman (email client) (note free) 30usd/m - everything is done using keyboard - distraction free and fast 8. raycast (spotlight replacement , mac only) 9. pastebin * 10. spotify 11. hacker news (for insights and latest news) - fair warning , people like to complain here a lot 12. screendim 12.1 f.lux : he used to use this app earlier nowadays both mac and windows have this natively. makes screen warmer - windows (night light) * personally I stopped using pastebin a long time ago due to some security concerns. check online for more and current state.
personal take away was notion calendar I was not even aware that notion had a dedicated calendar app. I use obsidian and emacs. I dont use notion as I like to keep all my data to myself locally. I would like to add 1. flameshot for screenshots on windows 2. flow.launcher for spotlight alternative on windows 3. espanso for text expansions and those who like to go extreme use autohotkey. 4. check tldraw if you like excalidraw , its a more streamlined interface.
I use google docs, chrome notes, and pen and paper. Also google calendar
I liked obsidian so much, I saw i couldn't use it commercially and moved to Joplin immediately. UI and data management wise obsidian is still best. But I like open source software for the peace of my mind
I diagree. The graph is great if you have a very big system. For me, I use logseq which has a similar graph view. I've been writing on it for 4 years now and the graph view is just massive. It gives you a whole idea on the stuff the you've explored. It is always fun to generate blogs or content from it.
I started with Notion and Obsidian, but... I merged note taking and tasks management in Logseq. Maybe my most indispensable tool.
Haven't you guys heard about Logseq? With the extensions I use, I can replace Excalibur, todoist, notion, and many more. I just use it everyday, and the best thing is the linking of pages, just #page links right to that page, even if you forget it it still links in unlinked references. It's greatttt !!
Ive been an Obsidian nerd for a while but i dont like how you cant just put a git on the vault you use intuitively for online storage. I know there are plugins that add this but its just sort of wonky
What do you mean? It's just a bunch of markdown files that you can version control with git if you want