Wait, my data is not private with o1? I didn't know that. Where can I check this? Where is this notified to the user, or did they bury it in small text?
I just used Llama 3.2 locally and asked about starting a 3d printing business as a 3D beginner. It gave a similar output of what you spent a good time building in this video... Maybe do it the next time, show a before and after response from an LLM.
Btw, the concept you reached in this video of undetermined number of agents is far superior than it was from a video from 5 days ago. Really awesome 👏🏻
You could try "When providing responses, use concise and primary representations. However, include additional details only when needed to ensure clarity and completeness of the task" and you should get short response's with out compromising the chain of thought.
😂 I love the way you have called out your mistake 4:00 it was just so delightful to see you handle it like a boss that I have had to replay it more than 3 times to enjoy the moment... You are definitely a smart man!!! I am eager to see the evolution over time!!! 😅
"Comments are apologies in code." - Robert C Martin. Cursor is helping you. Also for the price of the spec of this machine, you can buy an insane number if tokens from anthropic or openai. It might be worth getting people started using a hosted service.
You can also use the ollama streaming output to generate text. This way you know what’s the generator is doing. Also I think that GPT o1 does more then split up a task and let agents fix the individual tasks. But nevertheless, a nice tutorial on making agents.
So, you've used Claude 3.5 (2024 october update) within Cursor AI Editor to develop a (simple) python script that run some agenting on a 70b model on ollama? Where's the o1 in here?
o1 is a reasoning model which kept their reasoning 'recipe' private. This is his take (which resonates with the average user of locally owned open source models) to kind of hack the way the 70b model works and simulate reasoning to enhance the final output> a simple method which actually does provide better replies.
Maybe I missed it, but what hardware is needed for that nemotron - it is 43GB? Doesn't that mean you need at least that much VRAM? And here I thought I was a baller with my 16GB vram...
I have test o1 - and it is not so smart. People still need to quide its selections. Big problem with models is censorship, someone else have select what you can do and not to do with these tools.
How much you'r local O1 results has more accuracy in comparison to original nemotron 70b and llama 3 3b without uaing chain of thought? Was there any improvement in bechmarks like Humaneval and MMLU?
I want to build an agent swarm to do coin margined futures btc trading. With each agent handing a serpearte part, ta, market sentiment, execution, risk tolerance, is there a way to keep each model small and only train it to focus on its task?
You'll need over 40gb vram so like 2 x rtx 4090 might be a good option. No idea what hardware is being used in the video. Anyone saying 'watch the video' should provide a timestamp.
Hey David, how can we reassure clients that their data is secure and won't be shared with the LLM provider for internal training purposes? What steps can we take to ensure their data privacy and address any concerns they might have?
I want to preserve a million-word dialogue between myself and my ChatGPT on multiple threads while upgrading to your recommendations. How do I achieve that?
Already have it lol. Running it on my RX 7900XTX with q4m, but i think ill buy myself 1-2 Radeon W7900 Pro to gain a lot more performance. Alsp you don't need Ollama for it, because it's available in LM Studio and it's downloading from Huggingface. Btw what PC hardware specs do you have?
He's clearly using a 128gb Macbook Pro which can use the memory as vram. He's running un-quantized. How much vram do you have on your gaming gpu? Nobody asked about your hardware bro.
@@rhadiem Every PC can use the RAM as VRAM. It's how computers work. It's called virtual memory. If the VRAM fills up, the computer uses the RAM as backup memory, to stay stable and not crash. But the RAM is waaaaaaay slower than the VRAM, that's why I am asking him what specs he has. My GPU has 24GB of VRAM and even with the Quant 4M (around 32GB) model of Nemotron 70B my VRAM gets filled up completely and my RAM also to 50GB, which slows down the model to such an amount, that it's painfully slow. He is using a way bigger model, without any issues. If he has a GPU with this huge amount of VRAM, this would be totally understandable, but with the RAM? I don't understand why lol. 😄
Wanna build your own AI Startup? Go here: www.skool.com/new-society
Why aren't you using Msty?
Wait, my data is not private with o1? I didn't know that. Where can I check this? Where is this notified to the user, or did they bury it in small text?
nice clickbait
It’s not clickbait tho
@@kylev.8248 You must be King of fools
I just used Llama 3.2 locally and asked about starting a 3d printing business as a 3D beginner. It gave a similar output of what you spent a good time building in this video... Maybe do it the next time, show a before and after response from an LLM.
😂
Btw, the concept you reached in this video of undetermined number of agents is far superior than it was from a video from 5 days ago. Really awesome 👏🏻
Great! Now build a local agent with lama that can control your computer like Antropic
Very doable with Open-Interpreter which is open source and free
@@orthodox_gentleman How much should I pay someone to setup for me?
I literally did this 6 months ago.
You build it.😂
Skyvern!
You could try "When providing responses, use concise and primary representations. However, include additional details only when needed to ensure clarity and completeness of the task" and you should get short response's with out compromising the chain of thought.
There is a model called Llama3.3B-Overthinker. I think it would fit the task quite nicely.
Is there available in Ollama or hugging face? If you don't mind the question. Thanks by the way for giving directions..
😂 I love the way you have called out your mistake 4:00 it was just so delightful to see you handle it like a boss that I have had to replay it more than 3 times to enjoy the moment... You are definitely a smart man!!! I am eager to see the evolution over time!!! 😅
Cool new format with the presentation man
Nice, your contribution to the open source community is awesome!
opensource?
@@ysh7713 Well kind of ~ better than giving all you data to a faceless big company who wills steal your data 100%.
That mark on your nose-it’s almost like a signature, something that’s so naturally you.🖖👍
Great video. Which hardware specs do you have? :-)
If you instruct the agent to use the fewest possible lines, it's likely to eliminate comments, which is suboptimal but expected.
"Comments are apologies in code." - Robert C Martin.
Cursor is helping you.
Also for the price of the spec of this machine, you can buy an insane number if tokens from anthropic or openai. It might be worth getting people started using a hosted service.
You can also use the ollama streaming output to generate text. This way you know what’s the generator is doing.
Also I think that GPT o1 does more then split up a task and let agents fix the individual tasks. But nevertheless, a nice tutorial on making agents.
🎉 you actually made it. Thanks
So, you've used Claude 3.5 (2024 october update) within Cursor AI Editor to develop a (simple) python script that run some agenting on a 70b model on ollama?
Where's the o1 in here?
o1 is a reasoning model which kept their reasoning 'recipe' private. This is his take (which resonates with the average user of locally owned open source models) to kind of hack the way the 70b model works and simulate reasoning to enhance the final output> a simple method which actually does provide better replies.
He is not sharing his research paper published in arxiv my man.
Good stuff, Ondrej!
Dude, there are very few people that can run nemotron locally….
awesome video David! How to train this model based on my dataset? and How to give it a nice UI?
exactly what I needed thank you so much David🎉
Maybe I missed it, but what hardware is needed for that nemotron - it is 43GB? Doesn't that mean you need at least that much VRAM? And here I thought I was a baller with my 16GB vram...
What a beast Macbook you need to have to get such a fast response. I have 7800x3D and 4080 rtx and its waaay slower.
Inspiring stuff. Cheers!
I have test o1 - and it is not so smart. People still need to quide its selections. Big problem with models is censorship, someone else have select what you can do and not to do with these tools.
Brooooo , there are tools you are behind on . Agent s and Claude computer use?? E2B has an open source version tooo 😊 stay blessed Ondrej
What a great explanation. Thnx
How much you'r local O1 results has more accuracy in comparison to original nemotron 70b and llama 3 3b without uaing chain of thought?
Was there any improvement in bechmarks like Humaneval and MMLU?
Very inspiring! Thanks
You should use Anything LLM and docker / Open WebUI
@DavidOndrej, what is your Mac specs? I have a Macbook Pro M3 Max 48 GB..
Missing the comparison between result using multiple agents and result using just 1....
Disappointing. We Don t even know if it is worth the work....
You downloade nemotron and not the 70b version which is why you had the error
I want to build an agent swarm to do coin margined futures btc trading. With each agent handing a serpearte part, ta, market sentiment, execution, risk tolerance, is there a way to keep each model small and only train it to focus on its task?
We have had AGI for over a year.
Not a chance with my elderly MacBook Pro. Looks like I need some new gear…
Yet, we don't one that could trade Nasdaq futures.
locally what hardware do you need to run this at minimum? i have a 64gb ram + 3060 12gb
Watch the video
You'll need over 40gb vram so like 2 x rtx 4090 might be a good option. No idea what hardware is being used in the video. Anyone saying 'watch the video' should provide a timestamp.
this guys is rich. not even joking so
He is on a MacBook Pro bro…
64GB RAM + 4070 Ti Super (16 VRAM) = Run Nemotron-70b-instruct-q2_K
Oh yeah im sure openai is quaking in their boots, bro.
Also, I hope the bruise on your nose heals soon. Been a long time now.
I think it’s a medical device that helps him breathe
Hey David, how can we reassure clients that their data is secure and won't be shared with the LLM provider for internal training purposes? What steps can we take to ensure their data privacy and address any concerns they might have?
You d to ask that in David’s classroom at skoool
@@cdunne1620 Sure
He mentions that in the video like four times
@@haljohnson6947 can you mention the specific timeline where he described about it.
@@haljohnson6947 pls mention the timeline where he mentioned it.
There is also an nemotron-mini model which is only 4b.
How good it is? In hugging face I saw nematron was in a bad place
Really??? Omg that is great
Dadusak!
why do you have that thing on your nose?
For correcting the nasal path/nose bridge (or something like that
More oxygen bro
Soccer players used to wear them years ago for example Robbie Fowler for Liverpool
Cool!
Have you had a chance to compare your results against GPT4o?
You should have a discord community to people share projects and business
Never mind, now I got the business model on skool. Nice call, thinking about joining it
clickbait or misleading title
I want to preserve a million-word dialogue between myself and my ChatGPT on multiple threads while upgrading to your recommendations. How do I achieve that?
modern day sham(mer) 👍
What vscode extension are you using for your ai?
Are my prompts on o1-preview used to train the AI even if I opt out? Where do I find this information?
No. Not quite. You have to split the turns.
27:36 LOL🤪
Already have it lol. Running it on my RX 7900XTX with q4m, but i think ill buy myself 1-2 Radeon W7900 Pro to gain a lot more performance. Alsp you don't need Ollama for it, because it's available in LM Studio and it's downloading from Huggingface.
Btw what PC hardware specs do you have?
He's clearly using a 128gb Macbook Pro which can use the memory as vram. He's running un-quantized. How much vram do you have on your gaming gpu? Nobody asked about your hardware bro.
@@rhadiem Every PC can use the RAM as VRAM. It's how computers work. It's called virtual memory. If the VRAM fills up, the computer uses the RAM as backup memory, to stay stable and not crash. But the RAM is waaaaaaay slower than the VRAM, that's why I am asking him what specs he has. My GPU has 24GB of VRAM and even with the Quant 4M (around 32GB) model of Nemotron 70B my VRAM gets filled up completely and my RAM also to 50GB, which slows down the model to such an amount, that it's painfully slow. He is using a way bigger model, without any issues. If he has a GPU with this huge amount of VRAM, this would be totally understandable, but with the RAM? I don't understand why lol. 😄
How’d you get composer in the sidebar?
i fkin slep bro
No repo to share the code?
How to make local ai with backpropogation feature ( if got wrong stuff, CEO instruct what's wrong and it improve sub local agent by time )
Lol, that's not how O1 works. You can't tell it in the system prompt
1 token per second is too slow for any pratical use...
Bro llama is nowhere near o1 wtf
Which computer u can use for local llm
strong pc
99% of free stuff sucks. One of them is this video. 20 minutes to answer "why is the sky blue?"
free stuff has a learning curve, it's not everyone's cup of tea
99% of paid software sucks and it hurts your wallet
😇
😂 you need a graphic card with a price of a Tesla car to run that module locally ; btw you talk like 10.000word/min , 😅
Title is misleading. You are using Llama which is a LLM but not a Reasoner model
David i would like to create sales agents,lead generators,receptionist,appointment setters and I want to sell them.Can you help 😢
1st one to comment 😄
Awesome!
I am getting following error:
bhushan@Bhushans-MacBook-Pro ~ % ollama run nemotron
Error: llama runner process has terminated: signal: killed
After clearing some memory now it's started working...