Indeed! The 01 model is definitely superior for coding, so far it has been flawless and surpassed the preview, as expected. The latency is a small price to pay. However, I am keen to see how Anthropic respond 🙂
i still get better results with claude. however i agree on the clean formatting. a really underrated practice is to generate examples with whatever model is best at that and feed it as context to your workhorse model. having a dataset of good examples works wonders for any llm application.
@@Kiki-qh7xk good for you, if you like an AI that can't even reason. Some people dont need reasoning depth, because their work and life lacks the depth that they would need from an AI anyways.
There are many variables (use cases, prompting style and so on). I’m primarily interested in Swift coding. Recently an app enhancement that I wanted to implement was apparently difficult for Claude to implement but when I gave the task to the 01 model it produced an elegant and robust solution immediately. However I have had very good results when using Claude Sonnet 3.5 in other projects so I will continue to use it. Thanks for your RUclips videos which are always informative, helpful and interesting 👍
BTW you can also use gemini-2.0-flash-exp and gemini-2.0-flash-thinking-exp-1219 in Cursor (you need to add a Google API Key and add them manually to models) In my first tests these models perform very very good :)
I think you should compare it with Claude for the same tasks. Giving Claude the same tasks would allow you to compare both models in terms of response quality, speed, cost, and overall performance. However, I don't think the two tasks you provided are very representative, as the documentation appears to be well-structured and comprehensive. These tasks seem to mainly involve reading the documentation and combining the provided examples. It might also be valuable to test the same tasks on GPT-4 and even less advanced models, to better evaluate the actual level of difficulty. And what would be even more interesting is to assign O1 and Claude more complex real-world tasks, such as working within an existing codebase to add a new feature or solve a specific problem. Most models are quite good at generating standalone files like HTML, JS, or CSS, but they tend to struggle when dealing with an existing codebase.
You should use Composer rather than Chat. There's also an agent option there which may've worked better for one-shot. Also, you can add docs in Cursor itself under Features in the settings.
Do you always have to scroll through the chat to find different code snippets and apply the individually one by one? Doesn't Cursor provide any better experience for dealing with this?
Instead of Chat, he could have used Composer, which will edit multiple files per prompt. All you have to is approve or reject the changes. Sometimes Composer updates files when you're not expecting it, and I often find myself telling it 'don't change anything' when iterating a strategy for building a feature. I probably could get the same result by switching to Chat, but Composer feels smarter, dunno if that's true. Anyone have thoughts on that?
You know I really liked the idea of Claude, and I even coded with it for a time. But holy shit, their business model is annoying as hell within the actual Claude app. Even as a paying customer I had a limit per chat. Meaning I’d get really far into a project, get a cutoff mid-project, and have to start a new conversation with zero context. I use chats as projects. Funnel my entire limit into one chat for Pete’s sake.
yeah why shouldn't they give everyone an ability to code for free endlessly for $20 a month :) same with gas in the cars - if you paid for a few liters, and started your around the world trip, why do you have to recharge so many times, it should just work till the end of times :) damn, people get used to incredible things so fast and feel so entitled.
@@jaxxedbytes that’s weird, I could’ve sworn I said “funnel my limit into one chat”, and not “I should get unlimited access”. ChatGPT 4o does this. It isn’t an unreasonable ask.
@darknessguy4221 that may be true. There is a good alternative btw. The google flash-2.0 models are free at the moment and can be used in cursor (google api key is needed). They seem very very powerful and ultra fast.
@@multi_variate Woke is a term used to describe heightened awareness and activism regarding social and racial justice issues, often emphasizing the importance of equity and inclusion. Nothing to do with tech. You're using it as a shorthand to diminish responsible governance, which is wrong because our 'woke' government has created the most powerful country the world has ever known, precisely because the wealthy and corporations must abide by regulations meant to protect the health and well-being of all of us.
All ai are useless just for boilerplate If one that train with an open source repo then u will get these type of ai dev u will get what I'm saying if ur not copy-paste
Using o1 for setting up a project and Claude for doing the small edits is the best of both worlds. I love how cleanly formatted o1's code is!
Indeed! The 01 model is definitely superior for coding, so far it has been flawless and surpassed the preview, as expected. The latency is a small price to pay. However, I am keen to see how Anthropic respond 🙂
@@19LloydG in my experience, claude is still better.
i still get better results with claude. however i agree on the clean formatting.
a really underrated practice is to generate examples with whatever model is best at that and feed it as context to your workhorse model.
having a dataset of good examples works wonders for any llm application.
@@Kiki-qh7xk good for you, if you like an AI that can't even reason. Some people dont need reasoning depth, because their work and life lacks the depth that they would need from an AI anyways.
There are many variables (use cases, prompting style and so on). I’m primarily interested in Swift coding. Recently an app enhancement that I wanted to implement was apparently difficult for Claude to implement but when I gave the task to the 01 model it produced an elegant and robust solution immediately. However I have had very good results when using Claude Sonnet 3.5 in other projects so I will continue to use it. Thanks for your RUclips videos which are always informative, helpful and interesting 👍
BTW you can also use gemini-2.0-flash-exp and gemini-2.0-flash-thinking-exp-1219 in Cursor (you need to add a Google API Key and add them manually to models)
In my first tests these models perform very very good :)
So, how much did that o1 usage cost you?
Cursor o1 charges is 40 cents per request.
that's the right question.
You have 10 01-mini requests/day with Pro subscription, dunno about 01
@@ArcanoIncantatoreisn’t it unlimited?
@@multi_variateif true that is actually insanely expensive hahahaha
what was the API cost for these chats, or its it included in Cursor subscription?
I think, at the moment, the cost per call to o1 (and o1 preview) is 40 cents USD.
10:45 You opened the HTML file directly from the drive instead accessing the node server. IMHO the Oneshot worked.
I think you should compare it with Claude for the same tasks. Giving Claude the same tasks would allow you to compare both models in terms of response quality, speed, cost, and overall performance.
However, I don't think the two tasks you provided are very representative, as the documentation appears to be well-structured and comprehensive. These tasks seem to mainly involve reading the documentation and combining the provided examples.
It might also be valuable to test the same tasks on GPT-4 and even less advanced models, to better evaluate the actual level of difficulty.
And what would be even more interesting is to assign O1 and Claude more complex real-world tasks, such as working within an existing codebase to add a new feature or solve a specific problem. Most models are quite good at generating standalone files like HTML, JS, or CSS, but they tend to struggle when dealing with an existing codebase.
You should use Composer rather than Chat. There's also an agent option there which may've worked better for one-shot.
Also, you can add docs in Cursor itself under Features in the settings.
can only use gpt4 or sonnet 3.5 in Composer
@ ah! Good to know, thanks. Wonder if that will change in the future.
@@witchcraft8118 Not true, you can use any model in composer "normal" mode, if you want composer "agent" mode then you can only use 4o or sonnet 3.5
Is cursor much better than just VS code with roo-cline
For example with roo-cline you can connect an Obsidian MCP so it can search and take in knowledge during its work
OHH WOW THE AUDIO TRACK TOO??? 🎉🎉😮
Do you know that you can just paste the URL of the website in the cursor chat? No need to copy the entire content over.
It's following cursor rules much better than any other model. But cost is high. It's easy 10$/ day
thanks. was this costly?
wow it's amazing that you added Indonesian audio.
How do local docs files work?
Looks very impressive. Would be interesting to see a direct comparison with claude with the same prompts
Do you always have to scroll through the chat to find different code snippets and apply the individually one by one? Doesn't Cursor provide any better experience for dealing with this?
Instead of Chat, he could have used Composer, which will edit multiple files per prompt. All you have to is approve or reject the changes. Sometimes Composer updates files when you're not expecting it, and I often find myself telling it 'don't change anything' when iterating a strategy for building a feature. I probably could get the same result by switching to Chat, but Composer feels smarter, dunno if that's true. Anyone have thoughts on that?
Does anyone know whether o1 pro mode also will be available through the API and if so how much Cursor might charge for it?
isn't o1 like super expensive?
o1 is cheaper than GPT-4. o1 is $15+60/M while GPT-4 is $30+60/M. o1 produces more tokens so in practice its price is closer to GPT-4-32k.
You know I really liked the idea of Claude, and I even coded with it for a time. But holy shit, their business model is annoying as hell within the actual Claude app. Even as a paying customer I had a limit per chat. Meaning I’d get really far into a project, get a cutoff mid-project, and have to start a new conversation with zero context.
I use chats as projects. Funnel my entire limit into one chat for Pete’s sake.
yeah why shouldn't they give everyone an ability to code for free endlessly for $20 a month :) same with gas in the cars - if you paid for a few liters, and started your around the world trip, why do you have to recharge so many times, it should just work till the end of times :)
damn, people get used to incredible things so fast and feel so entitled.
@@jaxxedbytes that’s weird, I could’ve sworn I said “funnel my limit into one chat”, and not “I should get unlimited access”. ChatGPT 4o does this. It isn’t an unreasonable ask.
You really should use the composer with the agent option enabled in Cursor.
Much better for tasks like the one you did in the video.
If he uses cursor while on composer mode, the bill for using 01 is more expensive
@darknessguy4221 that may be true. There is a good alternative btw. The google flash-2.0 models are free at the moment and can be used in cursor (google api key is needed). They seem very very powerful and ultra fast.
Kris I do not have access as well - Maybe because we are from EU.
i have the o1 model available in the models options. i am from germany
@@superlama6452 Woke regulators are going to keep you away from innovation. Bring the change.
@@multi_variate Define woke?
@ontheruntonowhere regulators who don't understand the tech but pretend to.
@@multi_variate Woke is a term used to describe heightened awareness and activism regarding social and racial justice issues, often emphasizing the importance of equity and inclusion. Nothing to do with tech. You're using it as a shorthand to diminish responsible governance, which is wrong because our 'woke' government has created the most powerful country the world has ever known, precisely because the wealthy and corporations must abide by regulations meant to protect the health and well-being of all of us.
Amazing
thats impressive
The composer agent would have been interesting
I think it would be better to use sonnet on a regular basis and solve the harder problems with o1 otherwise your waller is gonna be broken 😂😂
how much do I have to pay you guys to advertise about my AI framework ? Mine is the sh1t
You’re all doing AI coding completely wrong! 😂😂😂😂
Please, add arabic to audio track.
All ai are useless just for boilerplate
If one that train with an open source repo then u will get these type of ai
dev u will get what I'm saying if ur not copy-paste
Just for boiler plate doesn't sound useless, boilerplates are very useful.
😂😂😂😂 wrong