Great Video. I NEVER trust AI for coding questions. I do use them to point me in a direction where I can search elsewhere to find an answer to a tough question, but I don't ever trust exactly what they give me.
I haven't tried any of these other AIs, but I used llama on a whim to help me learn D3.js and it was immensely helpful. The docs were really hard to find information with and not very good at breaking down how it worked. Asking llama (idr if it was 70B or 450B) questions about things I just couldn't figure out really helped take some of the pain out of the process. I could ask questions about the answers it provided and it felt a lot like a conversation with someone who knew and understood D3. There were a few instances where the solutions it provided didn't seem to work but when I pointed it out, it did fix it. I thought that was pretty understandable since I asked it questions about a project that it never saw, only basing it on the context I provided in my prompt, and D3 has changed a lot over the years so it may have based some answers on stuff on the internet written about previous versions.
On Question 6 ~17:40 seconds in Claude made up an answer. It select "B) 1.5rem + 1.5vmin" but in the question answer B is "B) 1.5rem + 1vmin" . They are not the same. The AI made up a new answer how fun is that.
Kevin thank you for producing the very educational content you publish. I have been a subscriber for sometime but this is my first comment. I am very interested in learning more relative to large design models (LDMs). Your expertise in reviewing LDMs would be very interesting and worthwhile.🎉
Yay, KPow had mentioned this video on his discord awhile ago and I’ve been [im]patiently waiting 😂 loving the long format again and this is a very interesting comparison! I’ve experimented with copilot, chatGPT and Claude in terms of JavaScript refactoring not as much css and in my tests Claude came out on top; copilot and chatgpt was dismal and laughable. To be fair , a lot of this was in February or March and these things change and learn so quickly , the same experiments would most likely glean very different results now in September 😂
What's interesting with claude is that one of its answers here said something like "as defined in the CSS specifications". I wonder if that means that it was trained using the CSS specs as one of its data sources. That would also explain why it nailed things like the unit definitions.
I really like this! I keep telling my friends not to trust the AI stuff, but some of them insist on "learning" by asking the LLMs. I have always preferreds the source materials - W3C and MDN are still the greatest resources on the web. That said, I wonder how Codeium stacks up? (I do use AI assistants in my edditors, but I always double-triple check their code. Remember: never use someone else's code without understanding yourself!)
Thank you for this! It gave me the idea to ask Claude about a technical question (not css but docker related) and it nailed it on the first try! That was not experience previously with Copilot …
Great stuff. I use it most often as a syntax reminder or template generator for programming. Although sometimes they work okay, sometimes when I use GPT or Copilot they either don't understand the question at all or have to be guided in the right direction. IMO in the case of css these differences result from when the model training was completed. For example, you can see a big difference between gpt4 and 4o. By the way, it's time for me to check Claude.
At 7:45 you got the question wrong. You said, "as long as my selector has equal specificity, it will not work" but the text said "As long as my selector has equal specificity, it will work." That's why Gemini did it so terribly and flipped the specificity because you told it something that was not true and it thought that equal specificity was needed.
24:24 My guess why Copilot got this wrong is, like screen readers used by the visually impaired, it read INSIDE as an initialisation and didn't understand its meaning. You should have used text-transform:uppercase; PEBKAC 😜
Copilot is based on an older GPT model, would be good to also compare against the latest ChatGPT 4o. Also: LLMs can get confused by large amounts of extraneous chat context. It's better to ask new, unrelated questions in a new chat.
A lot of these fails come from not being strict enough, which in normal case would be correct approach. But in the world of software development one should differ a typo from some weird token which can be a css unit or a part of new specs. Funny how they can pretend that 1vm is.the same as 1.5vm but at the same tiime do not even bother to ignore case when it comes to the unit of q/Q. You could probably require them to perform strict syntax checking but then you should also avoid errors like omitting NOT in the Wordpress question.
Great video Kevin! Recently read about Claude 3.5 Sonnet had shipped with a lot of coding (and math) in mind. Would love to the a rematch when/if the others make an update. Regarding Q, afaik it is not really part of the metric system but seems to be a unique unit for the web?
Copilot is only trained to December 2023 I read today and that's gpt 4o or whatever it's called. Not sure which chatgpt copilot is based on but it's not that great. Claude has definitely been the best for coding since I've been using it - at least for python but it seems like for CSS too
The funny thing with these models is that they’re always, like, two years behind on web technologies. Like I cannot for the life of me get any LLM to touch subgrid with a 10 foot pole
Gemini got the first questions wrong. It says that the rem in the media query would be relative to the html font size. In particular it says that 1rem would be 32px in this case.
Oh the struggles of trying to get LLMs to do anything correct in CSS. Not only is it terrible at CSS shapes and such, which is basically just math, but it keeps using rgba(0, 0, 0, 0.5) instead of just rgb(0 0 0 / 0.5) and other outdated approaches to things. I have completely given up on asking an LLM for CSS things outside of "Is there a way to do.." and using the response to figure it out myself.
Thx, i acrually try only severtimes some simple task for CSS, and it got wrong in every time, so I stop using it for CSS, full of nonsense. For JS it looks like it can be more corret, however, it needs to be checkc as well. i like using this tools for stupid questions, it can help to find correct learning materieals.
I love AI but when it comes to coding, it has lots of flaws. I just asked ChatGPT about how to display a unit inside an input but it got it absolutely wrong. Even after I told ChatGPT, that it was wrong, the answer was still wrong :) It’s great for asking short questions like: what’s the + selector doing? The question would be: can an AI (in the future) understand new CSS features and apply them correctly?
The problem with CSS specificially is that nothing really lives in isolation. Context is key, but even giving it access to your entire project, I don't see a time where it'll properly infer all the different things that are going on. Like you said, simple things it might be able to explain, but I mean, two of the three had no idea how specificity of simple selectors worked, so I have my doubts there as well, lol.
@@KevinPowellYeah, and because of that it makes LLMs be really bad with CSS in general, but a little bit good with Tailwind because it is more isolated
17:40 Not sure if you should count that as correct. It said B, which was wrong, but it also rewrote the answer text to be correct(adding .5 to the vmin). IMO that should be half a point, inventing an answer not in the options is basically cheating and if it wasn't for you knowing the answers going into the test it could very easily have convinced you D was wrong and B was right using the explanation as to why D was right and B was wrong.
At about 3 minutes you use the phrase "Maybe it knows how it works but got the explanation wrong." This is a mental mistake. LLMs don't know how anything _works._ That isn't how the function. Your earlier assessment that they regurgitate the general consensus of the internet is more correct. But even then, that's not how they work. They are big autocompletes. They merely predict the most likely next words, right or wrong. Nonsense or sense. And they are not consistent. Every time I see people testing LLMs like you are, I see them asking _once_ and that's not how these tests should work. That's not how you should use LLMs, either. You should be asking multiple times because it WILL give different answers each time.
I always hate it when people use words like _intelligence_ and _think_ about AIs. They're advanced calculators, nothing else. The problem here seems to be, that because they're too advanced algorithms to understand for most humans, then it must be conscienct. No. A calculator is a calculator, no matter if it solves 2+2 or advanced algerbra. I could also say, well it is magic to me how the calculator calculates 2+2 and gets 4, therefore the calculator must be conscient.
After RUclipss horrific price increase, many people are talking about leaving the platform. Please share if your content is available somewhere else--otherwise, I'm going to miss the channel very much ❤️!
Is there a particular reason you didn't include the very popular chat-gpt? (Sorry I just kind of watched only the first 5 minutes due to time constraint...)
@kevin powel, I have been working as a css developer since 12 years. Recently my company laid off me due to insufficient projects. Iam actively looking for a Role where I can fit. Kindly help me.
@@KevinPowell i feel like the responses from copilot are different though too, idk what microsoftness they add to it. i would assume raw chatgpt would be more optimal but i could be wrong. and claude! claude has to be a contender too. i never hear about gemini or copilot for coding 🌚 js
I dont have time to watch the video, but i just want to say that ai is terrible at css because it's so visual and implicit unlike programming languages which are based on logic.
This is a great video, and can be made into a series of sort.
Great Video. I NEVER trust AI for coding questions. I do use them to point me in a direction where I can search elsewhere to find an answer to a tough question, but I don't ever trust exactly what they give me.
... and cheers from Ottawa, from a NS Acadian!
That's how I treat it as well, o1 does seem to be quite a bit more accurate though.
I haven't tried any of these other AIs, but I used llama on a whim to help me learn D3.js and it was immensely helpful. The docs were really hard to find information with and not very good at breaking down how it worked. Asking llama (idr if it was 70B or 450B) questions about things I just couldn't figure out really helped take some of the pain out of the process. I could ask questions about the answers it provided and it felt a lot like a conversation with someone who knew and understood D3. There were a few instances where the solutions it provided didn't seem to work but when I pointed it out, it did fix it. I thought that was pretty understandable since I asked it questions about a project that it never saw, only basing it on the context I provided in my prompt, and D3 has changed a lot over the years so it may have based some answers on stuff on the internet written about previous versions.
On Question 6 ~17:40 seconds in Claude made up an answer. It select "B) 1.5rem + 1.5vmin" but in the question answer B is "B) 1.5rem + 1vmin" . They are not the same. The AI made up a new answer how fun is that.
It's even funnier because AI then goes on to explain that answer B is incorrect because it is 1.5rem + 1vmin
Kevin thank you for producing the very educational content you publish. I have been a subscriber for sometime but this is my first comment. I am very interested in learning more relative to large design models (LDMs). Your expertise in reviewing LDMs would be very interesting and worthwhile.🎉
Yay, KPow had mentioned this video on his discord awhile ago and I’ve been [im]patiently waiting 😂 loving the long format again and this is a very interesting comparison! I’ve experimented with copilot, chatGPT and Claude in terms of JavaScript refactoring not as much css and in my tests Claude came out on top; copilot and chatgpt was dismal and laughable. To be fair , a lot of this was in February or March and these things change and learn so quickly , the same experiments would most likely glean very different results now in September 😂
The best free AD for claud 😃
What's interesting with claude is that one of its answers here said something like "as defined in the CSS specifications". I wonder if that means that it was trained using the CSS specs as one of its data sources. That would also explain why it nailed things like the unit definitions.
I really like this! I keep telling my friends not to trust the AI stuff, but some of them insist on "learning" by asking the LLMs. I have always preferreds the source materials - W3C and MDN are still the greatest resources on the web. That said, I wonder how Codeium stacks up? (I do use AI assistants in my edditors, but I always double-triple check their code. Remember: never use someone else's code without understanding yourself!)
Codium uses GPT-4o, which is the same as copilot. Not sure if the paid version uses something else though
Thank you for this! It gave me the idea to ask Claude about a technical question (not css but docker related) and it nailed it on the first try! That was not experience previously with Copilot …
Great stuff. I use it most often as a syntax reminder or template generator for programming. Although sometimes they work okay, sometimes when I use GPT or Copilot they either don't understand the question at all or have to be guided in the right direction. IMO in the case of css these differences result from when the model training was completed. For example, you can see a big difference between gpt4 and 4o. By the way, it's time for me to check Claude.
See this is a great topic to make a video about - hitting multiple relevant modern technologies at once 👏
At 7:45 you got the question wrong. You said, "as long as my selector has equal specificity, it will not work" but the text said "As long as my selector has equal specificity, it will work." That's why Gemini did it so terribly and flipped the specificity because you told it something that was not true and it thought that equal specificity was needed.
Wow, your knowledge is remarkable!
Always Amazing Videos !
Kevin, This is a great video
Make a series on PDF development and email design.
You accidentally gave a point to Copilot instead of Gemini on question 9 :)
Judging by the overall performance of both, I don't think that matters that much :}
드디어 이 채널에도 AI model 이 나왔다!! 저도 Claude가 좋다고 느꼈습니다!
Finally, an AI model has appeared on this channel!! I also felt that Claude was good!
24:24 My guess why Copilot got this wrong is, like screen readers used by the visually impaired, it read INSIDE as an initialisation and didn't understand its meaning. You should have used text-transform:uppercase; PEBKAC 😜
Custom defined units would be kinda cool...didn't know I needed that until now 😂
You should try with OpenAI o1, it takes more time for "thinking" before answering
Vertical Media is defined by wikipedia as Trade Magazines/Journals.
Copilot is based on an older GPT model, would be good to also compare against the latest ChatGPT 4o. Also: LLMs can get confused by large amounts of extraneous chat context. It's better to ask new, unrelated questions in a new chat.
A lot of these fails come from not being strict enough, which in normal case would be correct approach. But in the world of software development one should differ a typo from some weird token which can be a css unit or a part of new specs. Funny how they can pretend that 1vm is.the same as 1.5vm but at the same tiime do not even bother to ignore case when it comes to the unit of q/Q. You could probably require them to perform strict syntax checking but then you should also avoid errors like omitting NOT in the Wordpress question.
I have never clicked on a notification so fast 😂 Great topic for a video!
Glad to hear that! Was very curious if people would be into this type of thing or not!
Great video Kevin! Recently read about Claude 3.5 Sonnet had shipped with a lot of coding (and math) in mind. Would love to the a rematch when/if the others make an update. Regarding Q, afaik it is not really part of the metric system but seems to be a unique unit for the web?
Copilot is only trained to December 2023 I read today and that's gpt 4o or whatever it's called. Not sure which chatgpt copilot is based on but it's not that great. Claude has definitely been the best for coding since I've been using it - at least for python but it seems like for CSS too
The funny thing with these models is that they’re always, like, two years behind on web technologies. Like I cannot for the life of me get any LLM to touch subgrid with a 10 foot pole
Gemini got the first questions wrong. It says that the rem in the media query would be relative to the html font size. In particular it says that 1rem would be 32px in this case.
Claude it is. ❤ Can you publish those questions so we can try it?
Oh the struggles of trying to get LLMs to do anything correct in CSS. Not only is it terrible at CSS shapes and such, which is basically just math, but it keeps using rgba(0, 0, 0, 0.5) instead of just rgb(0 0 0 / 0.5) and other outdated approaches to things. I have completely given up on asking an LLM for CSS things outside of "Is there a way to do.." and using the response to figure it out myself.
Would the media query behavior change when using nesting?
Thx, i acrually try only severtimes some simple task for CSS, and it got wrong in every time, so I stop using it for CSS, full of nonsense. For JS it looks like it can be more corret, however, it needs to be checkc as well. i like using this tools for stupid questions, it can help to find correct learning materieals.
I'm curious how Perplexity would do in this challenge.
I love AI but when it comes to coding, it has lots of flaws.
I just asked ChatGPT about how to display a unit inside an input but it got it absolutely wrong. Even after I told ChatGPT, that it was wrong, the answer was still wrong :)
It’s great for asking short questions like: what’s the + selector doing?
The question would be: can an AI (in the future) understand new CSS features and apply them correctly?
The problem with CSS specificially is that nothing really lives in isolation. Context is key, but even giving it access to your entire project, I don't see a time where it'll properly infer all the different things that are going on.
Like you said, simple things it might be able to explain, but I mean, two of the three had no idea how specificity of simple selectors worked, so I have my doubts there as well, lol.
@@KevinPowellYeah, and because of that it makes LLMs be really bad with CSS in general, but a little bit good with Tailwind because it is more isolated
17:40
Not sure if you should count that as correct.
It said B, which was wrong, but it also rewrote the answer text to be correct(adding .5 to the vmin).
IMO that should be half a point, inventing an answer not in the options is basically cheating and if it wasn't for you knowing the answers going into the test it could very easily have convinced you D was wrong and B was right using the explanation as to why D was right and B was wrong.
At about 3 minutes you use the phrase "Maybe it knows how it works but got the explanation wrong." This is a mental mistake. LLMs don't know how anything _works._ That isn't how the function. Your earlier assessment that they regurgitate the general consensus of the internet is more correct. But even then, that's not how they work. They are big autocompletes. They merely predict the most likely next words, right or wrong. Nonsense or sense. And they are not consistent.
Every time I see people testing LLMs like you are, I see them asking _once_ and that's not how these tests should work. That's not how you should use LLMs, either. You should be asking multiple times because it WILL give different answers each time.
Absolutely agree.
I always hate it when people use words like _intelligence_ and _think_ about AIs. They're advanced calculators, nothing else. The problem here seems to be, that because they're too advanced algorithms to understand for most humans, then it must be conscienct. No. A calculator is a calculator, no matter if it solves 2+2 or advanced algerbra. I could also say, well it is magic to me how the calculator calculates 2+2 and gets 4, therefore the calculator must be conscient.
also html isnt the root of everything, its the document
instead of html i could have just used svg or mathml
what font is he using at 00:03 ??
After RUclipss horrific price increase, many people are talking about leaving the platform. Please share if your content is available somewhere else--otherwise, I'm going to miss the channel very much ❤️!
I'm confused, are you paying for this?
Oh i clciked on the video after 1 min of its publish
Cool kevin powell gave me a heart on my comment :)
Is there a particular reason you didn't include the very popular chat-gpt? (Sorry I just kind of watched only the first 5 minutes due to time constraint...)
@kevin powel, I have been working as a css developer since 12 years.
Recently my company laid off me due to insufficient projects.
Iam actively looking for a Role where I can fit. Kindly help me.
4 one
Wow
!important is !correct... in most programming languages, putting an exclamation mark in front makes it a NOT operator...
I can't tell if you're trying to tell us that important isn't correct or isn't incorrect
@@daedaluxe It is really !important what I am trying to say. ;)
@@DxBang3D It's not important, got it
It's one of the several mistakes the working group has listed, but can't change it now 😊
32*35
copilot and gemini? why not chatgpt and llama?
Copilot uses the same model as chat gpt. As for llama, I could have... Maybe next time?
@@KevinPowell i feel like the responses from copilot are different though too, idk what microsoftness they add to it. i would assume raw chatgpt would be more optimal but i could be wrong. and claude! claude has to be a contender too. i never hear about gemini or copilot for coding 🌚 js
I'll stick to W3C Schools and Stack Overflow
It's not "clode" lol
I dont have time to watch the video, but i just want to say that ai is terrible at css because it's so visual and implicit unlike programming languages which are based on logic.
openai leaping in to invalidate this video with the new model mere days before release
O1 probably would get most of them correct
discord communities ftw
This is why I don't use these AI chatbots...
AI is amazing at backend, but not so much at frontend, mostly due to the fact that it has a brain but lacks eyes 🙈