I gave three AI models a CSS quiz

Kevin Powell

Просмотров 14 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 ноя 2024

Комментарии • 77

@only_._gaming Месяц назад ⁺¹⁹
This is a great video, and can be made into a series of sort.
@tgd-613 Месяц назад ⁺¹⁶
Great Video. I NEVER trust AI for coding questions. I do use them to point me in a direction where I can search elsewhere to find an answer to a tough question, but I don't ever trust exactly what they give me.
@tgd-613 Месяц назад
... and cheers from Ottawa, from a NS Acadian!
@spaceowl5957 Месяц назад
That's how I treat it as well, o1 does seem to be quite a bit more accurate though.
@kingoffongpei Месяц назад ⁺²
I haven't tried any of these other AIs, but I used llama on a whim to help me learn D3.js and it was immensely helpful. The docs were really hard to find information with and not very good at breaking down how it worked. Asking llama (idr if it was 70B or 450B) questions about things I just couldn't figure out really helped take some of the pain out of the process. I could ask questions about the answers it provided and it felt a lot like a conversation with someone who knew and understood D3. There were a few instances where the solutions it provided didn't seem to work but when I pointed it out, it did fix it. I thought that was pretty understandable since I asked it questions about a project that it never saw, only basing it on the context I provided in my prompt, and D3 has changed a lot over the years so it may have based some answers on stuff on the internet written about previous versions.
@mkLee970 Месяц назад ⁺¹¹
On Question 6 ~17:40 seconds in Claude made up an answer. It select "B) 1.5rem + 1.5vmin" but in the question answer B is "B) 1.5rem + 1vmin" . They are not the same. The AI made up a new answer how fun is that.
@FenrirRobu Месяц назад ⁺²
It's even funnier because AI then goes on to explain that answer B is incorrect because it is 1.5rem + 1vmin
@chriswalker4636 Месяц назад ⁺²
Kevin thank you for producing the very educational content you publish. I have been a subscriber for sometime but this is my first comment. I am very interested in learning more relative to large design models (LDMs). Your expertise in reviewing LDMs would be very interesting and worthwhile.🎉
@clevermissfox Месяц назад ⁺¹
Yay, KPow had mentioned this video on his discord awhile ago and I’ve been [im]patiently waiting 😂 loving the long format again and this is a very interesting comparison! I’ve experimented with copilot, chatGPT and Claude in terms of JavaScript refactoring not as much css and in my tests Claude came out on top; copilot and chatgpt was dismal and laughable. To be fair , a lot of this was in February or March and these things change and learn so quickly , the same experiments would most likely glean very different results now in September 😂
@HITO-nv4cg Месяц назад ⁺⁴
The best free AD for claud 😃
@msclrhd Месяц назад ⁺²
What's interesting with claude is that one of its answers here said something like "as defined in the CSS specifications". I wonder if that means that it was trained using the CSS specs as one of its data sources. That would also explain why it nailed things like the unit definitions.
@SkamanSamTyler Месяц назад ⁺¹
I really like this! I keep telling my friends not to trust the AI stuff, but some of them insist on "learning" by asking the LLMs. I have always preferreds the source materials - W3C and MDN are still the greatest resources on the web. That said, I wonder how Codeium stacks up? (I do use AI assistants in my edditors, but I always double-triple check their code. Remember: never use someone else's code without understanding yourself!)
@KevinPowell Месяц назад
Codium uses GPT-4o, which is the same as copilot. Not sure if the paid version uses something else though
@franckdervaux792 Месяц назад
Thank you for this! It gave me the idea to ask Claude about a technical question (not css but docker related) and it nailed it on the first try! That was not experience previously with Copilot …
@Deadgray Месяц назад
Great stuff. I use it most often as a syntax reminder or template generator for programming. Although sometimes they work okay, sometimes when I use GPT or Copilot they either don't understand the question at all or have to be guided in the right direction. IMO in the case of css these differences result from when the model training was completed. For example, you can see a big difference between gpt4 and 4o. By the way, it's time for me to check Claude.
@anthonybarnes Месяц назад
See this is a great topic to make a video about - hitting multiple relevant modern technologies at once 👏
@321sas Месяц назад ⁺⁵
At 7:45 you got the question wrong. You said, "as long as my selector has equal specificity, it will not work" but the text said "As long as my selector has equal specificity, it will work." That's why Gemini did it so terribly and flipped the specificity because you told it something that was not true and it thought that equal specificity was needed.
@compton8301 Месяц назад
Wow, your knowledge is remarkable!
@Nova_BG Месяц назад ⁺¹
Always Amazing Videos !
@SavanSanandiya-p5y Месяц назад
Kevin, This is a great video
Make a series on PDF development and email design.
@CodingwithNephi-c6r Месяц назад ⁺¹⁴
You accidentally gave a point to Copilot instead of Gemini on question 9 :)
@myartikool Месяц назад ⁺³
Judging by the overall performance of both, I don't think that matters that much :}
@다루루 Месяц назад
드디어 이 채널에도 AI model 이 나왔다!! 저도 Claude가 좋다고 느꼈습니다!
Finally, an AI model has appeared on this channel!! I also felt that Claude was good!
@sandy_knight Месяц назад
24:24 My guess why Copilot got this wrong is, like screen readers used by the visually impaired, it read INSIDE as an initialisation and didn't understand its meaning. You should have used text-transform:uppercase; PEBKAC 😜
@joshuamitchell6204 Месяц назад
Custom defined units would be kinda cool...didn't know I needed that until now 😂
@viccc.n Месяц назад
You should try with OpenAI o1, it takes more time for "thinking" before answering
@daveturnbull7221 Месяц назад
Vertical Media is defined by wikipedia as Trade Magazines/Journals.
@VaebnKenh Месяц назад
Copilot is based on an older GPT model, would be good to also compare against the latest ChatGPT 4o. Also: LLMs can get confused by large amounts of extraneous chat context. It's better to ask new, unrelated questions in a new chat.
@mendoso Месяц назад
A lot of these fails come from not being strict enough, which in normal case would be correct approach. But in the world of software development one should differ a typo from some weird token which can be a css unit or a part of new specs. Funny how they can pretend that 1vm is.the same as 1.5vm but at the same tiime do not even bother to ignore case when it comes to the unit of q/Q. You could probably require them to perform strict syntax checking but then you should also avoid errors like omitting NOT in the Wordpress question.
@albedesigns Месяц назад
I have never clicked on a notification so fast 😂 Great topic for a video!
@KevinPowell Месяц назад
Glad to hear that! Was very curious if people would be into this type of thing or not!
@PeterWarholm Месяц назад
Great video Kevin! Recently read about Claude 3.5 Sonnet had shipped with a lot of coding (and math) in mind. Would love to the a rematch when/if the others make an update. Regarding Q, afaik it is not really part of the metric system but seems to be a unique unit for the web?
@SianDoherty Месяц назад
Copilot is only trained to December 2023 I read today and that's gpt 4o or whatever it's called. Not sure which chatgpt copilot is based on but it's not that great. Claude has definitely been the best for coding since I've been using it - at least for python but it seems like for CSS too
@samhenrigold Месяц назад
The funny thing with these models is that they’re always, like, two years behind on web technologies. Like I cannot for the life of me get any LLM to touch subgrid with a 10 foot pole
@weisj Месяц назад
Gemini got the first questions wrong. It says that the rem in the media query would be relative to the html font size. In particular it says that 1rem would be 32px in this case.
@darwinmanalo5436 Месяц назад
Claude it is. ❤ Can you publish those questions so we can try it?
@nustaniel Месяц назад
Oh the struggles of trying to get LLMs to do anything correct in CSS. Not only is it terrible at CSS shapes and such, which is basically just math, but it keeps using rgba(0, 0, 0, 0.5) instead of just rgb(0 0 0 / 0.5) and other outdated approaches to things. I have completely given up on asking an LLM for CSS things outside of "Is there a way to do.." and using the response to figure it out myself.
@TheThirdWorldCitizen Месяц назад
Would the media query behavior change when using nesting?
@ElectricKota Месяц назад
Thx, i acrually try only severtimes some simple task for CSS, and it got wrong in every time, so I stop using it for CSS, full of nonsense. For JS it looks like it can be more corret, however, it needs to be checkc as well. i like using this tools for stupid questions, it can help to find correct learning materieals.
@denisds130 Месяц назад
I'm curious how Perplexity would do in this challenge.
@xilliman Месяц назад ⁺³
I love AI but when it comes to coding, it has lots of flaws.
I just asked ChatGPT about how to display a unit inside an input but it got it absolutely wrong. Even after I told ChatGPT, that it was wrong, the answer was still wrong :)
It’s great for asking short questions like: what’s the + selector doing?
The question would be: can an AI (in the future) understand new CSS features and apply them correctly?
@KevinPowell Месяц назад ⁺⁵
The problem with CSS specificially is that nothing really lives in isolation. Context is key, but even giving it access to your entire project, I don't see a time where it'll properly infer all the different things that are going on.
Like you said, simple things it might be able to explain, but I mean, two of the three had no idea how specificity of simple selectors worked, so I have my doubts there as well, lol.
@andreilucasgoncalves1416 Месяц назад ⁺¹
@@KevinPowellYeah, and because of that it makes LLMs be really bad with CSS in general, but a little bit good with Tailwind because it is more isolated
@scragar Месяц назад
17:40
Not sure if you should count that as correct.
It said B, which was wrong, but it also rewrote the answer text to be correct(adding .5 to the vmin).
IMO that should be half a point, inventing an answer not in the options is basically cheating and if it wasn't for you knowing the answers going into the test it could very easily have convinced you D was wrong and B was right using the explanation as to why D was right and B was wrong.
@DampeS8N Месяц назад ⁺¹²
At about 3 minutes you use the phrase "Maybe it knows how it works but got the explanation wrong." This is a mental mistake. LLMs don't know how anything _works._ That isn't how the function. Your earlier assessment that they regurgitate the general consensus of the internet is more correct. But even then, that's not how they work. They are big autocompletes. They merely predict the most likely next words, right or wrong. Nonsense or sense. And they are not consistent.
Every time I see people testing LLMs like you are, I see them asking _once_ and that's not how these tests should work. That's not how you should use LLMs, either. You should be asking multiple times because it WILL give different answers each time.
@Killyspudful Месяц назад ⁺²
Absolutely agree.
@AntiAtheismIsUnstoppable Месяц назад ⁺²
I always hate it when people use words like _intelligence_ and _think_ about AIs. They're advanced calculators, nothing else. The problem here seems to be, that because they're too advanced algorithms to understand for most humans, then it must be conscienct. No. A calculator is a calculator, no matter if it solves 2+2 or advanced algerbra. I could also say, well it is magic to me how the calculator calculates 2+2 and gets 4, therefore the calculator must be conscient.
@nomadshiba Месяц назад
also html isnt the root of everything, its the document
instead of html i could have just used svg or mathml
@icepuddin168 Месяц назад
what font is he using at 00:03 ??
@rujor Месяц назад
After RUclipss horrific price increase, many people are talking about leaving the platform. Please share if your content is available somewhere else--otherwise, I'm going to miss the channel very much ❤️!
@bob-p7x6j Месяц назад
I'm confused, are you paying for this?
@kaslmineer7999 Месяц назад
Oh i clciked on the video after 1 min of its publish
@kaslmineer7999 Месяц назад
Cool kevin powell gave me a heart on my comment :)
@TimeFlyBy Месяц назад
Is there a particular reason you didn't include the very popular chat-gpt? (Sorry I just kind of watched only the first 5 minutes due to time constraint...)
@karthicc7298 Месяц назад
@kevin powel, I have been working as a css developer since 12 years.
Recently my company laid off me due to insufficient projects.
Iam actively looking for a Role where I can fit. Kindly help me.
@webschool4780 Месяц назад
4 one
@shyamfx Месяц назад
Wow
@DxBang3D Месяц назад ⁺⁵
!important is !correct... in most programming languages, putting an exclamation mark in front makes it a NOT operator...
@daedaluxe Месяц назад ⁺²
I can't tell if you're trying to tell us that important isn't correct or isn't incorrect
@DxBang3D Месяц назад
@@daedaluxe It is really !important what I am trying to say. ;)
@daedaluxe Месяц назад
@@DxBang3D It's not important, got it
@KevinPowell Месяц назад ⁺¹
It's one of the several mistakes the working group has listed, but can't change it now 😊
@fatema8eee Месяц назад
32*35
@Dekutard Месяц назад
copilot and gemini? why not chatgpt and llama?
@KevinPowell Месяц назад
Copilot uses the same model as chat gpt. As for llama, I could have... Maybe next time?
@Dekutard Месяц назад
@@KevinPowell i feel like the responses from copilot are different though too, idk what microsoftness they add to it. i would assume raw chatgpt would be more optimal but i could be wrong. and claude! claude has to be a contender too. i never hear about gemini or copilot for coding 🌚 js
@a1white Месяц назад ⁺³
I'll stick to W3C Schools and Stack Overflow
@chychywoohoo Месяц назад
It's not "clode" lol
@ibrahimharchiche3590 Месяц назад
I dont have time to watch the video, but i just want to say that ai is terrible at css because it's so visual and implicit unlike programming languages which are based on logic.
@st8113 Месяц назад ⁺¹
openai leaping in to invalidate this video with the new model mere days before release
@andreilucasgoncalves1416 Месяц назад
O1 probably would get most of them correct
@5alidshammout Месяц назад
discord communities ftw
@MakoSDV Месяц назад
This is why I don't use these AI chatbots...
@samtastic24 Месяц назад ⁺¹
AI is amazing at backend, but not so much at frontend, mostly due to the fact that it has a brain but lacks eyes 🙈

Следующие

Автовоспроизведение