This is LTT using their higher budget compared to most tech channels for something genuinely useful. This is really good to see! Well done LTT, this deserve genuine praise.
@@zypher5876 yeah, true. But Linus has a far greater scale than Gamers Nexus, as of now, reviewing everything from CPUs, GPUs, laptops, etc, so it's cool to see that level of standardization on a large scale to make sure the testing is consistent.
The issue is all of these reviewers are doing the factorio benchmark wrong. The game has a 60 tps cap as that's what the engine runs at. What needs to be tested are heavier maps that push the cpu BELOW 60. These above 60 tps benchmarks are as good as a synthetic benchmark in that they're not realistic. And yes, results absolutely do change and so does the cpu hierarchy.
As a data scientist, I absolutely love this video and greatly appreciate this perspective. The silicon lottery is absolutely real, and sample size of one is far from sufficient. Glad to see someone doing more in depth analysis. I don't expect the methodology of ltt labs to dive into frequentist vs bayesian stats, but it would be interesting to use public benchmarks (for an incredibly nerdy perspective/audience) as a prior distributions and see how the results differ. Regardless of the depth of it, just seeing more stats in hardware/software benchmarks is a fantastic breath of air. Keep up the good work!
La muestra es real, en el consumidor final la variación será más grande pero le darán la culpa a los otros componentes como cantidad de ram, ssd u otros factores, AMD solo garantiza ka velocidad del reloj en ciertss condiciones, en el hogar la temperatura ambiente también ocasiona ruido en los datos. Lo cierto es que no hay un tercero que obligue a cumplir un estándar legal porque es complejo medir que los productos sean iguales
Agreed! Would also be interested in seeing a bayesian analysis with public benchmarks as a prior (though I'm also one of those nerds who would love more of this kind of stuff). I also appreciate how they detailed their similarity metric of using euclidean distance, along with explaining what that looks like in a higher-dimensional space for those not accustomed to seeing it used that way since it's such a common approach for any sort of clustering problem.
LTT should hire a data team to take care of all this data. It would be great to have it properly stored and analyzed, maybe you guys could ge into testing neural network performance of GPUs too. I find it hard to find data on how well GPUs run AI tasks.
@@diegoberan7883I second the request for GPU benchmarks for AI tasks. Looking at some of what Puget Systems has done for their AI/ML benchmarks would probably give a good starting point for the LTT team.
Looks like the transparency we wanted. You clearly stated the challenges, how you worked around them and the variables you can't really control. It was very interesting, at least to me. Good job guys!
Incredible leveraging youtube videos to finance an independent lab to preform tests on all kinds of tech hardware and then making vids showing and explaining all of the data collected which in turn can finance the next round of tests. Bangin job LTT I love it.
Yeah, they have the equipment but no clue how to use it properly, as shown in this video. They pulled 11 tray CPUs, probably from the same tray. Does not reflect the consumer experience at all. Of course enterprise hardware will be more consistent, even if it is going in a consumer product. Imagine if a company like HP had people doing side by side tests of the same model and getting a 20% variance in benchmark scores... In other words, the major manufacturers of complete systems get the best and what goes in the consumer boxes is like a random grab bag of everything else that wasn't good enough for HP or Dell. This is why I benchmark test new CPUs and GPUs and have and will send them back solely for unreasonably low benchmark scores. If everyone did that, it would be a completely different market and the variance would become statistically insignificant for everyone, not just HP, Dell and LTT. Oh, and don't think you can get around it buying a tray CPU from AliExpress. Those will be bottom of the barrel, just consistently so. You're a dreamer if you think that was some leftover from HP's order. Those were ordered by some no name company with the name produced by randomly mashing a few keys on the keyboard and were binned and priced accordingly. I'm sure you can buy 5600X CPUs there all day long that are really a 5500. And it is AMD that is the one running the scam there.
@@Lurch-Bot Did you miss the part where 11 of the CPU's came from 11 different stores across 3 countries (Canada, USA, and looked like UK)? These 12 chips (including LTT's pre-launch test sample) are going to be from different batches across the 6-month window they were collected, which will emphasis the differences between wafers let alone between chips on the same wafer.
@@Lurch-Bot 22 seconds from the start of the video, that's how long it took for them to tell you how the CPUs were acquired, and you somehow still managed to make this comment. Wow.
@@Lurch-BotYou're in many statements. 1) They're not from the same tray, they're from various sources, even from various countries. 2) Companies that get AMD products are more likely to remedy this by restricting performance rather than only getting 'the good stuff'. You're talking about companies that often don't enable XMP, Only use one stick of RAM for dual-channel supporting platforms AND often use RAM that is slower than what even the CPU manufacturer recommend (although that last point has gotten better in the last few years). Don't believe me, just see anyone looking into this (LTT, Gamer's Nexus, David does Tech Stuff etc). Not to mention insufficient cooling which can have a i9 performance than an i7. Consumer level stuff is absolute trash from companies like HP and Dell. Workstation level it's just okay, Server level you're actually getting some sampling/additional testing/verification but you're paying through the nose for it. 3) It's within your right to send things back if you don't like them but you're basically finding your own golden sample that way, if everyone started doing the result would not be what you suggest. Instead they'd be forced to lock down things to the absolute minimum performance any chip can do. 4) There really isn't any sampling as deep as you think there is. Good chips go towards higher end products bad chips go towards lower end products. But lower end is higher volume so high quality chips can end up there just because there isn't enough demand for the high end. (They might even do a rough pre-selection and only select the chips originating from the middle of a wafer for the high end low volume stuff to avoid having to test the rest in depth. The middle of the wafer has a much better chance of churning out high quality chips). 5) AMD isn't running any scams and Aliexpress CPUs can be of just as fine quality as anything else nor are tray CPUs any worse or better on average from any testing I've ever seen done.
I never realized how much work goes into ensuring consistency in benchmark testing. It's worrying to think about the lack of oversight in computer hardware compared to other industries and the issues that could ensue.
As nice as it'd be, it kind of admittedly makes sense to an extent - Cars aren't tested and regulated because they're expensive, they're tested and regulated because they can kill people
Genuine praise to linus for taking time to address this and how confusing and weird these companies are becoming launching their cpus and gpus in today’s market
@@Blast_HardCheese I think it's always this bad but the manufacturers just locked the maximum potential performance so then it won't vary once it reaches the end consumer (so your typical consumer won't feel aggrieved because of it,) and you can always unlock it through overclocking. But nowadays they use a different sophisticated method which almost makes overclocking obsolete and it is already unlocked for the end consumers, so we can see the various results between each chip.
@@Blast_HardCheese It's not that bad, all they had to do was find the maximum frequency that all 3 CPUs could run at and fix them at that speed. Then the testing should result in minimal difference. For customers, you should be glad if you get a golden sample and don't care if you didn't because you got what you paid for which are the guaranteed speeds. Everything over the guaranteed speeds is a bonus that you may or may not get.
Disagree, it s more like: it s crazy how good the production process for such delicate tech has become that there s so little deviation and the results being so insanely close!
18:28 To answer your question, Linus, all regulation is written in blood. The reason we have car safety ratings is because before we did corporations were happy to "self-regulate," and sell cars that killed people. Regulation tends to only come in the critical aftermath of such loss.
i love the fact that you come out and say "yes, our tests are inconsistent, and here's why" in such a fascinating, entertaining fashion. there's a reason i always watch every LTT video that pops up in my feed.
@@lilcheatyIn that case lets throw all reviewers performance results out the window because of a few percent variance, I do agree with you though there is a huge difference in GN and LTT testing and that difference is GN can at least be trusted with their numbers but LTT cannot. "Look guys there is a variance so its not our fault we put out shonky reviews and fudged numbers". Silicon lottery has always been a thing but of course Loonus spins into this to obfuscate the fact their testing methodology is bogus at best. Wanna see a good correction video? HUB's latest video where Steve apologised for his mistakes and moved on, LTT "Silicon lottery cause number to go boom".
@@GsrItaliabecause gamers Nexus has to the best of my knowledge never came out and explicitly confirmed their testing method and if they have, it's not even comparable to this, no one else in the industry is doing what LTT is doing because they don't have the money to just buy 11 top of the line. CPUs and at least 10 4090s just for data accuracy reasons. This isn't necessarily a knock against anyone else on this platform, they just don't have the money to do this
Well done. Possibly the most thorough and well-explained video on the effects* of the Silicon Lottery I've ever seen. *Not covering much on the causes, or it would've been a two hour video! In fact, can we have an updated video on that please?
It's a 1 minute explanation: Each transistor within the silicon will be electrically different. Differences in needed gate voltage (what vCore is) for current to pass, differences in leakage (the amount of current loss through electron migration, aka heat), differences in output capacitance (the inevitable result of having current paths in parallel), etc. With these differences, we have a different required voltage to drive the transistor to operate properly given a clock speed, different heat output given a voltage and different power usage given both. Each modern clockspeed boosting algo for Intel and AMD take these into account to be both stable and maximize performance (aka, clockspeed) given the current load on the CPU. So with all of that in mind, you can see why CPU A and CPU B will perform slightly differently given the same workload. Their clockspeeds will differ because of power, voltage and temperature differences down to a transistor level.
@@winebartender6653You're 1 minute explanation assumes a fairly deep level of knowledge about how a CPU works and seeing as LTT's target audience is not scientists and engineers where these kinds of assumptions could be made, your 1 minute explanation with likely turn into a five or 10 minute exclamation if they were being insanely information dense, at the average information density of an LTT video, I would say the hour-long prediction is not unreasonable given all the other stuff they would have to then cover to make it a complete video
@@the_undead ? Absolutely not. A simple preface of "A CPU is made up of billions, or even trillions, of transistors, which are essentially tiny electrical switches". You don't need to know anything about voltage, current, capacitance or anything else that you think is a low level concept. The point of the 1 minute explanation is to touch on the fact a transistor will always be different when made in a CPU, that's it. You could absolutely expand it out, but it is unnecessary to get the point across
1 minute explanation is that each chip will not ever be the same as another one. Kind of like a snowflake. They can not be perfect and therefore will have variance. They will meet minimum specs and that is all that is guaranteed out of a good chip. some may be better than others and can therefore be overclocked easier and higher.
@@winebartender6653 This is 100% true..that said, have you seen Intel's new 200 series architecture? They're doing some very interesting things with how voltage and power is delivered to the CPU. There's options to deliver different vcore to a single core along with the ability to "overclock" a single core while other cores and clock speeds can remain lower. It's really fascinating what they're doing even if the CPU itself is behind AMD. Der8auer did a great video explaining this and how it's going to change the way we tweak our systems.
I'm glad to see linus and his team working towards doing better, especially after that whole thing where they were admitting lack of accuracy and that they failed their community. Seeing them be so passionate about this really brings a smile to my face.
You say that like they're the only ones that don't have perfect accuracy, You say that like putting in the tens of thousands of dollars in equipment and man hours is something that they can just do overnight, You say that like gamers Nexus was at the time he made that disgusting disingenuous video was in the habit of retesting products to use that data in comparison charts for new product launches. But no, none of those things are true and I still don't think gamers Nexus retests. Although if you can provide me with real evidence that he does retest stuff I will happily change this opinion
GN doesn't have the money to do real hardware testing at any kind of scale anymore. Steve himself admits that they put a huge deal of money into the "investigative journalism" they do. When most of his sources we just have to take his word for it because "they cant be revealed" which is fine. Its great that he focus hard on shit companies doing shit things. The the recent EKWB crap, Yes they are garbage, yes the management is terrible and they deserve everything coming. But be honest the only reason his Tech news is successful is because it gives people someone/something to hate. Also, the one thing in common with his investigative videos? He always contacts the company for comments prior to the video even being released. Did he do that with LTT...No. He took what he could from LTTs videos and forums and put them on blast for people to hate, gaining him views. The world today thrives on hatred and in some ways Steve is using that for his own gains. So yeah everyone on the internet, LTT or GN play on peoples emotions and thoughts to get your views. None of them are innocent. but also none are guilty. If LTT failed so did GN. One for testing data the other for using hate to garner views.
you can maybe work around the inconsistencies in red dead 2 by using cheat engine with the unrandomizer, it forces the random number generator to always return the same values, so you could end up with a deterministic benchmark run :)
I came down here to check to see if someone had suggested similar. As a former game dev (not for RDR2): This will have a minor impact on performance of games that utilize system random number generators especially (because of the syscall overhead) but importantly the result should be changed the same (incredibly tiny) amount. In exchange, you will get drastically more consistent runs from the behaviors in game. We actually had a debug build we did that replaced all the calls to random things like this with static values, so that we could apply internal test harnesses, especially during building our tutorials. I am not intimately familiar with how Cheat Engine goes about derandominzing, but all the sane ways I can think of it would do it, give you more in repeatability than you lose in cost to run it, or in deviance from player system behavior. (You're simply selecting one player system behavior and repeating it)
@@Deveyus this may be a stupid question, but would the (albeit minor) performance difference be mitigated if you leave in the random number generation but just don’t use it and use the static value instead (therefore still creating the “load” of generating a random number while also keeping your consistency)
I am not sure if a result gathered with the default randomized behavior turned off is representative of the average gamer experience. Since the goal of the labs is to provide statistically significant testing for the masses, introducing irregularity is maybe not the right solution.
As PhD researcher, this is my favorite video that you guys have published in a while. Love the data and scientific process used here guys. Keep it up, go labs go!!
It's great to see. All this work will make sure their internal comparisons are reliable. You can look at all LTT Labs testing when you want to compare DIFFERENT products for performance and know it's just the products you're looking at that vary, then look across different reviewers testing for the SAME products to get a glimpse the actual market quality variances for that one product. LTT Labs is actually benefiting the whole techtuber community, not just LMG.
@@chairface1859 now you look a bit daft. See working in a University I know this is a proper job title. Just because you have little understanding of job roles within a University doesn't make a job title any less real. I have a very unassuming job title I work within the Estates/Operations department of the University that I work at my job title is Building Support. But dig a little deeper and Im actually part of a team that run three of my uni's flagship research buildings one doing research into neuroscience ( like finding a cure for dementia Huntington's) and that has the Compound Semiconductor research Institute ( yes we have the ability to fab our own silicon wafers) also in the same building is the Chemical Catalysts Institute. This build brings together the school of chemistry school of physics and school of engineering. The final building houses the Social sciences with departments within it that research help form public policies which work with The British Government also the business side of the university. Out of the 5000plus staff at my university about 1000 are academics
Always excited for labs stuff! That was exceptionally well presented and explained. Really great how you adressed inconsistencies and the multitude of possible causes. Probably one of the best ways I've seen to explain testing methodology, the reasoning behind it and the resulting discrepancies when compared to real world applications. Loved it!
The thing I'm not sure about, is why they couldn't just completely fix the clock frequency to a set value, instead of letting the CPU run AFAP. Sure, you won't get the best performance out of it, but if you want to use them to test a GPU, then it shouldn't be part of the bottleneck either...
@@luisjalabert8366probably because that isn't very indicative of real world usage, they're not trying to get the most accurate numbers. They're trying to get the most accurate representation of how the average person buying these things is going to use it
@@AK-tf3fcYou say that like gamers Nexus is dramatically better, does gamer is Nexus upload videos showing their processes that they used to get the performance specs. Has gamers Nexus spent over a million dollars on scientific equipment and learning how to use said equipment to get better more accurate numbers, I don't f****** think so, feel free to say he makes more entertaining videos all you want, because that is an entirely subjective point. But the purpose of this video was never entertainment, It was to show their processes and the struggles they are having getting data accuracy to a point where they are happy. Where as far as I can tell gamers Nexus doesn't even care about accuracy
@@the_undead gamer nexus video is educative while linus video is entertainment only. Not to mention only one of them exploit people and sell their merchandise
As a professional test engineer, I am truly impressed with the level of detail, thought and effort that has gone into the LTT Labs. 👏👏👏👏 Now if you could just coming and explain this level of dedication to some of my colleagues in other disciplines of engineering that would be great. They do seem to think that everything will just work, and don’t appreciate that us test professionals have to think around corners sometimes…
Your comment is not constructive and is worthless. Please tell your own brilliant idea to improve the testing methods. Else don't type anything.@@GreySectoid
Loved this video as a statistician, and really like the approach you’re taking. It may be a bit much, but if you did an equivalence check pre/post for any gpu lineup you’re reviewing, that would be pretty solid evidence that any differences you found in the tests across the gpus is due to differences in the gpus. The other option would be to model the specific test rig in the regression, but then you’d need to put each gpu in each test rig, which would defeat the purpose of parallelizing the test in the first place.
I know this video is already pretty old, but I work as an engineer running tests day in and day out. We have a half dozen test stands which we KNOW are not perfectly consistent with each other (different noise and such) but what we do know is that each stand is consistent with its self. The way to make it so that we can run parallel tests is by running *different* tests on each stand, since we only need the numbers of each test to be compared, not across tests. In your case, running cyberpunk on one chip and factorio on another. This means that a 5% margin in cyberpunk is still a 5% margin, and they should all be relatively consistent (ie a 5% lead in cyberpunk is equal on each bench, even if the nubers are offset by 3%) Just food for thought, hopefully the labs folks are already doing so
This is industry defining data. Genuinely. Anyone notice at ~9:50 when the charts fly by, that Corsola goes from being in the top 3 CPUs for cyberpunk at 1080p, to suddenly being the worst CPU in the test by a wide margin at 1440p. I expected different games to stress the CPUs in different ways, but I did NOT expect a single game to completely change the order of a single CPU. I guess this goes back to the performance controversy with Halo Infinite that Hardware Unboxed covered, where they found that the specific part of the game you benchmarked could completely change the order of the top performing GPUs. I suppose the same logic applies within the same game, for each resolution. That's crazy. Also, good catch with CS GO. Showing the differences between CPUs is great, but at ~700 frame per second, a microscopic difference in latency has an outsized effect on the FPS value. That's because the time between frames is actually the inverse of the FPS count. Which is fine when you're roughly around AAA game refresh rates between 30 and 120, but that inverse relationship starts skewing your data more and more the higher your FPS number is. High FPS games always have a larger impact on average performance, just because of the way math works. I'd like to see someone normalize using "time between frames" to measure those differences instead of the actual FPS number. That would probably give a number that better represents how different it feels to use two CPUs. It would also give more weight to games with lower FPS values, which is actually where performance matters the most. Almost no one can notice the difference between 400 or 500 FPS, but people can definitely notice the difference between 40 and 50 FPS, even though the "percent difference" is exactly the same. Absolute values matter, and the percentage difference calculation everyone uses can obscure very important data that actually matters. When I'm looking at buying parts, I don't care if a CPU gets 5% more FPS in a game that's already maxing out my monitor's refresh rate. Worst case games generally matter more, since the esports titles with high FPS counts are always going to be easy to run.
Uhh... Except, because of the way math works, the inverse relationship between FPS and frame time doesn't affect the percent difference at all, and regardless of which way you show it our perception is a logarithmic scale. ie. The difference 40 to 50 FPS is 25%, and the difference 400 to 500 FPS is also 25%. Invert that and the difference 20 to 25 ms is still 25%, and the difference 2.0 to 2.5 ms is even still (surprise) 25%.
@@Renegade605you're getting it but you're not getting it. Say an anomaly increased latency in testing by 2.5 ms. The 20ms case would increase to 22.5ms, i.e. 12.5% and the 2ms case would increase to 4.5ms, i.e. 125%
@@RoninD20 what you just said is the exact opposite of the point OP was making. You're correct that a 2.5ms hitch will feel much worse at 2ms average frame times (400 fps) than at 20ms, except that OP said small differences in frame rate matter more at low average frame rates. Both are true, but for very different reasons in very different cases. You just can't simplify the data that much and at some point you have to trust that your audience is capable of understanding when and why the different numbers matter.
Getting "um acktuallyed" in youtube comments by someone who repeated exactly what I said. Perfection. @@Renegade605 For reference, I said "the percent difference is exactly the same". That's a verbatim quote from my unedited post. Check yourself before you wreck yourself. We agree, so I'm not sure why you're telling me I'm wrong about the percent difference. On my other points, I'll clarify. When I mentioned "High FPS games always have a larger impact on average performance", I was specifically talking about average performance charts like the one at 8:37. Charts that take the FPS values from each game, and average the result. That's a graph where the inverse relationship skews the data. Specifically, it heavily weights the graph towards the result of the game with the highest FPS values, and reduces the weight of games with low FPS values. You can compare the graph at 8:34 that included CS GO and the graph at 11:41 which excluded CS GO, to see how much a single outlier game affected the entire average result. That inverse relationship definitely skews the results when you average the FPS values between games, as basically every reviewer does. That's a big problem, because an extra 3fps in CS GO mean basically nothing when it's already maxing out your monitor, but 3fps more in cyberpunk at 4k is a pretty big change. However, both those changes are averaged out on the same graph. That's why the inverse relationship is a problem. Rewatch the 10:43 "CS:GO is Wonky" segment to have Linus explain why games with framerates that high aren't valuable for benchmarking. There's some CS GO specific stuff there, but there's also a lot of statements that can be generalized to all high FPS games. If anyone is still confused, I can clarify further.
(Tech horror story) There's an Azure server out there with a haunted CPU. We had a lot in the cloud, on one cluster we always had more errors with exactly the same deployments. We got around it for months by updating to a new cluster. Just a few months later the errors were back. I think that the old compute server got moved to the new cluster when the old cluster got decommissioned. And to this day.... I moved to ARM. Just because i know that x86 box is going to get me.
@@Jimmy_Jones company policy had reasons I can't remember for keeping availability zones the same. (I think there was a pre purchasing compute agreement ) Fortunately I moved companies and it hasn't found me yet.
Would it be more expense than its worth to run a profiler on the containers to check performance? Can your company seek compensation for being placed on faulty hardware?
@@jake20479I have a 7800x3d. Am I the minority here for buying a chip, and if it runs my programs and games at an acceptable speed that I wouldn't notice a difference unless I lined it up with 11 other chips, that I'm just happy with my purchase and enjoy my games? (Run on sentence, I ran out of breath trying to read it. I'm also not fixing it though.)
@@jake20479 Except you'd be comparing that 'any percentage less' to reviewers who most certainly will not share the same test bench, software variables, temperatures, humidity and several other variables. So, this is good for LTT to keep data similar, but for your average consumer, the rule of 'within margin of error' very much applies.
@@jake20479 I mean, I'd much rather a cheap cpu than a consistent one. Do you have any idea how expensive it would be for chip makers to get their consistency down under 1%? I guarantee whenever you buy anything at a gorocery store theres at least a 1% variance to someone else who bought that same thing, it's just pointless to make stuff that consistent.
@@jake20479if you want the best you buy the best. We all know they bin the chips, we all know they don't all turn out identical. This has been the norm for decades.
Superb work! The only thing missing was repeating the testing on another motherboard to check if the CPU performance depends on individual CPU or the combination of motherboard power delivery and CPU. Would the same CPUs perform the best in any motherboard or did those CPUs simply happen to work the best with that specific motherboard?
@@nomadicdragon7157 umm.. assuming you are not joking, just look up the accusations against LTT on Gamers Nexus channel about 5 months ago. Shit went down
I love seeing this kind of testing across multiple channels. And another shining example of why you should always check across multiple review sources. (Loved the Pokemon referencing of the processors)
One of the best videos you've made in a long time. I really like the labs content and the transparancy that you're trying to give. There is a bunch to learn about testing to learn in this video. Really looking forward to more of this content.
Statistically,from experience in semiconductor industry, It will take at least a sample size of 30 cpus to establish proper sigma. Ideally you would want to do this experiment with 1 motherboard first and repeat with other boards to gauge variation. The next step would be to test about 100 cpus from different time stamps (hopefully getting different lots),to gauge the variability of their process.
Even this relatively small scale test was probably a lot for LMG So as much as I'm sure Linus would love to get to that point, they're a long ways off from having that kind of money
I never got lucky in silicon lottery but once. That was an Northwood P4 that can be pushed from 1.8 to 2.8 Ghz. For RAM I'm usually happy if I get the values printed on the sticks to run stable.
As someone who used to do statistical process control for safety critical electrical components in the automotive industry, I approve of the comprehensiveness of this. Now that you have the process laid out for doing this, it will only get faster and more routine.
This is absolutely what Gamers Nexus loved to see LTT's testing become, even though LMG seems forgot to mention them at 17:24. (It can be just oversight, but not including Gamers Nexus, the most through on testing tech reviewer before LMG have the Lab, seems a bit intentional.)
@@Hathos9 that was very funny, please continue. People who love tech-not-drama don't watch ltt much, the whole point of this show is ~ 70% entertainment 30% info. It's basically a discovery channel tv show talking to the lizard brain off its TA. Nothing bad in it per se, some videos are pretty entertaining, but that's what it is primarily - a tech-related entertainment business.
@@BoraHorzaGobuchul I'm fine with entertainment/tech. It is better than the non-stop drama and negativity of GN. LTT seeks views by making fun videos. GN seeks views by attacking others.
Yeah after you explained it I am not nearly as mad about this as I thought I'd be. If they design it and manufacture it to achieve a standard, but include features that may be able to get even MORE than advertise, then I don't really care about the variability of how much more it gets. It's kind of like rock climbing gear, where a carabiner will be rated to 22kN of force. 99.7%+ of the time it'll be stronger than this, and it could easily vary from 23kN to over 30kN. That's be like a CPU having a performance spread of nearly 40%. The thing is, the lowest number of 22kN is still 5x stronger than any realistic safety situation, and so maybe we should think about CPUs in terms of the demand we expect to place on the. Eg. "This CPU will have a 99.7% chance of running *game* at more than 60fps/144fps/250fps" For your testing, maybe Linux with Wine/Proton would be better in terms of minimisation of background tasks? The lack of oversight vs automotives and, indeed, rock climbing gear kind of checks out because if your CPU is 5% slower, you aren't going to find yourself in an accident where your brakes fail and you, your partner, your two kids and another random family all fucking die. It is definitely worth doing in a decentralized, nongovernmental fashion though. Maybe a consumer union could be made where manufacturers have to provide representative samples in order to get a trademarked seal of independent testing.
Yeah I think you are right. There are margins in probably every industry so I think the consumer has to be fine when the product reaches its rated performance. When we talk about cars for example, there you can often measure differences between the power of engines. But then the average consumer doesn’t care about fuel quality, tires and regular oil services that much, so probably those have more impact on the actual performance on the road than the engine itself. The same is probably true for pc part’s performance in different countries with different average temperatures. So in the end I guess you should buy a product by it’s confidential rated speed and be happy if you get more than you paid for but not be necessarily sad if you don’t gat that much of a benefit. Your friend might have ended up with a better chip itself but maybe he has a low quality power supply, which makes the cpu perform worse than yours. Or maybe there are also margins in psus?
@themisterx8660 the margins are in EVERY industry. If it's not federally regulated, it's financially regulated or consumer regulated... excerpt vitamins and supplements...
the problem with using Linux is that it's just not a realistic OS to use for benchmarks. Linux has many problems working with NVIDIA GPUs, but the main thing is that most people just don't run Linux so it's better to just use Windows, even if it's bloated and makes getting reliable test data harder
@@memethief4113 True, but it can be used for non gaming tests. As it stands out, Linux will have to be used in specific tests. Like you said, having Linux be usable for ALL tests, we might get there one day, but not today. Not this year either. Maybe next year, in the most optimistic timescale, but realistically I think it's more 5 years, IF it continues to grow like in the last 2 years.
@@memethief4113 I guess the argument would be that it would be fore controlled testing. If one chip ran a game with proton on linux at 100fps and another at 105, then you could meaningfully say the second is 5% faster, even if on people's actual systems the real world framerate would be 45 or 145 or whatever. The problem with testing in realistic scenarios is that realistic scenarios have a lot of uncontrolled variables. If you tested on windows you could get 145fps or you could get 95 depending on what windows fuckery is happening in the background and you couldn't know without doing 100s of tests and averaging it. Also Linux gaming, even with NVIDIA, is my personal main use case :P.
To be honest, back in the early i7 / AMD Phenom II era of CPUS, the silicon lottery was insaaaane in comparison. Some will be shocked by this and say how bad it is there's such a difference between chips, whereas others will be amazed it's actually pretty consistent. That said, it doesn't make testing simple at all.
This is a good investigation! I've been waiting for someone on this investigation to come out because, as a system integrator for CAD pc's I thought I was losing my mind! Ever since the 13th gen intel and the X3D line that AMD brought out, I've been seeing these inconsistencies. And I was really worried that something in our testing wasn't right! Glad to see my sanity is in check... But I REALLY think you should also take a look at intel! The 13th gen was already a real 'silicon lottery' thing, but the 14th gen intels are... WAY out of spec now and then! PLEASE PLEASE check intel too!!!
@@ABaumstumpf OK! Nice to know! But the thing is. If I swap the same cpu with the same cooler between different motherboards (of the same type that is, we use the b760m aorus elite a lot and the b760 tomahawk) I don't see as many differences as when I change the cpu's out... When I only swap mobo, I see a difference of about 2% max in cinebench. But between different cpu's on the same motherboard I can get up to 7% difference! So, what am I doing wrong in testing?
This is some incredible research. Not many, if any review channels that do testing would go to this level of effort to obtain the results you have here. I think the amount of effort that it takes to obtain these different samples is probably a reason why. I have always heard of the silicon lottery but did not think about that the lottery may be based on the piece of silicone being used were the environments of the factory. As an afterthought it seems obvious but of course you would need to obtain units from different manufacturing runs to get samples that would perform differently. I appreciate the effort that it took to do this and this demonstrates the true value of The LTT lab.
Variation is normal in any process. I worked with steel for 9 years, you can order a structural grade 50 and get anywhere from a 53 yield strength to a 65. You can't produce everything the exact same and the tighter they hold their tolerances of what's acceptable to ship, the more expensive it is for you to get their cpu. Good companies will improve their process parameters to tighten that up but you don't want almost no variation in cpu
Amazing work @LinusTechTips! Your discussion on sources of variability really got my brain going. As someone who works with psychological data, this is something we often discuss (inter-trial vs inter-participant). Have you looked at using linear mixed models (aka hierarchical models, aka multilevel models)? They could allow you to control for multiple sources of variability without having to aggregate data so much.
as someone who works in big data and analysis of it the way you guys described all your statistic analysis in a way easy to digest for regular people was amazing 10/10 vid
This video feels like it would be great for teaching high school science students about isolating variables and the impact it can have on results! Such an interesting video!
Awesome video! Lab is gaining a lot of credibility after delivering such great and in depth analysis. Impressive! I must admitt however that is a bit sad to see at 17:24 no mention of gamer nexus.. I guess the community would love to see a reconciliation... I'll keep dreaming
The ball’s in Steve’s trunk. “If an article contains personal or serious allegations or claims against an individual, it may be appropriate and necessary to give that individual an opportunity to respond to these claims, or to deny them if they wish” - Independent Press Standards Organization But they’re UK based, so why should Steve care (:
@@Winnetou17 GN pretty clearly used the situation and drama to drive traffic and profit and weren't entirely honest or genuine in their handling of it. LTT denying them further attention is only appropriate.
@@waldolemmer It keeps filtering my comment, they likely have keywords in place to prevent discussion of it anymore, but there are videos out there with analysis of it, one rhymes with mechmechfotato.
@waldolemmer (We'll see if this goes through) As an example, they were dishonest by not actually giving an unbiased or full picture on the events they were reporting on, they did not contact LTT to get the full picture, and presented his one sided view as fact, then he refused to respond to further discourse on the subject, claiming to be holier and denying any poor handling of it on his part.
Freaking amazing video, well done everyone. It would be cool to see if the results are still as varied if the clock speed was locked across all of the cpus. I assume this wasn’t looked at because that’s not as equivalent to the real world use of the chips, but still could have been an interesting add.
Love the update on testing methodology! Very detailed and I do appreciate that! I did notice that Gamer's Nexus didn't get a spot in the B-roll alongside the other creators, I honestly hope you two make up and I strongly recommend looking to them every now again for sanity checks because they are really great and highly respected.
I lost all respect for that pile of garbage when he made that hour-long video, as someone who knows a great deal about human psychology, it is obvious to me that that video was not made for any reason other than revenge. And more specifically revenge about an LTT labs employee calling him out specifically on his data testing procedures, more specifically calling into question how accurate gamers Nexus testing is. And seeing as Steve's response was that hour-long video instead of an hour-long video detailing his processes or the updates he's made to his processes, I'm going to suspect those same concerns are still valid today, and I feel this is partially confirmed by the lack of gamers Nexus in that b-roll shot, Yvonne and Luke would not allow Linus to hold a grudge for this long, also, linus isn't even the CEO anymore, so I suspect there was a reason other than the controversy for the lack of a gamers Nexus in that shot
@@the_undead "as someone knows a great deal about being a fanboi" If my firm would be slandered like it happened I'd do exactly the same. And oh btw to clear your delusional POV even more... Surely LLT just issued apologies and shut down testing because there wasn't any truth about the claims... surely! If you fire a gun you better be goddamn ready to get backfire. Back to School or where ever you gained that dubious knowledge about human psychology.
This is exactly the kind of video I feel like viewers and employees were missing so much when all the controversy broke out. This doesn't feel rushed, this feels like "we wanted to find the truth, so we did our best, and here's our results".
6:50 Smiled when I saw the only thing you add is Notepad++ because this is truly the only thing I miss since going from Windows to Mac on my work computer...
honestly, HUGE respect for doing this, first of all, the video is hella interesting and second of all knowing that you guys do this is extremely reassuring. great job dudes.
I think I speak for everyone here that this is EXACTLY what we want. We love tests like these, in depth, practical and perfectly executed. Thank you so much.
Intriguing, might have some better results with some other metrics like Mahalanobis distance instead of Euclidean to account for more correlations in the variance trends. (Mahalanobis is just Euclidean distance in PCA space). Loved this detailed video! Hope LTT does more videos like this!
I had a similar thought, there is likely significant correlation between the dimensions in game testing space. My brain went a linear algebra route - the axes in game testing space are (probably) not orthogonal, so you would want a metric tensor to allow for proper definition of distances. The Mahalanobis distance route you propose is probably more realistic to actually use here, but I'd love to see an exploration of how to treat correlations in one way or another. Or to just have access to the dataset to fiddle around with it myself, now that my interest has been piqued. I also loved this video and I look forward to seeing more Labs content as it develops.
I found this review of reviews and previews of reviews re-freshing. Whenever I've used reviews for purchases in the past, I haven't worried about the cherry-picking theory that was mentioned, but the random variance has been a concern of mine. I feel like it's obvious to know that LTT won't eliminate this completely, but knowing that they are keeping it in mind and checking from time to time is a good little piece of mind. Thanks LTT!
Dunno exactly about that, the term of 'silicon lottery' has been around for a long time. What this video did do, was highlight quite well, how it actually applies if trying to make a standardized test. That it's LTT basically only, that has the resources to actually go and get get ?11 cpu's of the same sku, to do this , is pretty crazy. From what i remember of this test, said ages ago on WAN, the idea was to use the 3 most close to the centre for the test benches
Hey just avoid Euclidian distance to measure dissimilarity on high dimensional spaces, usually the norms get pretty bit, l2-norm ends up not making much sense (such as k-means clustering)
@@skak3000 Dude, if someone criticises your work, would you do nothing? Well, he did, and he also proved with data and examples that actually his testing was better, and more reliable than the LTT ones, which were ridden with errors. Everyone knows Linus is a little narcisistic, he said so himself, and he apologised. That said, I watch them both and other outlets to make my decisions, I just don't understand why people side with anyone in cases like this, research everywhere and make your own decisions. Watching only one outlet and trying to protect them like they are your friend is just the stupidest parasocial shit. Fuck GN, fuck LTT, they are means to an end. Stop thinking emotionally and start thinking rationally.
@@soul_slayer7760 People who say they are thinking rationally are cringingly hilarious - and peak irrational. Anyone who thinks they or other people "think rationally" are so ignorant and arrogant it isn't worth discussing.
Huge respects to the LTT and the Labs team for the level of transparency, effort and time that went into this. While this type of content may not be for everyone I for one can get behind this and support you all the way. And who knows maybe the solution to all this is a consortium down the road. With time, enough experience and credibility there no reason why the Labs team cannot eventually head in that direction.
#1 - this is the labs content I am here for #2 - I hope you swapped the CPUs between the benches they were on, to ensure the issue is on the CPU and not something else on the bench.
As this is not a team of data analysts so they made a couple mistakes in communicating their process, but if you pay attention throughout the whole video, The process they are using is as follows, The only component that changes from one CPU test to the next is the CPU they're using probably PTM 7950 for their thermal pad material so that you don't have to worry about the thermal paste application being a problem, and they're doing this in their thermal chamber with a very specific temperature and humidity setting so that even just humidity in the lab building isn't a concern
@@andoletubegiven the things people threatened to do to Linus's kids, I would say it was not worth it. And personally given what a lot of LTT staff had to deal with because of that video. From gamers Nexus I don't think a lot of them would consider it worth it either. The fact that Linus felt the need to announce that he was doubling. The mental health budget for each of his employees means there was a lot of harassment going on All for stuff that was going to be fixed anyway because up to about a year ago now and possibly even longer. Linus has talked at length many times on Wan Show about wanting to do all of the things that were forced to be expedited by that whole controversy, which finally enough probably caused a lot of people to be very overworked trying to fix all these problems when one of the main things brought up in the controversy was overworking people Anyone genuinely believing that controversy was for the better is a hypocrite at best because I am willing to bet $10,000 at least that every single problem people brought up in that controversy is a problem at gamers Nexus given his track record for dealing with his own controversies
@@the_undead Well, I wasn't aware of any threats to Linus's kids. That is, of course, completely abhorrent. I'm not so onboard with the idea that LTT was necessarily fast tracking the necessary improvements to their testing. It's easy to express your ambitions in WAN show chat, but the proof is in the pudding, and the fact is, nothing was indicating that big improvements in testing protocols were imminent. As you say, the ruthless churn of the LTT video schedule was actually preventing them from meeting this ambition. So, I think the GN controversy is specifically the thing that has lit a fire under LTT to get their act together. We are now seeing the dividends of that. Steve is also being more careful and more thorough since the controversy. That's to our collective benefit. The fact that it has exposed vulnerabilities in the characters of Linus and Steve from GN is fair game because they do like to talk in high and mighty terms about ethics on their shows. It's only fair that we see them for who they are in times of duress - specifically because they are making monetised content for viewers about the same matters.
I love this kind of videos when you show the behind the scene of the lab. This is the most literal "computer science" that could be done. ^^ (Usually, "computer science" is not about computers as Edsger Dijkstra once said.) As a totally unsolicited advice: I might just suggest considering the Chebyshev distance instead of the Euclidean distance since you are interested in avoiding having a sample be an outlier in one of the tests. (Other high-order Minkowski distance would trade the exact rejection of outliers with some "average closeness".)
If you ever name something similar to like you did with the Pokemon, I would consider doing it alphabetically. So the first one starts with A, second starts with B, third starts with C and so on and so forth. It's just one of those nice-to-haves that doesn't change much but makes the presentation a bit easier to digest
Guys, when you're flashing through 10 different graphs comparing performance between 8 different chips, you need to keep the chips in the same order. The Productivity charts, in particular, have their orders shuffled between every slide. Makes it harder to track the changes.
In addition to the order being by perofmrance, it also doesn't matter. it's not about the perofmance of the individual chip, it's about the consistency.
@@MeshJediNot sure if I misread, but: I think that's exactly the point. Keeping the CPUs in the same order over multiple graphs, should make it way easier to figure out if the fluctuation in performance is due to (or scales with) the different tests (aka. is consistent) or stems from (possibly faulty) hardware (aka. is inconsistent).
As a research scientist, who looks at a lot of large dataset analyses, it’s pretty cool to see the common approaches. Choosing geometric mean, Euclidean data clustering etc. I expect to start seeing some PCA and heat maps soon!
The grocery store example is quite a funny one to choose, one of the few places where that sell products that actually do have a small variance in weight; even for the same price.
Everything has variance.. unless refined to a purity not found outside of lab type conditions. Even different F1 engines perform differently. His example of cars being tested is for pollution levels and safety standards.. not performance. V odd position to take from linus.
@@dzzope you're completely correct. I'm specifically pointing out the grocery store analogy because that's somewhere you notice variance in in your day-to-day life.
Computer testing not being required similar to cars, is because for cars it is all safety related, your cpu being 12 percent slower than expected, isn't a safety issue.
Regarding AMD's frequency adjustment making it difficult to test thermal performance, you might be able to find some useful information by supplying them with enough cooling to stay below the 90C threshold, then calculate the watts of heat pulled by measuring temp and flow. If the measurements can be taken near the CPU inlet and outlet with a high enough precision, it might be worth it to make a rig for it. In the case of the chip that possibly had a heat spreader issue: to run it at the same temperature as the rest of them, it would be outputting more watts of heat.
frequency adjustment is also related to core voltage (each core has own voltage), thats why different motherboards have different performance, since they have different power delivery and different load-line calibration on of box. Also there is a thing called CPPC (Collaborative Processor Performance Control), with optional Preferred Cores feature (which is ON by default). This thing may cause performance problems, since preferred Core1 and preferred Core2 may sit close to each other and heat each other.
As a mathematician I liked that you took the time to explain euclidean distance for those who've never heard of it! And I thought it was very funny that you did it after doing around 20 comparisons using euclidean distance in 1D or 2D :D
As a guy that teaches data analysis and data science this video brought a big grin to my face it'll be great for my students to watch. Great job Linus and everyone there. I love seeing benchmarking and testing getting better and better as time goes by. I used to do this for gaming companies all the time and it's not easy getting proper quality benchmarks. You and gamers Nexus and level 1 techs and others are actually pushing this technology forward and bringing awareness to customers and others of the issues related to benchmark. Thanks a lot
Noticed that. Also, GN just did this exact same video a few months ago. Sad to see they clearly haven't buried the hatchet, but I can't exactly blame them. Steve wasn't wrong with what he said, but he clearly took a huge swing at Linus' reputation without seeking comment. That comes across pretty malevolent to me. Linus' lab guy who trash talked GN appears to be the instigator, but Steve's response was not proportional, nor particularly professional IMO now that the dust has settled.
@@TheVillainOfTheYear I agree about everything other than Steve not being wrong. He was absolutely wrong, and was stirring drama for both views, and to undermine his main competitor. He didn't want solutions, he wanted to gain market share. That's integrity in the same way as Apple criticizing Google.
Which is funny considering he pronounced Corsola more or less correctly (basically a tomato tomato pronunciation), but royally messed up Raikou's pronunciation by falling for the classic "pronouncing the u" beginners trap.
What If we force to get about 8% CPU Usage with a software for that, so that all the CPUs are always on 8% Usage even if the background tasks are not that heavy atm and then run a benchmark
Personally, I choose to interpret the other option Linus gave in this video as quite literal, I don't think someone actually suggested that specifically, but when it comes to something that would actually work in this video, I suspect pokémon was the only viable option
As a computer engineer student myself, I want to say something to help Linus while he creates these type of videos. those processor's are exactly the same but they aren't exactly the same at the same time and there is a multiple reason but ill just tell eligible ones so you guys understand. First one is DIE-Connecter contact with the motherboard, which is sometimes even the slight changes on contact surface between CPU die and motherboard pins are eligible to create those type of problems, that's why we don't have CPU extension units but we have PCIe extensions? we do but it doesn't mean they don't take any performance cuz they do take some performance. and the second thing is when you test CPU the neutrino's (beams come from sun that goes through everything) is effecting CPU itself. yes i know its so rare or sometimes nearly impossible but its not as you might think. the reason behind this is neutrinos can change bits on registers or even ram. so there is a lot to say about it. We can trust about CPU manufactories about those type of things since it provides enough performance (nearly)…
The best thing to do would be to cooperate with other youtube testers to compile a database of results. Agree on a standardisation of data for these tests, then each test their respective components, then aggregate the data for comparison. The more you tubers you have, the bigger the test sample.
Thanks for the breakdowns; appreciate the work, team! Can the audio tech please level the volume on his expostulations? I'm listening on headphones because others need silence.
Crash testing interestingly isn't required. It also isn't done by the government. It is a 3rd party body that only tests vehicles that are sufficiently popular for them to warrant it and they buy the vehicles themselves. There has been a lot of complaining that the cybertruck hasn't been fully crash tested lately but that's simply because there aren't enough on the road yet for them to bother (though they will soon). The testing groups IIHS and NCAP are funded by car insurance companies NOT governments. The companies need to know how much to charge for which car in case one is a deathtrap. There is nothing similar for computers, but I could see a group of companies with very large servers paying for proper testing, but that would only be for server/enterprise parts, not consumer products.
I do wonder how much variance there would be if they set specific frequencies on the cpu AND also set all of the memory timings (down to tertiary timings). Even beyond that, I wonder how much tertiary memory timings played into this. Since it’s well known that memory training is BAD on AMD, it could make up for a lot of that difference.
Well, think about how many people are going to actually take the time to set memory timings all the way. It's an even smaller subset of people who will set specific freq's on their CPUs, which is a small subset of people who will even think about what overclockable CPU they want to buy, which is yet another subset of all CPUs sold. My belief is that there should be a 100% benchmark for both price and performance, and that ALL performance numbers should automatically come with a standard variance given in terms of price per unit of performance. Not only will it be more useful from a consumer perspective, it's good math as well. And given the way our math education is going, teaching consumers how to understand statistics and data analysis is a public service.
I've done productions where we buy 10 cameras, and 10 lenses, to test and match 3 cameras with 3 lenses, returning the rest. Most of the best DP's do this if I'm not mistaken.
Kudos to the use of euclidiean distance in determining similarity. Other measure of similarity can be used is cosine similarity, jaccard, etc. Its nice to see LTT actually doing Data Science work. Maybe could use Principal Component Analysis (PCA) to see which of the features is the main contributor to the difference between the highest and lowest rated CPU's
17:25 I seem to remember thinking this exact sentiment on a specific notorious call-out video… not knowing why it was such a big deal. Really cool what you guys are doing and cannot wait to geek out at the Labs site! Please keep being humble and doing great work! It speaks for itself.
I was A little disappointed they weren't included in the recommended reviewers montage. Its A shame that whole mess blew up the way it did. I remember while watching the video, thinking LTT would fix A few small mistakes and everything would be cool and when I looked at the comments after that video it seemed like that was everyone else's takeaway. But then I watched another video covering Steve's vid and noticed, A lot of people had the impression that LTT had malevolent intentions to various ends and deserved some sort of retribution that they saw Steve's video as. By the time LTT had made their official response, many peoples minds were made up. I spent hours responding to comments full of outright falsities regarding the content of Steve's original video, Linus's initial response on reddit, and so, so, SO, many blatent misunderstandings of Linus's words in their official response, most I attribute to the context being split up in A way that one responding to Linus's segment immediately after watching it without waiting for 2 *damn* minutes would simply misunderstand his choice of words. It was A shit show at all stages, and Linus seemingly came out feeling like A victim of A targeted attack by Steve. But I don't think it was Steve that wanted him to hurt. That was us, hoping for some good drama to eat up. That whole experience made me feel like I was watching the end of the truman show, or amarican arcadia. We did this. Shame on us.
@@nullvoid3545no Steve did this and Steve knew exactly what he was doing, I find it very suspicious that video came out relatively shortly after a video clip of an LTT labs employee calling out Steve for his questionable practice of data gathering for product comparisons. Combine that with Steve saying he will treat LMG like he treats all other companies and either this is all one big coincidence, or there is a lot of malice involved in that whole drama. As someone who knows a lot about psychology I'm on the fence but the way Steve was talking in that hour-long video that is not the way someone talks when their actions are in good faith, That is the way a politician talks when they are trying to destroy an opponent they know they cannot beat
@@nullvoid3545 Steve decided not to reach out for journalistic comment. “If an article contains personal or serious allegations or claims against an individual, it may be appropriate and necessary to give that individual an opportunity to respond to these claims, or to deny them if they wish” - Independent Press Standards That wasn’t a journalistic investigation. It was a hit piece against a competitor (which Steve openly refers to LTT as.)
@@nullvoid3545 holy shit I never heard anyone put it so succinctly, it was such a dumb moment. Everyone needed to chill tf out, form their own educated opinion, and move on. Instead it's too easy to get short-sighted and aggressive. I'm so appreciative that we have so many awesome channels out there that do hard work to educate and entertain about tech for free! F the drama!
Next time, for similarity measures, there are a lot more similarity measures that you should explore. For example cosine similarity or Pearson similarity. But to be honest, such a limited dataset could just be brute forced in a few seconds trying every combination
It has been shown that degradation will cause these chips to clock lower over time. I would be interested in a retest of the test benches after they have been used extensively. Both to validate that they are still identical, and to quantify the degradation. Edit: what about the ssd performance influencing performance because assets are streamed while gaming?
This is LTT using their higher budget compared to most tech channels for something genuinely useful. This is really good to see! Well done LTT, this deserve genuine praise.
Gamers nexus did the same thing a while ago
@@zypher5876 Gamer's Nexus not listed as other media. Funny
@@zypher5876 yeah, true. But Linus has a far greater scale than Gamers Nexus, as of now, reviewing everything from CPUs, GPUs, laptops, etc, so it's cool to see that level of standardization on a large scale to make sure the testing is consistent.
Well , those overclocker do lots of binning and probably run through more samples than the average LTT benchmark review
@@joaomendes3907 I'm pretty sure that's not out of malice, since Linus' not the editor for it 🤷♂️
glad to see factorio used as a gauge for cpus
As it should be lol it's mental intense on ticks. Causes some real aggro on multiplayer systems with different spec.
haha, guage, i get it
Factoriohno.
The issue is all of these reviewers are doing the factorio benchmark wrong.
The game has a 60 tps cap as that's what the engine runs at. What needs to be tested are heavier maps that push the cpu BELOW 60. These above 60 tps benchmarks are as good as a synthetic benchmark in that they're not realistic.
And yes, results absolutely do change and so does the cpu hierarchy.
@@whatistruth_1not to mention Factorio scales more with memory performance than core perf or cache.
As a data scientist, I absolutely love this video and greatly appreciate this perspective. The silicon lottery is absolutely real, and sample size of one is far from sufficient. Glad to see someone doing more in depth analysis. I don't expect the methodology of ltt labs to dive into frequentist vs bayesian stats, but it would be interesting to use public benchmarks (for an incredibly nerdy perspective/audience) as a prior distributions and see how the results differ. Regardless of the depth of it, just seeing more stats in hardware/software benchmarks is a fantastic breath of air. Keep up the good work!
La muestra es real, en el consumidor final la variación será más grande pero le darán la culpa a los otros componentes como cantidad de ram, ssd u otros factores, AMD solo garantiza ka velocidad del reloj en ciertss condiciones, en el hogar la temperatura ambiente también ocasiona ruido en los datos. Lo cierto es que no hay un tercero que obligue a cumplir un estándar legal porque es complejo medir que los productos sean iguales
Agreed! Would also be interested in seeing a bayesian analysis with public benchmarks as a prior (though I'm also one of those nerds who would love more of this kind of stuff). I also appreciate how they detailed their similarity metric of using euclidean distance, along with explaining what that looks like in a higher-dimensional space for those not accustomed to seeing it used that way since it's such a common approach for any sort of clustering problem.
When public is put in test, there's always possible cheating or possible than manufacturer goes in game too.. if you understand what I mean..
LTT should hire a data team to take care of all this data. It would be great to have it properly stored and analyzed, maybe you guys could ge into testing neural network performance of GPUs too. I find it hard to find data on how well GPUs run AI tasks.
@@diegoberan7883I second the request for GPU benchmarks for AI tasks. Looking at some of what Puget Systems has done for their AI/ML benchmarks would probably give a good starting point for the LTT team.
Looks like the transparency we wanted. You clearly stated the challenges, how you worked around them and the variables you can't really control. It was very interesting, at least to me. Good job guys!
Incredible leveraging youtube videos to finance an independent lab to preform tests on all kinds of tech hardware and then making vids showing and explaining all of the data collected which in turn can finance the next round of tests. Bangin job LTT I love it.
Yeah, they have the equipment but no clue how to use it properly, as shown in this video. They pulled 11 tray CPUs, probably from the same tray. Does not reflect the consumer experience at all. Of course enterprise hardware will be more consistent, even if it is going in a consumer product. Imagine if a company like HP had people doing side by side tests of the same model and getting a 20% variance in benchmark scores...
In other words, the major manufacturers of complete systems get the best and what goes in the consumer boxes is like a random grab bag of everything else that wasn't good enough for HP or Dell.
This is why I benchmark test new CPUs and GPUs and have and will send them back solely for unreasonably low benchmark scores. If everyone did that, it would be a completely different market and the variance would become statistically insignificant for everyone, not just HP, Dell and LTT.
Oh, and don't think you can get around it buying a tray CPU from AliExpress. Those will be bottom of the barrel, just consistently so. You're a dreamer if you think that was some leftover from HP's order. Those were ordered by some no name company with the name produced by randomly mashing a few keys on the keyboard and were binned and priced accordingly. I'm sure you can buy 5600X CPUs there all day long that are really a 5500. And it is AMD that is the one running the scam there.
@@Lurch-Bot Did you miss the part where 11 of the CPU's came from 11 different stores across 3 countries (Canada, USA, and looked like UK)?
These 12 chips (including LTT's pre-launch test sample) are going to be from different batches across the 6-month window they were collected, which will emphasis the differences between wafers let alone between chips on the same wafer.
@@Lurch-Bot 22 seconds from the start of the video, that's how long it took for them to tell you how the CPUs were acquired, and you somehow still managed to make this comment. Wow.
Ummm, that's literally how businesses work. Spend money to make something, sell it so you can make more, and sell more.
@@Lurch-BotYou're in many statements.
1) They're not from the same tray, they're from various sources, even from various countries.
2) Companies that get AMD products are more likely to remedy this by restricting performance rather than only getting 'the good stuff'. You're talking about companies that often don't enable XMP, Only use one stick of RAM for dual-channel supporting platforms AND often use RAM that is slower than what even the CPU manufacturer recommend (although that last point has gotten better in the last few years). Don't believe me, just see anyone looking into this (LTT, Gamer's Nexus, David does Tech Stuff etc).
Not to mention insufficient cooling which can have a i9 performance than an i7.
Consumer level stuff is absolute trash from companies like HP and Dell. Workstation level it's just okay, Server level you're actually getting some sampling/additional testing/verification but you're paying through the nose for it.
3) It's within your right to send things back if you don't like them but you're basically finding your own golden sample that way, if everyone started doing the result would not be what you suggest. Instead they'd be forced to lock down things to the absolute minimum performance any chip can do.
4) There really isn't any sampling as deep as you think there is. Good chips go towards higher end products bad chips go towards lower end products. But lower end is higher volume so high quality chips can end up there just because there isn't enough demand for the high end. (They might even do a rough pre-selection and only select the chips originating from the middle of a wafer for the high end low volume stuff to avoid having to test the rest in depth. The middle of the wafer has a much better chance of churning out high quality chips).
5) AMD isn't running any scams and Aliexpress CPUs can be of just as fine quality as anything else nor are tray CPUs any worse or better on average from any testing I've ever seen done.
I never realized how much work goes into ensuring consistency in benchmark testing. It's worrying to think about the lack of oversight in computer hardware compared to other industries and the issues that could ensue.
Now imagine real scientific testing and trying to get perfectly consistent results and not within 0.25%
As nice as it'd be, it kind of admittedly makes sense to an extent - Cars aren't tested and regulated because they're expensive, they're tested and regulated because they can kill people
Genuine praise to linus for taking time to address this and how confusing and weird these companies are becoming launching their cpus and gpus in today’s market
The silicon lottery's always been a thing. I didn't think it would still be this bad though, and it's nice to see it tested
Yeah, this video was like the old LTT content we used to get.
@@Blast_HardCheese I think it's always this bad but the manufacturers just locked the maximum potential performance so then it won't vary once it reaches the end consumer (so your typical consumer won't feel aggrieved because of it,) and you can always unlock it through overclocking.
But nowadays they use a different sophisticated method which almost makes overclocking obsolete and it is already unlocked for the end consumers, so we can see the various results between each chip.
@@Blast_HardCheese It's not that bad, all they had to do was find the maximum frequency that all 3 CPUs could run at and fix them at that speed. Then the testing should result in minimal difference. For customers, you should be glad if you get a golden sample and don't care if you didn't because you got what you paid for which are the guaranteed speeds. Everything over the guaranteed speeds is a bonus that you may or may not get.
@@alexandruilea915Agreed, I just assumed they would have it fully reined in by now
The silicone lottery strikes again, It's crazy how much variation there can be between CPUs, especially within the same model
Yep, Linus dose all these videos just to keep the best stuff for himself.....You think the stats is for us but it's not.
Disagree, it s more like: it s crazy how good the production process for such delicate tech has become that there s so little deviation and the results being so insanely close!
@@AdventuresOfDetroit l o l
Theres not Much variance its +-2%.
especially for more bottom bin chips, since the premium ones are all reserved for the higher end chips
18:28 To answer your question, Linus, all regulation is written in blood. The reason we have car safety ratings is because before we did corporations were happy to "self-regulate," and sell cars that killed people. Regulation tends to only come in the critical aftermath of such loss.
i love the fact that you come out and say "yes, our tests are inconsistent, and here's why" in such a fascinating, entertaining fashion. there's a reason i always watch every LTT video that pops up in my feed.
you do know gamer nexus is better then this
@@AK-tf3fcnot comparable lmao, this video specifically shows us a big difference in gamers nexus testing vs ltt
@@lilcheatyIn that case lets throw all reviewers performance results out the window because of a few percent variance, I do agree with you though there is a huge difference in GN and LTT testing and that difference is GN can at least be trusted with their numbers but LTT cannot. "Look guys there is a variance so its not our fault we put out shonky reviews and fudged numbers". Silicon lottery has always been a thing but of course Loonus spins into this to obfuscate the fact their testing methodology is bogus at best. Wanna see a good correction video? HUB's latest video where Steve apologised for his mistakes and moved on, LTT "Silicon lottery cause number to go boom".
@@GsrItaliabecause gamers Nexus has to the best of my knowledge never came out and explicitly confirmed their testing method and if they have, it's not even comparable to this, no one else in the industry is doing what LTT is doing because they don't have the money to just buy 11 top of the line. CPUs and at least 10 4090s just for data accuracy reasons.
This isn't necessarily a knock against anyone else on this platform, they just don't have the money to do this
Yeah, how many times are they gonna have to say it before they realize they don't understand scientific method?
Well done. Possibly the most thorough and well-explained video on the effects* of the Silicon Lottery I've ever seen.
*Not covering much on the causes, or it would've been a two hour video!
In fact, can we have an updated video on that please?
It's a 1 minute explanation:
Each transistor within the silicon will be electrically different. Differences in needed gate voltage (what vCore is) for current to pass, differences in leakage (the amount of current loss through electron migration, aka heat), differences in output capacitance (the inevitable result of having current paths in parallel), etc.
With these differences, we have a different required voltage to drive the transistor to operate properly given a clock speed, different heat output given a voltage and different power usage given both.
Each modern clockspeed boosting algo for Intel and AMD take these into account to be both stable and maximize performance (aka, clockspeed) given the current load on the CPU.
So with all of that in mind, you can see why CPU A and CPU B will perform slightly differently given the same workload. Their clockspeeds will differ because of power, voltage and temperature differences down to a transistor level.
@@winebartender6653You're 1 minute explanation assumes a fairly deep level of knowledge about how a CPU works and seeing as LTT's target audience is not scientists and engineers where these kinds of assumptions could be made, your 1 minute explanation with likely turn into a five or 10 minute exclamation if they were being insanely information dense, at the average information density of an LTT video, I would say the hour-long prediction is not unreasonable given all the other stuff they would have to then cover to make it a complete video
@@the_undead ? Absolutely not. A simple preface of "A CPU is made up of billions, or even trillions, of transistors, which are essentially tiny electrical switches". You don't need to know anything about voltage, current, capacitance or anything else that you think is a low level concept.
The point of the 1 minute explanation is to touch on the fact a transistor will always be different when made in a CPU, that's it.
You could absolutely expand it out, but it is unnecessary to get the point across
1 minute explanation is that each chip will not ever be the same as another one. Kind of like a snowflake. They can not be perfect and therefore will have variance. They will meet minimum specs and that is all that is guaranteed out of a good chip. some may be better than others and can therefore be overclocked easier and higher.
@@winebartender6653 This is 100% true..that said, have you seen Intel's new 200 series architecture? They're doing some very interesting things with how voltage and power is delivered to the CPU. There's options to deliver different vcore to a single core along with the ability to "overclock" a single core while other cores and clock speeds can remain lower. It's really fascinating what they're doing even if the CPU itself is behind AMD. Der8auer did a great video explaining this and how it's going to change the way we tweak our systems.
I'm glad to see linus and his team working towards doing better, especially after that whole thing where they were admitting lack of accuracy and that they failed their community. Seeing them be so passionate about this really brings a smile to my face.
You say that like they're the only ones that don't have perfect accuracy, You say that like putting in the tens of thousands of dollars in equipment and man hours is something that they can just do overnight, You say that like gamers Nexus was at the time he made that disgusting disingenuous video was in the habit of retesting products to use that data in comparison charts for new product launches. But no, none of those things are true and I still don't think gamers Nexus retests. Although if you can provide me with real evidence that he does retest stuff I will happily change this opinion
GN doesn't have the money to do real hardware testing at any kind of scale anymore. Steve himself admits that they put a huge deal of money into the "investigative journalism" they do. When most of his sources we just have to take his word for it because "they cant be revealed" which is fine. Its great that he focus hard on shit companies doing shit things. The the recent EKWB crap, Yes they are garbage, yes the management is terrible and they deserve everything coming. But be honest the only reason his Tech news is successful is because it gives people someone/something to hate. Also, the one thing in common with his investigative videos? He always contacts the company for comments prior to the video even being released. Did he do that with LTT...No. He took what he could from LTTs videos and forums and put them on blast for people to hate, gaining him views. The world today thrives on hatred and in some ways Steve is using that for his own gains. So yeah everyone on the internet, LTT or GN play on peoples emotions and thoughts to get your views. None of them are innocent. but also none are guilty. If LTT failed so did GN. One for testing data the other for using hate to garner views.
@@the_undead shush
@@car_477 are you upset about this incident over a year and a half later?
you can maybe work around the inconsistencies in red dead 2 by using cheat engine with the unrandomizer, it forces the random number generator to always return the same values, so you could end up with a deterministic benchmark run :)
that's a great idea, do it for every game if possible or matters
I don't think they can automate it
I came down here to check to see if someone had suggested similar.
As a former game dev (not for RDR2): This will have a minor impact on performance of games that utilize system random number generators especially (because of the syscall overhead) but importantly the result should be changed the same (incredibly tiny) amount. In exchange, you will get drastically more consistent runs from the behaviors in game. We actually had a debug build we did that replaced all the calls to random things like this with static values, so that we could apply internal test harnesses, especially during building our tutorials. I am not intimately familiar with how Cheat Engine goes about derandominzing, but all the sane ways I can think of it would do it, give you more in repeatability than you lose in cost to run it, or in deviance from player system behavior. (You're simply selecting one player system behavior and repeating it)
@@Deveyus this may be a stupid question, but would the (albeit minor) performance difference be mitigated if you leave in the random number generation but just don’t use it and use the static value instead (therefore still creating the “load” of generating a random number while also keeping your consistency)
I am not sure if a result gathered with the default randomized behavior turned off is representative of the average gamer experience. Since the goal of the labs is to provide statistically significant testing for the masses, introducing irregularity is maybe not the right solution.
As PhD researcher, this is my favorite video that you guys have published in a while. Love the data and scientific process used here guys. Keep it up, go labs go!!
Waiting for the first LTT Labs paper to hit Science
It's great to see. All this work will make sure their internal comparisons are reliable. You can look at all LTT Labs testing when you want to compare DIFFERENT products for performance and know it's just the products you're looking at that vary, then look across different reviewers testing for the SAME products to get a glimpse the actual market quality variances for that one product. LTT Labs is actually benefiting the whole techtuber community, not just LMG.
lol "PhD researcher"
@@chairface1859yes, one who researches PhD candidates 🧐
@@chairface1859 now you look a bit daft. See working in a University I know this is a proper job title. Just because you have little understanding of job roles within a University doesn't make a job title any less real. I have a very unassuming job title I work within the Estates/Operations department of the University that I work at my job title is Building Support. But dig a little deeper and Im actually part of a team that run three of my uni's flagship research buildings one doing research into neuroscience ( like finding a cure for dementia Huntington's) and that has the Compound Semiconductor research Institute ( yes we have the ability to fab our own silicon wafers) also in the same building is the Chemical Catalysts Institute. This build brings together the school of chemistry school of physics and school of engineering. The final building houses the Social sciences with departments within it that research help form public policies which work with The British Government also the business side of the university. Out of the 5000plus staff at my university about 1000 are academics
crazy to think that the 1-2 % sillicon lottery difference could be the power of thousands of old 6502 cpus
Always excited for labs stuff! That was exceptionally well presented and explained. Really great how you adressed inconsistencies and the multitude of possible causes. Probably one of the best ways I've seen to explain testing methodology, the reasoning behind it and the resulting discrepancies when compared to real world applications. Loved it!
The thing I'm not sure about, is why they couldn't just completely fix the clock frequency to a set value, instead of letting the CPU run AFAP. Sure, you won't get the best performance out of it, but if you want to use them to test a GPU, then it shouldn't be part of the bottleneck either...
@@luisjalabert8366probably because that isn't very indicative of real world usage, they're not trying to get the most accurate numbers. They're trying to get the most accurate representation of how the average person buying these things is going to use it
HUGE respect to all members of the team. It must have been really exhausting finishing all these tests.
It is mostly automated at this point.
How many months did they work on it?
@sierra2632 it's like you are a naive who watch cringe linus instead of gamer nexus
@@AK-tf3fcYou say that like gamers Nexus is dramatically better, does gamer is Nexus upload videos showing their processes that they used to get the performance specs. Has gamers Nexus spent over a million dollars on scientific equipment and learning how to use said equipment to get better more accurate numbers, I don't f****** think so, feel free to say he makes more entertaining videos all you want, because that is an entirely subjective point. But the purpose of this video was never entertainment, It was to show their processes and the struggles they are having getting data accuracy to a point where they are happy. Where as far as I can tell gamers Nexus doesn't even care about accuracy
@@the_undead gamer nexus video is educative while linus video is entertainment only. Not to mention only one of them exploit people and sell their merchandise
1:00 yes, silicon lottery, btw it happened the same with 5K series with AM4, they simply overclock themselves until 90c
As a professional test engineer, I am truly impressed with the level of detail, thought and effort that has gone into the LTT Labs.
👏👏👏👏
Now if you could just coming and explain this level of dedication to some of my colleagues in other disciplines of engineering that would be great. They do seem to think that everything will just work, and don’t appreciate that us test professionals have to think around corners sometimes…
@@GreySectoid what's your issue?
@@GreySectoid yeah its pretty terrible
Majority of the population do engineering for money, not because they are overly passionate. It would be useless to explain
Your comment is not constructive and is worthless. Please tell your own brilliant idea to improve the testing methods. Else don't type anything.@@GreySectoid
What's pretty Terrible about it? @@JacobsKrąnųg
Loved this video as a statistician, and really like the approach you’re taking. It may be a bit much, but if you did an equivalence check pre/post for any gpu lineup you’re reviewing, that would be pretty solid evidence that any differences you found in the tests across the gpus is due to differences in the gpus.
The other option would be to model the specific test rig in the regression, but then you’d need to put each gpu in each test rig, which would defeat the purpose of parallelizing the test in the first place.
I know this video is already pretty old, but I work as an engineer running tests day in and day out. We have a half dozen test stands which we KNOW are not perfectly consistent with each other (different noise and such) but what we do know is that each stand is consistent with its self.
The way to make it so that we can run parallel tests is by running *different* tests on each stand, since we only need the numbers of each test to be compared, not across tests.
In your case, running cyberpunk on one chip and factorio on another. This means that a 5% margin in cyberpunk is still a 5% margin, and they should all be relatively consistent (ie a 5% lead in cyberpunk is equal on each bench, even if the nubers are offset by 3%)
Just food for thought, hopefully the labs folks are already doing so
This is industry defining data. Genuinely. Anyone notice at ~9:50 when the charts fly by, that Corsola goes from being in the top 3 CPUs for cyberpunk at 1080p, to suddenly being the worst CPU in the test by a wide margin at 1440p. I expected different games to stress the CPUs in different ways, but I did NOT expect a single game to completely change the order of a single CPU. I guess this goes back to the performance controversy with Halo Infinite that Hardware Unboxed covered, where they found that the specific part of the game you benchmarked could completely change the order of the top performing GPUs. I suppose the same logic applies within the same game, for each resolution. That's crazy.
Also, good catch with CS GO. Showing the differences between CPUs is great, but at ~700 frame per second, a microscopic difference in latency has an outsized effect on the FPS value. That's because the time between frames is actually the inverse of the FPS count. Which is fine when you're roughly around AAA game refresh rates between 30 and 120, but that inverse relationship starts skewing your data more and more the higher your FPS number is. High FPS games always have a larger impact on average performance, just because of the way math works. I'd like to see someone normalize using "time between frames" to measure those differences instead of the actual FPS number. That would probably give a number that better represents how different it feels to use two CPUs. It would also give more weight to games with lower FPS values, which is actually where performance matters the most. Almost no one can notice the difference between 400 or 500 FPS, but people can definitely notice the difference between 40 and 50 FPS, even though the "percent difference" is exactly the same. Absolute values matter, and the percentage difference calculation everyone uses can obscure very important data that actually matters. When I'm looking at buying parts, I don't care if a CPU gets 5% more FPS in a game that's already maxing out my monitor's refresh rate. Worst case games generally matter more, since the esports titles with high FPS counts are always going to be easy to run.
People dog on RUclips comments... But then sometimes you get good ones like this.
Uhh... Except, because of the way math works, the inverse relationship between FPS and frame time doesn't affect the percent difference at all, and regardless of which way you show it our perception is a logarithmic scale.
ie. The difference 40 to 50 FPS is 25%, and the difference 400 to 500 FPS is also 25%. Invert that and the difference 20 to 25 ms is still 25%, and the difference 2.0 to 2.5 ms is even still (surprise) 25%.
@@Renegade605you're getting it but you're not getting it. Say an anomaly increased latency in testing by 2.5 ms. The 20ms case would increase to 22.5ms, i.e. 12.5% and the 2ms case would increase to 4.5ms, i.e. 125%
@@RoninD20 what you just said is the exact opposite of the point OP was making.
You're correct that a 2.5ms hitch will feel much worse at 2ms average frame times (400 fps) than at 20ms, except that OP said small differences in frame rate matter more at low average frame rates.
Both are true, but for very different reasons in very different cases.
You just can't simplify the data that much and at some point you have to trust that your audience is capable of understanding when and why the different numbers matter.
Getting "um acktuallyed" in youtube comments by someone who repeated exactly what I said. Perfection.
@@Renegade605 For reference, I said "the percent difference is exactly the same". That's a verbatim quote from my unedited post. Check yourself before you wreck yourself. We agree, so I'm not sure why you're telling me I'm wrong about the percent difference.
On my other points, I'll clarify. When I mentioned "High FPS games always have a larger impact on average performance", I was specifically talking about average performance charts like the one at 8:37. Charts that take the FPS values from each game, and average the result. That's a graph where the inverse relationship skews the data. Specifically, it heavily weights the graph towards the result of the game with the highest FPS values, and reduces the weight of games with low FPS values.
You can compare the graph at 8:34 that included CS GO and the graph at 11:41 which excluded CS GO, to see how much a single outlier game affected the entire average result. That inverse relationship definitely skews the results when you average the FPS values between games, as basically every reviewer does. That's a big problem, because an extra 3fps in CS GO mean basically nothing when it's already maxing out your monitor, but 3fps more in cyberpunk at 4k is a pretty big change. However, both those changes are averaged out on the same graph. That's why the inverse relationship is a problem. Rewatch the 10:43 "CS:GO is Wonky" segment to have Linus explain why games with framerates that high aren't valuable for benchmarking. There's some CS GO specific stuff there, but there's also a lot of statements that can be generalized to all high FPS games.
If anyone is still confused, I can clarify further.
(Tech horror story) There's an Azure server out there with a haunted CPU. We had a lot in the cloud, on one cluster we always had more errors with exactly the same deployments. We got around it for months by updating to a new cluster. Just a few months later the errors were back. I think that the old compute server got moved to the new cluster when the old cluster got decommissioned. And to this day.... I moved to ARM. Just because i know that x86 box is going to get me.
Change to a different availability zone?
@@Jimmy_Jones company policy had reasons I can't remember for keeping availability zones the same. (I think there was a pre purchasing compute agreement ) Fortunately I moved companies and it hasn't found me yet.
Try getting some techpriests to help
@@duckbilldaniel"it hasn't found me yet" lol
Would it be more expense than its worth to run a profiler on the containers to check performance? Can your company seek compensation for being placed on faulty hardware?
@17:24 discretely showcasing bad blood between Gamers Nexus and LTT 😅😂
I thought the same thing
@@minijag972Never trust pirate w/o ship. Such a big company, such a small sample.
@@minijag972 Yeah I always recommend you check out my competitor's shop after he smashes my windows 😂
Gamers Nexus incident made this video possible IMO. LTT forced to up their game
@@minijag972you really think that it's genuinely him holding a grudge between that? Linus is not an editor for it 💀
Inb4 copyright strike by Nintendo for using Pokemon names
Nahhh, Linus's 4D chess move of that Corsola pronunciation will evade the Nintendo ninjas.
They're currently overburdened with Palworld, so nows the time to do this.
Nah wouldn't be a copyright strike. A trademark violation cease-and-desist maybe though!
@@XxZannexX😂
nah nintendont is busy with palworld this month
Oh boy, people are going to be loosing their minds again for the average 1% differences.
@@jake20479I have a 7800x3d. Am I the minority here for buying a chip, and if it runs my programs and games at an acceptable speed that I wouldn't notice a difference unless I lined it up with 11 other chips, that I'm just happy with my purchase and enjoy my games?
(Run on sentence, I ran out of breath trying to read it. I'm also not fixing it though.)
@@jake20479 you know that that is basically impossible, in almost every industry. no one can build 100% the same product every time
@@jake20479 Except you'd be comparing that 'any percentage less' to reviewers who most certainly will not share the same test bench, software variables, temperatures, humidity and several other variables.
So, this is good for LTT to keep data similar, but for your average consumer, the rule of 'within margin of error' very much applies.
@@jake20479 I mean, I'd much rather a cheap cpu than a consistent one. Do you have any idea how expensive it would be for chip makers to get their consistency down under 1%? I guarantee whenever you buy anything at a gorocery store theres at least a 1% variance to someone else who bought that same thing, it's just pointless to make stuff that consistent.
@@jake20479if you want the best you buy the best. We all know they bin the chips, we all know they don't all turn out identical. This has been the norm for decades.
Superb work! The only thing missing was repeating the testing on another motherboard to check if the CPU performance depends on individual CPU or the combination of motherboard power delivery and CPU. Would the same CPUs perform the best in any motherboard or did those CPUs simply happen to work the best with that specific motherboard?
Seems like Gamers Nexus did not make the list of media outlets in 17:24
LTT Saltmine be like
LTT staying "classy"
Why would they? Gamers Nexus is a backpack warranty review channel, not technical.
Did I miss something between Gamers Nexus & LMG?
@@nomadicdragon7157 umm.. assuming you are not joking, just look up the accusations against LTT on Gamers Nexus channel about 5 months ago. Shit went down
8:31 honestly really disappointed to see that diabetes didn't win out over the other CPUs. I had my bets on this one :(
Well, it makes progress in the real world tests.
I love seeing this kind of testing across multiple channels.
And another shining example of why you should always check across multiple review sources.
(Loved the Pokemon referencing of the processors)
(and I've only known a couple years now that Raikou is long o sound (ish). And i have the gbc Crystal bundle. Best Christmas ever, lol)
One of the best videos you've made in a long time. I really like the labs content and the transparancy that you're trying to give. There is a bunch to learn about testing to learn in this video. Really looking forward to more of this content.
Statistically,from experience in semiconductor industry,
It will take at least a sample size of 30 cpus to establish proper sigma. Ideally you would want to do this experiment with 1 motherboard first and repeat with other boards to gauge variation.
The next step would be to test about 100 cpus from different time stamps (hopefully getting different lots),to gauge the variability of their process.
Also at JEDEC to remove differences in memory controller overclock quality.
Of course those hundred CPUs are gonna cost about $35,000... for every model of CPU that you test.
Even this relatively small scale test was probably a lot for LMG So as much as I'm sure Linus would love to get to that point, they're a long ways off from having that kind of money
I never got lucky in silicon lottery but once. That was an Northwood P4 that can be pushed from 1.8 to 2.8 Ghz. For RAM I'm usually happy if I get the values printed on the sticks to run stable.
As someone who used to do statistical process control for safety critical electrical components in the automotive industry, I approve of the comprehensiveness of this. Now that you have the process laid out for doing this, it will only get faster and more routine.
11:53 Colours on 1% low and Avg are inverted
Gamers Nexus about to make a third rant video
This is absolutely what Gamers Nexus loved to see LTT's testing become, even though LMG seems forgot to mention them at 17:24.
(It can be just oversight, but not including Gamers Nexus, the most through on testing tech reviewer before LMG have the Lab, seems a bit intentional.)
The issue is that GN isn't trustworthy.
@@Hathos9how, cause they called out your best boy linus? go toutch grass please
@@martine5923 That is the issue. GN has a legion of toxic sheep that love drama and attack whoever GN says to. I like technology, not drama.
@@Hathos9 that was very funny, please continue. People who love tech-not-drama don't watch ltt much, the whole point of this show is ~ 70% entertainment 30% info. It's basically a discovery channel tv show talking to the lizard brain off its TA. Nothing bad in it per se, some videos are pretty entertaining, but that's what it is primarily - a tech-related entertainment business.
@@BoraHorzaGobuchul I'm fine with entertainment/tech. It is better than the non-stop drama and negativity of GN. LTT seeks views by making fun videos. GN seeks views by attacking others.
I loved how much meme-age they worked into their BSOD in the intro. They even got the QR code go to the LTT store lol
Absolutely no surprise re marketing included.
Yeah after you explained it I am not nearly as mad about this as I thought I'd be. If they design it and manufacture it to achieve a standard, but include features that may be able to get even MORE than advertise, then I don't really care about the variability of how much more it gets.
It's kind of like rock climbing gear, where a carabiner will be rated to 22kN of force. 99.7%+ of the time it'll be stronger than this, and it could easily vary from 23kN to over 30kN. That's be like a CPU having a performance spread of nearly 40%. The thing is, the lowest number of 22kN is still 5x stronger than any realistic safety situation, and so maybe we should think about CPUs in terms of the demand we expect to place on the. Eg. "This CPU will have a 99.7% chance of running *game* at more than 60fps/144fps/250fps"
For your testing, maybe Linux with Wine/Proton would be better in terms of minimisation of background tasks?
The lack of oversight vs automotives and, indeed, rock climbing gear kind of checks out because if your CPU is 5% slower, you aren't going to find yourself in an accident where your brakes fail and you, your partner, your two kids and another random family all fucking die. It is definitely worth doing in a decentralized, nongovernmental fashion though. Maybe a consumer union could be made where manufacturers have to provide representative samples in order to get a trademarked seal of independent testing.
Yeah I think you are right. There are margins in probably every industry so I think the consumer has to be fine when the product reaches its rated performance. When we talk about cars for example, there you can often measure differences between the power of engines. But then the average consumer doesn’t care about fuel quality, tires and regular oil services that much, so probably those have more impact on the actual performance on the road than the engine itself. The same is probably true for pc part’s performance in different countries with different average temperatures. So in the end I guess you should buy a product by it’s confidential rated speed and be happy if you get more than you paid for but not be necessarily sad if you don’t gat that much of a benefit. Your friend might have ended up with a better chip itself but maybe he has a low quality power supply, which makes the cpu perform worse than yours. Or maybe there are also margins in psus?
@themisterx8660 the margins are in EVERY industry. If it's not federally regulated, it's financially regulated or consumer regulated... excerpt vitamins and supplements...
the problem with using Linux is that it's just not a realistic OS to use for benchmarks. Linux has many problems working with NVIDIA GPUs, but the main thing is that most people just don't run Linux so it's better to just use Windows, even if it's bloated and makes getting reliable test data harder
@@memethief4113 True, but it can be used for non gaming tests. As it stands out, Linux will have to be used in specific tests. Like you said, having Linux be usable for ALL tests, we might get there one day, but not today. Not this year either. Maybe next year, in the most optimistic timescale, but realistically I think it's more 5 years, IF it continues to grow like in the last 2 years.
@@memethief4113 I guess the argument would be that it would be fore controlled testing. If one chip ran a game with proton on linux at 100fps and another at 105, then you could meaningfully say the second is 5% faster, even if on people's actual systems the real world framerate would be 45 or 145 or whatever.
The problem with testing in realistic scenarios is that realistic scenarios have a lot of uncontrolled variables. If you tested on windows you could get 145fps or you could get 95 depending on what windows fuckery is happening in the background and you couldn't know without doing 100s of tests and averaging it. Also Linux gaming, even with NVIDIA, is my personal main use case :P.
To be honest, back in the early i7 / AMD Phenom II era of CPUS, the silicon lottery was insaaaane in comparison. Some will be shocked by this and say how bad it is there's such a difference between chips, whereas others will be amazed it's actually pretty consistent. That said, it doesn't make testing simple at all.
This is a good investigation! I've been waiting for someone on this investigation to come out because, as a system integrator for CAD pc's I thought I was losing my mind! Ever since the 13th gen intel and the X3D line that AMD brought out, I've been seeing these inconsistencies. And I was really worried that something in our testing wasn't right! Glad to see my sanity is in check... But I REALLY think you should also take a look at intel! The 13th gen was already a real 'silicon lottery' thing, but the 14th gen intels are... WAY out of spec now and then! PLEASE PLEASE check intel too!!!
@@ABaumstumpf OK! Nice to know! But the thing is. If I swap the same cpu with the same cooler between different motherboards (of the same type that is, we use the b760m aorus elite a lot and the b760 tomahawk) I don't see as many differences as when I change the cpu's out... When I only swap mobo, I see a difference of about 2% max in cinebench. But between different cpu's on the same motherboard I can get up to 7% difference!
So, what am I doing wrong in testing?
This is some incredible research. Not many, if any review channels that do testing would go to this level of effort to obtain the results you have here. I think the amount of effort that it takes to obtain these different samples is probably a reason why. I have always heard of the silicon lottery but did not think about that the lottery may be based on the piece of silicone being used were the environments of the factory. As an afterthought it seems obvious but of course you would need to obtain units from different manufacturing runs to get samples that would perform differently. I appreciate the effort that it took to do this and this demonstrates the true value of The LTT lab.
Variation is normal in any process. I worked with steel for 9 years, you can order a structural grade 50 and get anywhere from a 53 yield strength to a 65. You can't produce everything the exact same and the tighter they hold their tolerances of what's acceptable to ship, the more expensive it is for you to get their cpu. Good companies will improve their process parameters to tighten that up but you don't want almost no variation in cpu
Amazing work @LinusTechTips!
Your discussion on sources of variability really got my brain going.
As someone who works with psychological data, this is something we often discuss (inter-trial vs inter-participant).
Have you looked at using linear mixed models (aka hierarchical models, aka multilevel models)? They could allow you to control for multiple sources of variability without having to aggregate data so much.
Respect for the Vegemite shout out.
Love, Australia
expect thats not vegemite ,real stuff is black not brown
naming them pokemon is fire
as someone who works in big data and analysis of it the way you guys described all your statistic analysis in a way easy to digest for regular people was amazing 10/10 vid
This video feels like it would be great for teaching high school science students about isolating variables and the impact it can have on results! Such an interesting video!
"come on guys, we're not that young anymore" im dead lmaooo
Awesome video! Lab is gaining a lot of credibility after delivering such great and in depth analysis. Impressive!
I must admitt however that is a bit sad to see at 17:24 no mention of gamer nexus.. I guess the community would love to see a reconciliation... I'll keep dreaming
The ball’s in Steve’s trunk.
“If an article contains personal or serious allegations or claims against an individual, it may be appropriate and necessary to give that individual an opportunity to respond to these claims, or to deny them if they wish”
- Independent Press Standards Organization
But they’re UK based, so why should Steve care (:
I noticed the GN omission too. Really sad to see.
17:24 Kinda weird not seeing Gamers Nexus... wonder if that relationship will ever recover.
I was sadden by that too. It would have buried the hatched and allow everybody to move forward. But it's so hard to have nice things today :(
It makes sense, they have no obligation to mention GN and given what happened it's understandable
@@Winnetou17 GN pretty clearly used the situation and drama to drive traffic and profit and weren't entirely honest or genuine in their handling of it. LTT denying them further attention is only appropriate.
@@waldolemmer It keeps filtering my comment, they likely have keywords in place to prevent discussion of it anymore, but there are videos out there with analysis of it, one rhymes with mechmechfotato.
@waldolemmer (We'll see if this goes through) As an example, they were dishonest by not actually giving an unbiased or full picture on the events they were reporting on, they did not contact LTT to get the full picture, and presented his one sided view as fact, then he refused to respond to further discourse on the subject, claiming to be holier and denying any poor handling of it on his part.
Freaking amazing video, well done everyone. It would be cool to see if the results are still as varied if the clock speed was locked across all of the cpus. I assume this wasn’t looked at because that’s not as equivalent to the real world use of the chips, but still could have been an interesting add.
Love the update on testing methodology! Very detailed and I do appreciate that! I did notice that Gamer's Nexus didn't get a spot in the B-roll alongside the other creators, I honestly hope you two make up and I strongly recommend looking to them every now again for sanity checks because they are really great and highly respected.
I noticed that as well. The Steve's are my go to reviewers for almost all purchases on parts
I lost all respect for that pile of garbage when he made that hour-long video, as someone who knows a great deal about human psychology, it is obvious to me that that video was not made for any reason other than revenge. And more specifically revenge about an LTT labs employee calling him out specifically on his data testing procedures, more specifically calling into question how accurate gamers Nexus testing is. And seeing as Steve's response was that hour-long video instead of an hour-long video detailing his processes or the updates he's made to his processes, I'm going to suspect those same concerns are still valid today, and I feel this is partially confirmed by the lack of gamers Nexus in that b-roll shot, Yvonne and Luke would not allow Linus to hold a grudge for this long, also, linus isn't even the CEO anymore, so I suspect there was a reason other than the controversy for the lack of a gamers Nexus in that shot
@@the_undead "as someone knows a great deal about being a fanboi"
If my firm would be slandered like it happened I'd do exactly the same. And oh btw to clear your delusional POV even more... Surely LLT just issued apologies and shut down testing because there wasn't any truth about the claims... surely! If you fire a gun you better be goddamn ready to get backfire. Back to School or where ever you gained that dubious knowledge about human psychology.
This is exactly the kind of video I feel like viewers and employees were missing so much when all the controversy broke out. This doesn't feel rushed, this feels like "we wanted to find the truth, so we did our best, and here's our results".
6:50 Smiled when I saw the only thing you add is Notepad++ because this is truly the only thing I miss since going from Windows to Mac on my work computer...
Notepad++ is great for it's autosave and multiple tabs, though now the regular notepad (WIndows 11) has the same feature🤔
Same lol.. vscode isn’t nearly as good
@@Seedheh really?
@@Seed sure wish np++ had support for practically any language.
Vscode is way better the only thing notepad++ does better is beeing a bit faster@@Seed
That is what makes synthetic benchmarks more consistent and deterministic. But are less relevant to real life performance which is not deterministic.
Why there is no temperature comparison? I mean, nice engagement... but avg/max temps and also avg/max core voltage would be so interesting
0:05 🤔 there's something wrong with your Vegemite mate. I'd be having words with the bloke or sheila that sold it if I was you.
honestly, HUGE respect for doing this, first of all, the video is hella interesting and second of all knowing that you guys do this is extremely reassuring. great job dudes.
@@boscotheman82 thank you! but I did not ask
I think I speak for everyone here that this is EXACTLY what we want. We love tests like these, in depth, practical and perfectly executed. Thank you so much.
Intriguing, might have some better results with some other metrics like Mahalanobis distance instead of Euclidean to account for more correlations in the variance trends. (Mahalanobis is just Euclidean distance in PCA space).
Loved this detailed video! Hope LTT does more videos like this!
I had a similar thought, there is likely significant correlation between the dimensions in game testing space. My brain went a linear algebra route - the axes in game testing space are (probably) not orthogonal, so you would want a metric tensor to allow for proper definition of distances. The Mahalanobis distance route you propose is probably more realistic to actually use here, but I'd love to see an exploration of how to treat correlations in one way or another. Or to just have access to the dataset to fiddle around with it myself, now that my interest has been piqued.
I also loved this video and I look forward to seeing more Labs content as it develops.
I was wondering what Mahalanobis distance meant, then I realised it used standard deviation. Then some of my high school math came flooding in.
I found this review of reviews and previews of reviews re-freshing. Whenever I've used reviews for purchases in the past, I haven't worried about the cherry-picking theory that was mentioned, but the random variance has been a concern of mine. I feel like it's obvious to know that LTT won't eliminate this completely, but knowing that they are keeping it in mind and checking from time to time is a good little piece of mind. Thanks LTT!
Dunno exactly about that, the term of 'silicon lottery' has been around for a long time. What this video did do, was highlight quite well, how it actually applies if trying to make a standardized test.
That it's LTT basically only, that has the resources to actually go and get get ?11 cpu's of the same sku, to do this , is pretty crazy.
From what i remember of this test, said ages ago on WAN, the idea was to use the 3 most close to the centre for the test benches
Hey just avoid Euclidian distance to measure dissimilarity on high dimensional spaces, usually the norms get pretty bit, l2-norm ends up not making much sense (such as k-means clustering)
Very useful video. Interesting that you didn't list Gamers Nexus despite them making an even bigger 68-CPU test back in August.
Also noted that, too bad if they are not friends anymore
Linus' ego is probably still bruised.
@@ShapershiftIt's Steve's ego, he was insulted big time when an LTT employee on a tour criticism he's way of testing. That was started it all... 😂
@@skak3000 Dude, if someone criticises your work, would you do nothing? Well, he did, and he also proved with data and examples that actually his testing was better, and more reliable than the LTT ones, which were ridden with errors. Everyone knows Linus is a little narcisistic, he said so himself, and he apologised.
That said, I watch them both and other outlets to make my decisions, I just don't understand why people side with anyone in cases like this, research everywhere and make your own decisions. Watching only one outlet and trying to protect them like they are your friend is just the stupidest parasocial shit. Fuck GN, fuck LTT, they are means to an end. Stop thinking emotionally and start thinking rationally.
@@soul_slayer7760 People who say they are thinking rationally are cringingly hilarious - and peak irrational. Anyone who thinks they or other people "think rationally" are so ignorant and arrogant it isn't worth discussing.
Huge respects to the LTT and the Labs team for the level of transparency, effort and time that went into this. While this type of content may not be for everyone I for one can get behind this and support you all the way.
And who knows maybe the solution to all this is a consortium down the road. With time, enough experience and credibility there no reason why the Labs team cannot eventually head in that direction.
#1 - this is the labs content I am here for
#2 - I hope you swapped the CPUs between the benches they were on, to ensure the issue is on the CPU and not something else on the bench.
They didn't do the second, indeed they tested with EXPO which is an *extra* variable that AMD does not list as a stock configuration.
As this is not a team of data analysts so they made a couple mistakes in communicating their process, but if you pay attention throughout the whole video, The process they are using is as follows, The only component that changes from one CPU test to the next is the CPU they're using probably PTM 7950 for their thermal pad material so that you don't have to worry about the thermal paste application being a problem, and they're doing this in their thermal chamber with a very specific temperature and humidity setting so that even just humidity in the lab building isn't a concern
If there is a video proof of leasons learned, this is the one. Congratulations to LTT team, huge work!
Yep, it makes the pain that they, and others went through, worth it in the end. Good signs for the future.
@@andoletubegiven the things people threatened to do to Linus's kids, I would say it was not worth it. And personally given what a lot of LTT staff had to deal with because of that video. From gamers Nexus I don't think a lot of them would consider it worth it either. The fact that Linus felt the need to announce that he was doubling. The mental health budget for each of his employees means there was a lot of harassment going on
All for stuff that was going to be fixed anyway because up to about a year ago now and possibly even longer. Linus has talked at length many times on Wan Show about wanting to do all of the things that were forced to be expedited by that whole controversy, which finally enough probably caused a lot of people to be very overworked trying to fix all these problems when one of the main things brought up in the controversy was overworking people
Anyone genuinely believing that controversy was for the better is a hypocrite at best because I am willing to bet $10,000 at least that every single problem people brought up in that controversy is a problem at gamers Nexus given his track record for dealing with his own controversies
@@the_undead Well, I wasn't aware of any threats to Linus's kids. That is, of course, completely abhorrent. I'm not so onboard with the idea that LTT was necessarily fast tracking the necessary improvements to their testing. It's easy to express your ambitions in WAN show chat, but the proof is in the pudding, and the fact is, nothing was indicating that big improvements in testing protocols were imminent. As you say, the ruthless churn of the LTT video schedule was actually preventing them from meeting this ambition. So, I think the GN controversy is specifically the thing that has lit a fire under LTT to get their act together. We are now seeing the dividends of that. Steve is also being more careful and more thorough since the controversy. That's to our collective benefit. The fact that it has exposed vulnerabilities in the characters of Linus and Steve from GN is fair game because they do like to talk in high and mighty terms about ethics on their shows. It's only fair that we see them for who they are in times of duress - specifically because they are making monetised content for viewers about the same matters.
What you mean lesson learned?
That was always the plan!
It takes time to reach such level.
And they are not done.
@@Systox25 some falsehood about that controversy from a couple months ago actually teaching Linus something because he needed to be taught something
I love this kind of videos when you show the behind the scene of the lab.
This is the most literal "computer science" that could be done. ^^ (Usually, "computer science" is not about computers as Edsger Dijkstra once said.)
As a totally unsolicited advice: I might just suggest considering the Chebyshev distance instead of the Euclidean distance since you are interested in avoiding having a sample be an outlier in one of the tests. (Other high-order Minkowski distance would trade the exact rejection of outliers with some "average closeness".)
Hey Euclidean Distance! I remember that gem from my grad school days. We used it for satelight to classify pixels as land types/change detection
If you ever name something similar to like you did with the Pokemon, I would consider doing it alphabetically. So the first one starts with A, second starts with B, third starts with C and so on and so forth. It's just one of those nice-to-haves that doesn't change much but makes the presentation a bit easier to digest
Guys, when you're flashing through 10 different graphs comparing performance between 8 different chips, you need to keep the chips in the same order. The Productivity charts, in particular, have their orders shuffled between every slide. Makes it harder to track the changes.
In addition to the order being by perofmrance, it also doesn't matter. it's not about the perofmance of the individual chip, it's about the consistency.
@@MeshJediNot sure if I misread, but: I think that's exactly the point. Keeping the CPUs in the same order over multiple graphs, should make it way easier to figure out if the fluctuation in performance is due to (or scales with) the different tests (aka. is consistent) or stems from (possibly faulty) hardware (aka. is inconsistent).
@@VoxVenenatus Exactly. If the chips stayed in the same place, you could easily see the bars jumping up and down as the slides changed.
@@VoxVenenatus that dosent really matter and only workshop when you see all the stats of a single cpu
As a research scientist, who looks at a lot of large dataset analyses, it’s pretty cool to see the common approaches. Choosing geometric mean, Euclidean data clustering etc.
I expect to start seeing some PCA and heat maps soon!
The problem is that all the methods in the world can't cover up bad data.
The grocery store example is quite a funny one to choose, one of the few places where that sell products that actually do have a small variance in weight; even for the same price.
Everything has variance.. unless refined to a purity not found outside of lab type conditions. Even different F1 engines perform differently. His example of cars being tested is for pollution levels and safety standards.. not performance. V odd position to take from linus.
@@dzzope you're completely correct. I'm specifically pointing out the grocery store analogy because that's somewhere you notice variance in in your day-to-day life.
Computer testing not being required similar to cars, is because for cars it is all safety related, your cpu being 12 percent slower than expected, isn't a safety issue.
It's crazy how close these CPUs are. Some ancient architectures like Barton had massive overclocking spreads, way over 20% between samples.
Regarding AMD's frequency adjustment making it difficult to test thermal performance, you might be able to find some useful information by supplying them with enough cooling to stay below the 90C threshold, then calculate the watts of heat pulled by measuring temp and flow. If the measurements can be taken near the CPU inlet and outlet with a high enough precision, it might be worth it to make a rig for it. In the case of the chip that possibly had a heat spreader issue: to run it at the same temperature as the rest of them, it would be outputting more watts of heat.
frequency adjustment is also related to core voltage (each core has own voltage), thats why different motherboards have different performance, since they have different power delivery and different load-line calibration on of box.
Also there is a thing called CPPC (Collaborative Processor Performance Control), with optional Preferred Cores feature (which is ON by default). This thing may cause performance problems, since preferred Core1 and preferred Core2 may sit close to each other and heat each other.
As a mathematician I liked that you took the time to explain euclidean distance for those who've never heard of it! And I thought it was very funny that you did it after doing around 20 comparisons using euclidean distance in 1D or 2D :D
As a guy that teaches data analysis and data science this video brought a big grin to my face it'll be great for my students to watch. Great job Linus and everyone there. I love seeing benchmarking and testing getting better and better as time goes by. I used to do this for gaming companies all the time and it's not easy getting proper quality benchmarks. You and gamers Nexus and level 1 techs and others are actually pushing this technology forward and bringing awareness to customers and others of the issues related to benchmark. Thanks a lot
17:30 haha GN conspicuously missing
Noticed that. Also, GN just did this exact same video a few months ago.
Sad to see they clearly haven't buried the hatchet, but I can't exactly blame them. Steve wasn't wrong with what he said, but he clearly took a huge swing at Linus' reputation without seeking comment. That comes across pretty malevolent to me. Linus' lab guy who trash talked GN appears to be the instigator, but Steve's response was not proportional, nor particularly professional IMO now that the dust has settled.
@@TheVillainOfTheYear I agree about everything other than Steve not being wrong. He was absolutely wrong, and was stirring drama for both views, and to undermine his main competitor. He didn't want solutions, he wanted to gain market share. That's integrity in the same way as Apple criticizing Google.
Only controversy from this will be his pronunciation of corsola😭
And not including GN
Which is funny considering he pronounced Corsola more or less correctly (basically a tomato tomato pronunciation), but royally messed up Raikou's pronunciation by falling for the classic "pronouncing the u" beginners trap.
What If we force to get about 8% CPU Usage with a software for that, so that all the CPUs are always on 8% Usage even if the background tasks are not that heavy atm and then run a benchmark
Those pokemon names I feel had to be decided by one of the engineers. we like to troll like that and it is totally something they would do.
Personally, I choose to interpret the other option Linus gave in this video as quite literal, I don't think someone actually suggested that specifically, but when it comes to something that would actually work in this video, I suspect pokémon was the only viable option
Great job guys! It would be nice to see another video the same detail on the latest 13th gen and 14th gen Intel CPUS
As a computer engineer student myself, I want to say something to help Linus while he creates these type of videos. those processor's are exactly the same but they aren't exactly the same at the same time and there is a multiple reason but ill just tell eligible ones so you guys understand. First one is DIE-Connecter contact with the motherboard, which is sometimes even the slight changes on contact surface between CPU die and motherboard pins are eligible to create those type of problems, that's why we don't have CPU extension units but we have PCIe extensions? we do but it doesn't mean they don't take any performance cuz they do take some performance. and the second thing is when you test CPU the neutrino's (beams come from sun that goes through everything) is effecting CPU itself. yes i know its so rare or sometimes nearly impossible but its not as you might think. the reason behind this is neutrinos can change bits on registers or even ram. so there is a lot to say about it. We can trust about CPU manufactories about those type of things since it provides enough performance (nearly)…
The best thing to do would be to cooperate with other youtube testers to compile a database of results. Agree on a standardisation of data for these tests, then each test their respective components, then aggregate the data for comparison. The more you tubers you have, the bigger the test sample.
Love the focus on consistency and variance, and would love to see error bars on future comparisons to quantify this!
Thanks for the breakdowns; appreciate the work, team! Can the audio tech please level the volume on his expostulations? I'm listening on headphones because others need silence.
Glad to see a local chain on your comparison (UK - Computer Orbit)
Crash testing interestingly isn't required. It also isn't done by the government. It is a 3rd party body that only tests vehicles that are sufficiently popular for them to warrant it and they buy the vehicles themselves. There has been a lot of complaining that the cybertruck hasn't been fully crash tested lately but that's simply because there aren't enough on the road yet for them to bother (though they will soon).
The testing groups IIHS and NCAP are funded by car insurance companies NOT governments. The companies need to know how much to charge for which car in case one is a deathtrap.
There is nothing similar for computers, but I could see a group of companies with very large servers paying for proper testing, but that would only be for server/enterprise parts, not consumer products.
How about adding Error Bars to your results?
0:52 someone’s got a good taste in keyboards, i love seeing the moon drop dash getting some love!
I do wonder how much variance there would be if they set specific frequencies on the cpu AND also set all of the memory timings (down to tertiary timings). Even beyond that, I wonder how much tertiary memory timings played into this. Since it’s well known that memory training is BAD on AMD, it could make up for a lot of that difference.
Well, think about how many people are going to actually take the time to set memory timings all the way. It's an even smaller subset of people who will set specific freq's on their CPUs, which is a small subset of people who will even think about what overclockable CPU they want to buy, which is yet another subset of all CPUs sold. My belief is that there should be a 100% benchmark for both price and performance, and that ALL performance numbers should automatically come with a standard variance given in terms of price per unit of performance. Not only will it be more useful from a consumer perspective, it's good math as well. And given the way our math education is going, teaching consumers how to understand statistics and data analysis is a public service.
Alright guys, you got my at "test benches" and finding identical parts. You deserve my engagement for this video, take it.
@19:35 wow, i actually have never thought of the issue in this way.
I've done productions where we buy 10 cameras, and 10 lenses, to test and match 3 cameras with 3 lenses, returning the rest. Most of the best DP's do this if I'm not mistaken.
DP?
@@tiagobelo4965 Cinematographer or Director of Photograhpy. They are the person who is usually responsible for the recorded image.
director of photography@@tiagobelo4965
@@tiagobelo4965 director of photography
Director of Photography @@tiagobelo4965
Kudos to the use of euclidiean distance in determining similarity. Other measure of similarity can be used is cosine similarity, jaccard, etc. Its nice to see LTT actually doing Data Science work. Maybe could use Principal Component Analysis (PCA) to see which of the features is the main contributor to the difference between the highest and lowest rated CPU's
17:25 I seem to remember thinking this exact sentiment on a specific notorious call-out video… not knowing why it was such a big deal. Really cool what you guys are doing and cannot wait to geek out at the Labs site! Please keep being humble and doing great work! It speaks for itself.
I was A little disappointed they weren't included in the recommended reviewers montage.
Its A shame that whole mess blew up the way it did.
I remember while watching the video, thinking LTT would fix A few small mistakes and everything would be cool and when I looked at the comments after that video it seemed like that was everyone else's takeaway.
But then I watched another video covering Steve's vid and noticed, A lot of people had the impression that LTT had malevolent intentions to various ends and deserved some sort of retribution that they saw Steve's video as.
By the time LTT had made their official response, many peoples minds were made up.
I spent hours responding to comments full of outright falsities regarding the content of Steve's original video, Linus's initial response on reddit, and so, so, SO, many blatent misunderstandings of Linus's words in their official response, most I attribute to the context being split up in A way that one responding to Linus's segment immediately after watching it without waiting for 2 *damn* minutes would simply misunderstand his choice of words.
It was A shit show at all stages, and Linus seemingly came out feeling like A victim of A targeted attack by Steve.
But I don't think it was Steve that wanted him to hurt.
That was us, hoping for some good drama to eat up.
That whole experience made me feel like I was watching the end of the truman show, or amarican arcadia.
We did this.
Shame on us.
@@nullvoid3545no Steve did this and Steve knew exactly what he was doing, I find it very suspicious that video came out relatively shortly after a video clip of an LTT labs employee calling out Steve for his questionable practice of data gathering for product comparisons. Combine that with Steve saying he will treat LMG like he treats all other companies and either this is all one big coincidence, or there is a lot of malice involved in that whole drama. As someone who knows a lot about psychology I'm on the fence but the way Steve was talking in that hour-long video that is not the way someone talks when their actions are in good faith, That is the way a politician talks when they are trying to destroy an opponent they know they cannot beat
@@nullvoid3545
Steve decided not to reach out for journalistic comment.
“If an article contains personal or serious allegations or claims against an individual, it may be appropriate and necessary to give that individual an opportunity to respond to these claims, or to deny them if they wish”
- Independent Press Standards
That wasn’t a journalistic investigation. It was a hit piece against a competitor (which Steve openly refers to LTT as.)
@@nullvoid3545 holy shit I never heard anyone put it so succinctly, it was such a dumb moment. Everyone needed to chill tf out, form their own educated opinion, and move on. Instead it's too easy to get short-sighted and aggressive.
I'm so appreciative that we have so many awesome channels out there that do hard work to educate and entertain about tech for free! F the drama!
Next time, for similarity measures, there are a lot more similarity measures that you should explore. For example cosine similarity or Pearson similarity. But to be honest, such a limited dataset could just be brute forced in a few seconds trying every combination
Ever since the GN X LTT stuff, I feel the quality of LTT vids has drastically improved
Ooofph GN not on the trusted fellow YT sites for in depth reviews 😅
Thats only tellin us that nothing is changed. Sad....sad but true😅.
It has been shown that degradation will cause these chips to clock lower over time. I would be interested in a retest of the test benches after they have been used extensively. Both to validate that they are still identical, and to quantify the degradation.
Edit: what about the ssd performance influencing performance because assets are streamed while gaming?