@@ibonitog all cars have been doing that for decades, the thermostat stays closed so that water from the engine doesn't go to the radiator until it hits operating temperature (80-90°c), then when it does, the thermostat opens while simultaneously turning on the fan in order to allow water to circulate around the whole loop
@@chiefdenis even better, mechanical crank operated clutch based fans only start blowing once the crank heats up enough to close up the clutch, only then does the fan start seriously blowing. to me these systems are really cool since theyre run entirely independently of thermostat sensors or computers. the whole thing is mechanically managed by a water pump, a thermostat valve, and a clutch fan.
@@andrewbreazna what you described with the clutch fans is a very similar working principle with the thermostats in table irons, i didn't know they worked like that, so in a way the clutch fan is itself a thermostat, lol, i love simple old mechanical ingenuity. When i was younger i used to wonder how automatic transmissions worked in the 20s - 80s since any computer complex enough to accommodate such logic would be non existent or massive during those times and likely wouldn't fit in a car, turns out humans have been Pretty crazy for a long time.
Honestly, this is easily one of the most needed videos in PC Building. If it's possible to make the title clearer on what the content is, I'm sure many people would benefit as this will be relevant for many, many years and use cases
Exactly what I was thinking. Sounds too click batty and I almost didn't click on because I had no idea what it was actually showing. I understand clickbate, they already did a video of it previously, but they should at least let the viewer know kind of what the video is about, I've probably missed 100s of videos because of this because I just simply skip over it.
@@madbull4666 Yeah, same here. I only clicked on this specific LTT video because I wasn't in the mood for other things at the time. I know they know they have to do it for the almighty algorithm, but I wish they could at least put some parentheticals at the end of the title. "Why Overheat your CPU on Purpose? (exploring different benchmarking approaches)" or something like that. A lot of their bread-and-butter in this channel is informing people of things they didn't even know they needed to know. I know I can springboard off of LTT to other specialists in most tech areas for greater detail, but that's only if I know about the thing in the first place. I've built a lot of computers, but benchmarking is actually one of those things I've never understood well. This was a decent primer, but I'm glad I stumbled across it. I would have clicked it without fail if I'd known what it was about.
"You'll never know your limits until you try to push them." I'm not a hardcore OC enthusiast, but even still I like to see how much headroom I have in my rig, if for no other reason than that I just like to know. Also, it's always nice to be able to disprove people who claim, authoritatively but absolutely without any backing, that "you're doing it wrong if you don't X". You know the type, "You should really water cool that CPU or you'll wreck it!" or "if you use the stock fan curve then your RAM will burn out over time", stupid stuff like that.
A huge recommendation for y-cruncher! It also checks memory stability with different algorithms. I had some systems stable with different stress tests, only y-cruncher disconvered the problems (usually the FFT/NTT tests).
Agree. BTW tests shown in video generally do stress only CPU cores, leaving memory controller and memory mostly idle. High FFT Prime95/memtest86/TestMem5/y-cruncher for stressing memory controller and memory itself.
I had a stable setup all the time with stress tests but then actual using the pc it always crashed. Never really looked into it just put the clocks down a little.
My 5600x's Overclock was perfectly stable with long R20 loops and OCCT test, but actually P95 showed me its instability, with one of the cores clocking down to 1Ghz after around 10 minutes. All it took was a 50mhz reduction and in 5 months i've had 0 crashes, but still goes to show that somehow prime can be good for stability
Yeah, for my 5600X no Linpack or other tests are able to catch wrong curve optimizer value (missed by 2-5) on core 2, while Prime 95 gets thread crashed in either seconds or on test 2/3
@@Luke-fx9gw Yeah, first max threads, then 1/3 (one third of all) threads (it will test second limit) and then single thread with manually swapping thread from core to core in windows task manager (core affinity 0/2/4/6/8/10) for single thread limit... Well it is if you need full test of frequency stability on PBO 12 and 1/3 threads fail easier
@@Luke-fx9gw But with AVX-2 ... AVX is usually downstepped for Zen1, MAAAYBE Zen2 to make it passable, iirc. AVX-2 loads CPU harder and surely makes it run hotter too
I remember having this kind of issue back when I got a 6600k. I overclocked it to around 5ghz and was able to get into windows and run cinebench on loop with no problems. But in certain games (especially BF1), around 10 minutes of gameplay would give a BSOD. This kind of content is very useful to people getting into (amateur) overclocking especially, and honestly I would have loved this video back when I was starting to get into PC gaming.
I remember, when I first got my 4770k, Prime 95 Small FFTs was absolutely brutal in terms of heat output. It was only after a water cooling upgrade, multiple remounts, and a frustrated "crank the fans to max" run that I realized the test exposed Haswell's infamously bad TIM on the IHS and how the solution was devils' canyon.
@@Ender240sxS13 Don't even need the liquid metal honestly, I mean it's cool and all but I've seen mad drops on CPUs that don't even produce that much heat (I've de-lidded a few T SKUs to keep mini PCs quiet) with stuff like MX4, and I've used a fair amount of Kryonaut on hotter chips. I also use a thermally conductive adhesive for attaching the IHS back to the substrate but IDK if that actually makes any difference.
@@MiGujack3 Intel didn't have anything to fix, the TIM used was perfectly effective for keeping the chips cool at the clock speeds and power levels they were designed to operate at. The fact that they faced thermal issues when over-clocking isn't really Intel's problem. Additionally delidding isn't a particularly difficult task, and you aren't going to crack your die, that's more likely to happen on chips where the heat spreader is soldered to the die, in which case you aren't really going to have thermal conductivity issues.
All the test types can be really useful for testing stability I've found I know when trying to find a stable overclock for Ryzen 1800x I was able to push 4.0 ghz (even topping out at 4.2ghz but resulted in crashes faster) and be stable in prime 95/aida 64 but soon as I did general tasks for few hours I'd get crashes. I ended up finding a test that presented this and it was a AVX workload of just trying use Handbrake to compress a video was easiest way to generate this crash. With that I was able to finally dial in a stable overclock.
@@NVMDSTEvil It would actually be at the same rate as the equivalent AVX code path, just less efficient because the register file was only intended for AVX. That requires them to keep values in the register file for twice as long, which is obviously not ideal.
You can add spikes to Small FFTs P95 AVX2 simply by waving your mousing so that it sends interrupts. Buildzoid does this a lot in his videos. In my own experience, I get crashes in P95 far more easily when at the edge.
@@MrDiarukia USB itself is polling (hence the poll rate settings) but the interaction with the CPU is still interrupt-based (or else USB would be even harder on the CPU than it already is).
@@jonasls Basically USB itself is polling but once it notices an action from your mouse (movement, click, etc.) how does it tell the CPU so that the CPU can actually take action? It sends an interrupt which the CPU then services. The USB controller is continuously polling (barring stuff like selective suspend) but the CPU isn't continuously polling the controller.
It's very eye opening seeing how your home rig(s) deals with synthetic loads, I've had to modify several cases just to make sure the systems don't overheat in hotter months!
These videos are becoming more and more rigorous and I really love it! Like really, these are University course material! I am thankful for the quality and love put into these videos!
I highly recommend Gamers Nexus for similar content. Steve's videos are extremely well put-together and are very educational. I learned a lot about power supplies from their exploding Gigabyte PSU saga.
Might want to add these to my testing sweep for used pc’s I sell. I usually run Aida64 for about an hour and play games for a few hours. Aida64 has helped me find many faulty ddr3 ram sticks in the past few years but more testing is better.
Thanks to this, i was able to stress test my cpu to the extreme to where its limits. other stress test app does not get enough power to its tdp, but unlike prime 95... my cpu reached throttling levels and even went beyond the tdp. This is a good evaluation for me so I can upgrade my cooling even though in real world application, my cpu wont go as far as what the prime 95 showed but at least I have a bit of headroom for better cooling
I remember when I first downloaded Prime95 as that was what I heard most often as the recommendation for max temp and stability testing. My Haswell CPU shot up to temps I hadn't seen before, and seemed like it would keep going up. I was very confused and worried about my SFF rig. After a few forum threads I learned why it was so unrealistically hot - after Prime95 v26.6 the test included some really heavy instruction sets that no program I used utilised. That explained why after downloading the latest version (v28.5 back then) I had a toaster oven instead of a PC. I ended up with what I considered a nice and stable undervolt. That is until I had to use Cisco Packet Tracer for some homework. Boy, that thing should be sold as a stability test, because no matter what voltage I set, it would crash for no reason after several minutes. The only way to make it work was to set my Vcore back to Auto. Since then, I've tried both long Blender renders, some linpack tests, Cinebench on a loop, you name it. I ended up guiding several friends whose CPUs I overclocked through the BIOS to raise the Vcore a bit more, because you simply can't beat an actual game/tool running doing real compute for hours at a time over several weeks. All of those experiences left a sour taste in my mouth and honestly, I'm glad our hardware can boost by itself nowadays. Running something like a Sandy Bridge chip at 3.4 GHz just because 4.8 GHz crashed after a year of stability at the wrong time and you ran out of patience always felt like wasting a nice piece of silicon, but the alternative was wondering if it will suddenly BSOD in the middle of a videocall. Same with GPUs - running the core on the edge of stability just to make sure your frames don't dip bellow 60, then capping the framerate to avoid tearing felt so inefficient. I love the fact that these days you can get most of the potential of a chipjust by cooling it well and letting it do its thing. And you can still play with voltage curves if you want to squeeze the last few % out of it, but if you don't, you're not leaving double digit improvements on the table.
"after Prime95 v26.6 the test included some really heavy instruction sets" - From what I understand those are AVX. So new prime95 will run super hot. I'm pretty sure these tests are without that AVX enabled, otherwise the power readings would be much higher, like well over 200w. My 9900K does 208w in P95 AVX, stock, reportedly the 12700K is higher.
good points. At least with a 5800x cpu they have also added PBO and undervolting in the bios that lets you run the CPU a little bit more efficient as well.
Nice. When I bought all of my stuff I had planned for an overclock while purchasing and had it oc'd for over 2 years before an old HDD failed. I think OC'ing is good if you know the tolerances of your components. In my case I have OC'd CPU to 4.0 out of.. 4.3 or.6 maximum if I recall. And I juiced the Ram sticks too, but only up to the power output native to the MoBo... again a preplanned thing. Could have gone crazy with Ram and tried to bring it up to the edge of unstable but a minor overclock is all I needed to make it run really fast even off an HDD for 2 years. I think that is where a lot of people make the mistake.. overclocking random parts is a lot harder than buying parts designed to 'overclock' together.
who didn't instantly think it was Prime95 before clicking the video? I don't even bother with it anymore. As a casual gamer, it's not worth the time -- let alone the potential corruption or demise of other hardware. Moreover, sometimes I could stability test great in P95 and burn to the ground using CPUID stress test.
Back when I did a lot more overclocking and custom loop builds, I preferred running OCCT Linpack. Prime95 was nice for initial limit testing, but not realistic for long term stability testing. OCCT was much more in line with reality, meaning: OCCT stable for 4+ hours almost always meant a fully stable system, no matter what. Doing a few hours of P95 might have turned out stable, but very often the system then later crashed during say a simple DVD playback, or in a menu of a game. OCCT also had (assume it still has) a GPU (Furmark) test which you can run stand alone, or combined with the CPU test. This combined test was excellent for tuning custom loops (fanspeeds), as the combined test would dump maximum heat from the CPU and GPU into the loop. 30-45 mins of that and you'd have your max heat in the loop.
I once did that combined loop with the cpu+gpu benchmark, but for some reason a connector decided that was the time for it to disconnect when putting maximum heat into the system, water ended up boiling/pressurising the system to collapse and killed my mobo and a 1080 ti rip 😣
though stability is not absolut. the moment you get into RAM oc, you realize there is A, no test that does everything and B, the mother board is screwing with you. fast boot is enabled in almost every bios by default and a other options like mhc full check disabled. higher XMP profiles are unstable with those settings even if you dont notice and might produce a random crash you will not be able to reproduce fast if at all. GSAT (google stress app test) is very good on detecting that stuff.
On my old laptop, BeamNG.drive could maintain a pretty solid frame rate (40-50, sometimes 60), but my poor i7 6700HQ would hover around 80-85°C, even hitting 90 before I cleaned the dust out of it
That's not that bad, especially for a laptop. That processor is rated up to 100*C. So you were running well below it's max in a laptop with poor cooling.
got a ryzen 5 5600x and it was running hotter than I liked. undervolted by an offset of .15 volts and now it’s perfect, used cinebench to test the temps and stability.
Very interesting to see Intel still running quite a bit hotter than Ryzen. One of the previous explanations on cooler temps of Ryzen was that it had a bigger IHS, but clearly the architecture plays a bigger factor here.
we also dont see how much work is actually done... the time for a complete p95 run would be interesting. you can also make the CPU a lot hotter if you overclock the cach and even RAM for large ffts because the data load is higher. thats why you oc your ram, then cach and last the cpu, as the last is only limited by cooling
Toxic domestic-violence relationship. Dump that loser and switch to a less scummy browser that only verbally abuses instead of physically and financially as well. 😒
Back when I was building machines professionally I used a number of programs for testing each computer. For the thermal testing I used a program called CPU Burn, if I remember correctly. Now this was a very small and quite primitive utility that came in something like six or seven versions. Each of them was designed for a particular processor family. As the years went by the processors got more advanced and as the program wasn't updated (AFAIK) I tested each new processor generation with all the different versions of burn and noted which one caused the highest energy consumption and temperature. Up to about six or seven years ago it was still outperforming Prime95 by about 5 to 10 degrees for all processors. It was good at forcing computers to throttle, which was exactly what we wanted. For general stability testing I had a version of Prime95 configured to use as much memory as possible. As it was a 32-bit version it topped out at 2 GB for effective testing so a script threw up a number of the processes that at the least was equal to the number of threads available. If there were more memory than n * 2 GB then we increased the number of threads and maxed everything out so that the machine just barely avoided hitting the swap. Prime95 would run it's torture test deallocate the memory and reallocate it again for the next run. Watching the memory graph in Task Manager you could see it fluctuate wildly, but always returning to close to 100% memory utilization. This hit the memory subsystem quite hard and exposed any weak memory module or slot. Combined with ECC memory it was important to check the system logs to catch any logged ECC errors. One ECC Error doesn't necessarily mean it's bad, but more than that is definitely a red flag, and we replaced any memory that generated an error under testing. This test we ran for 24 hours in 40°C ambient as a standard test. Hard drives were also stress tested running random access, but that rarely caused any failures. However the same test run on large RAID arrays tended to expose both drive compatibility problems and controller and driver issues. Our test program was brutal on those and one time our test results caused a large manufacturer to recall all their release cards for their new series of controllers. Our test exposed a weakness that only showed itself when the system was under heavy stress. Sure it was unrealistic, but it didn't do anything but using standards system calls. All in all it was way over the top, but the machines that passed were all solid performers and likely to keep working for many years.
🖐by ecc memory you mean normal memory or the ones marketed as ECC (for xeon's and such). And how did you check logged ecc error? (windows event viewer?) thank you so much :)
@@amb1gamermain22 Yes I was talking about error correcting memory. I worked with a lot of workstations and servers, most of which required Registered memory and most of the time we also used ECC. So a lot of Xeon machines and AMD Opteron which also support ECC. Basic ECC can detect and correct single bit errors and detect multibit errors. So if a single bit is switched the memory controller will detect and correct that error on the fly, you never notice anything until you take a look in the event log where these errors will be logged with details of when and what memory module was generating the error. Bit errors happens even when the hardware is working perfectly. I'm not joking when I say that cosmic radiation can cause a bit flip in your computers RAM memory, graphics memory or in the caches on the processor. This is a case of "shit happens". When it happens to your computer there's something like a 90 - 99 % chance you will never notice. A lot of the memory is used to store bit graphics, sound and temporary data that isn't critical. if a bit in these kinds of resources flip it's very likely nothing bad happens. It gets hairy when a program instruction get corrupted and then a program may crash. What's considered worse is if a bit is flipped in a storage buffer that then gets written to storage. This can corrupt a file or the data in a file. For most home users that isn't very likely to happen, and if it happens it usually isn't a huge catastrophe, but in the business world that can be disastrous. But you don't need to be all that scared. Even in large server clusters ECC errors aren't detected very often, meaning it's a very rare occasion that a bit really do flip because of cosmic radiation. As a system administrator what you are interested in is the rate of ECC errors. One doesn't really mean anything. Two may occur, but it's certainly not a common occurrence. Three or more means you most probably have a failing memory module. When it's really bad the system will stop logging the errors basically just saying they are to many to be logged. But as long as it's just one bit the machine will still keep running. Now if two bits go bad the memory controller can't correct it. Well at least if it's just basic ECC. There are more advanced schemes that can correct dual bit errors but that's more involved than I want to get right now. Then there's something called "Kill bit" where the memory controller can disable a bad module, reconfigure the memory pool and keep the machine running. But again it's a bit more advanced than I want to go into here. When a multi bit error occurs the machine can either log it and try to keep going or the OS can be configured to flat out halt everything to avoid the risk of corrupting calculations or data. And that got way longer than I had planned for...
@@blahorgaslisk7763 yeah it did get long😂 But honestly was really informative, really appreciate it. But isn't it kinda alarming for memory at jedec timing to error? Sadly my experience is around OCing ram and ram stressers, and 1 error = bad OC. though idk if the error in stress tests are equal to what ECC reports as an error. But if it was the same then usually the culprit is heat / emf (as you said comic ray xD). Usually for heat opening side panel or directing a fan is the fix, but for emfs uhhh am not experienced there, but I've seen people at r/overclocking report that covering the backside of their ram slots (the bunch of pins) with grounded copper or aluminum improved their OC results/reduced errors. am not sure why am saying this to a sysadmin, but i've already wrote it
@@amb1gamermain22 In servers or workstations we never over clocked anything. And about the cosmic radiation I wasn't even joking. It's actually possible for a single neutron to flip a bit. So the bit flips I was talking about was not heat of OC related. And yet a bit flip can happen. The density of the memory also plays in. As more advanced processes are used to make denser memory chips they become more sensitive. In DDR5 they've integrated ECC functionality on the memory die, if I remember correctly. Frequencies and process nodes are getting high enough and small enough that they have to do this to get the memory stable. And now we're not talking just the occasional bit flip a year or so, but pretty regularly occurring single bit errors that will be corrected even before the memory controller get to play. I'm not sure how or even if it's possible to get a readout of the number of corrected errors in machines with standard DDR5 memory. I can see memory manufacturers not wanting to worry customers with logs full of errors that are normally occurring with the memory architecture, but at the same time I think it might be interesting if that data could be used to predict if a memory module is about to fail. Simply keeping count of the frequency and warning if say the the numbers of corrected errors per hour goes up dramatically sounds like it could be a good thing. A bit like SMART tries to predict storage failure by keeping tabs on number of corrected errors and reallocated sectors to warn you when a drive starts to show signs of aging. Perhaps I'm overthinking this, but I've always thought ECC was a good idea for all computers. Unfortunately it makes memory about 12% more expensive as it requires more memory chips on each module. Now there are a few things to remember about RAM memory. What you think of as DDR3 or DDR4 memory is what most PC's used. Servers and a lot of workstations however use Registered DDR3 or 4 memory, and then almost always ECC memory. The register on these sits between the memory chips and the memory bus. When the memory clock ticks over the register will latch down the values so the memory controller get a clean and clear signal that's either 0 or 1. With unregistered memory the memory controller is connected directly to the pins on the memory chips, and you basically hope the values are nice and stable. But it's cheaper to make memory modules without the registers and the memory access is slightly faster as the register adds some latency which makes for looser memory timing. Earlier memory architectures used buffers instead of the register. The buffer was basically a preamp on each data line making the memory put a lesser load on the memory bus. This way you could have more memory modules attached to the same memory bus and still not overload the I/O pins on the memory controller. And again this was used for servers and workstations where you needed large amounts of memory. But there has been ECC memory that didn't use buffers or registers, and there has been buffered or registered memory without ECC. There have even been implementations where they added an extra memory stick instead of using ECC memory. But ever since the memory controller moved into the processor those are no longer an option for a regular X86 PC. They were not common before that either, but there were a few machines that could do that. I don't remember what chipset they used though. Sorry for the wall of text, but I started writing and it got out of hand. I don't know if any of it makes sense or if I'm just rambling...
@@blahorgaslisk7763 Yes it does make sense :). (and it's really informative) DDR5 looks boboo if it's just correcting errors 24/7, they should've somehow fixed the cause of errors instead of ECCing. What i meant is that a single error while OCing ram = a Bad OC. So if we apply the same logic to a workstation with jedec timings, 1 error is alarming (since it's not OCed and still errors). But ig it's fine since ECC is here to save the day xd. 🤔i might be exaggerating and 1 error is bound to happen once in a while
Rename this to “What’s The Best Way To Stress Test? (CPU)” Super helpful and I didn’t even expect this lol, if I missed this video and I was looking for the best way to stress test I would be so mad lol
Pretty much an electric heater, not the most cost-effective thing, but a good byproduct for people living in cold climates. I live in a tropical country, so I expend energy two times, one in the CPU and other on AC to remove the heat from the room. =(
@@Phambleton yeah, a best in class gas generator plant has efficiency of 64%, so 36% losses. Add to that, all losses in transmission. Maybe if you had a heat pump in your house, electricity would be a cheaper way to heat, as those pumps output more heat than input power. Also, I have zero idea about government subsidies in Europe for gas, so probably this engineering analysis isn't enough.
@@dinartd not to mention not all electricity that goes into a CPU/GPU is outputed as heat, which is why TDP is so tricky to calculate for CPU coolers. But by already realising all these efficiency losses with basic knowledge, you know it's not a great idea using this to heat your house! Electricity price has doubled for me in the last 6 months, so I'm being conscious of stuff that uses significant power.
This only focused on stress-testing CPUs, it didn't cover GPUs, SSDs, or RAM. For GPUs, you should usually be fine with rendering a complex Blender scene and some benchmarks like Cinebench and Furmark. For SSDs, you can test with a program like H2testw that is usually used to test flash media (or just filling it with large video files, flushing the hardware caches by rebooting, then doing a comparison to the original files). For RAM, Memtest86+.
My favourite is 3D mark, I bought a license super cheap when it was on offer and I like how it gives me a score at the end, this way once I overclock I can see how much more it helps and when i get a new part I can see how many more points I get :)
@@luckydepressedguy8981 the new AMD CPUs arent really worth overclocking, i have a 3700x and a nzxt z62 aio and i researched overclocks but apparently i wouldnt get any performance gains. I think its due to them trting to get as much performance out the box as they can leaving no room for overclocking. To add to this manual overclocking is mostly obsolete these days due to many CPUs auto overclocking themselves if not limited like AMDs PBO
Without even watching the video I can say that when the AMD phenom II(?) black editions back in the day when you overclocked them and over heated them you'd "unlock" more cores because they used the same die or whatever for the new 6(?) Core CPU as they did the 4 core cpus. I found this out on accident as a kid by goofing off and overclocking the chip after I got it and next thing you know it's reading as a 6 core cpu. It blew my mind back then. Didn't understand what was going on at all. To this day it's one of my favorite computer hardware related stories I got to experience first hand.
Sometimes these things can be so seemingly random... I thought I had a rock solid system for the longest time.. everything ran great on it, Prime, Cinebench, Furmark, 3dmark, etc... and of all things, Cities Skylines was the one thing that exposed some instability.
Yep, way back in the day it took playing Counter Strike for me to realize some very high memory address chips were dead in ways that the BIOS wasn't detecting on startup. Memtest86 to confirm.
@3:59 @4:22 Prime95 absolutely *is* for stability in real-world applications, unless you don't consider science real. :) There are workloads out there that are very hard to number crunch, for which we're specifically optimizing the problems' innermost loops to run in the L1/L2 caches of specific CPU architectures and even models as that can mean the difference between weeks versus years of runtime per data set. Knowing in advance what the CPUs you have in your cluster can and can't handle, when they'll throttle (or even outright die…) is just as important as knowing how many cycles a specific instruction takes, so we can "waste" exactly as many cycles so the CPU can run 24/7/365¼ at full speed while not overwhelming the cooling. Prime95 results are very valuable starting points here. Applications keeping walls of racks of compute power busy may not be commonplace, but they certainly are real.
You wouldn't find the 140A EDC bug unless you use these tests correctly. Awesome work. There are very specific tests you need to do to catch all the problems with the BIOS. You can also slam JUST TWO CORES rotating around every 140 seconds, Small FFT , so that you catch the hot spots from tweaking.
So I donate all of my spare CPU and GPU cycles to BOINC, I've found that it can find system instability pretty well. It can also turn my 6 year old budget build into quite the space heater.
FYI, like prime95, OCCT has different data set modes (small/medium/large) and those dramatically affect thermals, the difference between normal and extreme is a trade-off between more core power vs more memory usage (extreme raises core power but hits the memory system less)
Not only that, internally it even uses a modified, headless Prime95 binary that it temporarily extracts in a temp folder. There are even still some "Prime95" strings inside the binary.
Well this one will freak my apprentice. We were stressing a fan yesterday and I used Prime95. He queried my use as he'd never heard of it but loves Linus so HA, in your face dude! You know who you are. Thanks Linus.
My typical stability test for a CPU and/or RAM is an overnight session of Primer95. Small FFT for CPU, large/blend for RAM. I've caught many stability issues that show up only after 4h of Prime95. Add the FurMark overnight for the GPU, and you're set.
This is the equivalent of suiting up your daily driver to go racing. It won't be extremely competitive, but it will teach you the limits of your car and how finely can ride them before something goes wrong.
I've liked Prime for the heat output but it isn't realistic. As was noticed in this video. Thank you for sharing the other methods to test my new systems.
When I've built a new PC in the past, i've always loaded it up with Cinebench and 3D Mark Wildlife at the same time and let it run for about 30 minutes. Once, it did indeed crash on me last year when I had an RX 580 in there with a 550 watt power supply and after upgrading the power supply, no more problems. So it can be useful to figuring these things out before passing off the computer you build to someone (in this case my wife) or using it yourself for work.
Been overheating my i7990x since 09 still going strong still gaming never had an issue been overclocked to 4.2/4.5 and still stable consistent years of awesome. Minor software issues here n there but nothing to major
It would be very useful to understand how to actually fix common stress test results. Like - what are the characteristics of what you're looking for - and what can you do to fix it.
I used prime 95 to tune the fan curves, CPU voltages, etc because if the system could stay within normal operating temperatures without deafening me while doing prime 95, then the computer would run relatively cool and quiet in other tasks. The result is that the system runs quiet enough that I don't need to use headphones while the CPU is pulling 215W and the GPU is drawing 400W in very demanding games. Prime 95 is also a great way to evaluate the stability of the system. I was happy to see that the VRMs were up to the task of providing my 11900K with enough juice to run at 5.2GHZ on all cores almost indefinitely.
I've had to compile Linpack as a dependency multiple times during my time doing university research on physical simulations, crazy to see it being mentioned like that. Solving systems of linear equations is in the heart of most physical simulations, as basically all of our work is to translate real world physical equations into those. Sometimes things can get complex with nonlinear systems as well though, but those are usually much more experimental.
Ahh...good old Prime95...and FurMark for GPUs. Prime95 has been such a good stress testing tool for years and years. FFT size can be chosen to stress within different cache levels, or big enough to stress memory controller stability. It does require you to know the sizes of your CPU caches to tune it appropriately. The spiky CPU loading profiles would probably be the best way to test VRM stability that would likely expose power based instability that wouldn't necessarily show up under a Prime95 test.
With Labs coming, stuffs getting more scientific. With these videos, more logs/journals/results in a PDF/xls file be awesome.
XLS? 4 real?
CSV ingest brah
I don't watch Linus to learn
@@moneer7139 I’m pretty sure you do, you just don’t realize it. You wouldn’t be here otherwise.
with a scientific lab ,they can also put forth scientific theories to be tested . provided they are recognized by computer scientific community.
@@moneer7139 then who? Lmg has the mosy tange if testing and budget for it lol
8:14 You can significantly cut the time down to reach steady state on water by shutting off the fans for 1-2 minutes at the start of testing
Great tip! That's actually what modern cars do with their cooling loop if outside temperatures are cold
@Projit no
@@ibonitog all cars have been doing that for decades, the thermostat stays closed so that water from the engine doesn't go to the radiator until it hits operating temperature (80-90°c), then when it does, the thermostat opens while simultaneously turning on the fan in order to allow water to circulate around the whole loop
@@chiefdenis even better, mechanical crank operated clutch based fans only start blowing once the crank heats up enough to close up the clutch, only then does the fan start seriously blowing. to me these systems are really cool since theyre run entirely independently of thermostat sensors or computers. the whole thing is mechanically managed by a water pump, a thermostat valve, and a clutch fan.
@@andrewbreazna what you described with the clutch fans is a very similar working principle with the thermostats in table irons, i didn't know they worked like that, so in a way the clutch fan is itself a thermostat, lol, i love simple old mechanical ingenuity.
When i was younger i used to wonder how automatic transmissions worked in the 20s - 80s since any computer complex enough to accommodate such logic would be non existent or massive during those times and likely wouldn't fit in a car, turns out humans have been Pretty crazy for a long time.
Honestly, this is easily one of the most needed videos in PC Building. If it's possible to make the title clearer on what the content is, I'm sure many people would benefit as this will be relevant for many, many years and use cases
@Projit Why did you lie
Exactly what I was thinking. Sounds too click batty and I almost didn't click on because I had no idea what it was actually showing. I understand clickbate, they already did a video of it previously, but they should at least let the viewer know kind of what the video is about, I've probably missed 100s of videos because of this because I just simply skip over it.
@@madbull4666 Yeah, same here. I only clicked on this specific LTT video because I wasn't in the mood for other things at the time. I know they know they have to do it for the almighty algorithm, but I wish they could at least put some parentheticals at the end of the title. "Why Overheat your CPU on Purpose? (exploring different benchmarking approaches)" or something like that. A lot of their bread-and-butter in this channel is informing people of things they didn't even know they needed to know. I know I can springboard off of LTT to other specialists in most tech areas for greater detail, but that's only if I know about the thing in the first place. I've built a lot of computers, but benchmarking is actually one of those things I've never understood well. This was a decent primer, but I'm glad I stumbled across it. I would have clicked it without fail if I'd known what it was about.
Yeah, I was repeating the same earlier with their QD-OLED video recently - they need to work on their title making first.
Could we have a petition at least? I've seen a lot of comments already mentioning the same problem :/
"You'll never know your limits until you try to push them."
I'm not a hardcore OC enthusiast, but even still I like to see how much headroom I have in my rig, if for no other reason than that I just like to know. Also, it's always nice to be able to disprove people who claim, authoritatively but absolutely without any backing, that "you're doing it wrong if you don't X". You know the type, "You should really water cool that CPU or you'll wreck it!" or "if you use the stock fan curve then your RAM will burn out over time", stupid stuff like that.
True, for me it's that I'm too lazy to do more than Prime95 and the Ryzen Master test
+MSI Kombustor? I forgot if that's the correct name
That's how I got my first anal fissure
@@moiseman I don't think you used enough thermal paste for your testing method 😳
I've never heard anyone say either of those things.....ever.
The hammer hitting the CPU physically hurt me
It would have already been dead.
For those concerned: it was already dead
@@LinusTechTips RIP CPU
@@LinusTechTips I noticed the bent pins before the hit but still
@@LinusTechTips no LIVING CPUs were harmed in the making of this video
A huge recommendation for y-cruncher! It also checks memory stability with different algorithms. I had some systems stable with different stress tests, only y-cruncher disconvered the problems (usually the FFT/NTT tests).
Agree.
BTW tests shown in video generally do stress only CPU cores, leaving memory controller and memory mostly idle.
High FFT Prime95/memtest86/TestMem5/y-cruncher for stressing memory controller and memory itself.
I had a stable setup all the time with stress tests but then actual using the pc it always crashed. Never really looked into it just put the clocks down a little.
Memtest86+ on a floppy disk in a USB floppy-drive ftw. 😀
My 5600x's Overclock was perfectly stable with long R20 loops and OCCT test, but actually P95 showed me its instability, with one of the cores clocking down to 1Ghz after around 10 minutes.
All it took was a 50mhz reduction and in 5 months i've had 0 crashes, but still goes to show that somehow prime can be good for stability
Yeah, for my 5600X no Linpack or other tests are able to catch wrong curve optimizer value (missed by 2-5) on core 2, while Prime 95 gets thread crashed in either seconds or on test 2/3
what settings do you both use to find instabilities with curve optimiser offsets .Prime 95 small ffts with avx?
@@DimkaTsv which prime 95 setting were you using ? Small ffts with avx?
@@Luke-fx9gw Yeah, first max threads, then 1/3 (one third of all) threads (it will test second limit) and then single thread with manually swapping thread from core to core in windows task manager (core affinity 0/2/4/6/8/10) for single thread limit... Well it is if you need full test of frequency stability on PBO
12 and 1/3 threads fail easier
@@Luke-fx9gw But with AVX-2 ... AVX is usually downstepped for Zen1, MAAAYBE Zen2 to make it passable, iirc.
AVX-2 loads CPU harder and surely makes it run hotter too
I remember having this kind of issue back when I got a 6600k. I overclocked it to around 5ghz and was able to get into windows and run cinebench on loop with no problems. But in certain games (especially BF1), around 10 minutes of gameplay would give a BSOD. This kind of content is very useful to people getting into (amateur) overclocking especially, and honestly I would have loved this video back when I was starting to get into PC gaming.
@Projit your content that doesn’t exist? Yea way better
I actually overheat my PC, so the lifetime of my I7-3770k goes down, so I can buy a new one soon.
3770k gng. It just won't die and still plenty to game.
You are going to hate yourself
Get rid of your thermal paste. Ez performance boost
Runs fine on 4.7 Ghz
a fellow 3770k warrior. To be fair, my 3080 is ridding it just fine. I will only update when the 14th gen hits
I remember, when I first got my 4770k, Prime 95 Small FFTs was absolutely brutal in terms of heat output. It was only after a water cooling upgrade, multiple remounts, and a frustrated "crank the fans to max" run that I realized the test exposed Haswell's infamously bad TIM on the IHS and how the solution was devils' canyon.
Or de-lid and go liquid metal or something...
@@Ender240sxS13 And potentially crack your die or something...
Intel should've fixed that in the first place.
@@MiGujack3 It ain't all that hard buddy guy, you'd have to be a downright goof, a goober even, to crack a haswell die de-lidding it
@@Ender240sxS13 Don't even need the liquid metal honestly, I mean it's cool and all but I've seen mad drops on CPUs that don't even produce that much heat (I've de-lidded a few T SKUs to keep mini PCs quiet) with stuff like MX4, and I've used a fair amount of Kryonaut on hotter chips. I also use a thermally conductive adhesive for attaching the IHS back to the substrate but IDK if that actually makes any difference.
@@MiGujack3 Intel didn't have anything to fix, the TIM used was perfectly effective for keeping the chips cool at the clock speeds and power levels they were designed to operate at. The fact that they faced thermal issues when over-clocking isn't really Intel's problem.
Additionally delidding isn't a particularly difficult task, and you aren't going to crack your die, that's more likely to happen on chips where the heat spreader is soldered to the die, in which case you aren't really going to have thermal conductivity issues.
All the test types can be really useful for testing stability I've found I know when trying to find a stable overclock for Ryzen 1800x I was able to push 4.0 ghz (even topping out at 4.2ghz but resulted in crashes faster) and be stable in prime 95/aida 64 but soon as I did general tasks for few hours I'd get crashes. I ended up finding a test that presented this and it was a AVX workload of just trying use Handbrake to compress a video was easiest way to generate this crash. With that I was able to finally dial in a stable overclock.
@Projit ez dubz
prime does avx.
Probably because P95 was running AVX2. Ryzen 1/1.5 runs AVX2 at half rate so it will be less power use than AVX1
@@NVMDSTEvil It would actually be at the same rate as the equivalent AVX code path, just less efficient because the register file was only intended for AVX. That requires them to keep values in the register file for twice as long, which is obviously not ideal.
@@NVMDSTEvil interesting I didn't know that difference just so happened to isolate my blue screens to using ffmpeg/handbrake
You can add spikes to Small FFTs P95 AVX2 simply by waving your mousing so that it sends interrupts. Buildzoid does this a lot in his videos.
In my own experience, I get crashes in P95 far more easily when at the edge.
Wouldn't you need a PS/2-Mouse for that? USB-Mice are not working with interrupts.
Personally Prime95 doesn't crash even after 12h but certain games crash after 5mins of playing.
@@MrDiarukia USB itself is polling (hence the poll rate settings) but the interaction with the CPU is still interrupt-based (or else USB would be even harder on the CPU than it already is).
@@chronoreverse can you explain how it's still interupt based?
@@jonasls Basically USB itself is polling but once it notices an action from your mouse (movement, click, etc.) how does it tell the CPU so that the CPU can actually take action? It sends an interrupt which the CPU then services.
The USB controller is continuously polling (barring stuff like selective suspend) but the CPU isn't continuously polling the controller.
Just wanted to point out that the link for a "free 240GB SSD" from Microcenter is actually for a 128GB USB Flash Drive & 128GB MicroSD Card
It almost adds up
Pog
lmao
I mean... they are solid state "drives"
Probably didnt update the adread
It's very eye opening seeing how your home rig(s) deals with synthetic loads, I've had to modify several cases just to make sure the systems don't overheat in hotter months!
@Projit you have no content
1:24 I like how your silver bullet is an unfired round.
Please, forgive us. -CW
At Aperture Science we fire the whole Bullet. That's 65% more Bullet per Bullet
@@insu_na I love that old reference
And with Aperture Desk Job here, it's even better !
Love prime 95. Helped me find out that my fans on my radiator were facing eachother. Back in 2016, I was using an FX-9590 in a CM HAFX case.
How did you not die on a fire?
@@naamadossantossilva4736 LMAO
@@naamadossantossilva4736 And here I was in 2013 with an FX-8320, and that was bad enough!
@@naamadossantossilva4736 Well... I'm still alive in 2022 running an FX-9590 @ 4.8 GHz 🥵
@@TinchoX And considering the market right now, you're probably gonna be sweating for quite a while longer unless you hit the lottery. MEGA OOF!
“Some men…just want to watch the PC burn.”
These videos are becoming more and more rigorous and I really love it! Like really, these are University course material! I am thankful for the quality and love put into these videos!
I highly recommend Gamers Nexus for similar content. Steve's videos are extremely well put-together and are very educational. I learned a lot about power supplies from their exploding Gigabyte PSU saga.
@Projit atleast Linus doesn't (and hasn't) advertised himself on some other creator's video.
@@PhantomRetrospective Pretty sure that was a joke comment man
I do it to make my PC a space heater for my room.
Nice little zoom/spatial blur effect there at 1:19. My compliments to the editor
Might want to add these to my testing sweep for used pc’s I sell. I usually run Aida64 for about an hour and play games for a few hours. Aida64 has helped me find many faulty ddr3 ram sticks in the past few years but more testing is better.
@Projit no
A lifesaver. I was just looking for an guide on benchmark programs, and most important, a way to graphic them. Thanks!
Thanks to this, i was able to stress test my cpu to the extreme to where its limits. other stress test app does not get enough power to its tdp, but unlike prime 95... my cpu reached throttling levels and even went beyond the tdp. This is a good evaluation for me so I can upgrade my cooling even though in real world application, my cpu wont go as far as what the prime 95 showed but at least I have a bit of headroom for better cooling
I still need to upgrade my cooling from the stock cooler, just waiting on parts. My cpu also thermal throttles just doing cine bench for 1 minute 😣
I studied Fortran in 1985, you no say Fortran old! Now lets translate some formulas!
I remember when I first downloaded Prime95 as that was what I heard most often as the recommendation for max temp and stability testing.
My Haswell CPU shot up to temps I hadn't seen before, and seemed like it would keep going up. I was very confused and worried about my SFF rig.
After a few forum threads I learned why it was so unrealistically hot - after Prime95 v26.6 the test included some really heavy instruction sets that no program I used utilised.
That explained why after downloading the latest version (v28.5 back then) I had a toaster oven instead of a PC.
I ended up with what I considered a nice and stable undervolt. That is until I had to use Cisco Packet Tracer for some homework. Boy, that thing should be sold as a stability test, because no matter what voltage I set, it would crash for no reason after several minutes. The only way to make it work was to set my Vcore back to Auto.
Since then, I've tried both long Blender renders, some linpack tests, Cinebench on a loop, you name it. I ended up guiding several friends whose CPUs I overclocked through the BIOS to raise the Vcore a bit more, because you simply can't beat an actual game/tool running doing real compute for hours at a time over several weeks. All of those experiences left a sour taste in my mouth and honestly, I'm glad our hardware can boost by itself nowadays. Running something like a Sandy Bridge chip at 3.4 GHz just because 4.8 GHz crashed after a year of stability at the wrong time and you ran out of patience always felt like wasting a nice piece of silicon, but the alternative was wondering if it will suddenly BSOD in the middle of a videocall.
Same with GPUs - running the core on the edge of stability just to make sure your frames don't dip bellow 60, then capping the framerate to avoid tearing felt so inefficient.
I love the fact that these days you can get most of the potential of a chipjust by cooling it well and letting it do its thing. And you can still play with voltage curves if you want to squeeze the last few % out of it, but if you don't, you're not leaving double digit improvements on the table.
"after Prime95 v26.6 the test included some really heavy instruction sets" - From what I understand those are AVX. So new prime95 will run super hot. I'm pretty sure these tests are without that AVX enabled, otherwise the power readings would be much higher, like well over 200w. My 9900K does 208w in P95 AVX, stock, reportedly the 12700K is higher.
good points. At least with a 5800x cpu they have also added PBO and undervolting in the bios that lets you run the CPU a little bit more efficient as well.
Cisco packet tracer is beyond garbage, it's very likely not your CPUs fault.
Nice. When I bought all of my stuff I had planned for an overclock while purchasing and had it oc'd for over 2 years before an old HDD failed. I think OC'ing is good if you know the tolerances of your components. In my case I have OC'd CPU to 4.0 out of.. 4.3 or.6 maximum if I recall. And I juiced the Ram sticks too, but only up to the power output native to the MoBo... again a preplanned thing. Could have gone crazy with Ram and tried to bring it up to the edge of unstable but a minor overclock is all I needed to make it run really fast even off an HDD for 2 years. I think that is where a lot of people make the mistake.. overclocking random parts is a lot harder than buying parts designed to 'overclock' together.
who didn't instantly think it was Prime95 before clicking the video? I don't even bother with it anymore. As a casual gamer, it's not worth the time -- let alone the potential corruption or demise of other hardware. Moreover, sometimes I could stability test great in P95 and burn to the ground using CPUID stress test.
Back when I did a lot more overclocking and custom loop builds, I preferred running OCCT Linpack. Prime95 was nice for initial limit testing, but not realistic for long term stability testing. OCCT was much more in line with reality, meaning: OCCT stable for 4+ hours almost always meant a fully stable system, no matter what. Doing a few hours of P95 might have turned out stable, but very often the system then later crashed during say a simple DVD playback, or in a menu of a game.
OCCT also had (assume it still has) a GPU (Furmark) test which you can run stand alone, or combined with the CPU test. This combined test was excellent for tuning custom loops (fanspeeds), as the combined test would dump maximum heat from the CPU and GPU into the loop. 30-45 mins of that and you'd have your max heat in the loop.
I once did that combined loop with the cpu+gpu benchmark, but for some reason a connector decided that was the time for it to disconnect when putting maximum heat into the system, water ended up boiling/pressurising the system to collapse and killed my mobo and a 1080 ti rip 😣
though stability is not absolut. the moment you get into RAM oc, you realize there is A, no test that does everything and B, the mother board is screwing with you.
fast boot is enabled in almost every bios by default and a other options like mhc full check disabled. higher XMP profiles are unstable with those settings even if you dont notice and might produce a random crash you will not be able to reproduce fast if at all. GSAT (google stress app test) is very good on detecting that stuff.
Oh man!! That zoom effect at 1:15 looks soo good with the floating icons getting blurred
On my old laptop, BeamNG.drive could maintain a pretty solid frame rate (40-50, sometimes 60), but my poor i7 6700HQ would hover around 80-85°C, even hitting 90 before I cleaned the dust out of it
That's not that bad, especially for a laptop. That processor is rated up to 100*C. So you were running well below it's max in a laptop with poor cooling.
That's a good stress test
I love beam
@@brandonjohnson4121 yep my old 650m would go to 120C+ before replacing it's thermal paste. Then a 3gb 1060m would average 98C whenever it was used
I'm having problems with my i5 10500hk and it sees 98⁰c
got a ryzen 5 5600x and it was running hotter than I liked. undervolted by an offset of .15 volts and now it’s perfect, used cinebench to test the temps and stability.
Me watching this on my laptop:
The laptop: dont you dare-
I'd love to see a video on fortran and those ancestral machines
Very interesting to see Intel still running quite a bit hotter than Ryzen.
One of the previous explanations on cooler temps of Ryzen was that it had a bigger IHS, but clearly the architecture plays a bigger factor here.
we also dont see how much work is actually done... the time for a complete p95 run would be interesting. you can also make the CPU a lot hotter if you overclock the cach and even RAM for large ffts because the data load is higher. thats why you oc your ram, then cach and last the cpu, as the last is only limited by cooling
that one breath in edit at 1:16 was so clean, nice doods
Chrome seems to stress every part of my PC, while also raising my own stress levels. I'd like to see Prime 95 do that.
Toxic domestic-violence relationship. Dump that loser and switch to a less scummy browser that only verbally abuses instead of physically and financially as well. 😒
I like how you guys turn into a data company
4:00 I'm a mathematician so for me, prime95 actually is a real-world application
We really needed this kind of down to earth, real life usage video. Thanks linus
Back when I was building machines professionally I used a number of programs for testing each computer. For the thermal testing I used a program called CPU Burn, if I remember correctly. Now this was a very small and quite primitive utility that came in something like six or seven versions. Each of them was designed for a particular processor family. As the years went by the processors got more advanced and as the program wasn't updated (AFAIK) I tested each new processor generation with all the different versions of burn and noted which one caused the highest energy consumption and temperature. Up to about six or seven years ago it was still outperforming Prime95 by about 5 to 10 degrees for all processors.
It was good at forcing computers to throttle, which was exactly what we wanted.
For general stability testing I had a version of Prime95 configured to use as much memory as possible. As it was a 32-bit version it topped out at 2 GB for effective testing so a script threw up a number of the processes that at the least was equal to the number of threads available. If there were more memory than n * 2 GB then we increased the number of threads and maxed everything out so that the machine just barely avoided hitting the swap. Prime95 would run it's torture test deallocate the memory and reallocate it again for the next run. Watching the memory graph in Task Manager you could see it fluctuate wildly, but always returning to close to 100% memory utilization. This hit the memory subsystem quite hard and exposed any weak memory module or slot. Combined with ECC memory it was important to check the system logs to catch any logged ECC errors. One ECC Error doesn't necessarily mean it's bad, but more than that is definitely a red flag, and we replaced any memory that generated an error under testing. This test we ran for 24 hours in 40°C ambient as a standard test.
Hard drives were also stress tested running random access, but that rarely caused any failures. However the same test run on large RAID arrays tended to expose both drive compatibility problems and controller and driver issues. Our test program was brutal on those and one time our test results caused a large manufacturer to recall all their release cards for their new series of controllers. Our test exposed a weakness that only showed itself when the system was under heavy stress. Sure it was unrealistic, but it didn't do anything but using standards system calls.
All in all it was way over the top, but the machines that passed were all solid performers and likely to keep working for many years.
🖐by ecc memory you mean normal memory or the ones marketed as ECC (for xeon's and such). And how did you check logged ecc error? (windows event viewer?)
thank you so much :)
@@amb1gamermain22 Yes I was talking about error correcting memory. I worked with a lot of workstations and servers, most of which required Registered memory and most of the time we also used ECC. So a lot of Xeon machines and AMD Opteron which also support ECC.
Basic ECC can detect and correct single bit errors and detect multibit errors. So if a single bit is switched the memory controller will detect and correct that error on the fly, you never notice anything until you take a look in the event log where these errors will be logged with details of when and what memory module was generating the error.
Bit errors happens even when the hardware is working perfectly. I'm not joking when I say that cosmic radiation can cause a bit flip in your computers RAM memory, graphics memory or in the caches on the processor. This is a case of "shit happens". When it happens to your computer there's something like a 90 - 99 % chance you will never notice. A lot of the memory is used to store bit graphics, sound and temporary data that isn't critical. if a bit in these kinds of resources flip it's very likely nothing bad happens. It gets hairy when a program instruction get corrupted and then a program may crash. What's considered worse is if a bit is flipped in a storage buffer that then gets written to storage. This can corrupt a file or the data in a file. For most home users that isn't very likely to happen, and if it happens it usually isn't a huge catastrophe, but in the business world that can be disastrous.
But you don't need to be all that scared. Even in large server clusters ECC errors aren't detected very often, meaning it's a very rare occasion that a bit really do flip because of cosmic radiation.
As a system administrator what you are interested in is the rate of ECC errors. One doesn't really mean anything. Two may occur, but it's certainly not a common occurrence. Three or more means you most probably have a failing memory module.
When it's really bad the system will stop logging the errors basically just saying they are to many to be logged.
But as long as it's just one bit the machine will still keep running.
Now if two bits go bad the memory controller can't correct it. Well at least if it's just basic ECC. There are more advanced schemes that can correct dual bit errors but that's more involved than I want to get right now. Then there's something called "Kill bit" where the memory controller can disable a bad module, reconfigure the memory pool and keep the machine running. But again it's a bit more advanced than I want to go into here.
When a multi bit error occurs the machine can either log it and try to keep going or the OS can be configured to flat out halt everything to avoid the risk of corrupting calculations or data.
And that got way longer than I had planned for...
@@blahorgaslisk7763 yeah it did get long😂
But honestly was really informative, really appreciate it.
But isn't it kinda alarming for memory at jedec timing to error?
Sadly my experience is around OCing ram and ram stressers, and 1 error = bad OC. though idk if the error in stress tests are equal to what ECC reports as an error. But if it was the same then usually the culprit is heat / emf (as you said comic ray xD).
Usually for heat opening side panel or directing a fan is the fix, but for emfs uhhh am not experienced there, but I've seen people at r/overclocking report that covering the backside of their ram slots (the bunch of pins) with grounded copper or aluminum improved their OC results/reduced errors.
am not sure why am saying this to a sysadmin, but i've already wrote it
@@amb1gamermain22 In servers or workstations we never over clocked anything. And about the cosmic radiation I wasn't even joking. It's actually possible for a single neutron to flip a bit. So the bit flips I was talking about was not heat of OC related. And yet a bit flip can happen. The density of the memory also plays in. As more advanced processes are used to make denser memory chips they become more sensitive. In DDR5 they've integrated ECC functionality on the memory die, if I remember correctly. Frequencies and process nodes are getting high enough and small enough that they have to do this to get the memory stable. And now we're not talking just the occasional bit flip a year or so, but pretty regularly occurring single bit errors that will be corrected even before the memory controller get to play. I'm not sure how or even if it's possible to get a readout of the number of corrected errors in machines with standard DDR5 memory. I can see memory manufacturers not wanting to worry customers with logs full of errors that are normally occurring with the memory architecture, but at the same time I think it might be interesting if that data could be used to predict if a memory module is about to fail. Simply keeping count of the frequency and warning if say the the numbers of corrected errors per hour goes up dramatically sounds like it could be a good thing. A bit like SMART tries to predict storage failure by keeping tabs on number of corrected errors and reallocated sectors to warn you when a drive starts to show signs of aging.
Perhaps I'm overthinking this, but I've always thought ECC was a good idea for all computers. Unfortunately it makes memory about 12% more expensive as it requires more memory chips on each module.
Now there are a few things to remember about RAM memory. What you think of as DDR3 or DDR4 memory is what most PC's used. Servers and a lot of workstations however use Registered DDR3 or 4 memory, and then almost always ECC memory.
The register on these sits between the memory chips and the memory bus. When the memory clock ticks over the register will latch down the values so the memory controller get a clean and clear signal that's either 0 or 1. With unregistered memory the memory controller is connected directly to the pins on the memory chips, and you basically hope the values are nice and stable. But it's cheaper to make memory modules without the registers and the memory access is slightly faster as the register adds some latency which makes for looser memory timing.
Earlier memory architectures used buffers instead of the register. The buffer was basically a preamp on each data line making the memory put a lesser load on the memory bus. This way you could have more memory modules attached to the same memory bus and still not overload the I/O pins on the memory controller. And again this was used for servers and workstations where you needed large amounts of memory.
But there has been ECC memory that didn't use buffers or registers, and there has been buffered or registered memory without ECC.
There have even been implementations where they added an extra memory stick instead of using ECC memory. But ever since the memory controller moved into the processor those are no longer an option for a regular X86 PC. They were not common before that either, but there were a few machines that could do that. I don't remember what chipset they used though.
Sorry for the wall of text, but I started writing and it got out of hand. I don't know if any of it makes sense or if I'm just rambling...
@@blahorgaslisk7763 Yes it does make sense :). (and it's really informative)
DDR5 looks boboo if it's just correcting errors 24/7, they should've somehow fixed the cause of errors instead of ECCing.
What i meant is that a single error while OCing ram = a Bad OC.
So if we apply the same logic to a workstation with jedec timings, 1 error is alarming (since it's not OCed and still errors). But ig it's fine since ECC is here to save the day xd.
🤔i might be exaggerating and 1 error is bound to happen once in a while
Love these kinds of videos, getting to the nitty-gritty. Helpful for us too!
Did the editor forgot to insert the intro?
Rename this to “What’s The Best Way To Stress Test? (CPU)” Super helpful and I didn’t even expect this lol, if I missed this video and I was looking for the best way to stress test I would be so mad lol
Really good video. A detailed, accurate look into something I'm sure most of us have wondered. Good stuff.
I was stress testing once a PC after messing with fan curves, I didn't realize I disabled the whole thing and my pc just crashed XD
Very nice summary of tools. Makes it easier to pick something
With gas prices as high as they are in the UK I think overheating your CPU could help keep the room warm while you work…… no? Only me? Ok
Pretty much an electric heater, not the most cost-effective thing, but a good byproduct for people living in cold climates. I live in a tropical country, so I expend energy two times, one in the CPU and other on AC to remove the heat from the room. =(
@@dinartd during the winter it’s useful in the UK haha but I agree it’s definitely not the most efficient
@@bytesizeclu Just use gas for heating, as it will ironically use "less gas" than it takes to generate electricity from gas
@@Phambleton yeah, a best in class gas generator plant has efficiency of 64%, so 36% losses. Add to that, all losses in transmission. Maybe if you had a heat pump in your house, electricity would be a cheaper way to heat, as those pumps output more heat than input power. Also, I have zero idea about government subsidies in Europe for gas, so probably this engineering analysis isn't enough.
@@dinartd not to mention not all electricity that goes into a CPU/GPU is outputed as heat, which is why TDP is so tricky to calculate for CPU coolers. But by already realising all these efficiency losses with basic knowledge, you know it's not a great idea using this to heat your house! Electricity price has doubled for me in the last 6 months, so I'm being conscious of stuff that uses significant power.
7:06 for the summary (just in case someone doesn't want to check description or video chapters)
This only focused on stress-testing CPUs, it didn't cover GPUs, SSDs, or RAM. For GPUs, you should usually be fine with rendering a complex Blender scene and some benchmarks like Cinebench and Furmark. For SSDs, you can test with a program like H2testw that is usually used to test flash media (or just filling it with large video files, flushing the hardware caches by rebooting, then doing a comparison to the original files). For RAM, Memtest86+.
My favourite is 3D mark, I bought a license super cheap when it was on offer and I like how it gives me a score at the end, this way once I overclock I can see how much more it helps and when i get a new part I can see how many more points I get :)
btw whats the point of overclocking a cpu? im running 5900x and a Lian li aio. Haven't touch the oc yet
@@luckydepressedguy8981 the new AMD CPUs arent really worth overclocking, i have a 3700x and a nzxt z62 aio and i researched overclocks but apparently i wouldnt get any performance gains. I think its due to them trting to get as much performance out the box as they can leaving no room for overclocking. To add to this manual overclocking is mostly obsolete these days due to many CPUs auto overclocking themselves if not limited like AMDs PBO
my personal CPU stress tests are always Prime95 if I want something to NEVER crash, for my daily I just use Y-Cruncher and Linpack
One of my favorite vids you’ve done in a while.
Without even watching the video I can say that when the AMD phenom II(?) black editions back in the day when you overclocked them and over heated them you'd "unlock" more cores because they used the same die or whatever for the new 6(?) Core CPU as they did the 4 core cpus. I found this out on accident as a kid by goofing off and overclocking the chip after I got it and next thing you know it's reading as a 6 core cpu. It blew my mind back then. Didn't understand what was going on at all. To this day it's one of my favorite computer hardware related stories I got to experience first hand.
@Projit lol ok
i wish washington had a microcenter😥
Sometimes these things can be so seemingly random... I thought I had a rock solid system for the longest time.. everything ran great on it, Prime, Cinebench, Furmark, 3dmark, etc... and of all things, Cities Skylines was the one thing that exposed some instability.
So true, there are some games that will find the smallest instability beyond any stress tests
Yep, way back in the day it took playing Counter Strike for me to realize some very high memory address chips were dead in ways that the BIOS wasn't detecting on startup. Memtest86 to confirm.
I've had the 7-Zip benchmark of all things expose overstretched memory clock / timings. Never seen that before.
i understand very little of this technical jargon but i love watching it
I'm surprised you didn't mention using overheating to set new thermal compound...
not worth mentioning!
@@sirfairplay9153 Why not? You'd be shocked how many novice builders don't think to do this before using their new PC for demanding tasks
@3:59 @4:22 Prime95 absolutely *is* for stability in real-world applications, unless you don't consider science real. :) There are workloads out there that are very hard to number crunch, for which we're specifically optimizing the problems' innermost loops to run in the L1/L2 caches of specific CPU architectures and even models as that can mean the difference between weeks versus years of runtime per data set. Knowing in advance what the CPUs you have in your cluster can and can't handle, when they'll throttle (or even outright die…) is just as important as knowing how many cycles a specific instruction takes, so we can "waste" exactly as many cycles so the CPU can run 24/7/365¼ at full speed while not overwhelming the cooling. Prime95 results are very valuable starting points here. Applications keeping walls of racks of compute power busy may not be commonplace, but they certainly are real.
"Why Overheat your CPU on Purpose?"
Ask Apple
What!? 2 videos about PC gaming in a row? This is a miracle!
jokes on you, instead of using fancy programs I can just watch this video to overheat my pc
The PC I built at Microcenter is still my main PC. I asked the guy to give me a future proof PC and he really delivered. Built it in 2015
Man AMD turned that"hot and loud" stigma real good
very well informative video as well as explanation for all of what we could of tried to think of ! Keep em coming linus !!
Gaming
Gaming
@@fnafgamer4367 Gaming
Gaming
Gaming
Gaming
Initially I thought this video was kinda out-of-ideas content, but it really aligns well with the upcoming Lab
BTW: yesterday on 13.04. a new version of Generic Log Viewer was released: 6.3 👍
You wouldn't find the 140A EDC bug unless you use these tests correctly. Awesome work. There are very specific tests you need to do to catch all the problems with the BIOS. You can also slam JUST TWO CORES rotating around every 140 seconds, Small FFT , so that you catch the hot spots from tweaking.
Thank you for giving credit to the tools you use in your videos by giving them a shoutout.
Linus Tech Tips is slowly becoming Linus Tech University. I like it.
That was really valuable. Would love to see a GPU edition
For college I have to simulate the collision of a thousand galaxies. That would be a pretty good over heating benchmark.
ASUS RoG Real Bench does an excellent job of covering lots of workloads types through automation rather than just running synthetic benchmarks.
OH MAN OH MAN if this is a tease for The Lab I'm so damn hyped
Micro Center is sponsoring all of tech tube and I'm all for it. Make the pilgrimage to your closest Micro Center if you are able to.
I know it's not the while picture but thanks for saying the "winner" outright at the beginning.
So I donate all of my spare CPU and GPU cycles to BOINC, I've found that it can find system instability pretty well. It can also turn my 6 year old budget build into quite the space heater.
FYI, like prime95, OCCT has different data set modes (small/medium/large) and those dramatically affect thermals, the difference between normal and extreme is a trade-off between more core power vs more memory usage (extreme raises core power but hits the memory system less)
Not only that, internally it even uses a modified, headless Prime95 binary that it temporarily extracts in a temp folder. There are even still some "Prime95" strings inside the binary.
Well this one will freak my apprentice. We were stressing a fan yesterday and I used Prime95. He queried my use as he'd never heard of it but loves Linus so HA, in your face dude! You know who you are. Thanks Linus.
0:15 that’s a pretty nice book style epoxy table you got there
Some of my legacy work is still in Fortran. *insert “this is fine” burning meme*
Nasa when Voyager era craft refuse to die: This is Fine
Thanks for the video!
Cool to know such stuff!
My typical stability test for a CPU and/or RAM is an overnight session of Primer95. Small FFT for CPU, large/blend for RAM. I've caught many stability issues that show up only after 4h of Prime95. Add the FurMark overnight for the GPU, and you're set.
This is the equivalent of suiting up your daily driver to go racing. It won't be extremely competitive, but it will teach you the limits of your car and how finely can ride them before something goes wrong.
I've liked Prime for the heat output but it isn't realistic. As was noticed in this video. Thank you for sharing the other methods to test my new systems.
When I've built a new PC in the past, i've always loaded it up with Cinebench and 3D Mark Wildlife at the same time and let it run for about 30 minutes. Once, it did indeed crash on me last year when I had an RX 580 in there with a 550 watt power supply and after upgrading the power supply, no more problems. So it can be useful to figuring these things out before passing off the computer you build to someone (in this case my wife) or using it yourself for work.
Love the video! Funny to see the plant in the back moving every time linus makes a bigger move though >,
So I could fry an egg on my PC while I'm playing, true multitasking gamer moments
Been overheating my i7990x since 09 still going strong still gaming never had an issue been overclocked to 4.2/4.5 and still stable consistent years of awesome. Minor software issues here n there but nothing to major
No more intro splash screen? I kinda liked it. Made me get all jazzed up about the rest of the video
Just the thing I was looking for my new build
Thank you, this is a great video. I learned alot about stress test methodology
It would be very useful to understand how to actually fix common stress test results. Like - what are the characteristics of what you're looking for - and what can you do to fix it.
This is soooo Linus Tech Labs material. Something I didnt know I wanted to know. Great content, thanks!
Finally some TECH TIPS from LTT, not just unboxing and reviews
Best video so far this year keep it up 🔥
I used prime 95 to tune the fan curves, CPU voltages, etc because if the system could stay within normal operating temperatures without deafening me while doing prime 95, then the computer would run relatively cool and quiet in other tasks. The result is that the system runs quiet enough that I don't need to use headphones while the CPU is pulling 215W and the GPU is drawing 400W in very demanding games. Prime 95 is also a great way to evaluate the stability of the system. I was happy to see that the VRMs were up to the task of providing my 11900K with enough juice to run at 5.2GHZ on all cores almost indefinitely.
I hope you guys paid something to the guy that made Generic Log Viewer, it's an amazing tool.
Seeing Linus simplifying all of linear algebra as Ax = b after having written an exam on it today is something else
I've had to compile Linpack as a dependency multiple times during my time doing university research on physical simulations, crazy to see it being mentioned like that. Solving systems of linear equations is in the heart of most physical simulations, as basically all of our work is to translate real world physical equations into those. Sometimes things can get complex with nonlinear systems as well though, but those are usually much more experimental.
Ahh...good old Prime95...and FurMark for GPUs. Prime95 has been such a good stress testing tool for years and years. FFT size can be chosen to stress within different cache levels, or big enough to stress memory controller stability. It does require you to know the sizes of your CPU caches to tune it appropriately. The spiky CPU loading profiles would probably be the best way to test VRM stability that would likely expose power based instability that wouldn't necessarily show up under a Prime95 test.