Intel PR posted an update to... reddit. So check that out, update your bios, and look for another update to your bios coming mid august 2024! www.reddit.com/r/intel/comments/1e9mf04/intel_core_13th14th_gen_desktop_processors/ tho they seem to have deleted some of the better questions about whats covered under rma, are folks that have delidded covered (guessing: lolnope), etc.
@@Level1Techs Sounds to me like they're trying to cover their asses. They're focusing on the voltage issue because that's something they can fix. The oxidation problem is a manufacturing defect, and that potentially means the dreaded R word - RECALL.
@@Level1Techs sh8 i just ordered a laptop with a 19400hx on the acer helios 18...am i screwed? considering i live in the phil and my sister bought it for me there in the u.s...yikes
ummm i saw literally ZERO sick memes here soooo... kinda suspect reporting bro? but fireship just pontificated on the issue and showed speculative-non-primary-source concern AND SICK MEMES, so therefore, issue confirmed
"AMD Is In the Rear-View Mirror" - Pat Gelsinger, Intel CEO "Intel CEO’s Comments of AMD ‘in the Rearview Mirror’ May Indicate Intel is Driving Backwards" - Areej Syed, Hardwaretimes
I saw a funny meme cartoon of Pat driving towards a cliff and Lisa driving away from the cliff. Lisa's car is of course in the rearview mirror from Pat's perspective. v-cliff
As a former aviation maintenance tech and lifelong computer nerd, I approve this message. I won't fly commercial and I won't buy intel. Both Boeing and intel are a complete disaster.
-Talks about a real problem -Explains what he found in true objectivism -Admits data is not big enough -Properly digs through the problem What a gigachad
I work at a gaming marketing agency in the IT department, and I have been absolutely losing my mind over these issues. I have built ten systems over the last six months, and had three of them experience various failures, from One of the more consistent problems I've seen has been system corruptions, where I'm unable to even run simple Windows updates, SFC, DISM -- you name it. Recovery media fails, fresh installs on new drives fail, and the crashes on games have gotten utterly out of control. Thank you for this, it's definitely incredibly helpful for tracking our OWN sanity.
@@chrisx742 Appreciate the thought, and while yes, it does, and we did -- used the Intel XTU, didn't go above 52x, dialed in Intel specified limits all across the board, and had little to no consistent improvement, but definite performance loss. Tried the Intel Processor Diagnostic Tool and failed consistently across all settings. RMA'd one 14900K already, popped in a spare 12900K while waiting for the replacement with none of these issues with all the rest of the same hardware. The point ultimately is that we shouldn't need to do this, and it's infuriating to have to fight with multi-billion dollar companies to get what we're paying for, and have it actually work consistently.
Intel forgot how to make fast cpus, then they forgot how to make efficient cpus, and now they appear to be forgetting how to make properly operational cpus... Not a good character arc...
Typical too quickly developed too little tested, lack of QA because the new lineup has to be released and delivered to keep up with the competition and not upset the shareholders behaviour. I wish we had fewer stock corporations again. A world in which profitable companies are devalued because they don't constantly increase their profits is completely f'd up.
Player: I've got you now Pirate King! PC: (Crashes) Player: Ahhhh you thought a mere catastraphic failure could save you?! Well prepare to doom your DOOM! LMAO
Intel is hanging on for dear life trying to compete with AMD. They rather take big risks that can backfire on their customers, then try to silence it, than being honest about the limits of their CPUs and losing their high spot in the benchmark charts.
Exactly, there is no fix for what is plain old Silicon degradation caused by too high voltages, clocks, temps and Amps pulled trough the silicon long-term. There is no software fix, there is no bios, there is no microcode that will revert this once its already degraded. The only fix is "do not exceed 1,3v and clock them 500-600Mhz lower". And also, do not clock your CPUs so high that they have less than a 5% headroom left in them, so even the smallest degradation throws then to the "unstable on stock settings" category. Intel is just making smoke hoping the attention will go away so they do not have to do the thing they should - full scale recall.
@@seeibe This. Consumers are used to seeing i7 at the top of the chart and Intel will break bones to keep it there, because they desperately need to keep it's brand power. I just had to leave a bad review for a "i7 Desktop" selling in my country's Amazon with no GPU for $900(!), because I noticed that it had a HD4500 and realised nobody else could work out that meant it was an 11 year old PC. They just see "i7" and buy. It's those idiots that Intel made the 14900 for. So Google still says "i7 is fastest" and the plebs carry on buying it.
When server providers are starting to actively recommend AMD systems after YEARS of Intel being the "safe bet", this is the point where Intel should start sweating. Because mind-share is a huge factor in what gets bought in large volumes.
I agree with most of your statements as I work for a small SI. I stated on initial videos a few months ago when the fingers were pointed at motherboard makers that the problem could be with Intel processors. We mostly provide 14900k to businesses on premium and entry-level B760 boards. So right off the bat, no O.C. Next, we disable motherboard enhancement and limit power to Intel stock. And some board are weak and can only do intel stock power. Failures have happened in all these scenarios. These systems can run cinebench all day long or occt stability test or aida64. But try to install the nvidia driver(compression/decompression). The installer fails. Another way to test this is tekken 8 demo game. No matter what bios we choose, those processors are gone. Replaced. There is no coming back from this. Has happened with ddr4 and ddr5 systems.
@explosivemonkeys I hope you get yours soon. We are not in the U.S. so our rma is a couple of weeks with Intel. But once the system is diagnosed with this problem at the service centre, we replace it with a new processor and return the system back to the customer. So, for a customer, it is basically 2 to 3 days down time. I am also hearing stories of Intel denying rma for these cases. That would make this worse.
The NVidia installer was a big thing for me too. Would crash on install about 95% of the time. At the start I figured it was a corrupt Windows install or maybe a failing HD. Almost dropped a lot of extra money to start replacing components before I started to see all the other issues out there, which thankfully saved me.
Just so it's clear - did replacing the CPU actually fix the issue? Was it properly verified, taking a system exhibiting the problem consistently, replacing only CPU and then testing again with the same exact test and no problem found? Asking because I was around in the past for the FDIV bug in P60/P66 by Intel, where you could actually even get it to calculate wrong in Excel using a specific formula to prove the bug. Initially Intel said there is no issue, then they made RMA's for all who complained trying to shut them up - or at least it looked like that is what went on, then finally they had to stop all the BS - and rework the CPU and announce that people who had affected chips could register to get it replaced with the new fixed chips shipping out as they were coming off the production line. I had one of the FDIV bug P60's for years as a memento of that cluckerf*ck but I traded it eventually for a DEC VAX 11/750 in 1997-8 something like that. Shoulda kept it. Point being with the FDIV bug you could not replace your way out of it until Intel made fixed chips, 100% of the P60/P66 initial production were affected. You could kernel patch your way out of it, but it was a software workaround for a hardware problem.
15:05 "We already replaced a lot of customer's 13900k with 14900k and the issues don't seem fully resolved." This statement is extremely telling. 14:56 "$1,000 extra" for support is insane and really tells the whole story here. The part is not reliable. I am sure a lot of conversations are happening behind the scenes here, but based on what I've seen, Intel has not committed to fixing the issue. We can hope that it is due to incompetence but more likely they do not want to admit fault here due to the cost of "making it right" Thanks for sharing this Wendell, it will be interesting to see other news around this topic in the coming weeks.
It's like how insurance companies are refusing to insure houses in certain states against extreme weather. When even the Capitalists charge more or refuse to cover something, that's a sign that there's a real problem there.
@@monad_tcp The issue is that Intel itself doesn't really know what to fix and currently the most reliable way to make it run stable enough is to lower the multiplier like to 53 and running the RAM slower. If that is what it takes then the CPU wouldn't run as specified and also would invalidate Intel marketing around the CPU, thus even if Intel want to do a simple recall and giving the costumer the fixed product, it can't be done. Assuming actually running it slower does fix the problem, Intel probably need to replace the product, force it to run the CPU slower, and give some monetary compensation.
I don't think it's about being committed to solve the issue, as it is that they have probably realized there is no fix, and they can't afford to recall two whole generations of product. That would kill the company. Trying to mitigate it through microcode and power limits is probably the best they can do until getting a 15th gen out, and it might even be too late to fix the issues for that one, so they're really just holding on for their life for a 16th gen.
@@rezenclowd3 That's a really dumb way to do business, people talk to each other. It's like if I started pumping out car parts, let's say a water pump that kept failing prematurely and blamed everyone but themselves and tried charging for repairs but knew it was self inflicted the whole time... Oh wait, my Golf R had that issue. Oh, there was a class action lawsuit? Nice.
@@Dendodorion I mean it does make sense if you aren't the company producing the faulty part in the first place. Charging more because something is worse to work on seems fair to me.
25% is a ridiculous failure rate for the high volume OEMs. Wonder if intel is getting back the failed chips to do some failure analysis on them with test boards and thermal imaging or something.
My guess is it's simply that power/frequency limits are pushed beyond what the platform can reasonably support, which means this was an intentional choice intel made. It is a known workaround that underclocking and reducing power limits increases stability.... at first. But once it starts happening, it seems that the damage is done, and it's only going to keep getting worse.
@@greebj but we don't have a zero. We have an insignificantly small number: iirc 4 out of ~1500 documented failures, or 0,25%, when 30% of the population use AMD, is a clear indicator that this is an intel-specific issue.
Right smack in the middle of the Raptor Lake 8+16 die is the ring interconnect fabric, with a ring agent interconnect per P-core and an agent per E-core block. Given you’re seeing a mix of IO related errors, including the out of VRAM errors (which is likely triggered by a PCIe operation failure), my gut says this is the system agent logic in a bad-state, specifically due to failing ring signaling at one or more ring agents or in the system agent itself, blocking and going through some form of error / bad-state recovery. My guess would be that this is due to electromigration degradation of the ring interconnect logic due to the very large Raptor Lake 900MHz top end ring clock bump by Intel, versus Alder Lake. This is why we’re seeing these issues in 13th and 14th gen, but not 12th gen. What’s concerning is that if this is where the problem is occurring, than Intel appears to have not implemented internal telemetry, or not exposed it via CPU or other system driver, to drive WHEA reporting around ring interconnect and the system agent. Given the complex nature of the SOC, with its asymmetric architecture, many clock domains and sophisticated “system agent” IO block, that would be bad if true.
If the ring causes those issues, I wonder if raising ring voltage could solve those issues or not. For example, I had 2 i9 13th gen, one is 13900K that I sold for the current i9-13900KS, both runs stable at 253W, 400A IccMax with BIOS default settings (I set my power limit lower than 253W initially due to having a 240mm AIO). I did turn off MCE in BIOS and set my custom load line to undervolt the chip. I used the previous chip for about a year with no noticeable degradation. I run both chips with SA voltage of 1.35V for my DDR4 ram OC. Both chip had ran mining softwares, Prime95 and other stress tests. I suspect that for the degradation to happen. most people did not configure power limit, they let their chip runs with high voltage with high current, and that cause fast degradation. Also, their silicon quality is probably average or below average in the first place.
@@kevinzhu5591 it probably would further accelerate degradation of the ring bus and lead to even more crashes my bet is that intel has a design flaw regarding ring bus which got exposed by their lack of regulation regarding power profiles which slowly cooked ring bus on stock speeds and fried it when people did do some overclocking
@@kevinzhu5591 Wouldn't it make more sense to lower ring voltage and clocks a bit? It shouldn't make a huge difference in terms of performance to reduce it by 300mhz or whatever.
@@ahs9674Doubt it would help. 12th Gen have had (minor and not often but I've encounter them way to many times...) problems with asymmetric uarch, 13th & 14th not only suffer from that but you have to have in mind that they do bent, are made using Intel 7 process that took ages to be "mature" enough to be even considered viable to be used for manufacturing and so much more. There might be a reason (beyond being behind with their own foundries and R&D on advanced nodes) why Intel is hell bent on using TSMC instead of their own process for CPU tile.
real. I love my 7700x. Such a good chip. I can Craft so many Mines with it
5 месяцев назад
I bought 2 7900X3D's for 100 GbE testing on Supermicro H13SAE-MF. I guess I dodged the Intel enshitification bullet. Meanwhile, my AliExpress craptastic and old Xeon rando recycled parts impulse buy laughs in ECC.
I remember a time when people would say to go with Intel, it's a much more stable platform, how times change eh lol. Ironically the only build I ever had so much hassle with was the one time I decided to jump ship over to Intel back in 2015 with a 6700k skylake cpu due to the crap ipc the fx8350 had that I was running. Always used amd other than that from the athlon days in the early 2000s through to a number of ryzen builds & I've literally never had any problems, all the amd builds have just worked, currently running a 7800x3d.
Heys, I'm one of those who were "tremendously suffering" from the instability, to the point where regular programs such as Discord, Chrome, Davinci Resolve, and games were crashing left and right. I just RMA-d my i9-13900KF. Intel already knows that shit's going down because customer support didn't put up too much of a fight and went straight to the point - RMA. Unfortunate, but it is what it is. Purchased a secondary, i7-12700K (LGA-1700 slot was needed so that I don't need a new MOBO) and it feels like a breeze of fresh air not having to put up with the constant instability.
@@MrXelaim No. 1. I'm a contractor (work from home) and cannot afford to lose a single day of work for making such a switch. This was a carefully orchastrated RMA process where I timed my CPU replacement on the same day the courier would come for the RMA CPU. 2. The motherboard is not faulty. Buyers protection in my country only covers the first 3 days to return a product without a good reason and this is a year old system.
@@owlmostdead9492 Why though? You still have a time window to return a product if you changed your mind. Otherwise, wouldn't people ignore warranty and just return age old products left and right out of greed?
My god this is so useful. I am on my third 13900K in 18 months. First one ‘lasted’ a year (3 months of that was me getting more frustrated trying to diagnose wtf was happening as more and more and more games failed). The pain was that it came out as IO errors, memory errors, everything except the CPU. In the end I found a forum thread where someone mentioned it. I got it swapped under warranty and the second one began to fail identically 5 months in. Third one is still quite new but if this goes as well I’ll be sharing all of this with the builders and telling them they can either refund the entire PC or put something else back in. I have no overclocking. The games that crashed were often games that a 13900K wasn’t even remotely needed for - like WoW - and it was absolutely infuriating
When AMD released its 7000 series, I went with a 7900x. I had buyer's remorse when I saw the better value of Intel 13th gen. But with this situation now, my mind is totally free from that feeling.
And they're likely about to get completely shafted by ARM with the mass migration of both datacenters and consumer computers to ARM chips (or other RISC designs) due to the huge drop in power consumption that ARM brings.
@@samwalker7567 Yep. I wouldn't buy any PC right now if you can hold off. Things are changing, and ARM seems to be poised to dethrone x86 in the coming years. Best to wait and see what the industry looks like once the dust settles.
@@dschwartz783 Disagree, if the transition happens it will: a) take years to acquire significant share in the desktop market, and b) there will still be a large marketshare of x86 for years after that happens, so support will remain for quite some time after. Makes no sense at all to hold off buying a PC in anticipation of this platform.
Crash rates measured in weeks is not acceptable for consumer products. If it's measured in months you still have an iffy product. If you see game logs where crashe intervals is close to hourly... the whole house is on freaking fire!
Standard SLA in the industry is 99.9999%, so you can have about 30 seconds of downtime per year. With this high of a failure rate keeping that level of SLA is simply impossible. Even for the cheap providers that have just 99.99% SLA keeping the downtime to 5 minutes per month might be hard.
@@hubertnnn that's the marketing and contracts. In actual practice it's often lower than that for pretty much all hosting providers, even big ones like Google/Amazon/Microsoft or OVH. Source: trust me bro. They refuse to acknowledge to us a bunch of small incidents and stuff during their own upgrades so they can keep claiming such high SLA wasn't breached
But to be fair, datacenter shouldn't even use consumer class CPU. 1 of the biggest difference between consumer class and enterprise class CPU is ECC RAM support. If you want 24/7 operation and stability is very critical, ECC RAM is a must have feature which only Xeons can provide .
And it's a lot worse than this video says, because when he says "25% of systems are having errors" he's not counting all the ones that are already dead or RMA'd, because they can't produce bug reports... If we assumed even half of the buggy chips go back to Intel at some point, the math would put it at at least 33% dead, which is absolutely disgusting.
I nearly cried watching this. Someone else that has spent countless hours going over crash logs!!! Someone else has felt my pain!!! I'm not alone! Anyone considering a job in IT.... This is your possible future!!! and its a thankless job.
On the warranty repair side I have seen a major shift in the willingness to accept that an Intel CPU might be faulty. It went from a multi-hour process to get a processor RMA approved to a self-approved process. I just order the CPU replacement and I'm done. It's akin now to ordering a replacement memory module or SSD under the warranty process (this is at the OEM level, not with Intel). Talking to other techs, we have seen a HUGE increase in CPU failures - almost always seen as general instability with a longer running process (which is a really hard thing to verify during a warranty service, so we do typically take the customer's word for it). I have tracked down the problem on a few of them to the memory controller contact pads - often you can even see visual discoloring of the pads from prolonged heat. The instability in those cases seems to come from the CPU getting hot, changing shape, and the signal integrity to the memory modules shifting... then BOOM, you're unstable... until it cools a little, then you're okay. Basically, I think the problem is the shape of the CPU.
Is this also affecting the 14900hx laptop cpu? I just got a laptop with one and could probably return but got a really good deal and has a good mini led screen and would get so much use out of it for the whole almost a year till next model laptops are out with the 50 series rtx so really don't want to return.
back in the days the cpus were binned with a way looser margin of error, in some cases you could easily make the cpu run 25% faster with a bit more of cooling, but once they caught wind of overclockers pushing the cpus so hot, they started to push power into them themselves, to make these margins extremely tight and cpus that are edging being defective are still sold, most people don't stress test their new cpus so by the time they realize they are defective it's too late and they get away with it
One of the biggest take-aways from this video is the price difference on service agreements between a 7950X ($139) and a 14900k ($1,280) server - "3 years parts & labor, 24/5 - Next business day onsite repair - zone 1". For the 7950X, that isn't covering the cost of a single hour onsite repair technician. For the 14900k, that is slightly less than the cost of an Asus Pro WS W680-ACE ($330) + a new 14900k ($600) + 4 x 48 GB of G.Skill Ripjaws DDR5 5200 RAM (2x$190) (total of $1,310). You can basically do a complete hardware replacement, AT RETAIL COST, for the price of that service agreement.
If 50% of CPUs are failing one way or another but the other 50% running perfectly fine I would closely look at the production starting even from the silicon provider and its quality.
Smaller you go well chip lottery that's why I don't see the point in clocking oh I can get 6ghz on water yea how long does the cpu last its equivalent to a top fuel dragster being pinned to its max breaking point intel has pushed the silicone far enough that it's getting bad yields some chips are good alot of chips are degrading faster due to the whole chip getting intense heat that's the problem 1nm chips are gonna have problem we need to move to a better Base substrate graphene or carbon substrates
@@aminorityofoneAbout 30 seconds later, we're pushing our customers towards the 7950X platform. Which is exactly what I would be doing in their position.
Yeah -- this is somewhat surprising from intel. This is amateurishly like the late 2000s AMD graphic card performance. I bet its heat problems from bad caps or arrays or interdigitated caps. If its design then, that's really really going to be surprising. I've fixed my AMD processor from swapping out dry caps before.
I mean, Intel had to do a massive recall back in the day (1990’s?) when their CPU’s literally couldn’t do simple math instructions consistently. They didn’t exactly fail, they just never worked in the first place because of a critical design flaw.
@@nessotrin If you Google it you might find some information. This was back in the early 90's. The company I worked for had the issue with our SPARC system. SUN made customers sign an NDA if I recall. I don't recall how widespread it was. I did a search and only found limited mention of it from some old forum posts...
@@nessotrin I think its this, en.wikipedia.org/wiki/Pentium_FDIV_bug , Its floating point math would mess up a small amount. But that error could compound and cause problems depending the calculations. I recall having a professor say he had to redo part of his thesis due to this problem.
I am surprised people even buy intel. Even without this issue - you buy intel and 6 month later 50% of performance are gone to fix all the security issues.
I’ve had so many bizarre issues lately and I’m now wondering if they’ll disappear if I swap back in my 12900k instead of my 13900ks 🤔 A couple of them highly repeatable but one that others can’t reproduce
@@ThioJoe Just try it, for science! It wont be that much of a performance drop, and if you d hopefully get a more stable system out of it until this is resolved ;) Post your findings on the L1 Forum if you actually do it :D
@@zodwraith5745 Or they do know, but it is a hardware problem that they can't patch with a microcode update.. So they just try to play the long game until their next gen comes out.
@@kyu9649 I'm not saying that's not a possibility, but to automatically assume as much would be highly unfair. Intel is WAY too big a company to cover up the kind of conspiracy theory you're insinuating without a leak. They haven't done anything to try and cover it up and have acknowledged it publicly. They just haven't provided a solution which just circles back around to my previous comment. They have a hard enough time keeping a lid on leaks they DO want kept secret like future products. This would be way too juicy for a leaker with all the coverage it's had.
@@zodwraith5745they know, they're trying extremely hard to patch it via microcode so they can come out with a fix at the same time they acknowledge the failure. Welcome to the age of publicly traded companies, their shares will tank for sure, they're just trying to control how much they crash.
Given the fact that 13900K has been on the market since October 2022 and this issue hasn't been fixed, it must be a design flaw. It's shocking that i9 still die at W680 motherboards.
This isnt armchair science my man. This is absolutely real research. You’ve negotiated access to data not really available to the public and doing real research on the data instead of making educated guesses as to whats going on which is what armchair science is
"The system just gets miserably slow for up to a minute before an actual crash" ...uh oh. I don't get crashes in games, but I *do* I get this kind of behavior from time to time just at my windows desktop on the 13900K if I leave it running for a long period of time (like, more than 1 day). I'd start getting weird things like programs hanging in the background, Explorer stops responding to inputs, even CTRL+ALT+DEL doesn't work. It never actually has a hard crash, but it never recovers either. Only fix is a full restart, or hard reset at the power button. Guess I can expect this to get worse?...
The 14900K chips are degrading insanely fast. My 14900K on day one could run at stock frequency with a -0.075 voltage offset on the last 2 steps of the V/F curve, without any WHEA errors either at full load or idle. Every 2 months or so, I would randomly find a couple of WHEA errors which were instantly fixed by increasing the previously mentioned voltage offset by +0.05v. This week WHEA errors began showing up again. Now the CPU needs to run at stock voltage as is unable to handle any negative voltage offset whatsoever.
@@douglasmurphy3266 I'm using an IceMan Direct Die waterblock with an external MORA3 420 radiator. My 14900K max core temp at full load is 70C. Temps are not an issue.
@@genejones7902 There are specific WHEA error codes that are related to CPU errors. Whenever I get them every couple of months, increasing the voltage slightly is the only way to make them go away. So yeah, its a CPU issue.
Wow, it is great to hear people talking about this. We deployed some i9 machines at work, and after awhile 2 of them started crashing for no reason. The, usually really good, Dell diagnostic software could never catch it. Reformatted, replaced RAM, SSD, etc, no luck. I saw deep on a Reddit thread to disable turbo boost and it resolved the issue. I gave this information to Dell and they ended up replacing the CPU on these machines. I've been helping desking and admin-ing for nearly 10 years in various sectors, and saw less than 5 CPUs die in that time, then I had 2 back to back. Was mind blowing
I am so glad I never upgraded my 12900K, it still work like a beast and I play games around 100FPS plus minus 20, mostly so I don't really need ultra fast CPU. My Mobo is also 5th gen PCIe slot, so probably not gonna upgrade for a few more years, but when I do I will switch my 12900k and Mobo to AMD setup.
Honestly not surprising at all. RAD game tools are some of the most mature middleware in games (probably as close to bug free as you can realistically get without formally verifying your software). If their software is suddenly failing, they're going to want to know why.
And Oodle is deployed basically everywhere and very stable outside of this situation, otherwise all our games would be crashing constantly no matter the hardware.
I work in commercial IT, we have had 30+ PCs where the intel CPU essentially died, this has been affecting them all from 10th gen to current. Recent one was a month old 12th gen 1255U. Affects desktop and laptop CPUs. Larger companies that deal with more PCs have had a lot more of the same.
I work in Labtop repair and our main customer is a Major IT firm that probably everybody knows (used to be very innovative in Hardware). And we've got Lenovo T14 Gen 3 (12000-series Intel CPU) and T14 Gen 4 (13000-series Intel CPU) dying like flies right now. Sometimes the replacement system bosrd dies with 2-3 months also. Could be the undersized cooling, could be the horrid Lenovo Thermal "Paste" (only at first*) or something else. *We replace the shitty Lenovo stuff with something a lot better as part of our service, especially when Systemboards/Mainboards have to be exchanged. Edit: typos, damn phone!
@@alexanderzawydiwski9534 Its normal for some sata slots to not work, as well as ram not being detected in certain configurations. There is manual included with motherboard for a reason, some configs working well is a miracle (like 3 mismatched RAM sticks, sometimes it works, sometimes it doesnt, game of luck with ANY processor regardless of age, there is also good reason why mobo makers have a list of supported memory modules and configs for each one).
I am also troubled by the lack of clear messaging and solutions from Intel. We do need more transparency to better understand and manage these hitches.
I made a video about this a year back, a long with Tech Yes City. A lot of people called me crazy. Stability issues, crashing, latency... List goes on. I appreciate you talking about this.
@@someperson1829no the problem is Intel and goes well beyond overclocking when it’s seen on boards that actively avoid overclocking. What we where seeing is those systems failed sooner due to it but failures are starting to be seen across the board on systems that were well within safe limits.
@@someperson1829 standard specs for these chips seem to be already quite far off the good envelope and inside a territory of power consumption, stability and longevity that would normally not be accepted from a chip at factory specs.
I bought my 13900k in October 2022. Many 13900k and 14900k owners didn't experience any problems until this year. I find it very odd that so many of us experienced issues pretty much at the same time simultaneously. Most would say the power profile caused the wear and tear but practically all at the same time? Different motherboards, different profiles that are aggressive and non aggressive? Different levels of use? Many working PC gamers aren't on their computers for long periods. Really strange.
I like that they want to keep places honest. I just want a standard. Don't favor giving one company a chance to explain and make steps to fixing things then not go back to other companies that did the same. If said companies didn't change let us know too.
The sadest part about this is that I have been telling users it's their CPU ever since it started poping up en mass with UE5 game releases, so over a year now. Hundreds of cases accross multitude of games even outside UE5. And even if I link them the articles now that mainstream media picked up the story they will not even bother reading it and simply reply "but it's only in THIS game".
This is slightly terrifying. A problem that impacts perhaps 0.1% of enthusiast customers is a poor experience, but I can appreciate how difficult it would be for Intel to root-cause the issue. If it's 10%, that speaks to a product design issue which *should* have been caught much earlier. Intel's own internal testing should have caught this and resulted in mitigation before customers noticed, and preferably before the chips shipped.
Thing is, if its getting worse over time, this smells like hardware defect in fabrication or material failure, not design flaw. I would not be surprised if we find out this down the line.
@@ponocni1 But then you should see the same issue cropping up with i7s and i5s, which we don't. What's so different with the i9s? If it's that 8 e core cluster then why doesn't disabling e cores help?
@@zodwraith5745 i9s are binned for higher max boost clocks and more power, so maybe whatever criteria that requires also makes them more vulnerable to this problem if it's an issue with them having pushed their manufacturing process to the breaking point.
@@zodwraith5745 It is happening with the 14700k. A number of people have reported it and I've seen it with mine. Bios updates helped some and I have run it with Asus MCE disabled the whole time. I've got a custom loop so it's not a cooler issue. (420mm + 120mm + 280mm rads too)
@@LucasHolt But what is the problem specifically? I've heard it seems to occur mostly with specific games. I haven't had a hiccup from my 14700k but I'm on MSI. Keep in mind most of these issues were on ASUS boards, a few with Gigabyte, and I've never heard of an MSI or Asrock occurrence. Although I think the narrative changed fast because Intel is such a juicier target. But what Wendell is talking about in this video isn't Z790, he's talking about W680. *_ASUS_* W680. A SERVER platform that uses consumer CPUs. That means 24/7 full tilt all cores running. That's why the error rates are so much higher. I haven't had an issue but I wouldn't be shocked to see my system throw up an error every few hours under _that_ kind of abuse.
Based off my experience overclocking these things......I'm pretty damn confident in saying the problem lies in either the ring bus or the system agent is just at it's damn limits. I would more than be willing to bet on the VAST majority of the true raptor lake dies that are failing are all going to be larger 16e core dies. It never sat right with me how wildly the power consumption on these chips was affected by the ringbus and how minor bumps in speed took absolutely shocking amounts of power and tons of system agent voltage fiddling. In retrospect, this explanation falls in line with how cagey intel has been about the SA voltage in the non-z chipsets. Basically we're seeing the limits of one of these two things and that it's so close to the limit that natural degradation of the silicon will cause faults.
The craziest example I can remember is an alder lake cpu that would do 5ghz ring.....at about 400watts. Drop it down to 4.8?...300 watts. Drop it to 4.6...250...and 4.4 only took 220-230ish. E cores were all disabled and the p core settings were the same on all tests...the only thing that changed was the ringbus/cache.
Seconded. I high suspect uncore interconnect degradation being a proximate root cause. Linpack won't catch it failing until things catostrophically fail later down the line. Harder to validate. "Just shove more voltage into it and it'll be fine. How long do you expect users to run these chips? 15 years? They'll be obsolete in 15 years." Try less than 1 year past intel execs. I run my 13900k with turbo disabled so basically sitting pretty at 0.8v vcore about 95% of the time. Bought it second hand for $300 about 5 months after launch due to "issues hitting ddr5-8000" pre-dellided with a custom copper ihs. Person I bought it from got a second chip direct die cooled it and from what I can tell that new chip worked just fine. Likely doesn't anymore. Lol. Probably swapped it out for a 14900ks. My chip still shows signs of degradation. SA was at 1.3V and vdd at 1.35V for 96gb dual rank ddr5-6800 for over a year. Now experiencing random latency spikes in the past 6 weeks. Even reinstalled windows as it's been over year but no change so it's not software related. Memory test passes at 32 hours as I don't have time to test for longer. Here's to hoping whatever releases this year is better than the hot garbage intel released or I'm switching back to AMD. Intel cpus failing were unheard of before really 11th gen. Started going downhill quick. 11th gen also had SA/IMC issues compared to 10th gen. It's telling intel is being hush hush about this as it's a huge blow to their QC reputation.
Could it be that this is due to the AVX offset was not triggered probably in light tasks such as gaming? I mean we will never see this in all these stability test then, because the load is heavy enough in these benchmarks for AVX offset to kick in
@@quanlethienminh6002 It's not likely to be an avx offset issue. The way to understand what i've pointed out here (to the best i can oversimplify it)..is the ring bus is like a road/traffic network between all the cpu parts while the system agent is the traffic cop/monitor/system. The pcie controller and memory controller all sit on the ring bus. In the scenario where the SSD corrupts....the data can't make it from the cpu core down to the pcie controller. Add this explanation to what wendell is saying and it should make things much clearer as to wtf exactly is going on. Disabling/slowing down stuff doesn't change the fact underlying infrastructure is just pushed to it's limits the way it is.
Iirc, the 1700 socketed Xeons... have avx512 by virtue of having no e cores......if the xeons are ecoreless dies and no failures exist..we might having something of a smoking gun. I havent touched a 1700 xeon...and havent seen a delidded die shot
I bought a 12900k 2 months before the 13900k. I've been having the same issues with the 12900k since about 5 months after using it. I believe the TDP on these motherboards have degraded the chips a bunch. I also have beefy water cooling that keeps the temps around 80 at full load. - The bios probably called more power because of the cooling efficiency on mine and burnt the chip up.
Thanks for this breakdown of your hard work. It reinforces my decision to change from Intel to AMD for my most recent build. I migrated from a 10900k to a 7800x3d, and so far I am very satisfied.
My first two computers used Intel. And once I saw AMD really supporting their sockets for longer, I decided to make the jump and couldn't be happier. More powerful for what I need and less power hungry than Intel.
This is unfortunately not uncommon when large companies deals with a big design/quality issue. When you have a reputation for quality... the last thing you want to do is openly admit you made bad things (and this also can open you up for lawsuits- like these server support companies looking to recoup all the extra service costs, or the company that lost 100k in their player base due to instability). Say vague things to sound like you care while claiming some type of user error. See how Toyota handled the 3.0 V6 oil sludge issues (but that was more a Toyota cultural thing), and how Apple handles EvErYtHiNg. Intel has been playing catchup to AMD Ryzen for years now.. after making fun of the chiplets being "glued together". All they really have left is their reliable reputation... so they are intentionally keeping tight lipped and hoping the media coverage of the issue just "goes away" after they release their next CPU generation - one that actually will be new and not just the old ones with more voltage.
@@damara2268 I bet Intel is really glad CrowdStrike pushed out a global Windows crash update filee. That gives the media something ay meatier to talk about versus Intel making bad CPUs that degrade quickly even when not OC'd.
This is an incredibly wonderful video/investigation. I'm living this myself right now. Actually went through the RMA process with Intel, and they specifically said they don't have ANY replacement processors at this point - all out of stock. They are actually giving me a cash refund instead. There is definitely some major drama going on behind the scenes over there and I really am appreciative of this video starting to uncover it. Hopefully Intel will be forced to disclose some info soon.
Thank you. It's always interesting to hear about modern trends in data centers. I don't work in gaming, so we don't use high end consumer CPU's, but this is worrisome because it shows that the latest processors from intel are truly the bleeding edge.
Maybe you should do something similar to what Steve/GN do sometimes and set up some form of communication so devs/data center admins can share data to hopefully get closer to a real answer? Heck, maybe even get physical examinations of problem CPUs if possible
I'm seeing a lot of people confused. It's not just i9 chips! The 14700k is also a POS. I've seen others with issues as well. People talk about the 14900k more but it's still a problem for the rest of us with 14700k chips too! I have not tried to overclock mine. it's running with DDR5 5600 48GB corsair modules. Custom loop with 420mm + 120mm + 280mm radiator. 6900XT GPU. It crashes in two different operating systems! I dual boot Windows and MidnightBSD. In the latter, LLVM clang will start crashing during long compiles (10+ min). On windows, I mostly game and that is also unstable. I made all the recommended changes that intel proposed. I turned off asus mce on the first day of ownership last november. Worst CPU I've ever owned. It's slower than the 3950x i had before for compiling and with all the nerfs to make it stable, it's starting to lose the gaming advantage. Huge waste of money.
@@ChrisL-d4c 25% failure rate means 75% with no issue. There's no guarantee you're going to keep being this lucky in the future though. For many the problem starts small and progresses.
you don't have to "try" overclocking it, turbo boost is usually turned on by default so it automatically overclocks for you. If you want to avoid overclocking you'd have to turn off turbo boost.
Love that KDE Plasma wallpaper on the screen in the background :) And I am so glad that i went with Ryzen this time. I missed the AM4 era but I sure hope AM5 lasts as long as AM4 did.
Great video, Wendell. I had to swap a 14900K in my desktop that had these problems. Two weeks later the news about the issues started popping up. So far so good with the replacement running at Intel "Extreme" spec.
I love how people ignore the goings on with the 12900K & KF's which are suffering the same issues be it not as bad (lagg/stalling but not hard crashing)
I chose to get a drive with on board cache instead of host platform memory specifically so I don't have to be nearly as concerned about drive corrupting internal mapping information just because i got a little spicy with cpu overclock.
I have an 13600K with multiple NVMe SSD drives in Linux and haven't had any issues at all in the 18 months I've had my system. I don't overclock and my DDDR4 RAM sticks don't like XMP so run at stock speeds, so it's quite a vanilla setup (ASUS motherboard with defaults for all the CPU settings).
Most of the comments are saying that there may be design issues but it got me thinking that maybe its a test, qualification, and reliability process issue. Those 3 are internal groups in all semiconductor companies that makes sure parts are good when they go to the customer. Now it got me thinking did Intel nerfed those groups to resolve the yield issue they were having several years ago?
After watching the video and thinking about your findings, the thing that stands out to me is your observation of the tick rate on the game servers falling by 50% before the crashing. This tells me the CPU pipelines are being invalidated, which is what's impacting performance so badly. Whatever is happening to cause these pipeline invalidations is the real culprit, and I have a suspicion about what it is. I believe the CPUs have new microcode to detect attempted side-channel attacks, and when that is triggered the CPU intentionally invalidates the pipelines on all cores and additionally flushes the L1 cache. This would immediately flush any security keys from the CPU causing the side-channel attack to whiff on stealing something valuable. Next, the L1 cache needs to be reconstituted, including reloading keys. With NVMe drives, you can realistically expect 12+ CPU cores requesting the same key file simultaneously would work fine, but I suspect the IO errors being encountered are OS-level locks causing the cache reconstitution to fail, thus crashing the process, and possibly even the kernel. Another cause could be actual side-channel attacks, and the CPU trying to protect itself. If the defense strategy chosen by Intel is to crash the CPU rather than allow compromising data then they would not want to publicize that because now you have an instant mechanism to launch a DoS attack against a service or service provider. For end users, I think that certain users are affected more frequently because their system has been compromised and is under frequent side-channel attacks. So this begs the question, which is worse? Do we have less active side-channel protection, probably increasing risk of keys being stolen, or do we have active protections that ultimately choose to unalive the CPU rather than allow keys to be leaked? I would see if you can dig up any data on the occurrence of side-channel attacks on these CPUs compared to the older 12900k systems. Final thought here, the reason that slowing down the memory improves stability is that at the lower clock speeds, the mechanism in the CPU that monitors for side-channel attacks is able to better analyze the flow of data coming into the CPU and makes less frequent false positives that trigger the failsafe mechanism.
I’m really curious if this will ultimately lead to a recall or a class action lawsuit. Also really curious if AMD sales reps are bringing the instability issues up when they’re talking with data center clients. The fact that this is happening with workstation boards that aren’t overclocked and are much more conservative with power management makes me think this is a flaw in the fundamental design of the chip. But if so, one would think similar problems would be appearing in Xeon chips too since they share core designs, right? Very puzzling…
Mate, after the way Intel behaved in the early noughties against AMD, I sure hope AMD are bringing this up. 1. Prospects need to know that after months, Intel hasn't root caused the issue or as Wendell said, clearly admitted that effected clients will be made whole. 2. It's glorious when Karma scores against arseholes like Intel. As an aside, I think the game companies who collect telemetry have an opportunity to leverage that in order to help get to the root cause. If it's not sufficient (as Wendell suggests), this could help the Data Scientists to refine the collection to improve this.
xeons have a whole different design philosophy behind them, and bring intel the bulk of it's money. they wouldn't sabotage their income for a headline. remember, intel laughed and derided amd's chiplets. then amd slaughtered them, intel panicked and tried to copy them. pretty sure intel thought they could get away with blaming customers(you're not a user, you bought something) just like nvidia did.
I had been an Intel user all my life up to my last one the Core i7 5930k It was perfect until it gave up on me after 7 good years of constant OC. I took the plunge and got AMD Ryzen 7800X3D for my new gamin rig instead of 14900k and now... I realized that I just narrowly avoided the path of suffering. I hope Intel gets back in the game soon. AMD CPU is so good now or even better in many aspects. My dad (R.I.P) who was a die hard fan of Intel would never believed AMD would be able to pull this.
I've had an innumerable number of issues with my 12700k (Same chip) across 3 different motherboards. I've been waiting for the 14700k to drop in price because of the insane amount of game crashes and bluescreens I have seen. Thank you for the fantastic deep dive into this. Seriously great work.
Try drop the XMP and run stock. I had same issue until I just used default ram speed and been stable for nearly 2 years. The cpu's are garbage for getting higher speed ram working and usually means more voltage which is another reason all these issues are happening. I wouldn't bother going for 14700k, May be even worse looking at all the problems with degradation.
Weird, my 12700k has been rock solid ever since I bought it when it was released and currently running 6400 MHz ram. I'm AI engineer and there were times I left computer processing for weeks and never had any issues with it.
mine is also fine, however I noticed a problem in a single core if I overclock. solution was to just overclock the other cores. but mine is direct die so the temp is like 40c under load.
My 13700k / 4090 build has been pretty much flawless for almost 2 years now. Even started out of a ddr4 board for about 5 months until msi released the z790 tomahawk ddr5 board. But 13700k oc to 5.6ghz on all 8 p cores, All 8 E core oc to 4.4ghz, Ring oc to 5.0ghz at 1.34v. Even the mem controller has been very solid. 32g.b Corsair rgb vengeance 6400 cl32 hynix A-Die oc to 7200 34-41-41-83 at 1.45v. On a msi z790 tomahawk. I can't complain. I've also been using a contact frame since day one for both boards I've had this 13700k in. I've heard a lot of the problems could be related to the bend they get with the stock mounting bracket. But until intel releases a statement / fix we really don't know what the issue is. I've used mine hard for over 8 hours a day every single day for work and gaming for almost 2 years now and it runs perfectly fine. I haven't babied it and I won't. I bought a k series for a reason like all the k series intel cpu's I've had.
If the CPU performance halves before crashing maybe it is the IMC and memory transfers stall causing bandwidth starvation. This would explain why most pure cpu metrics continue to look fine. I'm curious if the issue reported is exclusive to ddr5. Megekko (a big Dutch retailer) has an informal policy to push ddr5 dual dimm over quad dimm for stability reasons but they didn't specify this would be for Intel only.
From my extensive overclocking on my 13600k I've had this issue as well. What I've found from my testing is that it seems related to heat/power spikes. What I've noticed is that when at the edge of the thermal envelope +80c, that there could be times where it spikes over 100c but not record it because of how fast it can spike before the temp is reported. But ultimately, the instability comes down to voltage fluctuations and temp fluctuations. When you lock the voltage and keep the temps below 80c on max prime95 loads you'll have a pretty stable system.
@@EJM07 I am currently in my "summer clocks" Which is standard boost clocks on both p and e cores, but massively undervolted. By default my chip wants to run at 1.37v to achieve stock boost clocks. However, I can achieve this with 1.16v on load and 1.08v on idle. Yes, I have my load line calibration set where it adds slightly bit more voltage when under heavy loads so it doesn't have vdroop. But with this settings I always have a temp 6-7c above ambient on idle and temps not exceeding 75c on full load. This is even with ambient temps of up to 38c. Yes, ambient temps of 38c, that is not a typo, and yes it sucks badly when it's that hot in here. Specially with 70+% humidity.
@@chieftron i feel ya. high temp + high humidity is my pain also, it just hits differently. intel's solution to faster cpus seems to be just more voltage on the cores lately. i've seen some crazy voltages in boosting cores on some 12th gen. always thought it was a misreport from the tool, but now? not so sure anymore.
Needed a new virtualization host for @ home, waited until the company i work for rolled out massive amount of new Intel machines. Listened to what was being told on the work floor about the new hardware over some months. Did buy AMD 7000 series because heard nothing but problems on Intel at work. Even when it did not crash then everyone was complaining about the big little architecture, which yes you can disable or do core affinity. But be serious who wants to do stupid settings like that when the competition just has something that does not need that kind of tinkering at all. On AMD i was very careful as well to buy the right combination of hardware. Special for MB and memory combination had to be a verified set. In the end i did buy a B650M AORUS ELITE AX ICE and CMK192GX5M4B5200C38 and for storage i plunked the board full of these: Lexar NM790 4TB. I can say i am very happy with my home server.
Is there some kind of tech "law" to this? Every time chip Company A has hardware outperforming chip Company B.... A becomes more power efficient as they stagnate towards barely pushing their silicon. B becomes a power hog with frightening stability problems as they aggressively push their silicon for all the performance they can get. Nvidia became B during the Fermi 'n 5000 series days. AMD was B during the Bulldozer days. And now Intel is B.
You do know AMD has needed 3 bios releases with every release of their next gen CPU. 7800x3d still has 10 minute boot times, that the only answer for is to not shut the PC down.
my bulldozer cpus, had 2 of them, never crashed on me. never heard of them crashing either. people complained they weren't fast enough or whatever, never heard a peep about crashes.
@@jmwintenn Yeah, these old things are very solid. Not very fast, hungry for watts, and not even valued that much used, but these will outlive a lot of these newer processors.
Genuinely one of the most interesting videos I've seen in a while. Really love your clear, specific and entertaining communication style. I love how you're very clear about what you do and don't know.
At this stage in the game if I owned a 13th or 14th gen Intel cpu that showed any signs of crashing or performance degradation I’d be sending it back to Intel on the first thing smoking wether it was under warranty or not. Intel needs to admit there is a problem and recall the vast majority of these chips (after correcting the problem, which I’m sure they probably have with newer built chips). They are basically going to pull a Nintendo and never admit there is a problem (think joy con drift). As time marches on I feel for people who’s chips are either going to fail out of warranty, or won’t perform as well as it did new because of degradation or bios updates that will gimp performance.
they talked me into the 14700F and I am glad they did. I did notice crashes while using nvidia 4060 and a 4070, so I slapped a 7900 GRE card in my system, and it just works now. Be careful when using nvidia too. Everyone is quick to blame intel, when there are other things at play here. Yes I know the data centers are facing similar issues, but even then they use nvidia gpus.
@@cgwworldministries83they don't use the same GPUs, the data center ones have a better phase margin and just overall higher reliability construction traits.
As an overclocker, I've noticed the following in my personal system. It's a Z690 Asus M-ITX with a 13700k. 1. Tuning the PLL Core, PLL SA, PLL Cache, and PLL IMC voltages to around ~0.99 help significantly with stability. Other overclockers have noticed that there seems to be a PLL issue on these chips, so it's not just me. Stability is randomly lost over time when running DDR5 at high speeds, this has been attributed to a phase lock loop issue. 2. My motherboard personally comes with a very droopy LLC. To the tune of 0.200 mV (1.50 set, 1.30 get). Not only does it droop to an extreme degree, but it's also horrible at catching transient loads. I have to increase my loadline levels by 2 (from 3 to 5) in order to achieve stability at any sensible voltage. At stock, I would need 1.49v > 1.29v droop during AVX2, and even then, it's not always stable. When the LLC is increased to level 5, I can settle for 1.36v > 1.25v droop (-110mV). Even though it's a lower voltage under load, it's far more stable at catching transient loads than LLC 3 at a higher average load voltage. 3. Increasing your memory speed beyond ~6000 MT/s on DDR5 invites a massive stability penalty. Whereas I can run a very low load voltage of about 1.18v @ DDR5 6000 (CL28 100,000 tREFI), setting that up to DDR5 7200 (CL32 65,000 tREFI) necessitates an increase of 0.08 mV. So 1.26v core voltage. 4. Increasing the e-core frequency induces a penalty to power draw, and subsequently your minimum voltage necessitated to achieve stability. (Example: 4.2 GHz ecores draws around 75 watts, depending on your voltage. Set that to 3.2 GHz, it drops down to 50 watts). When you add all these things together, alongside power lost via inadequate cooler contact, it doesn't shock me that these CPU's are unstable out of the box.
Coming back to more comments and the 1st point matches what I've experienced exactly. Apparent memory corruption with very long power on times ~1 month or longer. 32 hour memory test passes for me no issues. I've read posts where people ran memory tests and like clock work they racked up errors as the weeks went on. A reboot and everything is perfect again. AMD system never experienced that. Not sure if dual rank 96gb ddr5-6800 counts as high clocks but that's what I managed to max out on my 4 dimm board. Will give tuning those voltages a try. Will come back in a few months to report back.
Is there a way for you to check whether the crashes are correlated with AVX work loads? Intel has always had problem with their AVX implementation since Skylake, having to do all these AVX clock offsets. Perhaps, in these games the load is so light that the AVX offset was not triggered even though AVX instructions are being used, leading to some local electromigration degradation around these die areas.
@@quanlethienminh6002 To be clear, AVX support is not guaranteed I'm 14900(K/KF/etc). I've seen people complaining about it on Intel forums. They RMA a 14900K with AVX and get a replacement without any AVX support at all.
Intel PR posted an update to... reddit. So check that out, update your bios, and look for another update to your bios coming mid august 2024!
www.reddit.com/r/intel/comments/1e9mf04/intel_core_13th14th_gen_desktop_processors/
tho they seem to have deleted some of the better questions about whats covered under rma, are folks that have delidded covered (guessing: lolnope), etc.
@@Level1Techs they posted the update to the Intel website. But yes it’s clear most of you guys get your info from Reddit…
Intel left out the oxidation issue on the website, which is real, interestingly
@@Level1Techs Sounds to me like they're trying to cover their asses. They're focusing on the voltage issue because that's something they can fix.
The oxidation problem is a manufacturing defect, and that potentially means the dreaded R word - RECALL.
does this also apply to i7 14700 hx laptops ?
@@Level1Techs sh8 i just ordered a laptop with a 19400hx on the acer helios 18...am i screwed? considering i live in the phil and my sister bought it for me there in the u.s...yikes
Actual investigative journalism instead of talking about cpu's for 20 minutes just like 99% of other tech channels. Quality content.
Heck yeah
True, I hope this gets more appreciated and we get more
Could be worse, if I had a nickel for every time someone just had to add some sort of awful music to a video I would be rich!!
Same avatar! :D
ummm i saw literally ZERO sick memes here soooo... kinda suspect reporting bro?
but fireship just pontificated on the issue and showed speculative-non-primary-source concern AND SICK MEMES, so therefore, issue confirmed
Quick we need to throw Wendell in witness protection before the Intel hitman finds him!!!
good thing he isn't looking into boeing planes right now
Dr Su needs to send some AMD Bodyguards XD
At least he isn't talking bad about Boeing.
He probably will have BSOD long before he reaches Wendell
Intel is not Boeing, yet.
"AMD Is In the Rear-View Mirror"
- Pat Gelsinger, Intel CEO
"Intel CEO’s Comments of AMD ‘in the Rearview Mirror’ May Indicate Intel is Driving Backwards"
- Areej Syed, Hardwaretimes
😂 lmao
Good one! Thanks for sharing.
I saw a funny meme cartoon of Pat driving towards a cliff and Lisa driving away from the cliff. Lisa's car is of course in the rearview mirror from Pat's perspective.
v-cliff
more like flipped over on the road :D
lool
Intel saw Boeing's crash rate and said *_CHALLENGE ACCEPTED!_*
hahaha
Hold my beer....
Ba DUM!
As a former aviation maintenance tech and lifelong computer nerd, I approve this message.
I won't fly commercial and I won't buy intel. Both Boeing and intel are a complete disaster.
Thanks cold LMAO
-Talks about a real problem
-Explains what he found in true objectivism
-Admits data is not big enough
-Properly digs through the problem
What a gigachad
Terrachad
yea first video of saw of him and he's smart, sympathic and accurate. should be new president
Yeah, noting limitations in data lends lots of credibility and we can still probably draw some conclusions as he did about it. Mega based.
Yottachad
I work at a gaming marketing agency in the IT department, and I have been absolutely losing my mind over these issues. I have built ten systems over the last six months, and had three of them experience various failures, from One of the more consistent problems I've seen has been system corruptions, where I'm unable to even run simple Windows updates, SFC, DISM -- you name it. Recovery media fails, fresh installs on new drives fail, and the crashes on games have gotten utterly out of control. Thank you for this, it's definitely incredibly helpful for tracking our OWN sanity.
Takes five minutes to downclock.
@@chrisx742 Appreciate the thought, and while yes, it does, and we did -- used the Intel XTU, didn't go above 52x, dialed in Intel specified limits all across the board, and had little to no consistent improvement, but definite performance loss. Tried the Intel Processor Diagnostic Tool and failed consistently across all settings. RMA'd one 14900K already, popped in a spare 12900K while waiting for the replacement with none of these issues with all the rest of the same hardware.
The point ultimately is that we shouldn't need to do this, and it's infuriating to have to fight with multi-billion dollar companies to get what we're paying for, and have it actually work consistently.
@@chrisx742 the fact that you would have to do that is hilarious.
What motherboards used? And ram?
@@chrisx742The fact people are so lazy these days is proof of how incompetent most computer users these days really are.
Intel forgot how to make fast cpus, then they forgot how to make efficient cpus, and now they appear to be forgetting how to make properly operational cpus... Not a good character arc...
heh
character "Arc"
@@countvonthizzle9623 nice flamebait, if only AMD's CEO wasn't a woman, and her tenure wasn't wildly successful...
@@DarioCastellarin women arent diversity
oh boy i hope this aged well
Typical too quickly developed too little tested, lack of QA because the new lineup has to be released and delivered to keep up with the competition and not upset the shareholders behaviour. I wish we had fewer stock corporations again. A world in which profitable companies are devalued because they don't constantly increase their profits is completely f'd up.
I mad respect the person who kept playing regardless if they were crashing every 2 hours,I feel then on a spiritual level
This was me playing Remnant 2. I would play about 4-6 hours a day and would experience at least 2 or 3 crashes straight to desktop.
That was me lol. I got my replacement CPU 2 days ago
My old ass laptop crashes every 30 mins while playing Roblox :D
Player: I've got you now Pirate King!
PC: (Crashes)
Player: Ahhhh you thought a mere catastraphic failure could save you?! Well prepare to doom your DOOM!
LMAO
Literally my life for months.
You should publicly mention that you are not suicidal in case Intel approaches Boeing for advice and business cards.
Just want to point out this is the kind of investigative work traditional tech journalism doesn't do anymore. Thank you so much for this work!
Steve from Gamers Nexus: "Excuse me?"
@@noxious89123 Are you saying GN is traditional tech journalism?
Frame Chasers 7/14/24 - problem described and solved
Feels like Intel wants to drag out the problem till next gen launches and play the Spider-Man meme till then
Intel is hanging on for dear life trying to compete with AMD. They rather take big risks that can backfire on their customers, then try to silence it, than being honest about the limits of their CPUs and losing their high spot in the benchmark charts.
That feels way too real...
Exactly, there is no fix for what is plain old Silicon degradation caused by too high voltages, clocks, temps and Amps pulled trough the silicon long-term. There is no software fix, there is no bios, there is no microcode that will revert this once its already degraded. The only fix is "do not exceed 1,3v and clock them 500-600Mhz lower". And also, do not clock your CPUs so high that they have less than a 5% headroom left in them, so even the smallest degradation throws then to the "unstable on stock settings" category. Intel is just making smoke hoping the attention will go away so they do not have to do the thing they should - full scale recall.
@@seeibe This. Consumers are used to seeing i7 at the top of the chart and Intel will break bones to keep it there, because they desperately need to keep it's brand power.
I just had to leave a bad review for a "i7 Desktop" selling in my country's Amazon with no GPU for $900(!), because I noticed that it had a HD4500 and realised nobody else could work out that meant it was an 11 year old PC. They just see "i7" and buy.
It's those idiots that Intel made the 14900 for. So Google still says "i7 is fastest" and the plebs carry on buying it.
This is me when I was 14, I saw i7 and chose the crappy HP Pavilion HPE H8-1103NL with the i7 2600k haha
When server providers are starting to actively recommend AMD systems after YEARS of Intel being the "safe bet", this is the point where Intel should start sweating.
Because mind-share is a huge factor in what gets bought in large volumes.
I agree with most of your statements as I work for a small SI. I stated on initial videos a few months ago when the fingers were pointed at motherboard makers that the problem could be with Intel processors. We mostly provide 14900k to businesses on premium and entry-level B760 boards. So right off the bat, no O.C. Next, we disable motherboard enhancement and limit power to Intel stock. And some board are weak and can only do intel stock power. Failures have happened in all these scenarios. These systems can run cinebench all day long or occt stability test or aida64. But try to install the nvidia driver(compression/decompression). The installer fails. Another way to test this is tekken 8 demo game. No matter what bios we choose, those processors are gone. Replaced. There is no coming back from this. Has happened with ddr4 and ddr5 systems.
Exact same issues here. Waiting for my RMA CPU to arrive as I type this.
@explosivemonkeys I hope you get yours soon. We are not in the U.S. so our rma is a couple of weeks with Intel. But once the system is diagnosed with this problem at the service centre, we replace it with a new processor and return the system back to the customer. So, for a customer, it is basically 2 to 3 days down time. I am also hearing stories of Intel denying rma for these cases. That would make this worse.
The NVidia installer was a big thing for me too. Would crash on install about 95% of the time. At the start I figured it was a corrupt Windows install or maybe a failing HD. Almost dropped a lot of extra money to start replacing components before I started to see all the other issues out there, which thankfully saved me.
@@explosivemonkeys my i9 13900K is unstable if i set more than intel's baseline specs, Do RMA covers me?
Just so it's clear - did replacing the CPU actually fix the issue? Was it properly verified, taking a system exhibiting the problem consistently, replacing only CPU and then testing again with the same exact test and no problem found? Asking because I was around in the past for the FDIV bug in P60/P66 by Intel, where you could actually even get it to calculate wrong in Excel using a specific formula to prove the bug. Initially Intel said there is no issue, then they made RMA's for all who complained trying to shut them up - or at least it looked like that is what went on, then finally they had to stop all the BS - and rework the CPU and announce that people who had affected chips could register to get it replaced with the new fixed chips shipping out as they were coming off the production line. I had one of the FDIV bug P60's for years as a memento of that cluckerf*ck but I traded it eventually for a DEC VAX 11/750 in 1997-8 something like that. Shoulda kept it.
Point being with the FDIV bug you could not replace your way out of it until Intel made fixed chips, 100% of the P60/P66 initial production were affected. You could kernel patch your way out of it, but it was a software workaround for a hardware problem.
15:05
"We already replaced a lot of customer's 13900k with 14900k and the issues don't seem fully resolved."
This statement is extremely telling.
14:56 "$1,000 extra" for support is insane and really tells the whole story here. The part is not reliable. I am sure a lot of conversations are happening behind the scenes here, but based on what I've seen, Intel has not committed to fixing the issue. We can hope that it is due to incompetence but more likely they do not want to admit fault here due to the cost of "making it right"
Thanks for sharing this Wendell, it will be interesting to see other news around this topic in the coming weeks.
oh yes, instead of issuing a recall, lets damage even more the brand name
It's like how insurance companies are refusing to insure houses in certain states against extreme weather. When even the Capitalists charge more or refuse to cover something, that's a sign that there's a real problem there.
@@monad_tcp The issue is that Intel itself doesn't really know what to fix and currently the most reliable way to make it run stable enough is to lower the multiplier like to 53 and running the RAM slower. If that is what it takes then the CPU wouldn't run as specified and also would invalidate Intel marketing around the CPU, thus even if Intel want to do a simple recall and giving the costumer the fixed product, it can't be done. Assuming actually running it slower does fix the problem, Intel probably need to replace the product, force it to run the CPU slower, and give some monetary compensation.
I think Intel owes every 13900k and 14900k owner a new 15th gen CPU since this is a defective chip.
I don't think it's about being committed to solve the issue, as it is that they have probably realized there is no fix, and they can't afford to recall two whole generations of product. That would kill the company. Trying to mitigate it through microcode and power limits is probably the best they can do until getting a 15th gen out, and it might even be too late to fix the issues for that one, so they're really just holding on for their life for a 16th gen.
The number of Intel fanboys trying to trash Wendell is hilarious.
Never stop calling out anyone doing shady business stuff!
I don't think he cares. He/dr ian. said in lv1 & tech2 potato podcats.
The fact its happening in game servers is pretty crazy and an entire 1000 dollars extra for support.
That's capitalism for you
@OllieHamon so? A company should charge more to work w products w more problems. I do....up to 10x more of my usual150 per hour rate.
the extra 1000 for supporting a cpu that is less than 1000 implies a very large support time sink and replacement cost ...
@@rezenclowd3 That's a really dumb way to do business, people talk to each other. It's like if I started pumping out car parts, let's say a water pump that kept failing prematurely and blamed everyone but themselves and tried charging for repairs but knew it was self inflicted the whole time... Oh wait, my Golf R had that issue. Oh, there was a class action lawsuit? Nice.
@@Dendodorion
I mean it does make sense if you aren't the company producing the faulty part in the first place.
Charging more because something is worse to work on seems fair to me.
Intel PR department: Lets talk about Lunar Lake instead. What a great CPU that is.
Lol that's the Intel way
Also Intel: Muh AI
The mighty days of Intel supremacy have long been over. Now they are trying not to be mediocre.
25% is a ridiculous failure rate for the high volume OEMs. Wonder if intel is getting back the failed chips to do some failure analysis on them with test boards and thermal imaging or something.
So far they're taking the failed cpu's and sweeping them under a very big rug.
@@pf100andahalf maybe that rug will come up in 40 years on an asmr “intel rug cleaning” video 😂
Sounds like it's a dynamic clocking bug getting stuck into a strange mode, causing crashes.
My guess is it's simply that power/frequency limits are pushed beyond what the platform can reasonably support, which means this was an intentional choice intel made. It is a known workaround that underclocking and reducing power limits increases stability.... at first. But once it starts happening, it seems that the damage is done, and it's only going to keep getting worse.
@@rich1051414 I think you're right.
I clicked because Wendell was on the left
He's on the right for me
hes in the middle for me
He's upside down to me
He is on top of me
I don't see him.
The absence of AMD data is also data
kind of... a lot of statistics don't like zeroes e.g. MTBF
@@greebjthe lack of failure data points to AMD chips not failing in the same ways, in numbers sufficient to record.
@@greebj but we don't have a zero. We have an insignificantly small number: iirc 4 out of ~1500 documented failures, or 0,25%, when 30% of the population use AMD, is a clear indicator that this is an intel-specific issue.
he already mentioned survivorship bias, look it up if you missed it for not being aware of it
My 13900k was so unstable after about 1.5 years that I couldn't copy data off an SSD without getting a BSOD
Did you try a live Linux USB stick? Just to rule out Windows is the culprit.
lmao that's an amazing product
It's cooked 💀💀
Almost 2 years with 13900KS , since day one.. power unlocked.. 300 plus watts.. no issue whatsoever.. your point is ?
@@Need4FPS "I had no problems, therefore the problems don't exist!"
This guys an idiot.
Right smack in the middle of the Raptor Lake 8+16 die is the ring interconnect fabric, with a ring agent interconnect per P-core and an agent per E-core block. Given you’re seeing a mix of IO related errors, including the out of VRAM errors (which is likely triggered by a PCIe operation failure), my gut says this is the system agent logic in a bad-state, specifically due to failing ring signaling at one or more ring agents or in the system agent itself, blocking and going through some form of error / bad-state recovery. My guess would be that this is due to electromigration degradation of the ring interconnect logic due to the very large Raptor Lake 900MHz top end ring clock bump by Intel, versus Alder Lake. This is why we’re seeing these issues in 13th and 14th gen, but not 12th gen.
What’s concerning is that if this is where the problem is occurring, than Intel appears to have not implemented internal telemetry, or not exposed it via CPU or other system driver, to drive WHEA reporting around ring interconnect and the system agent. Given the complex nature of the SOC, with its asymmetric architecture, many clock domains and sophisticated “system agent” IO block, that would be bad if true.
If the ring causes those issues, I wonder if raising ring voltage could solve those issues or not.
For example, I had 2 i9 13th gen, one is 13900K that I sold for the current i9-13900KS, both runs stable at 253W, 400A IccMax with BIOS default settings (I set my power limit lower than 253W initially due to having a 240mm AIO). I did turn off MCE in BIOS and set my custom load line to undervolt the chip. I used the previous chip for about a year with no noticeable degradation. I run both chips with SA voltage of 1.35V for my DDR4 ram OC.
Both chip had ran mining softwares, Prime95 and other stress tests.
I suspect that for the degradation to happen. most people did not configure power limit, they let their chip runs with high voltage with high current, and that cause fast degradation. Also, their silicon quality is probably average or below average in the first place.
@@kevinzhu5591 it probably would further accelerate degradation of the ring bus and lead to even more crashes
my bet is that intel has a design flaw regarding ring bus which got exposed by their lack of regulation regarding power profiles which slowly cooked ring bus on stock speeds and fried it when people did do some overclocking
@@xthelord1668that is tragic. I would hate to pay exorbitant prices for a faulty processor
@@kevinzhu5591 Wouldn't it make more sense to lower ring voltage and clocks a bit? It shouldn't make a huge difference in terms of performance to reduce it by 300mhz or whatever.
@@ahs9674Doubt it would help. 12th Gen have had (minor and not often but I've encounter them way to many times...) problems with asymmetric uarch, 13th & 14th not only suffer from that but you have to have in mind that they do bent, are made using Intel 7 process that took ages to be "mature" enough to be even considered viable to be used for manufacturing and so much more.
There might be a reason (beyond being behind with their own foundries and R&D on advanced nodes) why Intel is hell bent on using TSMC instead of their own process for CPU tile.
So glad I got a Ryzen back in 2021. I couldn't get a 5900x, but my 5800x has been running like a beast, couldn't be happier.
Good lord, the analysis and access is just swell. More, please. (Just don't get Boeing'd for your trouble)
"Boeing'd" 😄
If WotC can send the Pinkertons then Intel should be *_looks over shoulder_* I gotta go, guys...
I love that that is a verb now.
We must not lose Wendell 😭🙏
Let's hope Boeing doesn't use K series CPU's in their aircraft.
Damn does it feel like I dodged a bullet by going with AMD. My 7700X-based system has been rock-solid stable.
real. I love my 7700x. Such a good chip. I can Craft so many Mines with it
I bought 2 7900X3D's for 100 GbE testing on Supermicro H13SAE-MF. I guess I dodged the Intel enshitification bullet. Meanwhile, my AliExpress craptastic and old Xeon rando recycled parts impulse buy laughs in ECC.
Ryzen is an awesome cpu. I have a 3600 ryzen 5, I have had it a few years. Absolutely rock solid
@@andrewdonohue1853 I have the 2600x, also rock solid
I remember a time when people would say to go with Intel, it's a much more stable platform, how times change eh lol.
Ironically the only build I ever had so much hassle with was the one time I decided to jump ship over to Intel back in 2015 with a 6700k skylake cpu due to the crap ipc the fx8350 had that I was running.
Always used amd other than that from the athlon days in the early 2000s through to a number of ryzen builds & I've literally never had any problems, all the amd builds have just worked, currently running a 7800x3d.
AMD: We need a fastest and better CPU
Intel: we need more anti-competitive strategies
Heys, I'm one of those who were "tremendously suffering" from the instability, to the point where regular programs such as Discord, Chrome, Davinci Resolve, and games were crashing left and right. I just RMA-d my i9-13900KF. Intel already knows that shit's going down because customer support didn't put up too much of a fight and went straight to the point - RMA. Unfortunate, but it is what it is. Purchased a secondary, i7-12700K (LGA-1700 slot was needed so that I don't need a new MOBO) and it feels like a breeze of fresh air not having to put up with the constant instability.
Couldn't you also return the mobo, and go AMD instead?
@@MrXelaim No.
1. I'm a contractor (work from home) and cannot afford to lose a single day of work for making such a switch. This was a carefully orchastrated RMA process where I timed my CPU replacement on the same day the courier would come for the RMA CPU.
2. The motherboard is not faulty. Buyers protection in my country only covers the first 3 days to return a product without a good reason and this is a year old system.
@@repatomonor21 That's terrible buyers protection
@@owlmostdead9492 Why though? You still have a time window to return a product if you changed your mind. Otherwise, wouldn't people ignore warranty and just return age old products left and right out of greed?
@@repatomonor21 We have a 14 day return period, 3 days is not enough to evaluate a lot of products
My god this is so useful. I am on my third 13900K in 18 months. First one ‘lasted’ a year (3 months of that was me getting more frustrated trying to diagnose wtf was happening as more and more and more games failed). The pain was that it came out as IO errors, memory errors, everything except the CPU. In the end I found a forum thread where someone mentioned it.
I got it swapped under warranty and the second one began to fail identically 5 months in. Third one is still quite new but if this goes as well I’ll be sharing all of this with the builders and telling them they can either refund the entire PC or put something else back in.
I have no overclocking. The games that crashed were often games that a 13900K wasn’t even remotely needed for - like WoW - and it was absolutely infuriating
goodluck with 4th rma.
Have you come across anything mentioning if this is happening to the laptop 14900hx cpu?
@@WaterspoutsOfTheDeep refund the laptop
@@matyasselmek3673 they are fine the laptops arent affected
@@matyasselmek3673 evidence so far shows laptops are not affected
When AMD released its 7000 series, I went with a 7900x. I had buyer's remorse when I saw the better value of Intel 13th gen. But with this situation now, my mind is totally free from that feeling.
you can upgrade to a 7800x3d or a 9000x3d chip when they come out, without having to worry about the CPU killing itself.
@@chubbysumo2230 Well, I am not upgrading unless my current CPU ceases to function 😅
@@klwqwelp my intel 13th gen is dying… time to go team red
AMD x series always scam for price/performance. If you want to buy AMD better waiting for non-x version or 3D version
@@sandiaswara1940 Yeah for my use case the 7900 would have been perfect actually
Intels decades of fuck you buy our shit has not changed
And they're likely about to get completely shafted by ARM with the mass migration of both datacenters and consumer computers to ARM chips (or other RISC designs) due to the huge drop in power consumption that ARM brings.
@@samwalker7567 Yep. I wouldn't buy any PC right now if you can hold off. Things are changing, and ARM seems to be poised to dethrone x86 in the coming years. Best to wait and see what the industry looks like once the dust settles.
@@darthmelbiusFanatics I Guess.
@@dschwartz783 Disagree, if the transition happens it will: a) take years to acquire significant share in the desktop market, and b) there will still be a large marketshare of x86 for years after that happens, so support will remain for quite some time after. Makes no sense at all to hold off buying a PC in anticipation of this platform.
@@prw56legacy compatibility won't be a problem; the Apple M1 has proven that an appropriately designed ARM processor can competently emulate x86-64.
The crash rate is just absolutely unacceptable especially for OEM and datacenter.
Crash rates measured in weeks is not acceptable for consumer products. If it's measured in months you still have an iffy product. If you see game logs where crashe intervals is close to hourly... the whole house is on freaking fire!
Standard SLA in the industry is 99.9999%, so you can have about 30 seconds of downtime per year.
With this high of a failure rate keeping that level of SLA is simply impossible.
Even for the cheap providers that have just 99.99% SLA keeping the downtime to 5 minutes per month might be hard.
@@hubertnnn that's the marketing and contracts. In actual practice it's often lower than that for pretty much all hosting providers, even big ones like Google/Amazon/Microsoft or OVH.
Source: trust me bro. They refuse to acknowledge to us a bunch of small incidents and stuff during their own upgrades so they can keep claiming such high SLA wasn't breached
But to be fair, datacenter shouldn't even use consumer class CPU. 1 of the biggest difference between consumer class and enterprise class CPU is ECC RAM support. If you want 24/7 operation and stability is very critical, ECC RAM is a must have feature which only Xeons can provide .
And it's a lot worse than this video says, because when he says "25% of systems are having errors" he's not counting all the ones that are already dead or RMA'd, because they can't produce bug reports... If we assumed even half of the buggy chips go back to Intel at some point, the math would put it at at least 33% dead, which is absolutely disgusting.
I nearly cried watching this. Someone else that has spent countless hours going over crash logs!!! Someone else has felt my pain!!! I'm not alone!
Anyone considering a job in IT.... This is your possible future!!! and its a thankless job.
On the warranty repair side I have seen a major shift in the willingness to accept that an Intel CPU might be faulty. It went from a multi-hour process to get a processor RMA approved to a self-approved process. I just order the CPU replacement and I'm done. It's akin now to ordering a replacement memory module or SSD under the warranty process (this is at the OEM level, not with Intel).
Talking to other techs, we have seen a HUGE increase in CPU failures - almost always seen as general instability with a longer running process (which is a really hard thing to verify during a warranty service, so we do typically take the customer's word for it). I have tracked down the problem on a few of them to the memory controller contact pads - often you can even see visual discoloring of the pads from prolonged heat. The instability in those cases seems to come from the CPU getting hot, changing shape, and the signal integrity to the memory modules shifting... then BOOM, you're unstable... until it cools a little, then you're okay. Basically, I think the problem is the shape of the CPU.
I think this comment needs more attention, it's the only hypothesis that accounts for all the observations. I'm convinced this is the case too.
Is this also affecting the 14900hx laptop cpu? I just got a laptop with one and could probably return but got a really good deal and has a good mini led screen and would get so much use out of it for the whole almost a year till next model laptops are out with the 50 series rtx so really don't want to return.
@@WaterspoutsOfTheDeep
The Laptop Cpu is soldered, not socketed
back in the days the cpus were binned with a way looser margin of error, in some cases you could easily make the cpu run 25% faster with a bit more of cooling, but once they caught wind of overclockers pushing the cpus so hot, they started to push power into them themselves, to make these margins extremely tight and cpus that are edging being defective are still sold, most people don't stress test their new cpus so by the time they realize they are defective it's too late and they get away with it
One of the biggest take-aways from this video is the price difference on service agreements between a 7950X ($139) and a 14900k ($1,280) server - "3 years parts & labor, 24/5 - Next business day onsite repair - zone 1".
For the 7950X, that isn't covering the cost of a single hour onsite repair technician.
For the 14900k, that is slightly less than the cost of an Asus Pro WS W680-ACE ($330) + a new 14900k ($600) + 4 x 48 GB of G.Skill Ripjaws DDR5 5200 RAM (2x$190) (total of $1,310). You can basically do a complete hardware replacement, AT RETAIL COST, for the price of that service agreement.
If 50% of CPUs are failing one way or another but the other 50% running perfectly fine I would closely look at the production starting even from the silicon provider and its quality.
I just checked the latest batch, it sucks
Smaller you go well chip lottery that's why I don't see the point in clocking oh I can get 6ghz on water yea how long does the cpu last its equivalent to a top fuel dragster being pinned to its max breaking point intel has pushed the silicone far enough that it's getting bad yields some chips are good alot of chips are degrading faster due to the whole chip getting intense heat that's the problem 1nm chips are gonna have problem we need to move to a better Base substrate graphene or carbon substrates
14:58 - HOLY BALLS!!!! That’s crazy!!! When the data center operators notice issues and you know they collect data…something is up for sure!!!
I'm surprised they didn't leak this earlier.
This is going to drive companies to AMD much faster than before.
@@aminorityofoneAbout 30 seconds later, we're pushing our customers towards the 7950X platform. Which is exactly what I would be doing in their position.
I got my 13900KF 4090 build last year thinking it would last 8 years. Not knowing it'd be having such issues within a year...😂😅🥹🥲😥😢😭
Yeah -- this is somewhat surprising from intel. This is amateurishly like the late 2000s AMD graphic card performance. I bet its heat problems from bad caps or arrays or interdigitated caps. If its design then, that's really really going to be surprising. I've fixed my AMD processor from swapping out dry caps before.
I can't recall an issue like this since Sun accidentally shipped radioactive chips that were unstable.
I mean, Intel had to do a massive recall back in the day (1990’s?) when their CPU’s literally couldn’t do simple math instructions consistently. They didn’t exactly fail, they just never worked in the first place because of a critical design flaw.
I'd like to know more. Do you have a link ?
@@nessotrin If you Google it you might find some information. This was back in the early 90's. The company I worked for had the issue with our SPARC system. SUN made customers sign an NDA if I recall. I don't recall how widespread it was.
I did a search and only found limited mention of it from some old forum posts...
they WHAT
@@nessotrin I think its this, en.wikipedia.org/wiki/Pentium_FDIV_bug , Its floating point math would mess up a small amount. But that error could compound and cause problems depending the calculations. I recall having a professor say he had to redo part of his thesis due to this problem.
I am surprised people even buy intel.
Even without this issue - you buy intel and 6 month later 50% of performance are gone to fix all the security issues.
I’ve had so many bizarre issues lately and I’m now wondering if they’ll disappear if I swap back in my 12900k instead of my 13900ks 🤔
A couple of them highly repeatable but one that others can’t reproduce
Good to see you here!
Not reported Internal high temperatures can lead to undefined behaviour. High Current leakage can be interpreted as a wrong bit.
@@ThioJoe Just try it, for science!
It wont be that much of a performance drop, and if you d hopefully get a more stable system out of it until this is resolved ;)
Post your findings on the L1 Forum if you actually do it :D
Try it and make a video about it!
You make troll videos. Maybe you should stop spreading misinformation and lies about Intel because my i9-14900KS has NEVER crashed once.
This is why you make a statement. The speculation machines rolling now.
*you’re* a statement!!
I suspect Intel themselves probably don't know what's wrong.
@@zodwraith5745 Or they do know, but it is a hardware problem that they can't patch with a microcode update.. So they just try to play the long game until their next gen comes out.
@@kyu9649 I'm not saying that's not a possibility, but to automatically assume as much would be highly unfair. Intel is WAY too big a company to cover up the kind of conspiracy theory you're insinuating without a leak. They haven't done anything to try and cover it up and have acknowledged it publicly. They just haven't provided a solution which just circles back around to my previous comment. They have a hard enough time keeping a lid on leaks they DO want kept secret like future products. This would be way too juicy for a leaker with all the coverage it's had.
@@zodwraith5745they know, they're trying extremely hard to patch it via microcode so they can come out with a fix at the same time they acknowledge the failure.
Welcome to the age of publicly traded companies, their shares will tank for sure, they're just trying to control how much they crash.
Given the fact that 13900K has been on the market since October 2022 and this issue hasn't been fixed, it must be a design flaw. It's shocking that i9 still die at W680 motherboards.
This isnt armchair science my man. This is absolutely real research. You’ve negotiated access to data not really available to the public and doing real research on the data instead of making educated guesses as to whats going on which is what armchair science is
Could you get the AC/DC LL settings of an unstable Supermicro board?
"The system just gets miserably slow for up to a minute before an actual crash"
...uh oh. I don't get crashes in games, but I *do* I get this kind of behavior from time to time just at my windows desktop on the 13900K if I leave it running for a long period of time (like, more than 1 day). I'd start getting weird things like programs hanging in the background, Explorer stops responding to inputs, even CTRL+ALT+DEL doesn't work. It never actually has a hard crash, but it never recovers either. Only fix is a full restart, or hard reset at the power button. Guess I can expect this to get worse?...
The 14900K chips are degrading insanely fast. My 14900K on day one could run at stock frequency with a -0.075 voltage offset on the last 2 steps of the V/F curve, without any WHEA errors either at full load or idle. Every 2 months or so, I would randomly find a couple of WHEA errors which were instantly fixed by increasing the previously mentioned voltage offset by +0.05v. This week WHEA errors began showing up again. Now the CPU needs to run at stock voltage as is unable to handle any negative voltage offset whatsoever.
What kind of cooler do you have and what temps would the chip run at full load before it started to worsen?
Source: trust me bro
@@douglasmurphy3266 I'm using an IceMan Direct Die waterblock with an external MORA3 420 radiator. My 14900K max core temp at full load is 70C.
Temps are not an issue.
@@genejones7902 There are specific WHEA error codes that are related to CPU errors. Whenever I get them every couple of months, increasing the voltage slightly is the only way to make them go away. So yeah, its a CPU issue.
@@CyberneticArgumentCreator Username checks out.
Wow, it is great to hear people talking about this. We deployed some i9 machines at work, and after awhile 2 of them started crashing for no reason. The, usually really good, Dell diagnostic software could never catch it. Reformatted, replaced RAM, SSD, etc, no luck.
I saw deep on a Reddit thread to disable turbo boost and it resolved the issue.
I gave this information to Dell and they ended up replacing the CPU on these machines.
I've been helping desking and admin-ing for nearly 10 years in various sectors, and saw less than 5 CPUs die in that time, then I had 2 back to back. Was mind blowing
Solution: AMD
Buyer’s regret in two words
I am so glad I never upgraded my 12900K, it still work like a beast and I play games around 100FPS plus minus 20, mostly so I don't really need ultra fast CPU. My Mobo is also 5th gen PCIe slot, so probably not gonna upgrade for a few more years, but when I do I will switch my 12900k and Mobo to AMD setup.
I gotta say, the intro was a solid hook. You had my interest and now you have my attention.
Crazy that this was happening for so long even the oodle devs chimed in.
Honestly not surprising at all. RAD game tools are some of the most mature middleware in games (probably as close to bug free as you can realistically get without formally verifying your software). If their software is suddenly failing, they're going to want to know why.
And Oodle is deployed basically everywhere and very stable outside of this situation, otherwise all our games would be crashing constantly no matter the hardware.
So this explains why AMD came out with the server board for the AM5 socket a couple of weeks ago.😊
Hmm the data center support information - even tho it’s indirect is very interesting.
I work in commercial IT, we have had 30+ PCs where the intel CPU essentially died, this has been affecting them all from 10th gen to current. Recent one was a month old 12th gen 1255U. Affects desktop and laptop CPUs. Larger companies that deal with more PCs have had a lot more of the same.
How many amd cpus do you have?
I work in Labtop repair and our main customer is a Major IT firm that probably everybody knows (used to be very innovative in Hardware).
And we've got Lenovo T14 Gen 3 (12000-series Intel CPU) and T14 Gen 4 (13000-series Intel CPU) dying like flies right now.
Sometimes the replacement system bosrd dies with 2-3 months also.
Could be the undersized cooling, could be the horrid Lenovo Thermal "Paste" (only at first*) or something else.
*We replace the shitty Lenovo stuff with something a lot better as part of our service, especially when Systemboards/Mainboards have to be exchanged.
Edit: typos, damn phone!
Where I work we use exclusively Intel CPUs.
We're so fucked LMAO
@@alexanderzawydiwski9534 Its normal for some sata slots to not work, as well as ram not being detected in certain configurations. There is manual included with motherboard for a reason, some configs working well is a miracle (like 3 mismatched RAM sticks, sometimes it works, sometimes it doesnt, game of luck with ANY processor regardless of age, there is also good reason why mobo makers have a list of supported memory modules and configs for each one).
"we have had 30+ PCs where the intel CPU essentially died" - Are they Dell? We've gone through so many Latitudes this year it isn't funny anymore.
I am also troubled by the lack of clear messaging and solutions from Intel. We do need more transparency to better understand and manage these hitches.
I made a video about this a year back, a long with Tech Yes City. A lot of people called me crazy. Stability issues, crashing, latency... List goes on.
I appreciate you talking about this.
He has also made another video on it recently
@@IntelArcTesting This is true. Thanks man.
**chrome crashes continuously while watching this video with my never overclocked 13900k**
You never overclocked it, but your motherboard did. And that's the problem.
@@someperson1829no the problem is Intel and goes well beyond overclocking when it’s seen on boards that actively avoid overclocking. What we where seeing is those systems failed sooner due to it but failures are starting to be seen across the board on systems that were well within safe limits.
@@backupplan6058 Wow. There are LGA 1700 boards that don't activilly shove overclocking. How interesting.
@@someperson1829 standard specs for these chips seem to be already quite far off the good envelope and inside a territory of power consumption, stability and longevity that would normally not be accepted from a chip at factory specs.
@@someperson1829 As the video mentions this problem also occurs with server motherboards and no overclocking.
I bought my 13900k in October 2022. Many 13900k and 14900k owners didn't experience any problems until this year. I find it very odd that so many of us experienced issues pretty much at the same time simultaneously. Most would say the power profile caused the wear and tear but practically all at the same time? Different motherboards, different profiles that are aggressive and non aggressive? Different levels of use? Many working PC gamers aren't on their computers for long periods. Really strange.
Looking forward to the Gamers Nexus exposé 🤣
3 month later they pull out a 5,000 page reports complete with charts and 2 hour video
And then for all the fanboys and bad faith actors to call it "drama" and dismiss everything they found.
I like that they want to keep places honest. I just want a standard. Don't favor giving one company a chance to explain and make steps to fixing things then not go back to other companies that did the same. If said companies didn't change let us know too.
Intel execs be like: hey who is that long haired jesus looking guy doen there filming our hq?
I'd watch it@@niyablake
I'm glad they got the boiler snake a sunning lamp.
This is how you do investigative reporting!
The sadest part about this is that I have been telling users it's their CPU ever since it started poping up en mass with UE5 game releases, so over a year now.
Hundreds of cases accross multitude of games even outside UE5.
And even if I link them the articles now that mainstream media picked up the story they will not even bother reading it and simply reply "but it's only in THIS game".
they're NPCs, there's no saving them !
Yeah I get similar shrugs
It's only in THIS game ... for now.
Today we learn a bright side of telemetry.
This is slightly terrifying. A problem that impacts perhaps 0.1% of enthusiast customers is a poor experience, but I can appreciate how difficult it would be for Intel to root-cause the issue. If it's 10%, that speaks to a product design issue which *should* have been caught much earlier. Intel's own internal testing should have caught this and resulted in mitigation before customers noticed, and preferably before the chips shipped.
Thing is, if its getting worse over time, this smells like hardware defect in fabrication or material failure, not design flaw. I would not be surprised if we find out this down the line.
@@ponocni1 But then you should see the same issue cropping up with i7s and i5s, which we don't. What's so different with the i9s? If it's that 8 e core cluster then why doesn't disabling e cores help?
@@zodwraith5745 i9s are binned for higher max boost clocks and more power, so maybe whatever criteria that requires also makes them more vulnerable to this problem if it's an issue with them having pushed their manufacturing process to the breaking point.
@@zodwraith5745 It is happening with the 14700k. A number of people have reported it and I've seen it with mine. Bios updates helped some and I have run it with Asus MCE disabled the whole time. I've got a custom loop so it's not a cooler issue. (420mm + 120mm + 280mm rads too)
@@LucasHolt But what is the problem specifically? I've heard it seems to occur mostly with specific games. I haven't had a hiccup from my 14700k but I'm on MSI. Keep in mind most of these issues were on ASUS boards, a few with Gigabyte, and I've never heard of an MSI or Asrock occurrence. Although I think the narrative changed fast because Intel is such a juicier target.
But what Wendell is talking about in this video isn't Z790, he's talking about W680. *_ASUS_* W680. A SERVER platform that uses consumer CPUs. That means 24/7 full tilt all cores running. That's why the error rates are so much higher. I haven't had an issue but I wouldn't be shocked to see my system throw up an error every few hours under _that_ kind of abuse.
Based off my experience overclocking these things......I'm pretty damn confident in saying the problem lies in either the ring bus or the system agent is just at it's damn limits. I would more than be willing to bet on the VAST majority of the true raptor lake dies that are failing are all going to be larger 16e core dies. It never sat right with me how wildly the power consumption on these chips was affected by the ringbus and how minor bumps in speed took absolutely shocking amounts of power and tons of system agent voltage fiddling. In retrospect, this explanation falls in line with how cagey intel has been about the SA voltage in the non-z chipsets. Basically we're seeing the limits of one of these two things and that it's so close to the limit that natural degradation of the silicon will cause faults.
The craziest example I can remember is an alder lake cpu that would do 5ghz ring.....at about 400watts. Drop it down to 4.8?...300 watts. Drop it to 4.6...250...and 4.4 only took 220-230ish. E cores were all disabled and the p core settings were the same on all tests...the only thing that changed was the ringbus/cache.
Seconded. I high suspect uncore interconnect degradation being a proximate root cause. Linpack won't catch it failing until things catostrophically fail later down the line. Harder to validate. "Just shove more voltage into it and it'll be fine. How long do you expect users to run these chips? 15 years? They'll be obsolete in 15 years." Try less than 1 year past intel execs.
I run my 13900k with turbo disabled so basically sitting pretty at 0.8v vcore about 95% of the time. Bought it second hand for $300 about 5 months after launch due to "issues hitting ddr5-8000" pre-dellided with a custom copper ihs. Person I bought it from got a second chip direct die cooled it and from what I can tell that new chip worked just fine. Likely doesn't anymore. Lol. Probably swapped it out for a 14900ks.
My chip still shows signs of degradation. SA was at 1.3V and vdd at 1.35V for 96gb dual rank ddr5-6800 for over a year. Now experiencing random latency spikes in the past 6 weeks. Even reinstalled windows as it's been over year but no change so it's not software related. Memory test passes at 32 hours as I don't have time to test for longer.
Here's to hoping whatever releases this year is better than the hot garbage intel released or I'm switching back to AMD. Intel cpus failing were unheard of before really 11th gen. Started going downhill quick. 11th gen also had SA/IMC issues compared to 10th gen.
It's telling intel is being hush hush about this as it's a huge blow to their QC reputation.
Could it be that this is due to the AVX offset was not triggered probably in light tasks such as gaming?
I mean we will never see this in all these stability test then, because the load is heavy enough in these benchmarks for AVX offset to kick in
@@quanlethienminh6002 It's not likely to be an avx offset issue. The way to understand what i've pointed out here (to the best i can oversimplify it)..is the ring bus is like a road/traffic network between all the cpu parts while the system agent is the traffic cop/monitor/system. The pcie controller and memory controller all sit on the ring bus. In the scenario where the SSD corrupts....the data can't make it from the cpu core down to the pcie controller. Add this explanation to what wendell is saying and it should make things much clearer as to wtf exactly is going on. Disabling/slowing down stuff doesn't change the fact underlying infrastructure is just pushed to it's limits the way it is.
Iirc, the 1700 socketed
Xeons... have avx512 by virtue of having no e cores......if the xeons are ecoreless dies and no failures exist..we might having something of a smoking gun. I havent touched a 1700 xeon...and havent seen a delidded die shot
I bought a 12900k 2 months before the 13900k. I've been having the same issues with the 12900k since about 5 months after using it. I believe the TDP on these motherboards have degraded the chips a bunch. I also have beefy water cooling that keeps the temps around 80 at full load. - The bios probably called more power because of the cooling efficiency on mine and burnt the chip up.
It finally gave out yesterday. Any load for about 2minutes - Blue Screen Crash.
I always learn so much whenever Wendell just wants to do something like play Dwarf Fortress.
Thanks for this breakdown of your hard work. It reinforces my decision to change from Intel to AMD for my most recent build. I migrated from a 10900k to a 7800x3d, and so far I am very satisfied.
My first two computers used Intel. And once I saw AMD really supporting their sockets for longer, I decided to make the jump and couldn't be happier. More powerful for what I need and
less power hungry than Intel.
@@BeeWhere Yes, less power = less heat = less noise too.
Your intro just hits right, and I don't know exactly what it is, I just really like it.
This is unfortunately not uncommon when large companies deals with a big design/quality issue. When you have a reputation for quality... the last thing you want to do is openly admit you made bad things (and this also can open you up for lawsuits- like these server support companies looking to recoup all the extra service costs, or the company that lost 100k in their player base due to instability). Say vague things to sound like you care while claiming some type of user error. See how Toyota handled the 3.0 V6 oil sludge issues (but that was more a Toyota cultural thing), and how Apple handles EvErYtHiNg. Intel has been playing catchup to AMD Ryzen for years now.. after making fun of the chiplets being "glued together". All they really have left is their reliable reputation... so they are intentionally keeping tight lipped and hoping the media coverage of the issue just "goes away" after they release their next CPU generation - one that actually will be new and not just the old ones with more voltage.
That's exactly why this Intel instability issue should be headline on all big PC news websites
@@damara2268 I bet Intel is really glad CrowdStrike pushed out a global Windows crash update filee. That gives the media something ay meatier to talk about versus Intel making bad CPUs that degrade quickly even when not OC'd.
This is an incredibly wonderful video/investigation. I'm living this myself right now. Actually went through the RMA process with Intel, and they specifically said they don't have ANY replacement processors at this point - all out of stock. They are actually giving me a cash refund instead. There is definitely some major drama going on behind the scenes over there and I really am appreciative of this video starting to uncover it. Hopefully Intel will be forced to disclose some info soon.
Wow really they have no replacements! I think we all know that Intel have figured out what's wrong and are keeping lips very tight.
Here from the GN video. Thanks for the information.
Thank you. It's always interesting to hear about modern trends in data centers. I don't work in gaming, so we don't use high end consumer CPU's, but this is worrisome because it shows that the latest processors from intel are truly the bleeding edge.
Maybe you should do something similar to what Steve/GN do sometimes and set up some form of communication so devs/data center admins can share data to hopefully get closer to a real answer? Heck, maybe even get physical examinations of problem CPUs if possible
@CMDR_bravoMike That would be great, but I don't want to bankrupt GN. They always spend "yes" money on their investigative journalism.
THE KEY ISSUE IS "INTEL INSIDE" ON THE WARNING LABEL ON THE FRONT OF THE COMPUTER CASE
I'm seeing a lot of people confused. It's not just i9 chips! The 14700k is also a POS. I've seen others with issues as well. People talk about the 14900k more but it's still a problem for the rest of us with 14700k chips too!
I have not tried to overclock mine. it's running with DDR5 5600 48GB corsair modules. Custom loop with 420mm + 120mm + 280mm radiator. 6900XT GPU.
It crashes in two different operating systems! I dual boot Windows and MidnightBSD. In the latter, LLVM clang will start crashing during long compiles (10+ min). On windows, I mostly game and that is also unstable. I made all the recommended changes that intel proposed. I turned off asus mce on the first day of ownership last november.
Worst CPU I've ever owned. It's slower than the 3950x i had before for compiling and with all the nerfs to make it stable, it's starting to lose the gaming advantage. Huge waste of money.
@@ChrisL-d4c 25% failure rate means 75% with no issue. There's no guarantee you're going to keep being this lucky in the future though. For many the problem starts small and progresses.
Hear anything about this also affecting the 14900hx laptop cpu? Or they safe?
@@WaterspoutsOfTheDeep I've not seen anything about laptop CPUs either way.
you don't have to "try" overclocking it, turbo boost is usually turned on by default so it automatically overclocks for you. If you want to avoid overclocking you'd have to turn off turbo boost.
Intel's is counting on their reputation and brand name to carry them through since the 11th gen.
They have been doing that since 9th gen........
You mean 7/8th Gen?
If not longer
Love that KDE Plasma wallpaper on the screen in the background :) And I am so glad that i went with Ryzen this time. I missed the AM4 era but I sure hope AM5 lasts as long as AM4 did.
Great video, Wendell. I had to swap a 14900K in my desktop that had these problems. Two weeks later the news about the issues started popping up. So far so good with the replacement running at Intel "Extreme" spec.
@@aaronmoore3050 Why it makes no sense? You know that even with the same name the silicon is not a copy and paste?
Look at how well that radiator is lit and presented. You can tell Wendell is a radiator connosour.
I believe that's the radiator cooling his 14900k.
I love how people ignore the goings on with the 12900K & KF's which are suffering the same issues be it not as bad (lagg/stalling but not hard crashing)
I have had a ssd drive corrupt on a 13th gen intel 13600k . Was related to IO errors. Can confirm! Was not OC'd and on latest bios.
I chose to get a drive with on board cache instead of host platform memory specifically so I don't have to be nearly as concerned about drive corrupting internal mapping information just because i got a little spicy with cpu overclock.
I have an 13600K with multiple NVMe SSD drives in Linux and haven't had any issues at all in the 18 months I've had my system. I don't overclock and my DDDR4 RAM sticks don't like XMP so run at stock speeds, so it's quite a vanilla setup (ASUS motherboard with defaults for all the CPU settings).
I hope you had a backup, that is scary.
Most of the comments are saying that there may be design issues but it got me thinking that maybe its a test, qualification, and reliability process issue. Those 3 are internal groups in all semiconductor companies that makes sure parts are good when they go to the customer.
Now it got me thinking did Intel nerfed those groups to resolve the yield issue they were having several years ago?
After watching the video and thinking about your findings, the thing that stands out to me is your observation of the tick rate on the game servers falling by 50% before the crashing. This tells me the CPU pipelines are being invalidated, which is what's impacting performance so badly. Whatever is happening to cause these pipeline invalidations is the real culprit, and I have a suspicion about what it is.
I believe the CPUs have new microcode to detect attempted side-channel attacks, and when that is triggered the CPU intentionally invalidates the pipelines on all cores and additionally flushes the L1 cache. This would immediately flush any security keys from the CPU causing the side-channel attack to whiff on stealing something valuable. Next, the L1 cache needs to be reconstituted, including reloading keys. With NVMe drives, you can realistically expect 12+ CPU cores requesting the same key file simultaneously would work fine, but I suspect the IO errors being encountered are OS-level locks causing the cache reconstitution to fail, thus crashing the process, and possibly even the kernel.
Another cause could be actual side-channel attacks, and the CPU trying to protect itself. If the defense strategy chosen by Intel is to crash the CPU rather than allow compromising data then they would not want to publicize that because now you have an instant mechanism to launch a DoS attack against a service or service provider.
For end users, I think that certain users are affected more frequently because their system has been compromised and is under frequent side-channel attacks. So this begs the question, which is worse? Do we have less active side-channel protection, probably increasing risk of keys being stolen, or do we have active protections that ultimately choose to unalive the CPU rather than allow keys to be leaked?
I would see if you can dig up any data on the occurrence of side-channel attacks on these CPUs compared to the older 12900k systems.
Final thought here, the reason that slowing down the memory improves stability is that at the lower clock speeds, the mechanism in the CPU that monitors for side-channel attacks is able to better analyze the flow of data coming into the CPU and makes less frequent false positives that trigger the failsafe mechanism.
Intel has a pretty big problem and that is CEO Pat Gelsinger!
I’m really curious if this will ultimately lead to a recall or a class action lawsuit.
Also really curious if AMD sales reps are bringing the instability issues up when they’re talking with data center clients. The fact that this is happening with workstation boards that aren’t overclocked and are much more conservative with power management makes me think this is a flaw in the fundamental design of the chip. But if so, one would think similar problems would be appearing in Xeon chips too since they share core designs, right? Very puzzling…
Epyc is AMD, you probably meant to write Xeon.
Mate, after the way Intel behaved in the early noughties against AMD, I sure hope AMD are bringing this up. 1. Prospects need to know that after months, Intel hasn't root caused the issue or as Wendell said, clearly admitted that effected clients will be made whole. 2. It's glorious when Karma scores against arseholes like Intel.
As an aside, I think the game companies who collect telemetry have an opportunity to leverage that in order to help get to the root cause. If it's not sufficient (as Wendell suggests), this could help the Data Scientists to refine the collection to improve this.
Xeons are nowhere near pushed to the single thread clocks of the high Ks.
@@petemonster1 I miss the noughties.
xeons have a whole different design philosophy behind them, and bring intel the bulk of it's money. they wouldn't sabotage their income for a headline.
remember, intel laughed and derided amd's chiplets. then amd slaughtered them, intel panicked and tried to copy them. pretty sure intel thought they could get away with blaming customers(you're not a user, you bought something) just like nvidia did.
I had been an Intel user all my life up to my last one the Core i7 5930k It was perfect until it gave up on me after 7 good years of constant OC.
I took the plunge and got AMD Ryzen 7800X3D for my new gamin rig instead of 14900k and now... I realized that I just narrowly avoided the path of suffering. I hope Intel gets back in the game soon.
AMD CPU is so good now or even better in many aspects. My dad (R.I.P) who was a die hard fan of Intel would never believed AMD would be able to pull this.
I've had an innumerable number of issues with my 12700k (Same chip) across 3 different motherboards. I've been waiting for the 14700k to drop in price because of the insane amount of game crashes and bluescreens I have seen.
Thank you for the fantastic deep dive into this. Seriously great work.
Try drop the XMP and run stock.
I had same issue until I just used default ram speed and been stable for nearly 2 years.
The cpu's are garbage for getting higher speed ram working and usually means more voltage which is another reason all these issues are happening.
I wouldn't bother going for 14700k, May be even worse looking at all the problems with degradation.
14700k crash also.
Weird, my 12700k has been rock solid ever since I bought it when it was released and currently running 6400 MHz ram. I'm AI engineer and there were times I left computer processing for weeks and never had any issues with it.
Your Intel chip sucks so you're waiting to buy another Intel chip that's also having long term issues 🤔 Interesting.
@@Navhkrin that would be impossible with a 14700k. Don't upgrade.
I can't get mine to be stable for 30 minutes doing make -j28 compiles lol
My 13900K (running at stock) has been rock solid for about a year, hopefully I don't have a ticking time bomb 😬
I hope high temperatures and very low size processing technology don't degrade the chip faster than you should expect.
Is this @Intel PR speaking through @JarrodsTech account?
🤑💸💰
mine is also fine, however I noticed a problem in a single core if I overclock. solution was to just overclock the other cores. but mine is direct die so the temp is like 40c under load.
i have a 13900kf at stock settings with intel power limits rock solid since january 2023
My 13700k / 4090 build has been pretty much flawless for almost 2 years now. Even started out of a ddr4 board for about 5 months until msi released the z790 tomahawk ddr5 board. But 13700k oc to 5.6ghz on all 8 p cores, All 8 E core oc to 4.4ghz, Ring oc to 5.0ghz at 1.34v. Even the mem controller has been very solid. 32g.b Corsair rgb vengeance 6400 cl32 hynix A-Die oc to 7200 34-41-41-83 at 1.45v. On a msi z790 tomahawk. I can't complain. I've also been using a contact frame since day one for both boards I've had this 13700k in. I've heard a lot of the problems could be related to the bend they get with the stock mounting bracket. But until intel releases a statement / fix we really don't know what the issue is. I've used mine hard for over 8 hours a day every single day for work and gaming for almost 2 years now and it runs perfectly fine. I haven't babied it and I won't. I bought a k series for a reason like all the k series intel cpu's I've had.
If the CPU performance halves before crashing maybe it is the IMC and memory transfers stall causing bandwidth starvation.
This would explain why most pure cpu metrics continue to look fine.
I'm curious if the issue reported is exclusive to ddr5.
Megekko (a big Dutch retailer) has an informal policy to push ddr5 dual dimm over quad dimm for stability reasons but they didn't specify this would be for Intel only.
From my extensive overclocking on my 13600k I've had this issue as well. What I've found from my testing is that it seems related to heat/power spikes. What I've noticed is that when at the edge of the thermal envelope +80c, that there could be times where it spikes over 100c but not record it because of how fast it can spike before the temp is reported. But ultimately, the instability comes down to voltage fluctuations and temp fluctuations. When you lock the voltage and keep the temps below 80c on max prime95 loads you'll have a pretty stable system.
Then maybe that’s the way to go for now on those chips? (At the cost of some performance)
@@EJM07 I am currently in my "summer clocks" Which is standard boost clocks on both p and e cores, but massively undervolted. By default my chip wants to run at 1.37v to achieve stock boost clocks. However, I can achieve this with 1.16v on load and 1.08v on idle. Yes, I have my load line calibration set where it adds slightly bit more voltage when under heavy loads so it doesn't have vdroop. But with this settings I always have a temp 6-7c above ambient on idle and temps not exceeding 75c on full load. This is even with ambient temps of up to 38c. Yes, ambient temps of 38c, that is not a typo, and yes it sucks badly when it's that hot in here. Specially with 70+% humidity.
@@chieftron i feel ya. high temp + high humidity is my pain also, it just hits differently. intel's solution to faster cpus seems to be just more voltage on the cores lately. i've seen some crazy voltages in boosting cores on some 12th gen. always thought it was a misreport from the tool, but now? not so sure anymore.
Needed a new virtualization host for @ home, waited until the company i work for rolled out massive amount of new Intel machines. Listened to what was being told on the work floor about the new hardware over some months. Did buy AMD 7000 series because heard nothing but problems on Intel at work. Even when it did not crash then everyone was complaining about the big little architecture, which yes you can disable or do core affinity. But be serious who wants to do stupid settings like that when the competition just has something that does not need that kind of tinkering at all. On AMD i was very careful as well to buy the right combination of hardware. Special for MB and memory combination had to be a verified set. In the end i did buy a B650M AORUS ELITE AX ICE and CMK192GX5M4B5200C38 and for storage i plunked the board full of these: Lexar NM790 4TB. I can say i am very happy with my home server.
AMD watching this: "Kermit drinking tea" meme.
Is there some kind of tech "law" to this?
Every time chip Company A has hardware outperforming chip Company B....
A becomes more power efficient as they stagnate towards barely pushing their silicon.
B becomes a power hog with frightening stability problems as they aggressively push their silicon for all the performance they can get.
Nvidia became B during the Fermi 'n 5000 series days.
AMD was B during the Bulldozer days.
And now Intel is B.
You do know AMD has needed 3 bios releases with every release of their next gen CPU. 7800x3d still has 10 minute boot times, that the only answer for is to not shut the PC down.
That is the way things go. Things will change up again in a few years.
Shill @@kramnull8962
my bulldozer cpus, had 2 of them, never crashed on me. never heard of them crashing either. people complained they weren't fast enough or whatever, never heard a peep about crashes.
@@jmwintenn Yeah, these old things are very solid. Not very fast, hungry for watts, and not even valued that much used, but these will outlive a lot of these newer processors.
Genuinely one of the most interesting videos I've seen in a while. Really love your clear, specific and entertaining communication style. I love how you're very clear about what you do and don't know.
Very well structured video, and the two-camera work was well done (uncharacteristically :) ).
At this stage in the game if I owned a 13th or 14th gen Intel cpu that showed any signs of crashing or performance degradation I’d be sending it back to Intel on the first thing smoking wether it was under warranty or not. Intel needs to admit there is a problem and recall the vast majority of these chips (after correcting the problem, which I’m sure they probably have with newer built chips).
They are basically going to pull a Nintendo and never admit there is a problem (think joy con drift). As time marches on I feel for people who’s chips are either going to fail out of warranty, or won’t perform as well as it did new because of degradation or bios updates that will gimp performance.
Joycon drift is intentional by design to force people into buying new joycons tho...
I'm kinda glad Micro Center talked me down to a 14700K been stable.
they talked me into the 14700F and I am glad they did. I did notice crashes while using nvidia 4060 and a 4070, so I slapped a 7900 GRE card in my system, and it just works now. Be careful when using nvidia too. Everyone is quick to blame intel, when there are other things at play here. Yes I know the data centers are facing similar issues, but even then they use nvidia gpus.
@@cgwworldministries83they don't use the same GPUs, the data center ones have a better phase margin and just overall higher reliability construction traits.
@@SianaGearz they are still nvidia gpus which offload the scheduler to the CPU.
@@cgwworldministries83i have a 13700f. I was getting temp spikes so i limited my throttle to 90c
My 14700k has been unstable. It's a huge PITA. I'll screw with it and get it working for a month or two and then it starts again.
I'm so glad I went with AMD when I decided to upgrade from my 8700K Christmas 2023. Went with a 7800X3D and haven't had a single issue, love it.
I wouldn't, but then again I use my computer for things other than gaming.
As an overclocker, I've noticed the following in my personal system. It's a Z690 Asus M-ITX with a 13700k.
1. Tuning the PLL Core, PLL SA, PLL Cache, and PLL IMC voltages to around ~0.99 help significantly with stability. Other overclockers have noticed that there seems to be a PLL issue on these chips, so it's not just me. Stability is randomly lost over time when running DDR5 at high speeds, this has been attributed to a phase lock loop issue.
2. My motherboard personally comes with a very droopy LLC. To the tune of 0.200 mV (1.50 set, 1.30 get). Not only does it droop to an extreme degree, but it's also horrible at catching transient loads. I have to increase my loadline levels by 2 (from 3 to 5) in order to achieve stability at any sensible voltage. At stock, I would need 1.49v > 1.29v droop during AVX2, and even then, it's not always stable. When the LLC is increased to level 5, I can settle for 1.36v > 1.25v droop (-110mV). Even though it's a lower voltage under load, it's far more stable at catching transient loads than LLC 3 at a higher average load voltage.
3. Increasing your memory speed beyond ~6000 MT/s on DDR5 invites a massive stability penalty. Whereas I can run a very low load voltage of about 1.18v @ DDR5 6000 (CL28 100,000 tREFI), setting that up to DDR5 7200 (CL32 65,000 tREFI) necessitates an increase of 0.08 mV. So 1.26v core voltage.
4. Increasing the e-core frequency induces a penalty to power draw, and subsequently your minimum voltage necessitated to achieve stability. (Example: 4.2 GHz ecores draws around 75 watts, depending on your voltage. Set that to 3.2 GHz, it drops down to 50 watts).
When you add all these things together, alongside power lost via inadequate cooler contact, it doesn't shock me that these CPU's are unstable out of the box.
Coming back to more comments and the 1st point matches what I've experienced exactly.
Apparent memory corruption with very long power on times ~1 month or longer. 32 hour memory test passes for me no issues. I've read posts where people ran memory tests and like clock work they racked up errors as the weeks went on. A reboot and everything is perfect again. AMD system never experienced that. Not sure if dual rank 96gb ddr5-6800 counts as high clocks but that's what I managed to max out on my 4 dimm board. Will give tuning those voltages a try.
Will come back in a few months to report back.
Is there a way for you to check whether the crashes are correlated with AVX work loads?
Intel has always had problem with their AVX implementation since Skylake, having to do all these AVX clock offsets. Perhaps, in these games the load is so light that the AVX offset was not triggered even though AVX instructions are being used, leading to some local electromigration degradation around these die areas.
@@quanlethienminh6002 To be clear, AVX support is not guaranteed I'm 14900(K/KF/etc).
I've seen people complaining about it on Intel forums. They RMA a 14900K with AVX and get a replacement without any AVX support at all.
@@Gen0cidePTB oh you are thinking of AVX 512. 14900k officially supports the older AVX 2 instruction set.
My bad @@quanlethienminh6002 you're correct