Intel PR posted an update to... reddit. So check that out, update your bios, and look for another update to your bios coming mid august 2024! www.reddit.com/r/intel/comments/1e9mf04/intel_core_13th14th_gen_desktop_processors/ tho they seem to have deleted some of the better questions about whats covered under rma, are folks that have delidded covered (guessing: lolnope), etc.
@@Level1Techs Sounds to me like they're trying to cover their asses. They're focusing on the voltage issue because that's something they can fix. The oxidation problem is a manufacturing defect, and that potentially means the dreaded R word - RECALL.
@@Level1Techs sh8 i just ordered a laptop with a 19400hx on the acer helios 18...am i screwed? considering i live in the phil and my sister bought it for me there in the u.s...yikes
"AMD Is In the Rear-View Mirror" - Pat Gelsinger, Intel CEO "Intel CEO’s Comments of AMD ‘in the Rearview Mirror’ May Indicate Intel is Driving Backwards" - Areej Syed, Hardwaretimes
I saw a funny meme cartoon of Pat driving towards a cliff and Lisa driving away from the cliff. Lisa's car is of course in the rearview mirror from Pat's perspective. v-cliff
ummm i saw literally ZERO sick memes here soooo... kinda suspect reporting bro? but fireship just pontificated on the issue and showed speculative-non-primary-source concern AND SICK MEMES, so therefore, issue confirmed
Intel forgot how to make fast cpus, then they forgot how to make efficient cpus, and now they appear to be forgetting how to make properly operational cpus... Not a good character arc...
Typical too quickly developed too little tested, lack of QA because the new lineup has to be released and delivered to keep up with the competition and not upset the shareholders behaviour. I wish we had fewer stock corporations again. A world in which profitable companies are devalued because they don't constantly increase their profits is completely f'd up.
-Talks about a real problem -Explains what he found in true objectivism -Admits data is not big enough -Properly digs through the problem What a gigachad
I've had games with bad enough memory leaks that you pretty much need to reload them every 2-3 hours or the game will bog down and eventually crash. Common problem with specific operations in Unity. All you really need to fix that one is a cleanup subroutine - which I actually talked one developer into implementing during Early Access.
Player: I've got you now Pirate King! PC: (Crashes) Player: Ahhhh you thought a mere catastraphic failure could save you?! Well prepare to doom your DOOM! LMAO
Intel is hanging on for dear life trying to compete with AMD. They rather take big risks that can backfire on their customers, then try to silence it, than being honest about the limits of their CPUs and losing their high spot in the benchmark charts.
Exactly, there is no fix for what is plain old Silicon degradation caused by too high voltages, clocks, temps and Amps pulled trough the silicon long-term. There is no software fix, there is no bios, there is no microcode that will revert this once its already degraded. The only fix is "do not exceed 1,3v and clock them 500-600Mhz lower". And also, do not clock your CPUs so high that they have less than a 5% headroom left in them, so even the smallest degradation throws then to the "unstable on stock settings" category. Intel is just making smoke hoping the attention will go away so they do not have to do the thing they should - full scale recall.
@@seeibe This. Consumers are used to seeing i7 at the top of the chart and Intel will break bones to keep it there, because they desperately need to keep it's brand power. I just had to leave a bad review for a "i7 Desktop" selling in my country's Amazon with no GPU for $900(!), because I noticed that it had a HD4500 and realised nobody else could work out that meant it was an 11 year old PC. They just see "i7" and buy. It's those idiots that Intel made the 14900 for. So Google still says "i7 is fastest" and the plebs carry on buying it.
I work at a gaming marketing agency in the IT department, and I have been absolutely losing my mind over these issues. I have built ten systems over the last six months, and had three of them experience various failures, from One of the more consistent problems I've seen has been system corruptions, where I'm unable to even run simple Windows updates, SFC, DISM -- you name it. Recovery media fails, fresh installs on new drives fail, and the crashes on games have gotten utterly out of control. Thank you for this, it's definitely incredibly helpful for tracking our OWN sanity.
@@chrisx742 Appreciate the thought, and while yes, it does, and we did -- used the Intel XTU, didn't go above 52x, dialed in Intel specified limits all across the board, and had little to no consistent improvement, but definite performance loss. Tried the Intel Processor Diagnostic Tool and failed consistently across all settings. RMA'd one 14900K already, popped in a spare 12900K while waiting for the replacement with none of these issues with all the rest of the same hardware. The point ultimately is that we shouldn't need to do this, and it's infuriating to have to fight with multi-billion dollar companies to get what we're paying for, and have it actually work consistently.
As a former aviation maintenance tech and lifelong computer nerd, I approve this message. I won't fly commercial and I won't buy intel. Both Boeing and intel are a complete disaster.
When server providers are starting to actively recommend AMD systems after YEARS of Intel being the "safe bet", this is the point where Intel should start sweating. Because mind-share is a huge factor in what gets bought in large volumes.
I agree with most of your statements as I work for a small SI. I stated on initial videos a few months ago when the fingers were pointed at motherboard makers that the problem could be with Intel processors. We mostly provide 14900k to businesses on premium and entry-level B760 boards. So right off the bat, no O.C. Next, we disable motherboard enhancement and limit power to Intel stock. And some board are weak and can only do intel stock power. Failures have happened in all these scenarios. These systems can run cinebench all day long or occt stability test or aida64. But try to install the nvidia driver(compression/decompression). The installer fails. Another way to test this is tekken 8 demo game. No matter what bios we choose, those processors are gone. Replaced. There is no coming back from this. Has happened with ddr4 and ddr5 systems.
@explosivemonkeys I hope you get yours soon. We are not in the U.S. so our rma is a couple of weeks with Intel. But once the system is diagnosed with this problem at the service centre, we replace it with a new processor and return the system back to the customer. So, for a customer, it is basically 2 to 3 days down time. I am also hearing stories of Intel denying rma for these cases. That would make this worse.
The NVidia installer was a big thing for me too. Would crash on install about 95% of the time. At the start I figured it was a corrupt Windows install or maybe a failing HD. Almost dropped a lot of extra money to start replacing components before I started to see all the other issues out there, which thankfully saved me.
Just so it's clear - did replacing the CPU actually fix the issue? Was it properly verified, taking a system exhibiting the problem consistently, replacing only CPU and then testing again with the same exact test and no problem found? Asking because I was around in the past for the FDIV bug in P60/P66 by Intel, where you could actually even get it to calculate wrong in Excel using a specific formula to prove the bug. Initially Intel said there is no issue, then they made RMA's for all who complained trying to shut them up - or at least it looked like that is what went on, then finally they had to stop all the BS - and rework the CPU and announce that people who had affected chips could register to get it replaced with the new fixed chips shipping out as they were coming off the production line. I had one of the FDIV bug P60's for years as a memento of that cluckerf*ck but I traded it eventually for a DEC VAX 11/750 in 1997-8 something like that. Shoulda kept it. Point being with the FDIV bug you could not replace your way out of it until Intel made fixed chips, 100% of the P60/P66 initial production were affected. You could kernel patch your way out of it, but it was a software workaround for a hardware problem.
15:05 "We already replaced a lot of customer's 13900k with 14900k and the issues don't seem fully resolved." This statement is extremely telling. 14:56 "$1,000 extra" for support is insane and really tells the whole story here. The part is not reliable. I am sure a lot of conversations are happening behind the scenes here, but based on what I've seen, Intel has not committed to fixing the issue. We can hope that it is due to incompetence but more likely they do not want to admit fault here due to the cost of "making it right" Thanks for sharing this Wendell, it will be interesting to see other news around this topic in the coming weeks.
It's like how insurance companies are refusing to insure houses in certain states against extreme weather. When even the Capitalists charge more or refuse to cover something, that's a sign that there's a real problem there.
@@monad_tcp The issue is that Intel itself doesn't really know what to fix and currently the most reliable way to make it run stable enough is to lower the multiplier like to 53 and running the RAM slower. If that is what it takes then the CPU wouldn't run as specified and also would invalidate Intel marketing around the CPU, thus even if Intel want to do a simple recall and giving the costumer the fixed product, it can't be done. Assuming actually running it slower does fix the problem, Intel probably need to replace the product, force it to run the CPU slower, and give some monetary compensation.
I don't think it's about being committed to solve the issue, as it is that they have probably realized there is no fix, and they can't afford to recall two whole generations of product. That would kill the company. Trying to mitigate it through microcode and power limits is probably the best they can do until getting a 15th gen out, and it might even be too late to fix the issues for that one, so they're really just holding on for their life for a 16th gen.
@@rezenclowd3 That's a really dumb way to do business, people talk to each other. It's like if I started pumping out car parts, let's say a water pump that kept failing prematurely and blamed everyone but themselves and tried charging for repairs but knew it was self inflicted the whole time... Oh wait, my Golf R had that issue. Oh, there was a class action lawsuit? Nice.
@@Dendodorion I mean it does make sense if you aren't the company producing the faulty part in the first place. Charging more because something is worse to work on seems fair to me.
25% is a ridiculous failure rate for the high volume OEMs. Wonder if intel is getting back the failed chips to do some failure analysis on them with test boards and thermal imaging or something.
My guess is it's simply that power/frequency limits are pushed beyond what the platform can reasonably support, which means this was an intentional choice intel made. It is a known workaround that underclocking and reducing power limits increases stability.... at first. But once it starts happening, it seems that the damage is done, and it's only going to keep getting worse.
real. I love my 7700x. Such a good chip. I can Craft so many Mines with it
4 месяца назад
I bought 2 7900X3D's for 100 GbE testing on Supermicro H13SAE-MF. I guess I dodged the Intel enshitification bullet. Meanwhile, my AliExpress craptastic and old Xeon rando recycled parts impulse buy laughs in ECC.
I remember a time when people would say to go with Intel, it's a much more stable platform, how times change eh lol. Ironically the only build I ever had so much hassle with was the one time I decided to jump ship over to Intel back in 2015 with a 6700k skylake cpu due to the crap ipc the fx8350 had that I was running. Always used amd other than that from the athlon days in the early 2000s through to a number of ryzen builds & I've literally never had any problems, all the amd builds have just worked, currently running a 7800x3d.
When AMD released its 7000 series, I went with a 7900x. I had buyer's remorse when I saw the better value of Intel 13th gen. But with this situation now, my mind is totally free from that feeling.
Heys, I'm one of those who were "tremendously suffering" from the instability, to the point where regular programs such as Discord, Chrome, Davinci Resolve, and games were crashing left and right. I just RMA-d my i9-13900KF. Intel already knows that shit's going down because customer support didn't put up too much of a fight and went straight to the point - RMA. Unfortunate, but it is what it is. Purchased a secondary, i7-12700K (LGA-1700 slot was needed so that I don't need a new MOBO) and it feels like a breeze of fresh air not having to put up with the constant instability.
@@MrXelaim No. 1. I'm a contractor (work from home) and cannot afford to lose a single day of work for making such a switch. This was a carefully orchastrated RMA process where I timed my CPU replacement on the same day the courier would come for the RMA CPU. 2. The motherboard is not faulty. Buyers protection in my country only covers the first 3 days to return a product without a good reason and this is a year old system.
@@owlmostdead9492 Why though? You still have a time window to return a product if you changed your mind. Otherwise, wouldn't people ignore warranty and just return age old products left and right out of greed?
Right smack in the middle of the Raptor Lake 8+16 die is the ring interconnect fabric, with a ring agent interconnect per P-core and an agent per E-core block. Given you’re seeing a mix of IO related errors, including the out of VRAM errors (which is likely triggered by a PCIe operation failure), my gut says this is the system agent logic in a bad-state, specifically due to failing ring signaling at one or more ring agents or in the system agent itself, blocking and going through some form of error / bad-state recovery. My guess would be that this is due to electromigration degradation of the ring interconnect logic due to the very large Raptor Lake 900MHz top end ring clock bump by Intel, versus Alder Lake. This is why we’re seeing these issues in 13th and 14th gen, but not 12th gen. What’s concerning is that if this is where the problem is occurring, than Intel appears to have not implemented internal telemetry, or not exposed it via CPU or other system driver, to drive WHEA reporting around ring interconnect and the system agent. Given the complex nature of the SOC, with its asymmetric architecture, many clock domains and sophisticated “system agent” IO block, that would be bad if true.
If the ring causes those issues, I wonder if raising ring voltage could solve those issues or not. For example, I had 2 i9 13th gen, one is 13900K that I sold for the current i9-13900KS, both runs stable at 253W, 400A IccMax with BIOS default settings (I set my power limit lower than 253W initially due to having a 240mm AIO). I did turn off MCE in BIOS and set my custom load line to undervolt the chip. I used the previous chip for about a year with no noticeable degradation. I run both chips with SA voltage of 1.35V for my DDR4 ram OC. Both chip had ran mining softwares, Prime95 and other stress tests. I suspect that for the degradation to happen. most people did not configure power limit, they let their chip runs with high voltage with high current, and that cause fast degradation. Also, their silicon quality is probably average or below average in the first place.
@@kevinzhu5591 it probably would further accelerate degradation of the ring bus and lead to even more crashes my bet is that intel has a design flaw regarding ring bus which got exposed by their lack of regulation regarding power profiles which slowly cooked ring bus on stock speeds and fried it when people did do some overclocking
@@kevinzhu5591 Wouldn't it make more sense to lower ring voltage and clocks a bit? It shouldn't make a huge difference in terms of performance to reduce it by 300mhz or whatever.
@@ahs9674Doubt it would help. 12th Gen have had (minor and not often but I've encounter them way to many times...) problems with asymmetric uarch, 13th & 14th not only suffer from that but you have to have in mind that they do bent, are made using Intel 7 process that took ages to be "mature" enough to be even considered viable to be used for manufacturing and so much more. There might be a reason (beyond being behind with their own foundries and R&D on advanced nodes) why Intel is hell bent on using TSMC instead of their own process for CPU tile.
@@aminorityofoneAbout 30 seconds later, we're pushing our customers towards the 7950X platform. Which is exactly what I would be doing in their position.
Yeah -- this is somewhat surprising from intel. This is amateurishly like the late 2000s AMD graphic card performance. I bet its heat problems from bad caps or arrays or interdigitated caps. If its design then, that's really really going to be surprising. I've fixed my AMD processor from swapping out dry caps before.
And they're likely about to get completely shafted by ARM with the mass migration of both datacenters and consumer computers to ARM chips (or other RISC designs) due to the huge drop in power consumption that ARM brings.
@@samwalker7567 Yep. I wouldn't buy any PC right now if you can hold off. Things are changing, and ARM seems to be poised to dethrone x86 in the coming years. Best to wait and see what the industry looks like once the dust settles.
@@dschwartz783 Disagree, if the transition happens it will: a) take years to acquire significant share in the desktop market, and b) there will still be a large marketshare of x86 for years after that happens, so support will remain for quite some time after. Makes no sense at all to hold off buying a PC in anticipation of this platform.
On the warranty repair side I have seen a major shift in the willingness to accept that an Intel CPU might be faulty. It went from a multi-hour process to get a processor RMA approved to a self-approved process. I just order the CPU replacement and I'm done. It's akin now to ordering a replacement memory module or SSD under the warranty process (this is at the OEM level, not with Intel). Talking to other techs, we have seen a HUGE increase in CPU failures - almost always seen as general instability with a longer running process (which is a really hard thing to verify during a warranty service, so we do typically take the customer's word for it). I have tracked down the problem on a few of them to the memory controller contact pads - often you can even see visual discoloring of the pads from prolonged heat. The instability in those cases seems to come from the CPU getting hot, changing shape, and the signal integrity to the memory modules shifting... then BOOM, you're unstable... until it cools a little, then you're okay. Basically, I think the problem is the shape of the CPU.
Is this also affecting the 14900hx laptop cpu? I just got a laptop with one and could probably return but got a really good deal and has a good mini led screen and would get so much use out of it for the whole almost a year till next model laptops are out with the 50 series rtx so really don't want to return.
back in the days the cpus were binned with a way looser margin of error, in some cases you could easily make the cpu run 25% faster with a bit more of cooling, but once they caught wind of overclockers pushing the cpus so hot, they started to push power into them themselves, to make these margins extremely tight and cpus that are edging being defective are still sold, most people don't stress test their new cpus so by the time they realize they are defective it's too late and they get away with it
I nearly cried watching this. Someone else that has spent countless hours going over crash logs!!! Someone else has felt my pain!!! I'm not alone! Anyone considering a job in IT.... This is your possible future!!! and its a thankless job.
My god this is so useful. I am on my third 13900K in 18 months. First one ‘lasted’ a year (3 months of that was me getting more frustrated trying to diagnose wtf was happening as more and more and more games failed). The pain was that it came out as IO errors, memory errors, everything except the CPU. In the end I found a forum thread where someone mentioned it. I got it swapped under warranty and the second one began to fail identically 5 months in. Third one is still quite new but if this goes as well I’ll be sharing all of this with the builders and telling them they can either refund the entire PC or put something else back in. I have no overclocking. The games that crashed were often games that a 13900K wasn’t even remotely needed for - like WoW - and it was absolutely infuriating
I am surprised people even buy intel. Even without this issue - you buy intel and 6 month later 50% of performance are gone to fix all the security issues.
The 14900K chips are degrading insanely fast. My 14900K on day one could run at stock frequency with a -0.075 voltage offset on the last 2 steps of the V/F curve, without any WHEA errors either at full load or idle. Every 2 months or so, I would randomly find a couple of WHEA errors which were instantly fixed by increasing the previously mentioned voltage offset by +0.05v. This week WHEA errors began showing up again. Now the CPU needs to run at stock voltage as is unable to handle any negative voltage offset whatsoever.
@@douglasmurphy3266 I'm using an IceMan Direct Die waterblock with an external MORA3 420 radiator. My 14900K max core temp at full load is 70C. Temps are not an issue.
@@genejones7902 There are specific WHEA error codes that are related to CPU errors. Whenever I get them every couple of months, increasing the voltage slightly is the only way to make them go away. So yeah, its a CPU issue.
One of the biggest take-aways from this video is the price difference on service agreements between a 7950X ($139) and a 14900k ($1,280) server - "3 years parts & labor, 24/5 - Next business day onsite repair - zone 1". For the 7950X, that isn't covering the cost of a single hour onsite repair technician. For the 14900k, that is slightly less than the cost of an Asus Pro WS W680-ACE ($330) + a new 14900k ($600) + 4 x 48 GB of G.Skill Ripjaws DDR5 5200 RAM (2x$190) (total of $1,310). You can basically do a complete hardware replacement, AT RETAIL COST, for the price of that service agreement.
If 50% of CPUs are failing one way or another but the other 50% running perfectly fine I would closely look at the production starting even from the silicon provider and its quality.
Smaller you go well chip lottery that's why I don't see the point in clocking oh I can get 6ghz on water yea how long does the cpu last its equivalent to a top fuel dragster being pinned to its max breaking point intel has pushed the silicone far enough that it's getting bad yields some chips are good alot of chips are degrading faster due to the whole chip getting intense heat that's the problem 1nm chips are gonna have problem we need to move to a better Base substrate graphene or carbon substrates
Crash rates measured in weeks is not acceptable for consumer products. If it's measured in months you still have an iffy product. If you see game logs where crashe intervals is close to hourly... the whole house is on freaking fire!
Standard SLA in the industry is 99.9999%, so you can have about 30 seconds of downtime per year. With this high of a failure rate keeping that level of SLA is simply impossible. Even for the cheap providers that have just 99.99% SLA keeping the downtime to 5 minutes per month might be hard.
@@hubertnnn that's the marketing and contracts. In actual practice it's often lower than that for pretty much all hosting providers, even big ones like Google/Amazon/Microsoft or OVH. Source: trust me bro. They refuse to acknowledge to us a bunch of small incidents and stuff during their own upgrades so they can keep claiming such high SLA wasn't breached
But to be fair, datacenter shouldn't even use consumer class CPU. 1 of the biggest difference between consumer class and enterprise class CPU is ECC RAM support. If you want 24/7 operation and stability is very critical, ECC RAM is a must have feature which only Xeons can provide .
And it's a lot worse than this video says, because when he says "25% of systems are having errors" he's not counting all the ones that are already dead or RMA'd, because they can't produce bug reports... If we assumed even half of the buggy chips go back to Intel at some point, the math would put it at at least 33% dead, which is absolutely disgusting.
"The system just gets miserably slow for up to a minute before an actual crash" ...uh oh. I don't get crashes in games, but I *do* I get this kind of behavior from time to time just at my windows desktop on the 13900K if I leave it running for a long period of time (like, more than 1 day). I'd start getting weird things like programs hanging in the background, Explorer stops responding to inputs, even CTRL+ALT+DEL doesn't work. It never actually has a hard crash, but it never recovers either. Only fix is a full restart, or hard reset at the power button. Guess I can expect this to get worse?...
I mean, Intel had to do a massive recall back in the day (1990’s?) when their CPU’s literally couldn’t do simple math instructions consistently. They didn’t exactly fail, they just never worked in the first place because of a critical design flaw.
@@nessotrin If you Google it you might find some information. This was back in the early 90's. The company I worked for had the issue with our SPARC system. SUN made customers sign an NDA if I recall. I don't recall how widespread it was. I did a search and only found limited mention of it from some old forum posts...
@@nessotrin I think its this, en.wikipedia.org/wiki/Pentium_FDIV_bug , Its floating point math would mess up a small amount. But that error could compound and cause problems depending the calculations. I recall having a professor say he had to redo part of his thesis due to this problem.
@@zodwraith5745 Or they do know, but it is a hardware problem that they can't patch with a microcode update.. So they just try to play the long game until their next gen comes out.
@@kyu9649 I'm not saying that's not a possibility, but to automatically assume as much would be highly unfair. Intel is WAY too big a company to cover up the kind of conspiracy theory you're insinuating without a leak. They haven't done anything to try and cover it up and have acknowledged it publicly. They just haven't provided a solution which just circles back around to my previous comment. They have a hard enough time keeping a lid on leaks they DO want kept secret like future products. This would be way too juicy for a leaker with all the coverage it's had.
@@zodwraith5745they know, they're trying extremely hard to patch it via microcode so they can come out with a fix at the same time they acknowledge the failure. Welcome to the age of publicly traded companies, their shares will tank for sure, they're just trying to control how much they crash.
@@greebj but we don't have a zero. We have an insignificantly small number: iirc 4 out of ~1500 documented failures, or 0,25%, when 30% of the population use AMD, is a clear indicator that this is an intel-specific issue.
I’ve had so many bizarre issues lately and I’m now wondering if they’ll disappear if I swap back in my 12900k instead of my 13900ks 🤔 A couple of them highly repeatable but one that others can’t reproduce
@@ThioJoe Just try it, for science! It wont be that much of a performance drop, and if you d hopefully get a more stable system out of it until this is resolved ;) Post your findings on the L1 Forum if you actually do it :D
I work in commercial IT, we have had 30+ PCs where the intel CPU essentially died, this has been affecting them all from 10th gen to current. Recent one was a month old 12th gen 1255U. Affects desktop and laptop CPUs. Larger companies that deal with more PCs have had a lot more of the same.
I work in Labtop repair and our main customer is a Major IT firm that probably everybody knows (used to be very innovative in Hardware). And we've got Lenovo T14 Gen 3 (12000-series Intel CPU) and T14 Gen 4 (13000-series Intel CPU) dying like flies right now. Sometimes the replacement system bosrd dies with 2-3 months also. Could be the undersized cooling, could be the horrid Lenovo Thermal "Paste" (only at first*) or something else. *We replace the shitty Lenovo stuff with something a lot better as part of our service, especially when Systemboards/Mainboards have to be exchanged. Edit: typos, damn phone!
@@alexanderzawydiwski9534 Its normal for some sata slots to not work, as well as ram not being detected in certain configurations. There is manual included with motherboard for a reason, some configs working well is a miracle (like 3 mismatched RAM sticks, sometimes it works, sometimes it doesnt, game of luck with ANY processor regardless of age, there is also good reason why mobo makers have a list of supported memory modules and configs for each one).
Given the fact that 13900K has been on the market since October 2022 and this issue hasn't been fixed, it must be a design flaw. It's shocking that i9 still die at W680 motherboards.
This isnt armchair science my man. This is absolutely real research. You’ve negotiated access to data not really available to the public and doing real research on the data instead of making educated guesses as to whats going on which is what armchair science is
Honestly not surprising at all. RAD game tools are some of the most mature middleware in games (probably as close to bug free as you can realistically get without formally verifying your software). If their software is suddenly failing, they're going to want to know why.
And Oodle is deployed basically everywhere and very stable outside of this situation, otherwise all our games would be crashing constantly no matter the hardware.
I made a video about this a year back, a long with Tech Yes City. A lot of people called me crazy. Stability issues, crashing, latency... List goes on. I appreciate you talking about this.
I am also troubled by the lack of clear messaging and solutions from Intel. We do need more transparency to better understand and manage these hitches.
The sadest part about this is that I have been telling users it's their CPU ever since it started poping up en mass with UE5 game releases, so over a year now. Hundreds of cases accross multitude of games even outside UE5. And even if I link them the articles now that mainstream media picked up the story they will not even bother reading it and simply reply "but it's only in THIS game".
I bought a 12900k 2 months before the 13900k. I've been having the same issues with the 12900k since about 5 months after using it. I believe the TDP on these motherboards have degraded the chips a bunch. I also have beefy water cooling that keeps the temps around 80 at full load. - The bios probably called more power because of the cooling efficiency on mine and burnt the chip up.
Wow, it is great to hear people talking about this. We deployed some i9 machines at work, and after awhile 2 of them started crashing for no reason. The, usually really good, Dell diagnostic software could never catch it. Reformatted, replaced RAM, SSD, etc, no luck. I saw deep on a Reddit thread to disable turbo boost and it resolved the issue. I gave this information to Dell and they ended up replacing the CPU on these machines. I've been helping desking and admin-ing for nearly 10 years in various sectors, and saw less than 5 CPUs die in that time, then I had 2 back to back. Was mind blowing
Thanks for this breakdown of your hard work. It reinforces my decision to change from Intel to AMD for my most recent build. I migrated from a 10900k to a 7800x3d, and so far I am very satisfied.
My first two computers used Intel. And once I saw AMD really supporting their sockets for longer, I decided to make the jump and couldn't be happier. More powerful for what I need and less power hungry than Intel.
After watching the video and thinking about your findings, the thing that stands out to me is your observation of the tick rate on the game servers falling by 50% before the crashing. This tells me the CPU pipelines are being invalidated, which is what's impacting performance so badly. Whatever is happening to cause these pipeline invalidations is the real culprit, and I have a suspicion about what it is. I believe the CPUs have new microcode to detect attempted side-channel attacks, and when that is triggered the CPU intentionally invalidates the pipelines on all cores and additionally flushes the L1 cache. This would immediately flush any security keys from the CPU causing the side-channel attack to whiff on stealing something valuable. Next, the L1 cache needs to be reconstituted, including reloading keys. With NVMe drives, you can realistically expect 12+ CPU cores requesting the same key file simultaneously would work fine, but I suspect the IO errors being encountered are OS-level locks causing the cache reconstitution to fail, thus crashing the process, and possibly even the kernel. Another cause could be actual side-channel attacks, and the CPU trying to protect itself. If the defense strategy chosen by Intel is to crash the CPU rather than allow compromising data then they would not want to publicize that because now you have an instant mechanism to launch a DoS attack against a service or service provider. For end users, I think that certain users are affected more frequently because their system has been compromised and is under frequent side-channel attacks. So this begs the question, which is worse? Do we have less active side-channel protection, probably increasing risk of keys being stolen, or do we have active protections that ultimately choose to unalive the CPU rather than allow keys to be leaked? I would see if you can dig up any data on the occurrence of side-channel attacks on these CPUs compared to the older 12900k systems. Final thought here, the reason that slowing down the memory improves stability is that at the lower clock speeds, the mechanism in the CPU that monitors for side-channel attacks is able to better analyze the flow of data coming into the CPU and makes less frequent false positives that trigger the failsafe mechanism.
I'm seeing a lot of people confused. It's not just i9 chips! The 14700k is also a POS. I've seen others with issues as well. People talk about the 14900k more but it's still a problem for the rest of us with 14700k chips too! I have not tried to overclock mine. it's running with DDR5 5600 48GB corsair modules. Custom loop with 420mm + 120mm + 280mm radiator. 6900XT GPU. It crashes in two different operating systems! I dual boot Windows and MidnightBSD. In the latter, LLVM clang will start crashing during long compiles (10+ min). On windows, I mostly game and that is also unstable. I made all the recommended changes that intel proposed. I turned off asus mce on the first day of ownership last november. Worst CPU I've ever owned. It's slower than the 3950x i had before for compiling and with all the nerfs to make it stable, it's starting to lose the gaming advantage. Huge waste of money.
@@ChrisL-d4c 25% failure rate means 75% with no issue. There's no guarantee you're going to keep being this lucky in the future though. For many the problem starts small and progresses.
you don't have to "try" overclocking it, turbo boost is usually turned on by default so it automatically overclocks for you. If you want to avoid overclocking you'd have to turn off turbo boost.
I bought my 13900k in October 2022. Many 13900k and 14900k owners didn't experience any problems until this year. I find it very odd that so many of us experienced issues pretty much at the same time simultaneously. Most would say the power profile caused the wear and tear but practically all at the same time? Different motherboards, different profiles that are aggressive and non aggressive? Different levels of use? Many working PC gamers aren't on their computers for long periods. Really strange.
This is slightly terrifying. A problem that impacts perhaps 0.1% of enthusiast customers is a poor experience, but I can appreciate how difficult it would be for Intel to root-cause the issue. If it's 10%, that speaks to a product design issue which *should* have been caught much earlier. Intel's own internal testing should have caught this and resulted in mitigation before customers noticed, and preferably before the chips shipped.
Thing is, if its getting worse over time, this smells like hardware defect in fabrication or material failure, not design flaw. I would not be surprised if we find out this down the line.
@@ponocni1 But then you should see the same issue cropping up with i7s and i5s, which we don't. What's so different with the i9s? If it's that 8 e core cluster then why doesn't disabling e cores help?
@@zodwraith5745 i9s are binned for higher max boost clocks and more power, so maybe whatever criteria that requires also makes them more vulnerable to this problem if it's an issue with them having pushed their manufacturing process to the breaking point.
@@zodwraith5745 It is happening with the 14700k. A number of people have reported it and I've seen it with mine. Bios updates helped some and I have run it with Asus MCE disabled the whole time. I've got a custom loop so it's not a cooler issue. (420mm + 120mm + 280mm rads too)
@@LucasHolt But what is the problem specifically? I've heard it seems to occur mostly with specific games. I haven't had a hiccup from my 14700k but I'm on MSI. Keep in mind most of these issues were on ASUS boards, a few with Gigabyte, and I've never heard of an MSI or Asrock occurrence. Although I think the narrative changed fast because Intel is such a juicier target. But what Wendell is talking about in this video isn't Z790, he's talking about W680. *_ASUS_* W680. A SERVER platform that uses consumer CPUs. That means 24/7 full tilt all cores running. That's why the error rates are so much higher. I haven't had an issue but I wouldn't be shocked to see my system throw up an error every few hours under _that_ kind of abuse.
This is unfortunately not uncommon when large companies deals with a big design/quality issue. When you have a reputation for quality... the last thing you want to do is openly admit you made bad things (and this also can open you up for lawsuits- like these server support companies looking to recoup all the extra service costs, or the company that lost 100k in their player base due to instability). Say vague things to sound like you care while claiming some type of user error. See how Toyota handled the 3.0 V6 oil sludge issues (but that was more a Toyota cultural thing), and how Apple handles EvErYtHiNg. Intel has been playing catchup to AMD Ryzen for years now.. after making fun of the chiplets being "glued together". All they really have left is their reliable reputation... so they are intentionally keeping tight lipped and hoping the media coverage of the issue just "goes away" after they release their next CPU generation - one that actually will be new and not just the old ones with more voltage.
@@damara2268 I bet Intel is really glad CrowdStrike pushed out a global Windows crash update filee. That gives the media something ay meatier to talk about versus Intel making bad CPUs that degrade quickly even when not OC'd.
I am so glad I never upgraded my 12900K, it still work like a beast and I play games around 100FPS plus minus 20, mostly so I don't really need ultra fast CPU. My Mobo is also 5th gen PCIe slot, so probably not gonna upgrade for a few more years, but when I do I will switch my 12900k and Mobo to AMD setup.
Maybe you should do something similar to what Steve/GN do sometimes and set up some form of communication so devs/data center admins can share data to hopefully get closer to a real answer? Heck, maybe even get physical examinations of problem CPUs if possible
Based off my experience overclocking these things......I'm pretty damn confident in saying the problem lies in either the ring bus or the system agent is just at it's damn limits. I would more than be willing to bet on the VAST majority of the true raptor lake dies that are failing are all going to be larger 16e core dies. It never sat right with me how wildly the power consumption on these chips was affected by the ringbus and how minor bumps in speed took absolutely shocking amounts of power and tons of system agent voltage fiddling. In retrospect, this explanation falls in line with how cagey intel has been about the SA voltage in the non-z chipsets. Basically we're seeing the limits of one of these two things and that it's so close to the limit that natural degradation of the silicon will cause faults.
The craziest example I can remember is an alder lake cpu that would do 5ghz ring.....at about 400watts. Drop it down to 4.8?...300 watts. Drop it to 4.6...250...and 4.4 only took 220-230ish. E cores were all disabled and the p core settings were the same on all tests...the only thing that changed was the ringbus/cache.
Seconded. I high suspect uncore interconnect degradation being a proximate root cause. Linpack won't catch it failing until things catostrophically fail later down the line. Harder to validate. "Just shove more voltage into it and it'll be fine. How long do you expect users to run these chips? 15 years? They'll be obsolete in 15 years." Try less than 1 year past intel execs. I run my 13900k with turbo disabled so basically sitting pretty at 0.8v vcore about 95% of the time. Bought it second hand for $300 about 5 months after launch due to "issues hitting ddr5-8000" pre-dellided with a custom copper ihs. Person I bought it from got a second chip direct die cooled it and from what I can tell that new chip worked just fine. Likely doesn't anymore. Lol. Probably swapped it out for a 14900ks. My chip still shows signs of degradation. SA was at 1.3V and vdd at 1.35V for 96gb dual rank ddr5-6800 for over a year. Now experiencing random latency spikes in the past 6 weeks. Even reinstalled windows as it's been over year but no change so it's not software related. Memory test passes at 32 hours as I don't have time to test for longer. Here's to hoping whatever releases this year is better than the hot garbage intel released or I'm switching back to AMD. Intel cpus failing were unheard of before really 11th gen. Started going downhill quick. 11th gen also had SA/IMC issues compared to 10th gen. It's telling intel is being hush hush about this as it's a huge blow to their QC reputation.
Could it be that this is due to the AVX offset was not triggered probably in light tasks such as gaming? I mean we will never see this in all these stability test then, because the load is heavy enough in these benchmarks for AVX offset to kick in
@@quanlethienminh6002 It's not likely to be an avx offset issue. The way to understand what i've pointed out here (to the best i can oversimplify it)..is the ring bus is like a road/traffic network between all the cpu parts while the system agent is the traffic cop/monitor/system. The pcie controller and memory controller all sit on the ring bus. In the scenario where the SSD corrupts....the data can't make it from the cpu core down to the pcie controller. Add this explanation to what wendell is saying and it should make things much clearer as to wtf exactly is going on. Disabling/slowing down stuff doesn't change the fact underlying infrastructure is just pushed to it's limits the way it is.
Iirc, the 1700 socketed Xeons... have avx512 by virtue of having no e cores......if the xeons are ecoreless dies and no failures exist..we might having something of a smoking gun. I havent touched a 1700 xeon...and havent seen a delidded die shot
If the CPU performance halves before crashing maybe it is the IMC and memory transfers stall causing bandwidth starvation. This would explain why most pure cpu metrics continue to look fine. I'm curious if the issue reported is exclusive to ddr5. Megekko (a big Dutch retailer) has an informal policy to push ddr5 dual dimm over quad dimm for stability reasons but they didn't specify this would be for Intel only.
@@someperson1829no the problem is Intel and goes well beyond overclocking when it’s seen on boards that actively avoid overclocking. What we where seeing is those systems failed sooner due to it but failures are starting to be seen across the board on systems that were well within safe limits.
@@someperson1829 standard specs for these chips seem to be already quite far off the good envelope and inside a territory of power consumption, stability and longevity that would normally not be accepted from a chip at factory specs.
Most of the comments are saying that there may be design issues but it got me thinking that maybe its a test, qualification, and reliability process issue. Those 3 are internal groups in all semiconductor companies that makes sure parts are good when they go to the customer. Now it got me thinking did Intel nerfed those groups to resolve the yield issue they were having several years ago?
I like that they want to keep places honest. I just want a standard. Don't favor giving one company a chance to explain and make steps to fixing things then not go back to other companies that did the same. If said companies didn't change let us know too.
After a 6 month battle with Intel regarding an RMA for my i9-13900kf I finally received my replacement CPU and I can confirm it is 100% the fault of Intel. I had game crashes, BSOD, and random PC shutdowns. This is a problem that Intel seemingly doesn't want to acknowledge. I have not received any reassurance in regards to the issue not happening again. Super disappointed and dissatisfied
I chose to get a drive with on board cache instead of host platform memory specifically so I don't have to be nearly as concerned about drive corrupting internal mapping information just because i got a little spicy with cpu overclock.
I have an 13600K with multiple NVMe SSD drives in Linux and haven't had any issues at all in the 18 months I've had my system. I don't overclock and my DDDR4 RAM sticks don't like XMP so run at stock speeds, so it's quite a vanilla setup (ASUS motherboard with defaults for all the CPU settings).
Thank you. It's always interesting to hear about modern trends in data centers. I don't work in gaming, so we don't use high end consumer CPU's, but this is worrisome because it shows that the latest processors from intel are truly the bleeding edge.
Love that KDE Plasma wallpaper on the screen in the background :) And I am so glad that i went with Ryzen this time. I missed the AM4 era but I sure hope AM5 lasts as long as AM4 did.
mine is also fine, however I noticed a problem in a single core if I overclock. solution was to just overclock the other cores. but mine is direct die so the temp is like 40c under load.
My 13700k / 4090 build has been pretty much flawless for almost 2 years now. Even started out of a ddr4 board for about 5 months until msi released the z790 tomahawk ddr5 board. But 13700k oc to 5.6ghz on all 8 p cores, All 8 E core oc to 4.4ghz, Ring oc to 5.0ghz at 1.34v. Even the mem controller has been very solid. 32g.b Corsair rgb vengeance 6400 cl32 hynix A-Die oc to 7200 34-41-41-83 at 1.45v. On a msi z790 tomahawk. I can't complain. I've also been using a contact frame since day one for both boards I've had this 13700k in. I've heard a lot of the problems could be related to the bend they get with the stock mounting bracket. But until intel releases a statement / fix we really don't know what the issue is. I've used mine hard for over 8 hours a day every single day for work and gaming for almost 2 years now and it runs perfectly fine. I haven't babied it and I won't. I bought a k series for a reason like all the k series intel cpu's I've had.
This is an incredibly wonderful video/investigation. I'm living this myself right now. Actually went through the RMA process with Intel, and they specifically said they don't have ANY replacement processors at this point - all out of stock. They are actually giving me a cash refund instead. There is definitely some major drama going on behind the scenes over there and I really am appreciative of this video starting to uncover it. Hopefully Intel will be forced to disclose some info soon.
Its not just 13th and 14th gen. I had this exact same problem on a 12700k. Just erratic gremlin type behavior that i was never able to resolve until i replaced the cpu. I litterally replaced every other component.
At this stage in the game if I owned a 13th or 14th gen Intel cpu that showed any signs of crashing or performance degradation I’d be sending it back to Intel on the first thing smoking wether it was under warranty or not. Intel needs to admit there is a problem and recall the vast majority of these chips (after correcting the problem, which I’m sure they probably have with newer built chips). They are basically going to pull a Nintendo and never admit there is a problem (think joy con drift). As time marches on I feel for people who’s chips are either going to fail out of warranty, or won’t perform as well as it did new because of degradation or bios updates that will gimp performance.
A faulty CPU can cause all kinds of funky behavior, since it processes literally everything. VRAM "errors" can be as simple as CPU failing free space > allocation check in DirectX API somewhere, I/O errors can be data inside caches being corrupted/CPU failing to calculate checksums or whatever. Speaking of weird slowness behavior - kinda sounds like clock stretching , Intel CPUs have it as well, 12 gen and later (may be even earlirer). Slowing down memory does reduce effective CPU load, disabling E-cores lowers the temperature of P-cores, both of which make CPU more stable at a given voltage+clock speed combination. -Seems a lot like Intel/mobo manufacturers/UEFI f-ed up clock voltage tables badly, or boosting behavior being too trigger happy to reach max clocks.- Would like to see results of manually lowering clock speeds. Some Gigabyte mobo's happily supplying 1.7 V to 100 C 6 GHz CPU makes damage irreversible (albeit with some weird settings, but it''s kinda ridiculous that it can go that high without user intentionally typing 1.7 into CPU voltage, just by weird interaction of extremely aggressive boosting and LLC). Also, isn't it funny how strong Intel (OEM?) mindshare is? AMD has been crushing gaming since 3D cache release, and has been competitive for 7 years, yet there's still 70/30 split there. UPD after watching GN interview: oooh, seems like everything is much worse than I thought. Laptop CPUs are failing, 13600Ks, heck, even non-k SKUs do. I'm wondering, how big is system agent and cache voltages are, and how did they change from Alder Lake
yeah, when the cpu starts getting into that fuzzy logic territory where some bits start flipping to the wrong side, memory/pointer errors just launching a program aren't out of the ordinary, i guess it can go to shit with the PCIe addressing all the same :D
@@MrKatoriz are you out of your Vulkan mind? I feel the issue is they're K SKU chips. Intel shouldn't even be making those. They're just asking people to break them.
Needed a new virtualization host for @ home, waited until the company i work for rolled out massive amount of new Intel machines. Listened to what was being told on the work floor about the new hardware over some months. Did buy AMD 7000 series because heard nothing but problems on Intel at work. Even when it did not crash then everyone was complaining about the big little architecture, which yes you can disable or do core affinity. But be serious who wants to do stupid settings like that when the competition just has something that does not need that kind of tinkering at all. On AMD i was very careful as well to buy the right combination of hardware. Special for MB and memory combination had to be a verified set. In the end i did buy a B650M AORUS ELITE AX ICE and CMK192GX5M4B5200C38 and for storage i plunked the board full of these: Lexar NM790 4TB. I can say i am very happy with my home server.
Great video, Wendell. I had to swap a 14900K in my desktop that had these problems. Two weeks later the news about the issues started popping up. So far so good with the replacement running at Intel "Extreme" spec.
I love how people ignore the goings on with the 12900K & KF's which are suffering the same issues be it not as bad (lagg/stalling but not hard crashing)
well I think most ppl that know how this stuff works knew something was wrong with the chips more than just bios when the engineers have not had an answer within a few weeks ... now the months with no word of a fix, it's pretty clear that the chips must have an issue with a batch of chips or design issues somehow not picked up during the years of design or contaminate as u stated that could be causing issues... I'm so happy to be on my x299 i7-7820x 5.1ghz cpu with up to 10th gen CPUs i could slide right into my mobo and not need a new mobo. Tried and true and no problems like this...
I'm curious if you've found much data involving non-i9 CPUs, or rather, specifically involving the i7 CPUs. Here's why: TL;DR: I have two i7 series, a 13th and 14th gen. The 14th gen is currently experiencing the problems mentioned. I believe it is cooked, and will be RMAing it shortly. Long Version: I've got a 13700(non-K) and a 14700K. The 13 ran so hot with an air cooler that I decided to switch to watercooling for the first time ever. I figured since I had to redo that much, I might as well upgrade and bought the 14 (this was before I found out that the 14s were just rebadged 13s). Got lazy, and kept using the 13+air for a couple more months before I finally put in the work and did the swap. That was back in late December, early January. Before I continue, I should note that even though I bought a K CPU, I did not do any overclocking besides what the default settings on my Gigabyte Z790 Aorus Elite AX mobo was doing. I even had XMP disabled on the RAM because testing with Cinebench showed no difference between XMP Off (4800MHz) and XMP On (6400MHz). I left everything else untouched, including Gigabyte's default "Optimized" settings, which I assumed meant 'optimized for performance + efficiency/power', but turned out it meant 'automatic overclocking'. A couple weeks ago, things started getting funky. The game I've been playing (Sons of the Forest / SOTF) was running fine, but other applications around Windows were running slower than usual. Then I went to host the game one night, and the machine BSOD'd... then began boot-looping. Eventually, I got it back into Windows... briefly. It BSOD'd just loading Discord. It began bootlooping well before getting into Windows, and even got to the point where it was experiencing hard crashes just sitting in the BIOS. I did some BIOS flashing to bring it up to date, and the machine started booting a little further again, but still resetting before loading into Windows. I had just made a Ventoy USB stick a day or two before the issues began, which included Linux Mint, which includes Memtest, so I ran that. Tens of thousands of errors began accumulating within 10 seconds of starting the tests. A bunch of troubleshooting and configuration changes later, I was convinced it had to be the CPU, so I dropped the 13700 back in. Worked flawlessly, and faster than the 14700K had been working in at least a couple months. I was ready to call it and see about sending the 14700K into Intel when an annoying little thought popped into my head: What if it's just a pin-contact issue? What if it works if I pop it back in? So to avoid Intel receiving it and telling me there was nothing wrong with it, I tried it. It worked! No BSODs through several Memtest runs, several Cinebench runs, or playing SOTF. So I assumed it was just a bad pin contact or something to that effect. That was a week ago. It has slowly been degrading ever since, to the point where it took 20 seconds to load the Windows calculator. It now fails to boot and the BIOS interrupts and tries to get me to revert to "Optimized Defaults", but it turns out I can choose "Enter the BIOS" and "Save & Exit" and it will boot fine. Choosing "Exit without saving" results in another boot failure, though. Other desktop applications were running obscenely slow, too, but oddly enough, SOTF ran fine. I found Gamers Nexus' video on this topic and watched it. Went into the BIOS, disabled the iGPU (being used for my second, non-gaming screen), disabled TurboBoost, disabled Gigabyte's overclocking, and turned off every other performance-enhancer related to the CPU, along with turning off the E-cores. I even dropped my RAM from 4800MHz down to 4000MHz. Now, instead of sitting around 60ºC while playing SOTF, it sits around 39ºC. According to HWMonitor, the core voltage doesn't go over 0.925v. But it still fails to boot without pulling the trick above. So I think my 14700K is just as cooked as the 14900K's you mentioned. I don't think it's limited to the i9 series.
From my extensive overclocking on my 13600k I've had this issue as well. What I've found from my testing is that it seems related to heat/power spikes. What I've noticed is that when at the edge of the thermal envelope +80c, that there could be times where it spikes over 100c but not record it because of how fast it can spike before the temp is reported. But ultimately, the instability comes down to voltage fluctuations and temp fluctuations. When you lock the voltage and keep the temps below 80c on max prime95 loads you'll have a pretty stable system.
@@EJM07 I am currently in my "summer clocks" Which is standard boost clocks on both p and e cores, but massively undervolted. By default my chip wants to run at 1.37v to achieve stock boost clocks. However, I can achieve this with 1.16v on load and 1.08v on idle. Yes, I have my load line calibration set where it adds slightly bit more voltage when under heavy loads so it doesn't have vdroop. But with this settings I always have a temp 6-7c above ambient on idle and temps not exceeding 75c on full load. This is even with ambient temps of up to 38c. Yes, ambient temps of 38c, that is not a typo, and yes it sucks badly when it's that hot in here. Specially with 70+% humidity.
@@chieftron i feel ya. high temp + high humidity is my pain also, it just hits differently. intel's solution to faster cpus seems to be just more voltage on the cores lately. i've seen some crazy voltages in boosting cores on some 12th gen. always thought it was a misreport from the tool, but now? not so sure anymore.
Genuinely one of the most interesting videos I've seen in a while. Really love your clear, specific and entertaining communication style. I love how you're very clear about what you do and don't know.
"Making the memory run dog slow seems to be the most effective" >.> This right here is exactly why I even run a xeon even in my gaming system. I had a bunch of random crashing with 1st gen Ryzen with a 3200mhz memory kit, not even particularly fast, but I thought "well it is first gen maybe that's just what I get for giving it a go" so I upgrade to 2nd gen Ryzen, it's not as bad but it still crashed more than I would have liked, I swap the RAM kit and it's the same. When 10th gen intel is starting to see discounts after the 11th gen launch I grab a 10850k on sale and sell my ryzen platform, thinking that maybe the intel chips are just more stable right now since I never had issues with haswell, cascade lake did not improve things at all if anything it was worse. I drop the memory speed down from 3200 to like 2666, better but still not nearly as solid as the system I had 4-5 years prior, I decide well I've changed everything except my GPU at least once and the Vega 64 is getting long in the tooth anyway, so I'll get a 30 series card, nope. Eventually last year out of sheer frustration at frequent crashing in a few games I drop the 10850k and buy a used Xeon W-3235 and a Supermicro board with the most boring 2133mhz 6 channel ECC memory kit imaginable, and I have once again found a level of stability I think is acceptable. Like intellectually I know, and knew, that as speed increases a greater degree of signal integrity and more precise timing are required to make faster speeds function, but since people weren't really complaining about it with DDR4 systems I thought it had to be my hardware, then here it is en-masse with the latest DDR5 based systems. In a way I feel vindicated that the crashing issues I've been seeing have been getting worse as systems have gotten faster, it also makes me glad that instead of buying 13th gen intel I went the xeon route because I felt like the underlying issue was at least related to memory speed and there was no way I wanted to enter in to a still dual channel DDR5 system.
I’m really curious if this will ultimately lead to a recall or a class action lawsuit. Also really curious if AMD sales reps are bringing the instability issues up when they’re talking with data center clients. The fact that this is happening with workstation boards that aren’t overclocked and are much more conservative with power management makes me think this is a flaw in the fundamental design of the chip. But if so, one would think similar problems would be appearing in Xeon chips too since they share core designs, right? Very puzzling…
Mate, after the way Intel behaved in the early noughties against AMD, I sure hope AMD are bringing this up. 1. Prospects need to know that after months, Intel hasn't root caused the issue or as Wendell said, clearly admitted that effected clients will be made whole. 2. It's glorious when Karma scores against arseholes like Intel. As an aside, I think the game companies who collect telemetry have an opportunity to leverage that in order to help get to the root cause. If it's not sufficient (as Wendell suggests), this could help the Data Scientists to refine the collection to improve this.
xeons have a whole different design philosophy behind them, and bring intel the bulk of it's money. they wouldn't sabotage their income for a headline. remember, intel laughed and derided amd's chiplets. then amd slaughtered them, intel panicked and tried to copy them. pretty sure intel thought they could get away with blaming customers(you're not a user, you bought something) just like nvidia did.
@@Mr11ESSE111 I was wondering this too, I can't replicate, it's happened a total of 3 times now at 366hours played.. idk, none of the other games I have crash and haven't had the weird/dreaded out of memory error
I have a 13700K and I got crashes on multiple softwares from Day 1. Chrome tabs would randomly crash all the time too. I intensly monitored CPU temps and observed that as the CPU started to get stressed the temps would instantly rise to a 100+ degs and then any program using the CPU would crash and the usage and temps would quickly decrease down (I knew that Instantly getting to a 100+ should be fine for this CPU as its designed to always perform at its thermal/power limit). So I figured alright lets turn off a few cores - didnt work. Then I bought a new 360mm AIO cooler hoping it would solve the issue - nope still the same - my Asus Prime Z690A can deliver a lot of power to the VRMs so the CPU instantly started to consume 235W and temps would rise and it would crash. Furthermore every game that had initial shader compilation would instantly crash on load but the compilation would continue from the point it had crashed - this way I got Hogwarts Legacy to run after 15-20 crashes on load. I got Last of Us to run after around 180 - 200 loads (yeah I kid you not about that number). Once shaders were loaded though the games ran just fine for the most part - they would intermittenty crash during loading transitions ingame when fast travelling (Hog Leg and Cyberpunk crashed about 1-2 hours in per session). Last of Us though ran only about 15-20mins per session before crashing. I checked the logs for Last of Us and the error exception was pointing towards Nvidia. Decompressing via 7-zip and Winrar worked just fine but custom decompressors that stressed the CPU would not go beyond 5-15% before crashing, the time of crash would always vary somewhere between those percentages and I finally figured out that the CPU was basically shutting off at an arbitrary point during decomp/shader load etc. when temps were getting too high and hence no exact error was being thrown. If it would always crash at lets say 12% during decomp then they would point more towards a software issue, but this had to be the hardware. Freaking Discord would crash. Cinebench would Crash. How the hell do Chrome tabs crash?? This meant spikes in CPU usage/temp would make this stop working. I finally gave up trying to play that game and figured the CPU I had received is just a bad die off the line. I was pretty much ensured that whatever was supposed to monitor the temps and adjust the clocks was not doing its job. And then - after many months of suffering (I bought a new SSD - bought a new GPU - bought a new Cooler - multiple OS reinstalls - almost bought new RAM sticks) the oodle decompression failure article came out and I downclocked only my P-core multiplier to 51x and that instantly solved everything. For 5-6 months I went through a crapload of frustration - it was my first high end build and I got ripped off by Intel - all because they were out chasing Benchmarks numbers against AMD - is there really a point to doubling power consumption for 10% extra performance and lack of stability? Unreal - how the heck does something like this pass Quality Checks? From what I know every CPU off the line goes through atleast one round of stress test. Anyways - rant over - my trust in Intel has been severely eroded - X3D is going to be in my next build.
Given what we're learning, it's entirely possible that the post-production stress test is actually *causing* some of these problems before the CPUs go out the door = P
Based on this and other videos, locking down P core multiplier to a lower setting is only a temporary solution. Even booting into OS and running all those programs with stock or higher settings, some damage is done. Eventually you'll have to reduce it further. Only way I can think to prevent this damage is undervolt and underclock (I'm talking 4.3GHz all cores) before even first time OS boot. That's what I did on a friend's 13900K system I set up for him. He's reported zero issues since setup. Frankly, these high core count CPU's are begging for a heat offset with a slight undervolt; no more than -100 to -135 mV, because otherwise it becomes unstable. It would probably be fine with a 5.3GHz overclock on all P cores during gaming since the motherboard is regulating 235w max just fine. The key is maintaining no more than a 1GHz delta between all cores and NOT sticking to stock voltage. The whole 6GHz on P cores advertisement by Intel is fraud IMO. Yeah, you can reach it, but you can say goodbye to your CPU sooner or later as a result.
Problem with this is for people that don't have any invoice like me (blame myself for buying 2nd hand). The second players here are mobo manufacturers, which allowed by default insane levels of power and voltage for unlocked chips and the third one and most important, Intel, who fucked up surely in the manufacturing of the chips. Bought almost on release day, at start chip was working well but after 3 months of use things started to go nuts. Everything crashing on linux, even the most basic things, not even to talk about games. Mine is unstable at stock settings (even Intel's Defaults), and so after reading the news and tons of subreddits posts, went with reducing the max freq allowed on bios (turbo ratio offset) to the minimum allowed, (which by itself helped but didn't solve the problem), and disabled all thermal and voltage enhacements (TVB, etc). That itself dropped down the voltage from 1.1v to 0.8v so i upped the AC loadline so chip voltage was 0.85v, and since then and with all those settings it seems pretty rock solid (but i ended up with a 13900T instead of a 14900K which i payed for).
Maybe ask the original buyer to provide you his, plus the transaction that happened between you two. I rma'd a cpu with AMD and they were cool with that. Cpu was bought for me by a different person due to regional availability and overall cost at that given time.
Remember when one skew of AMD chips were melting because of motherboard configurations was a hot topic? Maybe the algorithm is really tailored, but I got A LOT more of that than I ever did these two generations of intel just slowly going back to sand.
@@Peterowsky The AMD issue was much more obvious. Some still consider AMD to be a cheap unreliable alternative to Intel. (Well the tables have turned it seems) Intel situation is more subtle, as no physical damage is present, and the user can always be blamed for handling. I believe they are trying to damage control and downplay the situation. New chips will be coming out after all.However it seems the issue is so widespread that they won't be able to avoid it.
Who's going to trust Intel after this? I thought they may be on a path to properly compete with AMD CPUs, but now I don't even know if I want to risk getting an Intel CPU even if they do end up getting the tech right.
@@ZombieLincoln666 They get the faster speeds through sheer brute force. Power efficiency is important even for desktops. More power draw means hotter room unless you have a Linus-level setup that dumps the heat outside of the living space.
Intel PR posted an update to... reddit. So check that out, update your bios, and look for another update to your bios coming mid august 2024!
www.reddit.com/r/intel/comments/1e9mf04/intel_core_13th14th_gen_desktop_processors/
tho they seem to have deleted some of the better questions about whats covered under rma, are folks that have delidded covered (guessing: lolnope), etc.
@@Level1Techs they posted the update to the Intel website. But yes it’s clear most of you guys get your info from Reddit…
Intel left out the oxidation issue on the website, which is real, interestingly
@@Level1Techs Sounds to me like they're trying to cover their asses. They're focusing on the voltage issue because that's something they can fix.
The oxidation problem is a manufacturing defect, and that potentially means the dreaded R word - RECALL.
does this also apply to i7 14700 hx laptops ?
@@Level1Techs sh8 i just ordered a laptop with a 19400hx on the acer helios 18...am i screwed? considering i live in the phil and my sister bought it for me there in the u.s...yikes
Quick we need to throw Wendell in witness protection before the Intel hitman finds him!!!
good thing he isn't looking into boeing planes right now
Dr Su needs to send some AMD Bodyguards XD
At least he isn't talking bad about Boeing.
He probably will have BSOD long before he reaches Wendell
Intel is not Boeing, yet.
"AMD Is In the Rear-View Mirror"
- Pat Gelsinger, Intel CEO
"Intel CEO’s Comments of AMD ‘in the Rearview Mirror’ May Indicate Intel is Driving Backwards"
- Areej Syed, Hardwaretimes
😂 lmao
Good one! Thanks for sharing.
I saw a funny meme cartoon of Pat driving towards a cliff and Lisa driving away from the cliff. Lisa's car is of course in the rearview mirror from Pat's perspective.
v-cliff
more like flipped over on the road :D
lool
Actual investigative journalism instead of talking about cpu's for 20 minutes just like 99% of other tech channels. Quality content.
Heck yeah
True, I hope this gets more appreciated and we get more
Could be worse, if I had a nickel for every time someone just had to add some sort of awful music to a video I would be rich!!
Same avatar! :D
ummm i saw literally ZERO sick memes here soooo... kinda suspect reporting bro?
but fireship just pontificated on the issue and showed speculative-non-primary-source concern AND SICK MEMES, so therefore, issue confirmed
Intel forgot how to make fast cpus, then they forgot how to make efficient cpus, and now they appear to be forgetting how to make properly operational cpus... Not a good character arc...
heh
character "Arc"
@@countvonthizzle9623 nice flamebait, if only AMD's CEO wasn't a woman, and her tenure wasn't wildly successful...
@@DarioCastellarin women arent diversity
oh boy i hope this aged well
Typical too quickly developed too little tested, lack of QA because the new lineup has to be released and delivered to keep up with the competition and not upset the shareholders behaviour. I wish we had fewer stock corporations again. A world in which profitable companies are devalued because they don't constantly increase their profits is completely f'd up.
-Talks about a real problem
-Explains what he found in true objectivism
-Admits data is not big enough
-Properly digs through the problem
What a gigachad
Terrachad
yea first video of saw of him and he's smart, sympathic and accurate. should be new president
Yeah, noting limitations in data lends lots of credibility and we can still probably draw some conclusions as he did about it. Mega based.
Yottachad
You should publicly mention that you are not suicidal in case Intel approaches Boeing for advice and business cards.
I mad respect the person who kept playing regardless if they were crashing every 2 hours,I feel then on a spiritual level
This was me playing Remnant 2. I would play about 4-6 hours a day and would experience at least 2 or 3 crashes straight to desktop.
That was me lol. I got my replacement CPU 2 days ago
I've had games with bad enough memory leaks that you pretty much need to reload them every 2-3 hours or the game will bog down and eventually crash. Common problem with specific operations in Unity. All you really need to fix that one is a cleanup subroutine - which I actually talked one developer into implementing during Early Access.
My old ass laptop crashes every 30 mins while playing Roblox :D
Player: I've got you now Pirate King!
PC: (Crashes)
Player: Ahhhh you thought a mere catastraphic failure could save you?! Well prepare to doom your DOOM!
LMAO
Feels like Intel wants to drag out the problem till next gen launches and play the Spider-Man meme till then
Intel is hanging on for dear life trying to compete with AMD. They rather take big risks that can backfire on their customers, then try to silence it, than being honest about the limits of their CPUs and losing their high spot in the benchmark charts.
That feels way too real...
Exactly, there is no fix for what is plain old Silicon degradation caused by too high voltages, clocks, temps and Amps pulled trough the silicon long-term. There is no software fix, there is no bios, there is no microcode that will revert this once its already degraded. The only fix is "do not exceed 1,3v and clock them 500-600Mhz lower". And also, do not clock your CPUs so high that they have less than a 5% headroom left in them, so even the smallest degradation throws then to the "unstable on stock settings" category. Intel is just making smoke hoping the attention will go away so they do not have to do the thing they should - full scale recall.
@@seeibe This. Consumers are used to seeing i7 at the top of the chart and Intel will break bones to keep it there, because they desperately need to keep it's brand power.
I just had to leave a bad review for a "i7 Desktop" selling in my country's Amazon with no GPU for $900(!), because I noticed that it had a HD4500 and realised nobody else could work out that meant it was an 11 year old PC. They just see "i7" and buy.
It's those idiots that Intel made the 14900 for. So Google still says "i7 is fastest" and the plebs carry on buying it.
This is me when I was 14, I saw i7 and chose the crappy HP Pavilion HPE H8-1103NL with the i7 2600k haha
I work at a gaming marketing agency in the IT department, and I have been absolutely losing my mind over these issues. I have built ten systems over the last six months, and had three of them experience various failures, from One of the more consistent problems I've seen has been system corruptions, where I'm unable to even run simple Windows updates, SFC, DISM -- you name it. Recovery media fails, fresh installs on new drives fail, and the crashes on games have gotten utterly out of control. Thank you for this, it's definitely incredibly helpful for tracking our OWN sanity.
Takes five minutes to downclock.
@@chrisx742 Appreciate the thought, and while yes, it does, and we did -- used the Intel XTU, didn't go above 52x, dialed in Intel specified limits all across the board, and had little to no consistent improvement, but definite performance loss. Tried the Intel Processor Diagnostic Tool and failed consistently across all settings. RMA'd one 14900K already, popped in a spare 12900K while waiting for the replacement with none of these issues with all the rest of the same hardware.
The point ultimately is that we shouldn't need to do this, and it's infuriating to have to fight with multi-billion dollar companies to get what we're paying for, and have it actually work consistently.
@@chrisx742 the fact that you would have to do that is hilarious.
What motherboards used? And ram?
@@chrisx742The fact people are so lazy these days is proof of how incompetent most computer users these days really are.
Intel saw Boeing's crash rate and said *_CHALLENGE ACCEPTED!_*
hahaha
Hold my beer....
Ba DUM!
As a former aviation maintenance tech and lifelong computer nerd, I approve this message.
I won't fly commercial and I won't buy intel. Both Boeing and intel are a complete disaster.
Thanks cold LMAO
Just want to point out this is the kind of investigative work traditional tech journalism doesn't do anymore. Thank you so much for this work!
Steve from Gamers Nexus: "Excuse me?"
@@noxious89123 Are you saying GN is traditional tech journalism?
Frame Chasers 7/14/24 - problem described and solved
When server providers are starting to actively recommend AMD systems after YEARS of Intel being the "safe bet", this is the point where Intel should start sweating.
Because mind-share is a huge factor in what gets bought in large volumes.
Good lord, the analysis and access is just swell. More, please. (Just don't get Boeing'd for your trouble)
"Boeing'd" 😄
If WotC can send the Pinkertons then Intel should be *_looks over shoulder_* I gotta go, guys...
I love that that is a verb now.
We must not lose Wendell 😭🙏
Let's hope Boeing doesn't use K series CPU's in their aircraft.
I agree with most of your statements as I work for a small SI. I stated on initial videos a few months ago when the fingers were pointed at motherboard makers that the problem could be with Intel processors. We mostly provide 14900k to businesses on premium and entry-level B760 boards. So right off the bat, no O.C. Next, we disable motherboard enhancement and limit power to Intel stock. And some board are weak and can only do intel stock power. Failures have happened in all these scenarios. These systems can run cinebench all day long or occt stability test or aida64. But try to install the nvidia driver(compression/decompression). The installer fails. Another way to test this is tekken 8 demo game. No matter what bios we choose, those processors are gone. Replaced. There is no coming back from this. Has happened with ddr4 and ddr5 systems.
Exact same issues here. Waiting for my RMA CPU to arrive as I type this.
@explosivemonkeys I hope you get yours soon. We are not in the U.S. so our rma is a couple of weeks with Intel. But once the system is diagnosed with this problem at the service centre, we replace it with a new processor and return the system back to the customer. So, for a customer, it is basically 2 to 3 days down time. I am also hearing stories of Intel denying rma for these cases. That would make this worse.
The NVidia installer was a big thing for me too. Would crash on install about 95% of the time. At the start I figured it was a corrupt Windows install or maybe a failing HD. Almost dropped a lot of extra money to start replacing components before I started to see all the other issues out there, which thankfully saved me.
@@explosivemonkeys my i9 13900K is unstable if i set more than intel's baseline specs, Do RMA covers me?
Just so it's clear - did replacing the CPU actually fix the issue? Was it properly verified, taking a system exhibiting the problem consistently, replacing only CPU and then testing again with the same exact test and no problem found? Asking because I was around in the past for the FDIV bug in P60/P66 by Intel, where you could actually even get it to calculate wrong in Excel using a specific formula to prove the bug. Initially Intel said there is no issue, then they made RMA's for all who complained trying to shut them up - or at least it looked like that is what went on, then finally they had to stop all the BS - and rework the CPU and announce that people who had affected chips could register to get it replaced with the new fixed chips shipping out as they were coming off the production line. I had one of the FDIV bug P60's for years as a memento of that cluckerf*ck but I traded it eventually for a DEC VAX 11/750 in 1997-8 something like that. Shoulda kept it.
Point being with the FDIV bug you could not replace your way out of it until Intel made fixed chips, 100% of the P60/P66 initial production were affected. You could kernel patch your way out of it, but it was a software workaround for a hardware problem.
The number of Intel fanboys trying to trash Wendell is hilarious.
Never stop calling out anyone doing shady business stuff!
I don't think he cares. He/dr ian. said in lv1 & tech2 potato podcats.
15:05
"We already replaced a lot of customer's 13900k with 14900k and the issues don't seem fully resolved."
This statement is extremely telling.
14:56 "$1,000 extra" for support is insane and really tells the whole story here. The part is not reliable. I am sure a lot of conversations are happening behind the scenes here, but based on what I've seen, Intel has not committed to fixing the issue. We can hope that it is due to incompetence but more likely they do not want to admit fault here due to the cost of "making it right"
Thanks for sharing this Wendell, it will be interesting to see other news around this topic in the coming weeks.
oh yes, instead of issuing a recall, lets damage even more the brand name
It's like how insurance companies are refusing to insure houses in certain states against extreme weather. When even the Capitalists charge more or refuse to cover something, that's a sign that there's a real problem there.
@@monad_tcp The issue is that Intel itself doesn't really know what to fix and currently the most reliable way to make it run stable enough is to lower the multiplier like to 53 and running the RAM slower. If that is what it takes then the CPU wouldn't run as specified and also would invalidate Intel marketing around the CPU, thus even if Intel want to do a simple recall and giving the costumer the fixed product, it can't be done. Assuming actually running it slower does fix the problem, Intel probably need to replace the product, force it to run the CPU slower, and give some monetary compensation.
I think Intel owes every 13900k and 14900k owner a new 15th gen CPU since this is a defective chip.
I don't think it's about being committed to solve the issue, as it is that they have probably realized there is no fix, and they can't afford to recall two whole generations of product. That would kill the company. Trying to mitigate it through microcode and power limits is probably the best they can do until getting a 15th gen out, and it might even be too late to fix the issues for that one, so they're really just holding on for their life for a 16th gen.
The fact its happening in game servers is pretty crazy and an entire 1000 dollars extra for support.
That's capitalism for you
@OllieHamon so? A company should charge more to work w products w more problems. I do....up to 10x more of my usual150 per hour rate.
the extra 1000 for supporting a cpu that is less than 1000 implies a very large support time sink and replacement cost ...
@@rezenclowd3 That's a really dumb way to do business, people talk to each other. It's like if I started pumping out car parts, let's say a water pump that kept failing prematurely and blamed everyone but themselves and tried charging for repairs but knew it was self inflicted the whole time... Oh wait, my Golf R had that issue. Oh, there was a class action lawsuit? Nice.
@@Dendodorion
I mean it does make sense if you aren't the company producing the faulty part in the first place.
Charging more because something is worse to work on seems fair to me.
The mighty days of Intel supremacy have long been over. Now they are trying not to be mediocre.
25% is a ridiculous failure rate for the high volume OEMs. Wonder if intel is getting back the failed chips to do some failure analysis on them with test boards and thermal imaging or something.
So far they're taking the failed cpu's and sweeping them under a very big rug.
@@pf100andahalf maybe that rug will come up in 40 years on an asmr “intel rug cleaning” video 😂
Sounds like it's a dynamic clocking bug getting stuck into a strange mode, causing crashes.
My guess is it's simply that power/frequency limits are pushed beyond what the platform can reasonably support, which means this was an intentional choice intel made. It is a known workaround that underclocking and reducing power limits increases stability.... at first. But once it starts happening, it seems that the damage is done, and it's only going to keep getting worse.
@@rich1051414 I think you're right.
Damn does it feel like I dodged a bullet by going with AMD. My 7700X-based system has been rock-solid stable.
real. I love my 7700x. Such a good chip. I can Craft so many Mines with it
I bought 2 7900X3D's for 100 GbE testing on Supermicro H13SAE-MF. I guess I dodged the Intel enshitification bullet. Meanwhile, my AliExpress craptastic and old Xeon rando recycled parts impulse buy laughs in ECC.
Ryzen is an awesome cpu. I have a 3600 ryzen 5, I have had it a few years. Absolutely rock solid
@@andrewdonohue1853 I have the 2600x, also rock solid
I remember a time when people would say to go with Intel, it's a much more stable platform, how times change eh lol.
Ironically the only build I ever had so much hassle with was the one time I decided to jump ship over to Intel back in 2015 with a 6700k skylake cpu due to the crap ipc the fx8350 had that I was running.
Always used amd other than that from the athlon days in the early 2000s through to a number of ryzen builds & I've literally never had any problems, all the amd builds have just worked, currently running a 7800x3d.
When AMD released its 7000 series, I went with a 7900x. I had buyer's remorse when I saw the better value of Intel 13th gen. But with this situation now, my mind is totally free from that feeling.
you can upgrade to a 7800x3d or a 9000x3d chip when they come out, without having to worry about the CPU killing itself.
@@chubbysumo2230 Well, I am not upgrading unless my current CPU ceases to function 😅
@@klwqwelp my intel 13th gen is dying… time to go team red
AMD x series always scam for price/performance. If you want to buy AMD better waiting for non-x version or 3D version
@@sandiaswara1940 Yeah for my use case the 7900 would have been perfect actually
Heys, I'm one of those who were "tremendously suffering" from the instability, to the point where regular programs such as Discord, Chrome, Davinci Resolve, and games were crashing left and right. I just RMA-d my i9-13900KF. Intel already knows that shit's going down because customer support didn't put up too much of a fight and went straight to the point - RMA. Unfortunate, but it is what it is. Purchased a secondary, i7-12700K (LGA-1700 slot was needed so that I don't need a new MOBO) and it feels like a breeze of fresh air not having to put up with the constant instability.
Couldn't you also return the mobo, and go AMD instead?
@@MrXelaim No.
1. I'm a contractor (work from home) and cannot afford to lose a single day of work for making such a switch. This was a carefully orchastrated RMA process where I timed my CPU replacement on the same day the courier would come for the RMA CPU.
2. The motherboard is not faulty. Buyers protection in my country only covers the first 3 days to return a product without a good reason and this is a year old system.
@@repatomonor21 That's terrible buyers protection
@@owlmostdead9492 Why though? You still have a time window to return a product if you changed your mind. Otherwise, wouldn't people ignore warranty and just return age old products left and right out of greed?
@@repatomonor21 We have a 14 day return period, 3 days is not enough to evaluate a lot of products
Intel PR department: Lets talk about Lunar Lake instead. What a great CPU that is.
Lol that's the Intel way
Also Intel: Muh AI
So glad I got a Ryzen back in 2021. I couldn't get a 5900x, but my 5800x has been running like a beast, couldn't be happier.
Right smack in the middle of the Raptor Lake 8+16 die is the ring interconnect fabric, with a ring agent interconnect per P-core and an agent per E-core block. Given you’re seeing a mix of IO related errors, including the out of VRAM errors (which is likely triggered by a PCIe operation failure), my gut says this is the system agent logic in a bad-state, specifically due to failing ring signaling at one or more ring agents or in the system agent itself, blocking and going through some form of error / bad-state recovery. My guess would be that this is due to electromigration degradation of the ring interconnect logic due to the very large Raptor Lake 900MHz top end ring clock bump by Intel, versus Alder Lake. This is why we’re seeing these issues in 13th and 14th gen, but not 12th gen.
What’s concerning is that if this is where the problem is occurring, than Intel appears to have not implemented internal telemetry, or not exposed it via CPU or other system driver, to drive WHEA reporting around ring interconnect and the system agent. Given the complex nature of the SOC, with its asymmetric architecture, many clock domains and sophisticated “system agent” IO block, that would be bad if true.
If the ring causes those issues, I wonder if raising ring voltage could solve those issues or not.
For example, I had 2 i9 13th gen, one is 13900K that I sold for the current i9-13900KS, both runs stable at 253W, 400A IccMax with BIOS default settings (I set my power limit lower than 253W initially due to having a 240mm AIO). I did turn off MCE in BIOS and set my custom load line to undervolt the chip. I used the previous chip for about a year with no noticeable degradation. I run both chips with SA voltage of 1.35V for my DDR4 ram OC.
Both chip had ran mining softwares, Prime95 and other stress tests.
I suspect that for the degradation to happen. most people did not configure power limit, they let their chip runs with high voltage with high current, and that cause fast degradation. Also, their silicon quality is probably average or below average in the first place.
@@kevinzhu5591 it probably would further accelerate degradation of the ring bus and lead to even more crashes
my bet is that intel has a design flaw regarding ring bus which got exposed by their lack of regulation regarding power profiles which slowly cooked ring bus on stock speeds and fried it when people did do some overclocking
@@xthelord1668that is tragic. I would hate to pay exorbitant prices for a faulty processor
@@kevinzhu5591 Wouldn't it make more sense to lower ring voltage and clocks a bit? It shouldn't make a huge difference in terms of performance to reduce it by 300mhz or whatever.
@@ahs9674Doubt it would help. 12th Gen have had (minor and not often but I've encounter them way to many times...) problems with asymmetric uarch, 13th & 14th not only suffer from that but you have to have in mind that they do bent, are made using Intel 7 process that took ages to be "mature" enough to be even considered viable to be used for manufacturing and so much more.
There might be a reason (beyond being behind with their own foundries and R&D on advanced nodes) why Intel is hell bent on using TSMC instead of their own process for CPU tile.
14:58 - HOLY BALLS!!!! That’s crazy!!! When the data center operators notice issues and you know they collect data…something is up for sure!!!
I'm surprised they didn't leak this earlier.
This is going to drive companies to AMD much faster than before.
@@aminorityofoneAbout 30 seconds later, we're pushing our customers towards the 7950X platform. Which is exactly what I would be doing in their position.
I got my 13900KF 4090 build last year thinking it would last 8 years. Not knowing it'd be having such issues within a year...😂😅🥹🥲😥😢😭
Yeah -- this is somewhat surprising from intel. This is amateurishly like the late 2000s AMD graphic card performance. I bet its heat problems from bad caps or arrays or interdigitated caps. If its design then, that's really really going to be surprising. I've fixed my AMD processor from swapping out dry caps before.
AMD: We need a fastest and better CPU
Intel: we need more anti-competitive strategies
Intels decades of fuck you buy our shit has not changed
And they're likely about to get completely shafted by ARM with the mass migration of both datacenters and consumer computers to ARM chips (or other RISC designs) due to the huge drop in power consumption that ARM brings.
@@samwalker7567 Yep. I wouldn't buy any PC right now if you can hold off. Things are changing, and ARM seems to be poised to dethrone x86 in the coming years. Best to wait and see what the industry looks like once the dust settles.
@@darthmelbiusFanatics I Guess.
@@dschwartz783 Disagree, if the transition happens it will: a) take years to acquire significant share in the desktop market, and b) there will still be a large marketshare of x86 for years after that happens, so support will remain for quite some time after. Makes no sense at all to hold off buying a PC in anticipation of this platform.
@@prw56legacy compatibility won't be a problem; the Apple M1 has proven that an appropriately designed ARM processor can competently emulate x86-64.
On the warranty repair side I have seen a major shift in the willingness to accept that an Intel CPU might be faulty. It went from a multi-hour process to get a processor RMA approved to a self-approved process. I just order the CPU replacement and I'm done. It's akin now to ordering a replacement memory module or SSD under the warranty process (this is at the OEM level, not with Intel).
Talking to other techs, we have seen a HUGE increase in CPU failures - almost always seen as general instability with a longer running process (which is a really hard thing to verify during a warranty service, so we do typically take the customer's word for it). I have tracked down the problem on a few of them to the memory controller contact pads - often you can even see visual discoloring of the pads from prolonged heat. The instability in those cases seems to come from the CPU getting hot, changing shape, and the signal integrity to the memory modules shifting... then BOOM, you're unstable... until it cools a little, then you're okay. Basically, I think the problem is the shape of the CPU.
I think this comment needs more attention, it's the only hypothesis that accounts for all the observations. I'm convinced this is the case too.
Is this also affecting the 14900hx laptop cpu? I just got a laptop with one and could probably return but got a really good deal and has a good mini led screen and would get so much use out of it for the whole almost a year till next model laptops are out with the 50 series rtx so really don't want to return.
@@WaterspoutsOfTheDeep
The Laptop Cpu is soldered, not socketed
back in the days the cpus were binned with a way looser margin of error, in some cases you could easily make the cpu run 25% faster with a bit more of cooling, but once they caught wind of overclockers pushing the cpus so hot, they started to push power into them themselves, to make these margins extremely tight and cpus that are edging being defective are still sold, most people don't stress test their new cpus so by the time they realize they are defective it's too late and they get away with it
I nearly cried watching this. Someone else that has spent countless hours going over crash logs!!! Someone else has felt my pain!!! I'm not alone!
Anyone considering a job in IT.... This is your possible future!!! and its a thankless job.
My god this is so useful. I am on my third 13900K in 18 months. First one ‘lasted’ a year (3 months of that was me getting more frustrated trying to diagnose wtf was happening as more and more and more games failed). The pain was that it came out as IO errors, memory errors, everything except the CPU. In the end I found a forum thread where someone mentioned it.
I got it swapped under warranty and the second one began to fail identically 5 months in. Third one is still quite new but if this goes as well I’ll be sharing all of this with the builders and telling them they can either refund the entire PC or put something else back in.
I have no overclocking. The games that crashed were often games that a 13900K wasn’t even remotely needed for - like WoW - and it was absolutely infuriating
goodluck with 4th rma.
Have you come across anything mentioning if this is happening to the laptop 14900hx cpu?
@@WaterspoutsOfTheDeep refund the laptop
@@matyasselmek3673 they are fine the laptops arent affected
@@matyasselmek3673 evidence so far shows laptops are not affected
My 13900k was so unstable after about 1.5 years that I couldn't copy data off an SSD without getting a BSOD
Did you try a live Linux USB stick? Just to rule out Windows is the culprit.
lmao that's an amazing product
It's cooked 💀💀
Almost 2 years with 13900KS , since day one.. power unlocked.. 300 plus watts.. no issue whatsoever.. your point is ?
@@Need4FPS "I had no problems, therefore the problems don't exist!"
This guys an idiot.
I am surprised people even buy intel.
Even without this issue - you buy intel and 6 month later 50% of performance are gone to fix all the security issues.
The 14900K chips are degrading insanely fast. My 14900K on day one could run at stock frequency with a -0.075 voltage offset on the last 2 steps of the V/F curve, without any WHEA errors either at full load or idle. Every 2 months or so, I would randomly find a couple of WHEA errors which were instantly fixed by increasing the previously mentioned voltage offset by +0.05v. This week WHEA errors began showing up again. Now the CPU needs to run at stock voltage as is unable to handle any negative voltage offset whatsoever.
What kind of cooler do you have and what temps would the chip run at full load before it started to worsen?
Source: trust me bro
@@douglasmurphy3266 I'm using an IceMan Direct Die waterblock with an external MORA3 420 radiator. My 14900K max core temp at full load is 70C.
Temps are not an issue.
@@genejones7902 There are specific WHEA error codes that are related to CPU errors. Whenever I get them every couple of months, increasing the voltage slightly is the only way to make them go away. So yeah, its a CPU issue.
@@CyberneticArgumentCreator Username checks out.
One of the biggest take-aways from this video is the price difference on service agreements between a 7950X ($139) and a 14900k ($1,280) server - "3 years parts & labor, 24/5 - Next business day onsite repair - zone 1".
For the 7950X, that isn't covering the cost of a single hour onsite repair technician.
For the 14900k, that is slightly less than the cost of an Asus Pro WS W680-ACE ($330) + a new 14900k ($600) + 4 x 48 GB of G.Skill Ripjaws DDR5 5200 RAM (2x$190) (total of $1,310). You can basically do a complete hardware replacement, AT RETAIL COST, for the price of that service agreement.
If 50% of CPUs are failing one way or another but the other 50% running perfectly fine I would closely look at the production starting even from the silicon provider and its quality.
I just checked the latest batch, it sucks
Smaller you go well chip lottery that's why I don't see the point in clocking oh I can get 6ghz on water yea how long does the cpu last its equivalent to a top fuel dragster being pinned to its max breaking point intel has pushed the silicone far enough that it's getting bad yields some chips are good alot of chips are degrading faster due to the whole chip getting intense heat that's the problem 1nm chips are gonna have problem we need to move to a better Base substrate graphene or carbon substrates
Could you get the AC/DC LL settings of an unstable Supermicro board?
The crash rate is just absolutely unacceptable especially for OEM and datacenter.
Crash rates measured in weeks is not acceptable for consumer products. If it's measured in months you still have an iffy product. If you see game logs where crashe intervals is close to hourly... the whole house is on freaking fire!
Standard SLA in the industry is 99.9999%, so you can have about 30 seconds of downtime per year.
With this high of a failure rate keeping that level of SLA is simply impossible.
Even for the cheap providers that have just 99.99% SLA keeping the downtime to 5 minutes per month might be hard.
@@hubertnnn that's the marketing and contracts. In actual practice it's often lower than that for pretty much all hosting providers, even big ones like Google/Amazon/Microsoft or OVH.
Source: trust me bro. They refuse to acknowledge to us a bunch of small incidents and stuff during their own upgrades so they can keep claiming such high SLA wasn't breached
But to be fair, datacenter shouldn't even use consumer class CPU. 1 of the biggest difference between consumer class and enterprise class CPU is ECC RAM support. If you want 24/7 operation and stability is very critical, ECC RAM is a must have feature which only Xeons can provide .
And it's a lot worse than this video says, because when he says "25% of systems are having errors" he's not counting all the ones that are already dead or RMA'd, because they can't produce bug reports... If we assumed even half of the buggy chips go back to Intel at some point, the math would put it at at least 33% dead, which is absolutely disgusting.
"The system just gets miserably slow for up to a minute before an actual crash"
...uh oh. I don't get crashes in games, but I *do* I get this kind of behavior from time to time just at my windows desktop on the 13900K if I leave it running for a long period of time (like, more than 1 day). I'd start getting weird things like programs hanging in the background, Explorer stops responding to inputs, even CTRL+ALT+DEL doesn't work. It never actually has a hard crash, but it never recovers either. Only fix is a full restart, or hard reset at the power button. Guess I can expect this to get worse?...
I can't recall an issue like this since Sun accidentally shipped radioactive chips that were unstable.
I mean, Intel had to do a massive recall back in the day (1990’s?) when their CPU’s literally couldn’t do simple math instructions consistently. They didn’t exactly fail, they just never worked in the first place because of a critical design flaw.
I'd like to know more. Do you have a link ?
@@nessotrin If you Google it you might find some information. This was back in the early 90's. The company I worked for had the issue with our SPARC system. SUN made customers sign an NDA if I recall. I don't recall how widespread it was.
I did a search and only found limited mention of it from some old forum posts...
they WHAT
@@nessotrin I think its this, en.wikipedia.org/wiki/Pentium_FDIV_bug , Its floating point math would mess up a small amount. But that error could compound and cause problems depending the calculations. I recall having a professor say he had to redo part of his thesis due to this problem.
This is why you make a statement. The speculation machines rolling now.
*you’re* a statement!!
I suspect Intel themselves probably don't know what's wrong.
@@zodwraith5745 Or they do know, but it is a hardware problem that they can't patch with a microcode update.. So they just try to play the long game until their next gen comes out.
@@kyu9649 I'm not saying that's not a possibility, but to automatically assume as much would be highly unfair. Intel is WAY too big a company to cover up the kind of conspiracy theory you're insinuating without a leak. They haven't done anything to try and cover it up and have acknowledged it publicly. They just haven't provided a solution which just circles back around to my previous comment. They have a hard enough time keeping a lid on leaks they DO want kept secret like future products. This would be way too juicy for a leaker with all the coverage it's had.
@@zodwraith5745they know, they're trying extremely hard to patch it via microcode so they can come out with a fix at the same time they acknowledge the failure.
Welcome to the age of publicly traded companies, their shares will tank for sure, they're just trying to control how much they crash.
The absence of AMD data is also data
kind of... a lot of statistics don't like zeroes e.g. MTBF
@@greebjthe lack of failure data points to AMD chips not failing in the same ways, in numbers sufficient to record.
@@greebj but we don't have a zero. We have an insignificantly small number: iirc 4 out of ~1500 documented failures, or 0,25%, when 30% of the population use AMD, is a clear indicator that this is an intel-specific issue.
he already mentioned survivorship bias, look it up if you missed it for not being aware of it
I’ve had so many bizarre issues lately and I’m now wondering if they’ll disappear if I swap back in my 12900k instead of my 13900ks 🤔
A couple of them highly repeatable but one that others can’t reproduce
Good to see you here!
Not reported Internal high temperatures can lead to undefined behaviour. High Current leakage can be interpreted as a wrong bit.
From the sound of it... yes. Put your not-just-a-refresh 12900k back into service.
@@ThioJoe Just try it, for science!
It wont be that much of a performance drop, and if you d hopefully get a more stable system out of it until this is resolved ;)
Post your findings on the L1 Forum if you actually do it :D
Try it and make a video about it!
I work in commercial IT, we have had 30+ PCs where the intel CPU essentially died, this has been affecting them all from 10th gen to current. Recent one was a month old 12th gen 1255U. Affects desktop and laptop CPUs. Larger companies that deal with more PCs have had a lot more of the same.
How many amd cpus do you have?
I work in Labtop repair and our main customer is a Major IT firm that probably everybody knows (used to be very innovative in Hardware).
And we've got Lenovo T14 Gen 3 (12000-series Intel CPU) and T14 Gen 4 (13000-series Intel CPU) dying like flies right now.
Sometimes the replacement system bosrd dies with 2-3 months also.
Could be the undersized cooling, could be the horrid Lenovo Thermal "Paste" (only at first*) or something else.
*We replace the shitty Lenovo stuff with something a lot better as part of our service, especially when Systemboards/Mainboards have to be exchanged.
Edit: typos, damn phone!
Where I work we use exclusively Intel CPUs.
We're so fucked LMAO
@@alexanderzawydiwski9534 Its normal for some sata slots to not work, as well as ram not being detected in certain configurations. There is manual included with motherboard for a reason, some configs working well is a miracle (like 3 mismatched RAM sticks, sometimes it works, sometimes it doesnt, game of luck with ANY processor regardless of age, there is also good reason why mobo makers have a list of supported memory modules and configs for each one).
"we have had 30+ PCs where the intel CPU essentially died" - Are they Dell? We've gone through so many Latitudes this year it isn't funny anymore.
Given the fact that 13900K has been on the market since October 2022 and this issue hasn't been fixed, it must be a design flaw. It's shocking that i9 still die at W680 motherboards.
Hmm the data center support information - even tho it’s indirect is very interesting.
This isnt armchair science my man. This is absolutely real research. You’ve negotiated access to data not really available to the public and doing real research on the data instead of making educated guesses as to whats going on which is what armchair science is
So this explains why AMD came out with the server board for the AM5 socket a couple of weeks ago.😊
Crazy that this was happening for so long even the oodle devs chimed in.
Honestly not surprising at all. RAD game tools are some of the most mature middleware in games (probably as close to bug free as you can realistically get without formally verifying your software). If their software is suddenly failing, they're going to want to know why.
And Oodle is deployed basically everywhere and very stable outside of this situation, otherwise all our games would be crashing constantly no matter the hardware.
I made a video about this a year back, a long with Tech Yes City. A lot of people called me crazy. Stability issues, crashing, latency... List goes on.
I appreciate you talking about this.
He has also made another video on it recently
@@IntelArcTesting This is true. Thanks man.
I am also troubled by the lack of clear messaging and solutions from Intel. We do need more transparency to better understand and manage these hitches.
I'm glad they got the boiler snake a sunning lamp.
The sadest part about this is that I have been telling users it's their CPU ever since it started poping up en mass with UE5 game releases, so over a year now.
Hundreds of cases accross multitude of games even outside UE5.
And even if I link them the articles now that mainstream media picked up the story they will not even bother reading it and simply reply "but it's only in THIS game".
they're NPCs, there's no saving them !
Yeah I get similar shrugs
It's only in THIS game ... for now.
I bought a 12900k 2 months before the 13900k. I've been having the same issues with the 12900k since about 5 months after using it. I believe the TDP on these motherboards have degraded the chips a bunch. I also have beefy water cooling that keeps the temps around 80 at full load. - The bios probably called more power because of the cooling efficiency on mine and burnt the chip up.
It finally gave out yesterday. Any load for about 2minutes - Blue Screen Crash.
Wow, it is great to hear people talking about this. We deployed some i9 machines at work, and after awhile 2 of them started crashing for no reason. The, usually really good, Dell diagnostic software could never catch it. Reformatted, replaced RAM, SSD, etc, no luck.
I saw deep on a Reddit thread to disable turbo boost and it resolved the issue.
I gave this information to Dell and they ended up replacing the CPU on these machines.
I've been helping desking and admin-ing for nearly 10 years in various sectors, and saw less than 5 CPUs die in that time, then I had 2 back to back. Was mind blowing
I always learn so much whenever Wendell just wants to do something like play Dwarf Fortress.
Today we learn a bright side of telemetry.
Thanks for this breakdown of your hard work. It reinforces my decision to change from Intel to AMD for my most recent build. I migrated from a 10900k to a 7800x3d, and so far I am very satisfied.
My first two computers used Intel. And once I saw AMD really supporting their sockets for longer, I decided to make the jump and couldn't be happier. More powerful for what I need and
less power hungry than Intel.
@@BeeWhere Yes, less power = less heat = less noise too.
I gotta say, the intro was a solid hook. You had my interest and now you have my attention.
After watching the video and thinking about your findings, the thing that stands out to me is your observation of the tick rate on the game servers falling by 50% before the crashing. This tells me the CPU pipelines are being invalidated, which is what's impacting performance so badly. Whatever is happening to cause these pipeline invalidations is the real culprit, and I have a suspicion about what it is.
I believe the CPUs have new microcode to detect attempted side-channel attacks, and when that is triggered the CPU intentionally invalidates the pipelines on all cores and additionally flushes the L1 cache. This would immediately flush any security keys from the CPU causing the side-channel attack to whiff on stealing something valuable. Next, the L1 cache needs to be reconstituted, including reloading keys. With NVMe drives, you can realistically expect 12+ CPU cores requesting the same key file simultaneously would work fine, but I suspect the IO errors being encountered are OS-level locks causing the cache reconstitution to fail, thus crashing the process, and possibly even the kernel.
Another cause could be actual side-channel attacks, and the CPU trying to protect itself. If the defense strategy chosen by Intel is to crash the CPU rather than allow compromising data then they would not want to publicize that because now you have an instant mechanism to launch a DoS attack against a service or service provider.
For end users, I think that certain users are affected more frequently because their system has been compromised and is under frequent side-channel attacks. So this begs the question, which is worse? Do we have less active side-channel protection, probably increasing risk of keys being stolen, or do we have active protections that ultimately choose to unalive the CPU rather than allow keys to be leaked?
I would see if you can dig up any data on the occurrence of side-channel attacks on these CPUs compared to the older 12900k systems.
Final thought here, the reason that slowing down the memory improves stability is that at the lower clock speeds, the mechanism in the CPU that monitors for side-channel attacks is able to better analyze the flow of data coming into the CPU and makes less frequent false positives that trigger the failsafe mechanism.
I clicked because Wendell was on the left
He's on the right for me
hes in the middle for me
He's upside down to me
He is on top of me
I don't see him.
I'm seeing a lot of people confused. It's not just i9 chips! The 14700k is also a POS. I've seen others with issues as well. People talk about the 14900k more but it's still a problem for the rest of us with 14700k chips too!
I have not tried to overclock mine. it's running with DDR5 5600 48GB corsair modules. Custom loop with 420mm + 120mm + 280mm radiator. 6900XT GPU.
It crashes in two different operating systems! I dual boot Windows and MidnightBSD. In the latter, LLVM clang will start crashing during long compiles (10+ min). On windows, I mostly game and that is also unstable. I made all the recommended changes that intel proposed. I turned off asus mce on the first day of ownership last november.
Worst CPU I've ever owned. It's slower than the 3950x i had before for compiling and with all the nerfs to make it stable, it's starting to lose the gaming advantage. Huge waste of money.
@@ChrisL-d4c 25% failure rate means 75% with no issue. There's no guarantee you're going to keep being this lucky in the future though. For many the problem starts small and progresses.
Hear anything about this also affecting the 14900hx laptop cpu? Or they safe?
@@WaterspoutsOfTheDeep I've not seen anything about laptop CPUs either way.
you don't have to "try" overclocking it, turbo boost is usually turned on by default so it automatically overclocks for you. If you want to avoid overclocking you'd have to turn off turbo boost.
I bought my 13900k in October 2022. Many 13900k and 14900k owners didn't experience any problems until this year. I find it very odd that so many of us experienced issues pretty much at the same time simultaneously. Most would say the power profile caused the wear and tear but practically all at the same time? Different motherboards, different profiles that are aggressive and non aggressive? Different levels of use? Many working PC gamers aren't on their computers for long periods. Really strange.
This is slightly terrifying. A problem that impacts perhaps 0.1% of enthusiast customers is a poor experience, but I can appreciate how difficult it would be for Intel to root-cause the issue. If it's 10%, that speaks to a product design issue which *should* have been caught much earlier. Intel's own internal testing should have caught this and resulted in mitigation before customers noticed, and preferably before the chips shipped.
Thing is, if its getting worse over time, this smells like hardware defect in fabrication or material failure, not design flaw. I would not be surprised if we find out this down the line.
@@ponocni1 But then you should see the same issue cropping up with i7s and i5s, which we don't. What's so different with the i9s? If it's that 8 e core cluster then why doesn't disabling e cores help?
@@zodwraith5745 i9s are binned for higher max boost clocks and more power, so maybe whatever criteria that requires also makes them more vulnerable to this problem if it's an issue with them having pushed their manufacturing process to the breaking point.
@@zodwraith5745 It is happening with the 14700k. A number of people have reported it and I've seen it with mine. Bios updates helped some and I have run it with Asus MCE disabled the whole time. I've got a custom loop so it's not a cooler issue. (420mm + 120mm + 280mm rads too)
@@LucasHolt But what is the problem specifically? I've heard it seems to occur mostly with specific games. I haven't had a hiccup from my 14700k but I'm on MSI. Keep in mind most of these issues were on ASUS boards, a few with Gigabyte, and I've never heard of an MSI or Asrock occurrence. Although I think the narrative changed fast because Intel is such a juicier target.
But what Wendell is talking about in this video isn't Z790, he's talking about W680. *_ASUS_* W680. A SERVER platform that uses consumer CPUs. That means 24/7 full tilt all cores running. That's why the error rates are so much higher. I haven't had an issue but I wouldn't be shocked to see my system throw up an error every few hours under _that_ kind of abuse.
This is unfortunately not uncommon when large companies deals with a big design/quality issue. When you have a reputation for quality... the last thing you want to do is openly admit you made bad things (and this also can open you up for lawsuits- like these server support companies looking to recoup all the extra service costs, or the company that lost 100k in their player base due to instability). Say vague things to sound like you care while claiming some type of user error. See how Toyota handled the 3.0 V6 oil sludge issues (but that was more a Toyota cultural thing), and how Apple handles EvErYtHiNg. Intel has been playing catchup to AMD Ryzen for years now.. after making fun of the chiplets being "glued together". All they really have left is their reliable reputation... so they are intentionally keeping tight lipped and hoping the media coverage of the issue just "goes away" after they release their next CPU generation - one that actually will be new and not just the old ones with more voltage.
That's exactly why this Intel instability issue should be headline on all big PC news websites
@@damara2268 I bet Intel is really glad CrowdStrike pushed out a global Windows crash update filee. That gives the media something ay meatier to talk about versus Intel making bad CPUs that degrade quickly even when not OC'd.
Solution: AMD
Buyer’s regret in two words
I am so glad I never upgraded my 12900K, it still work like a beast and I play games around 100FPS plus minus 20, mostly so I don't really need ultra fast CPU. My Mobo is also 5th gen PCIe slot, so probably not gonna upgrade for a few more years, but when I do I will switch my 12900k and Mobo to AMD setup.
Maybe you should do something similar to what Steve/GN do sometimes and set up some form of communication so devs/data center admins can share data to hopefully get closer to a real answer? Heck, maybe even get physical examinations of problem CPUs if possible
@CMDR_bravoMike That would be great, but I don't want to bankrupt GN. They always spend "yes" money on their investigative journalism.
Based off my experience overclocking these things......I'm pretty damn confident in saying the problem lies in either the ring bus or the system agent is just at it's damn limits. I would more than be willing to bet on the VAST majority of the true raptor lake dies that are failing are all going to be larger 16e core dies. It never sat right with me how wildly the power consumption on these chips was affected by the ringbus and how minor bumps in speed took absolutely shocking amounts of power and tons of system agent voltage fiddling. In retrospect, this explanation falls in line with how cagey intel has been about the SA voltage in the non-z chipsets. Basically we're seeing the limits of one of these two things and that it's so close to the limit that natural degradation of the silicon will cause faults.
The craziest example I can remember is an alder lake cpu that would do 5ghz ring.....at about 400watts. Drop it down to 4.8?...300 watts. Drop it to 4.6...250...and 4.4 only took 220-230ish. E cores were all disabled and the p core settings were the same on all tests...the only thing that changed was the ringbus/cache.
Seconded. I high suspect uncore interconnect degradation being a proximate root cause. Linpack won't catch it failing until things catostrophically fail later down the line. Harder to validate. "Just shove more voltage into it and it'll be fine. How long do you expect users to run these chips? 15 years? They'll be obsolete in 15 years." Try less than 1 year past intel execs.
I run my 13900k with turbo disabled so basically sitting pretty at 0.8v vcore about 95% of the time. Bought it second hand for $300 about 5 months after launch due to "issues hitting ddr5-8000" pre-dellided with a custom copper ihs. Person I bought it from got a second chip direct die cooled it and from what I can tell that new chip worked just fine. Likely doesn't anymore. Lol. Probably swapped it out for a 14900ks.
My chip still shows signs of degradation. SA was at 1.3V and vdd at 1.35V for 96gb dual rank ddr5-6800 for over a year. Now experiencing random latency spikes in the past 6 weeks. Even reinstalled windows as it's been over year but no change so it's not software related. Memory test passes at 32 hours as I don't have time to test for longer.
Here's to hoping whatever releases this year is better than the hot garbage intel released or I'm switching back to AMD. Intel cpus failing were unheard of before really 11th gen. Started going downhill quick. 11th gen also had SA/IMC issues compared to 10th gen.
It's telling intel is being hush hush about this as it's a huge blow to their QC reputation.
Could it be that this is due to the AVX offset was not triggered probably in light tasks such as gaming?
I mean we will never see this in all these stability test then, because the load is heavy enough in these benchmarks for AVX offset to kick in
@@quanlethienminh6002 It's not likely to be an avx offset issue. The way to understand what i've pointed out here (to the best i can oversimplify it)..is the ring bus is like a road/traffic network between all the cpu parts while the system agent is the traffic cop/monitor/system. The pcie controller and memory controller all sit on the ring bus. In the scenario where the SSD corrupts....the data can't make it from the cpu core down to the pcie controller. Add this explanation to what wendell is saying and it should make things much clearer as to wtf exactly is going on. Disabling/slowing down stuff doesn't change the fact underlying infrastructure is just pushed to it's limits the way it is.
Iirc, the 1700 socketed
Xeons... have avx512 by virtue of having no e cores......if the xeons are ecoreless dies and no failures exist..we might having something of a smoking gun. I havent touched a 1700 xeon...and havent seen a delidded die shot
If the CPU performance halves before crashing maybe it is the IMC and memory transfers stall causing bandwidth starvation.
This would explain why most pure cpu metrics continue to look fine.
I'm curious if the issue reported is exclusive to ddr5.
Megekko (a big Dutch retailer) has an informal policy to push ddr5 dual dimm over quad dimm for stability reasons but they didn't specify this would be for Intel only.
This is how you do investigative reporting!
**chrome crashes continuously while watching this video with my never overclocked 13900k**
You never overclocked it, but your motherboard did. And that's the problem.
@@someperson1829no the problem is Intel and goes well beyond overclocking when it’s seen on boards that actively avoid overclocking. What we where seeing is those systems failed sooner due to it but failures are starting to be seen across the board on systems that were well within safe limits.
@@backupplan6058 Wow. There are LGA 1700 boards that don't activilly shove overclocking. How interesting.
@@someperson1829 standard specs for these chips seem to be already quite far off the good envelope and inside a territory of power consumption, stability and longevity that would normally not be accepted from a chip at factory specs.
@@someperson1829 As the video mentions this problem also occurs with server motherboards and no overclocking.
Most of the comments are saying that there may be design issues but it got me thinking that maybe its a test, qualification, and reliability process issue. Those 3 are internal groups in all semiconductor companies that makes sure parts are good when they go to the customer.
Now it got me thinking did Intel nerfed those groups to resolve the yield issue they were having several years ago?
Looking forward to the Gamers Nexus exposé 🤣
3 month later they pull out a 5,000 page reports complete with charts and 2 hour video
And then for all the fanboys and bad faith actors to call it "drama" and dismiss everything they found.
I like that they want to keep places honest. I just want a standard. Don't favor giving one company a chance to explain and make steps to fixing things then not go back to other companies that did the same. If said companies didn't change let us know too.
Intel execs be like: hey who is that long haired jesus looking guy doen there filming our hq?
I'd watch it@@niyablake
After a 6 month battle with Intel regarding an RMA for my i9-13900kf I finally received my replacement CPU and I can confirm it is 100% the fault of Intel. I had game crashes, BSOD, and random PC shutdowns. This is a problem that Intel seemingly doesn't want to acknowledge. I have not received any reassurance in regards to the issue not happening again. Super disappointed and dissatisfied
I have had a ssd drive corrupt on a 13th gen intel 13600k . Was related to IO errors. Can confirm! Was not OC'd and on latest bios.
I chose to get a drive with on board cache instead of host platform memory specifically so I don't have to be nearly as concerned about drive corrupting internal mapping information just because i got a little spicy with cpu overclock.
I have an 13600K with multiple NVMe SSD drives in Linux and haven't had any issues at all in the 18 months I've had my system. I don't overclock and my DDDR4 RAM sticks don't like XMP so run at stock speeds, so it's quite a vanilla setup (ASUS motherboard with defaults for all the CPU settings).
I hope you had a backup, that is scary.
Thank you. It's always interesting to hear about modern trends in data centers. I don't work in gaming, so we don't use high end consumer CPU's, but this is worrisome because it shows that the latest processors from intel are truly the bleeding edge.
Love that KDE Plasma wallpaper on the screen in the background :) And I am so glad that i went with Ryzen this time. I missed the AM4 era but I sure hope AM5 lasts as long as AM4 did.
My 13900K (running at stock) has been rock solid for about a year, hopefully I don't have a ticking time bomb 😬
I hope high temperatures and very low size processing technology don't degrade the chip faster than you should expect.
Is this @Intel PR speaking through @JarrodsTech account?
🤑💸💰
mine is also fine, however I noticed a problem in a single core if I overclock. solution was to just overclock the other cores. but mine is direct die so the temp is like 40c under load.
i have a 13900kf at stock settings with intel power limits rock solid since january 2023
My 13700k / 4090 build has been pretty much flawless for almost 2 years now. Even started out of a ddr4 board for about 5 months until msi released the z790 tomahawk ddr5 board. But 13700k oc to 5.6ghz on all 8 p cores, All 8 E core oc to 4.4ghz, Ring oc to 5.0ghz at 1.34v. Even the mem controller has been very solid. 32g.b Corsair rgb vengeance 6400 cl32 hynix A-Die oc to 7200 34-41-41-83 at 1.45v. On a msi z790 tomahawk. I can't complain. I've also been using a contact frame since day one for both boards I've had this 13700k in. I've heard a lot of the problems could be related to the bend they get with the stock mounting bracket. But until intel releases a statement / fix we really don't know what the issue is. I've used mine hard for over 8 hours a day every single day for work and gaming for almost 2 years now and it runs perfectly fine. I haven't babied it and I won't. I bought a k series for a reason like all the k series intel cpu's I've had.
This is an incredibly wonderful video/investigation. I'm living this myself right now. Actually went through the RMA process with Intel, and they specifically said they don't have ANY replacement processors at this point - all out of stock. They are actually giving me a cash refund instead. There is definitely some major drama going on behind the scenes over there and I really am appreciative of this video starting to uncover it. Hopefully Intel will be forced to disclose some info soon.
Wow really they have no replacements! I think we all know that Intel have figured out what's wrong and are keeping lips very tight.
Its not just 13th and 14th gen. I had this exact same problem on a 12700k. Just erratic gremlin type behavior that i was never able to resolve until i replaced the cpu. I litterally replaced every other component.
Nope.
Must have been something else.
At this stage in the game if I owned a 13th or 14th gen Intel cpu that showed any signs of crashing or performance degradation I’d be sending it back to Intel on the first thing smoking wether it was under warranty or not. Intel needs to admit there is a problem and recall the vast majority of these chips (after correcting the problem, which I’m sure they probably have with newer built chips).
They are basically going to pull a Nintendo and never admit there is a problem (think joy con drift). As time marches on I feel for people who’s chips are either going to fail out of warranty, or won’t perform as well as it did new because of degradation or bios updates that will gimp performance.
Joycon drift is intentional by design to force people into buying new joycons tho...
A faulty CPU can cause all kinds of funky behavior, since it processes literally everything. VRAM "errors" can be as simple as CPU failing free space > allocation check in DirectX API somewhere, I/O errors can be data inside caches being corrupted/CPU failing to calculate checksums or whatever.
Speaking of weird slowness behavior - kinda sounds like clock stretching , Intel CPUs have it as well, 12 gen and later (may be even earlirer). Slowing down memory does reduce effective CPU load, disabling E-cores lowers the temperature of P-cores, both of which make CPU more stable at a given voltage+clock speed combination. -Seems a lot like Intel/mobo manufacturers/UEFI f-ed up clock voltage tables badly, or boosting behavior being too trigger happy to reach max clocks.- Would like to see results of manually lowering clock speeds.
Some Gigabyte mobo's happily supplying 1.7 V to 100 C 6 GHz CPU makes damage irreversible (albeit with some weird settings, but it''s kinda ridiculous that it can go that high without user intentionally typing 1.7 into CPU voltage, just by weird interaction of extremely aggressive boosting and LLC).
Also, isn't it funny how strong Intel (OEM?) mindshare is? AMD has been crushing gaming since 3D cache release, and has been competitive for 7 years, yet there's still 70/30 split there.
UPD after watching GN interview: oooh, seems like everything is much worse than I thought. Laptop CPUs are failing, 13600Ks, heck, even non-k SKUs do. I'm wondering, how big is system agent and cache voltages are, and how did they change from Alder Lake
yeah, when the cpu starts getting into that fuzzy logic territory where some bits start flipping to the wrong side, memory/pointer errors just launching a program aren't out of the ordinary,
i guess it can go to shit with the PCIe addressing all the same :D
I'm never going to get any DirectX errors here. That's because I don't do Windows.
@@1pcfred OK, Vulkan and OpenGL probably have similar checks as well. Anyway, something else may get corrupted, it doesn't really matter
@@MrKatoriz are you out of your Vulkan mind? I feel the issue is they're K SKU chips. Intel shouldn't even be making those. They're just asking people to break them.
Here from the GN video. Thanks for the information.
Needed a new virtualization host for @ home, waited until the company i work for rolled out massive amount of new Intel machines. Listened to what was being told on the work floor about the new hardware over some months. Did buy AMD 7000 series because heard nothing but problems on Intel at work. Even when it did not crash then everyone was complaining about the big little architecture, which yes you can disable or do core affinity. But be serious who wants to do stupid settings like that when the competition just has something that does not need that kind of tinkering at all. On AMD i was very careful as well to buy the right combination of hardware. Special for MB and memory combination had to be a verified set. In the end i did buy a B650M AORUS ELITE AX ICE and CMK192GX5M4B5200C38 and for storage i plunked the board full of these: Lexar NM790 4TB. I can say i am very happy with my home server.
Great video, Wendell. I had to swap a 14900K in my desktop that had these problems. Two weeks later the news about the issues started popping up. So far so good with the replacement running at Intel "Extreme" spec.
@@aaronmoore3050 Why it makes no sense? You know that even with the same name the silicon is not a copy and paste?
I love how people ignore the goings on with the 12900K & KF's which are suffering the same issues be it not as bad (lagg/stalling but not hard crashing)
Intel's is counting on their reputation and brand name to carry them through since the 11th gen.
They have been doing that since 9th gen........
You mean 7/8th Gen?
If not longer
well I think most ppl that know how this stuff works knew something was wrong with the chips more than just bios when the engineers have not had an answer within a few weeks ... now the months with no word of a fix, it's pretty clear that the chips must have an issue with a batch of chips or design issues somehow not picked up during the years of design or contaminate as u stated that could be causing issues... I'm so happy to be on my x299 i7-7820x 5.1ghz cpu with up to 10th gen CPUs i could slide right into my mobo and not need a new mobo. Tried and true and no problems like this...
A shame that they dont make new ones, only xeons after that...
I'm curious if you've found much data involving non-i9 CPUs, or rather, specifically involving the i7 CPUs. Here's why:
TL;DR: I have two i7 series, a 13th and 14th gen. The 14th gen is currently experiencing the problems mentioned. I believe it is cooked, and will be RMAing it shortly.
Long Version:
I've got a 13700(non-K) and a 14700K.
The 13 ran so hot with an air cooler that I decided to switch to watercooling for the first time ever. I figured since I had to redo that much, I might as well upgrade and bought the 14 (this was before I found out that the 14s were just rebadged 13s). Got lazy, and kept using the 13+air for a couple more months before I finally put in the work and did the swap. That was back in late December, early January.
Before I continue, I should note that even though I bought a K CPU, I did not do any overclocking besides what the default settings on my Gigabyte Z790 Aorus Elite AX mobo was doing. I even had XMP disabled on the RAM because testing with Cinebench showed no difference between XMP Off (4800MHz) and XMP On (6400MHz). I left everything else untouched, including Gigabyte's default "Optimized" settings, which I assumed meant 'optimized for performance + efficiency/power', but turned out it meant 'automatic overclocking'.
A couple weeks ago, things started getting funky. The game I've been playing (Sons of the Forest / SOTF) was running fine, but other applications around Windows were running slower than usual. Then I went to host the game one night, and the machine BSOD'd... then began boot-looping. Eventually, I got it back into Windows... briefly. It BSOD'd just loading Discord. It began bootlooping well before getting into Windows, and even got to the point where it was experiencing hard crashes just sitting in the BIOS.
I did some BIOS flashing to bring it up to date, and the machine started booting a little further again, but still resetting before loading into Windows. I had just made a Ventoy USB stick a day or two before the issues began, which included Linux Mint, which includes Memtest, so I ran that.
Tens of thousands of errors began accumulating within 10 seconds of starting the tests.
A bunch of troubleshooting and configuration changes later, I was convinced it had to be the CPU, so I dropped the 13700 back in.
Worked flawlessly, and faster than the 14700K had been working in at least a couple months.
I was ready to call it and see about sending the 14700K into Intel when an annoying little thought popped into my head: What if it's just a pin-contact issue? What if it works if I pop it back in?
So to avoid Intel receiving it and telling me there was nothing wrong with it, I tried it. It worked! No BSODs through several Memtest runs, several Cinebench runs, or playing SOTF.
So I assumed it was just a bad pin contact or something to that effect.
That was a week ago.
It has slowly been degrading ever since, to the point where it took 20 seconds to load the Windows calculator. It now fails to boot and the BIOS interrupts and tries to get me to revert to "Optimized Defaults", but it turns out I can choose "Enter the BIOS" and "Save & Exit" and it will boot fine. Choosing "Exit without saving" results in another boot failure, though.
Other desktop applications were running obscenely slow, too, but oddly enough, SOTF ran fine.
I found Gamers Nexus' video on this topic and watched it.
Went into the BIOS, disabled the iGPU (being used for my second, non-gaming screen), disabled TurboBoost, disabled Gigabyte's overclocking, and turned off every other performance-enhancer related to the CPU, along with turning off the E-cores. I even dropped my RAM from 4800MHz down to 4000MHz.
Now, instead of sitting around 60ºC while playing SOTF, it sits around 39ºC. According to HWMonitor, the core voltage doesn't go over 0.925v.
But it still fails to boot without pulling the trick above.
So I think my 14700K is just as cooked as the 14900K's you mentioned. I don't think it's limited to the i9 series.
From my extensive overclocking on my 13600k I've had this issue as well. What I've found from my testing is that it seems related to heat/power spikes. What I've noticed is that when at the edge of the thermal envelope +80c, that there could be times where it spikes over 100c but not record it because of how fast it can spike before the temp is reported. But ultimately, the instability comes down to voltage fluctuations and temp fluctuations. When you lock the voltage and keep the temps below 80c on max prime95 loads you'll have a pretty stable system.
Then maybe that’s the way to go for now on those chips? (At the cost of some performance)
@@EJM07 I am currently in my "summer clocks" Which is standard boost clocks on both p and e cores, but massively undervolted. By default my chip wants to run at 1.37v to achieve stock boost clocks. However, I can achieve this with 1.16v on load and 1.08v on idle. Yes, I have my load line calibration set where it adds slightly bit more voltage when under heavy loads so it doesn't have vdroop. But with this settings I always have a temp 6-7c above ambient on idle and temps not exceeding 75c on full load. This is even with ambient temps of up to 38c. Yes, ambient temps of 38c, that is not a typo, and yes it sucks badly when it's that hot in here. Specially with 70+% humidity.
@@chieftron i feel ya. high temp + high humidity is my pain also, it just hits differently. intel's solution to faster cpus seems to be just more voltage on the cores lately. i've seen some crazy voltages in boosting cores on some 12th gen. always thought it was a misreport from the tool, but now? not so sure anymore.
Genuinely one of the most interesting videos I've seen in a while. Really love your clear, specific and entertaining communication style. I love how you're very clear about what you do and don't know.
"Making the memory run dog slow seems to be the most effective" >.>
This right here is exactly why I even run a xeon even in my gaming system. I had a bunch of random crashing with 1st gen Ryzen with a 3200mhz memory kit, not even particularly fast, but I thought "well it is first gen maybe that's just what I get for giving it a go" so I upgrade to 2nd gen Ryzen, it's not as bad but it still crashed more than I would have liked, I swap the RAM kit and it's the same. When 10th gen intel is starting to see discounts after the 11th gen launch I grab a 10850k on sale and sell my ryzen platform, thinking that maybe the intel chips are just more stable right now since I never had issues with haswell, cascade lake did not improve things at all if anything it was worse. I drop the memory speed down from 3200 to like 2666, better but still not nearly as solid as the system I had 4-5 years prior, I decide well I've changed everything except my GPU at least once and the Vega 64 is getting long in the tooth anyway, so I'll get a 30 series card, nope.
Eventually last year out of sheer frustration at frequent crashing in a few games I drop the 10850k and buy a used Xeon W-3235 and a Supermicro board with the most boring 2133mhz 6 channel ECC memory kit imaginable, and I have once again found a level of stability I think is acceptable. Like intellectually I know, and knew, that as speed increases a greater degree of signal integrity and more precise timing are required to make faster speeds function, but since people weren't really complaining about it with DDR4 systems I thought it had to be my hardware, then here it is en-masse with the latest DDR5 based systems.
In a way I feel vindicated that the crashing issues I've been seeing have been getting worse as systems have gotten faster, it also makes me glad that instead of buying 13th gen intel I went the xeon route because I felt like the underlying issue was at least related to memory speed and there was no way I wanted to enter in to a still dual channel DDR5 system.
I’m really curious if this will ultimately lead to a recall or a class action lawsuit.
Also really curious if AMD sales reps are bringing the instability issues up when they’re talking with data center clients. The fact that this is happening with workstation boards that aren’t overclocked and are much more conservative with power management makes me think this is a flaw in the fundamental design of the chip. But if so, one would think similar problems would be appearing in Xeon chips too since they share core designs, right? Very puzzling…
Epyc is AMD, you probably meant to write Xeon.
Mate, after the way Intel behaved in the early noughties against AMD, I sure hope AMD are bringing this up. 1. Prospects need to know that after months, Intel hasn't root caused the issue or as Wendell said, clearly admitted that effected clients will be made whole. 2. It's glorious when Karma scores against arseholes like Intel.
As an aside, I think the game companies who collect telemetry have an opportunity to leverage that in order to help get to the root cause. If it's not sufficient (as Wendell suggests), this could help the Data Scientists to refine the collection to improve this.
Xeons are nowhere near pushed to the single thread clocks of the high Ks.
@@petemonster1 I miss the noughties.
xeons have a whole different design philosophy behind them, and bring intel the bulk of it's money. they wouldn't sabotage their income for a headline.
remember, intel laughed and derided amd's chiplets. then amd slaughtered them, intel panicked and tried to copy them. pretty sure intel thought they could get away with blaming customers(you're not a user, you bought something) just like nvidia did.
13700kf + 4070super here, experienced a few random and inconsistent crashes in Counterstrike 2 and it's got me concerned
ouch
@@Mr11ESSE111 I was wondering this too, I can't replicate, it's happened a total of 3 times now at 366hours played.. idk, none of the other games I have crash and haven't had the weird/dreaded out of memory error
@@Mr11ESSE111 Never had a crash with my 13700KF + 4070 Super
@@Azerkeux Search google for TDP (PL1,PL2) bios settings and set them to PL1 125, PL2 253 try that. Worked for about 4 customers of me
Your intro just hits right, and I don't know exactly what it is, I just really like it.
I have a 13700K and I got crashes on multiple softwares from Day 1. Chrome tabs would randomly crash all the time too. I intensly monitored CPU temps and observed that as the CPU started to get stressed the temps would instantly rise to a 100+ degs and then any program using the CPU would crash and the usage and temps would quickly decrease down (I knew that Instantly getting to a 100+ should be fine for this CPU as its designed to always perform at its thermal/power limit). So I figured alright lets turn off a few cores - didnt work. Then I bought a new 360mm AIO cooler hoping it would solve the issue - nope still the same - my Asus Prime Z690A can deliver a lot of power to the VRMs so the CPU instantly started to consume 235W and temps would rise and it would crash.
Furthermore every game that had initial shader compilation would instantly crash on load but the compilation would continue from the point it had crashed - this way I got Hogwarts Legacy to run after 15-20 crashes on load. I got Last of Us to run after around 180 - 200 loads (yeah I kid you not about that number). Once shaders were loaded though the games ran just fine for the most part - they would intermittenty crash during loading transitions ingame when fast travelling (Hog Leg and Cyberpunk crashed about 1-2 hours in per session). Last of Us though ran only about 15-20mins per session before crashing. I checked the logs for Last of Us and the error exception was pointing towards Nvidia. Decompressing via 7-zip and Winrar worked just fine but custom decompressors that stressed the CPU would not go beyond 5-15% before crashing, the time of crash would always vary somewhere between those percentages and I finally figured out that the CPU was basically shutting off at an arbitrary point during decomp/shader load etc. when temps were getting too high and hence no exact error was being thrown. If it would always crash at lets say 12% during decomp then they would point more towards a software issue, but this had to be the hardware.
Freaking Discord would crash. Cinebench would Crash. How the hell do Chrome tabs crash?? This meant spikes in CPU usage/temp would make this stop working.
I finally gave up trying to play that game and figured the CPU I had received is just a bad die off the line. I was pretty much ensured that whatever was supposed to monitor the temps and adjust the clocks was not doing its job.
And then - after many months of suffering (I bought a new SSD - bought a new GPU - bought a new Cooler - multiple OS reinstalls - almost bought new RAM sticks) the oodle decompression failure article came out and I downclocked only my P-core multiplier to 51x and that instantly solved everything.
For 5-6 months I went through a crapload of frustration - it was my first high end build and I got ripped off by Intel - all because they were out chasing Benchmarks numbers against AMD - is there really a point to doubling power consumption for 10% extra performance and lack of stability? Unreal - how the heck does something like this pass Quality Checks? From what I know every CPU off the line goes through atleast one round of stress test.
Anyways - rant over - my trust in Intel has been severely eroded - X3D is going to be in my next build.
Just lower your TDP 125base,253boost and your CPU will run at 5,4 at max 75°C without any issues.
Given what we're learning, it's entirely possible that the post-production stress test is actually *causing* some of these problems before the CPUs go out the door = P
I have a 13700KF, which custom decompression programs did you use so I can try to replicate this issue?
bro made a whole essay💀
Based on this and other videos, locking down P core multiplier to a lower setting is only a temporary solution. Even booting into OS and running all those programs with stock or higher settings, some damage is done. Eventually you'll have to reduce it further. Only way I can think to prevent this damage is undervolt and underclock (I'm talking 4.3GHz all cores) before even first time OS boot.
That's what I did on a friend's 13900K system I set up for him. He's reported zero issues since setup. Frankly, these high core count CPU's are begging for a heat offset with a slight undervolt; no more than -100 to -135 mV, because otherwise it becomes unstable. It would probably be fine with a 5.3GHz overclock on all P cores during gaming since the motherboard is regulating 235w max just fine.
The key is maintaining no more than a 1GHz delta between all cores and NOT sticking to stock voltage.
The whole 6GHz on P cores advertisement by Intel is fraud IMO. Yeah, you can reach it, but you can say goodbye to your CPU sooner or later as a result.
Problem with this is for people that don't have any invoice like me (blame myself for buying 2nd hand). The second players here are mobo manufacturers, which allowed by default insane levels of power and voltage for unlocked chips and the third one and most important, Intel, who fucked up surely in the manufacturing of the chips.
Bought almost on release day, at start chip was working well but after 3 months of use things started to go nuts. Everything crashing on linux, even the most basic things, not even to talk about games.
Mine is unstable at stock settings (even Intel's Defaults), and so after reading the news and tons of subreddits posts, went with reducing the max freq allowed on bios (turbo ratio offset) to the minimum allowed, (which by itself helped but didn't solve the problem), and disabled all thermal and voltage enhacements (TVB, etc). That itself dropped down the voltage from 1.1v to 0.8v so i upped the AC loadline so chip voltage was 0.85v, and since then and with all those settings it seems pretty rock solid (but i ended up with a 13900T instead of a 14900K which i payed for).
Maybe ask the original buyer to provide you his, plus the transaction that happened between you two.
I rma'd a cpu with AMD and they were cool with that.
Cpu was bought for me by a different person due to regional availability and overall cost at that given time.
Remember when one skew of AMD chips were melting because of motherboard configurations was a hot topic? Maybe the algorithm is really tailored, but I got A LOT more of that than I ever did these two generations of intel just slowly going back to sand.
@@Peterowsky not to be that guy, but you wanted "sku".
@@Peterowsky The AMD issue was much more obvious. Some still consider AMD to be a cheap unreliable alternative to Intel. (Well the tables have turned it seems)
Intel situation is more subtle, as no physical damage is present, and the user can always be blamed for handling.
I believe they are trying to damage control and downplay the situation.
New chips will be coming out after all.However it seems the issue is so widespread that they won't be able to avoid it.
@@viigraphics Yeah, someone is about to pay looooots of money for replacements...
Nice work...I go back to 8086 era and appreciate your ability and willingness to dig this deep. Thank you!
Who's going to trust Intel after this? I thought they may be on a path to properly compete with AMD CPUs, but now I don't even know if I want to risk getting an Intel CPU even if they do end up getting the tech right.
everybody, this isn't the first time. Nobody remembers the past beyond 5 years ago
lol they already compete with AMD. better single thread performance
@@ZombieLincoln666 not true for years
@@ZombieLincoln666 They get the faster speeds through sheer brute force. Power efficiency is important even for desktops. More power draw means hotter room unless you have a Linus-level setup that dumps the heat outside of the living space.
Me.
Intel was on top for decades. AMD just recently (5 years) started being good.