The way the whole tech news industry went into radio silence, collectively, after Wendell and GN broke the story, was genuinely scary. Louis Rossmann's video was the symbolic bat-signal. After that one, it was absolute silence. Props to Wendell for helping keep up awareness, I wish other outlets like GN and HUB would do the same.
The fact that really good cooling could have had a part in causing an issue really blows my mind, imagine going all out spending time and money and your friend with a stock cooler having less issues then you
The irony of talking about the Intel Greybeards is those are the very people Intel is PAYING to walk out the door. Huge braindrain over there from what I'm hearing.
They created famous CPUs we know and love. Meltdown, Spectre, Foreshadow, Zombieload, SwapGS, Plundervolt, IME, RIDL, Fallout, Lazy FPU restore, Downfall, and now this. I say - keep them men working and create more beautiful CPUs!
@@moamber1AMD also has it's fair share of issues. Sinkclose, zenbleed, inception. Not to mention that spectre affects both Intel and AMD CPUs. The point is designing CPUs is hard. Not particularly related to the talents working at either company
@@moamber1 Bad actors will always exist, and those include the Government agencies responsible for some of the vulnerabilities, What seems like a mistake, sometimes was totally intentional, but what they didn't intend on was people finding it and also abusing it.
My problem with this whole thing is that with the issues happening at all, and the denying of the issues for so long as well as misleading information about them, I don't feel comfortable using anything from them for the servers, desktops or laptops that I manage, at least nothing from 13th gen or later. Sure give them a generation or two of stability and they might earn that trust back, but I simply can't take that risk when the business operations are on the line.
We should be building cpus and gpus with open source designs, open to all and run by the internet community. For profit corporation model is doomed moving forward
@@xlr555usa that won't happen, even most RISC V CPUs are not completely open source. The companies are very protective of their IP and rightly so, there are a lot of patents around CPUs that are significant advantages for anyone able to use them. Not that I would be opposed to the idea, just being realistic about it. Also the design is only one aspect and in this case I think the manufacturing is a significant part of the issues which wouldn't be revealed just from the design.
@@jeffreyparker9396 There was a time when CPU design could have been done using open source, but that's a long time ago. Today a modern CPU has billions of transistors and even the real design geniuses can't really claim to understand the entire thing. Even back in the 486 days we were talking millions of processors if I remember correctly. But the big problem then and especially now is manufacturing the processors. This is not cheap, and adjusting the design for a particular process isn't as easy as scaling the mask. No it's big money invested in the design and manufacturing even before the first chips are produced.
Up the Greybeards! You really provide an incredibly valuable resource for the PC enthusiast community and every other PC user, as well! My 14900K suffered weird errors after about 2 months while running with TVB which cleared up once I disabled it. It has been running fine since then. Thank you for all of your work!
We have 13900k for remote access and workload is compiling and synthetisation of FPGA designs. Now after 2 years of constant running we've seen failures where we get random crashes and freeze ups
@@BBWahoo Two years IS pretty new, in CPU terms. Traditionally, CPUs have far greater lifespans than any other system component, and have the lowest failure rates. That said, the information we currently have suggests that the problem does not lie in the physical CPUs themselves, but in the microcode that dictates their behavior. So if you purchased a 13th or 14th gen CPU today, but did not update to the newest BIOS, or your motherboard vendor has not yet released a BIOS for your motherboard that contains the newest microcode update... It's theoretically possible that your CPU could begin to degrade before you get that microcode. Generally, by the time the instability becomes noticeable by a normal user, the CPU has probably already sustained some amount of permanent damage. That doesn't necessarily mean that the degradation can't be stopped once the necessary microcode is applied, or that the CPU will fail within a 2-year period. (We just don't have the data to know for sure yet.) TL;DR: Yes, the issue can affect any 13th or 14th gen CPU, even new ones. If I were in the market for a new CPU, I'd look at AMD or wait for 15th gen. But if you already own a 13th or 14th gen CPU, there's no need to panic, because there are things you can do that will (hopefully) prevent any future instability issues. (Update your BIOS as soon as there's a version available with the newest microcode. In the meantime, you can adjust the boosting behavior/voltages in your BIOS to mitigate the risks.)
Why wasn't 12th gen affected by instability? Alder Lake is mostly the same as Raptor Lake, the only difference is it runs a little slower and has less L2 cache and E-cores on die.
tl;dr: CPU voltage loading algorithm is inaccurate when in memory-bound execution circumstances resulting in long-term harm @Level1Techs thanks for not letting this bugfix/intel fab issue just die a quiet death, homie. much to be learned, science to be done + lots of parallels from the past complicated by PWM i think. ( @7:16 the PID algorithm that directs the Vpwm for the CPU decides using x86 instructions in-the-pipeline WITHOUT considering the "effort" that differs between them circumstantially - is what i heard?)
compound this with, I think, the chip monitoring its own temperature and changing behavior, but the temperature estimation can be wildly, wildly off. and how that affects intended boost.
It's also something very difficult to test for and also counter intuitive, running tests on what your design is doing is extremely costly in time and resources, so most tests are basically going to be some heavy loads that really kick hard at the cpu but not something with a bunch of cache misses, waiting on memory or interrupts. And even if you try to test for those, it's really hard to reproduce a load similar to a game server
That's common sense if you are overclocking it. A dragster or Formula 1 car engine has to be rebuilt after every single race, because they are pushing them to the limits. That said - I've got 13+ year old Intel systems that've been overclocked all day everyday and are still running, the motherboard capacitors are the weak link there.
@@igelbofh ???? sorry if I'm wrong, but if the silicon is cooler, wouldn't you need less power, all else being the same? edit: i also think calling it a problem with direct die cooling is disingenous. They're perfectly fine on Ryzen. It's purely an Intel (13/14 gen ) issue.
I'm going on my THIRD RMA with issues on the 14900K. I am NOT an overclocker.. I just want a stable product that runs at advertised speeds out of the box for what I paid for and WORK.
I hd to take a loss and go 7950x3d. Only 1 minor issue with memory timing before a uefi update but now just works. Go AMD. You will find it’s easier to setup. Just put a negative pdo overdrive to undervolt and it will auto overclock. Simple.
I was once involved with a product where the system would sometimes leave the output on HIGH, the reason for this was because some of the drivers were better than others and the target had been reached faster than expected - but the code had no means to recognize this and simply waited for the output to come down - which it did not because the feedback loop had no mechanism to handle excessive output. The hard part was proving to the engineers that the reports were correct - only for them to reply "we saw that in development but figured it was a on-off anomaly and ignored it". If it can happen - it will happen
I had complained about this since coffe-lake: The motherboards applying insanely bad settings out of the box: disabling most if not all protection mechanisms, increasing the power-limit to infinity, and applying insanely stupid CPU-frying automatic overclocks with very high over-voltage that resulted in nearly no extra performance but a lot of heat - And Intel did nothing. That is what they are to blame for. My 8700K was gulping down in the neighbourhood of 150W when i first powered it up. No settings changed, straight out of the box onto the motherboard and into cinebench. Then i just enabled all safety-features again and set the rest of the settings according to intels specs. Lo and behold - the CPU was a tiny bit slower in benchmarks (well of course - i did limit it to the official 4.3GHz allcore instead of the 4.7GHz overclock) but it stayed below 110W. Some further small tweaking and now, even under full turbo in CB23, it stays below its TDP - which means it can now run 24/7 at maximum turbo while staying cooler and using less power. Sadly i did not catch that an UEFI-update changed some secondary voltages and fried the iGPU with a nice blistering 1.5V.
We live in an age when undervolting is necessary and both manufacturers are guilty of it. My 5950X at stock settings was underperforming by about 5% against official benchmarks, but after some efficiency tweaks (minus 200MHz boost, minus 30 all core curve optimizer, minus 0.1V core voltage offset) it actually performs 5% HIGHER than expected. It runs 15 degrees cooler and Vcore went from 1.5V down to 1.35V which is much better for longevity. I find that even budget processors like the 5600 need to be detuned a bit to keep the fans running at quiet speeds.
I run 5 GHz on my 13700K with wattage limited to 130PL1 and 145 for PL2. I can play games 1080p and 1440p just fine, with most recent patch...before this new one that just came out@@tourist6290
what wendell completely omites is that Intel much like nVIDIA has a complete death grip on the MB AIB's so, any "default bios mode", being performance or not, are directly approved by Intel, anything done that Intel doesnt like gets immediately axed (multiple instances were an AIB gave an option that was exclusive to K series CPU or Z series MB to lower tiers), the time frame of the craze microseconds boost clocks just for show on the BOX, and the outright lying on TDP aligns perfectly with the Zen launch, im sorry Wendell but this time i truly think you are either wrong or paid out by Intel to directly meatshield them or market them as the victim, which they are not, you said it yourself, more than a year and 2 gens back to back and they constantly refused to tell the truth until everyone already knew it from the mountains of dead CPU's on public display. And another thing for the AIB's, Intel did try to blame them in May-June-July, trying to make it look it was the same "shenanigans" that they did with AMD's Z4 X3D, but they immediately took back their accusations, why? because all AIB's refused to take the blame, and all of them had easy ways of exposing the truth and the "REAL" Intel imposed "DEFAULTS" since coffee lake.
@@eX_Arkangel "that Intel much like nVIDIA has a complete death grip on the MB AIB's" Not really - as the AIBs themself have said. You are mixing up Intel and AMD there. "and the outright lying on TDP" Again mixing up AMD and Intel there.
Fascinating video, its helped validate some of the assumptions I'd made and led me to adjust others. One thing I do think is worth noting is that the way (at least some) motherboard manufacturers tuned the settings in order to get those improved boosts was to *undervolt* via the load lines, reducing the voltage and by extension the thermal throttling. Given that most cpus will have a reasonable factor of safety on the (minimum) voltage, and most workloads won't draw anywhere near the amount of current that would pull down the voltage enough that the full design load line would compensate for, that probably didn't cause many issues in of itself. Its reasonable to assume however, that for the instances where chips with minimal margin were given high current loads, even brand new chips that hadn't had any chance to degrade would see instability that would present similarly to the VMin Shift instability. Not every unstable chip was necessarily damaged. I think Intel thought so too, and their initial 'fix' of the Intel default profile, which included setting the AC and DC load lines to match the regulator by default (in theory, some board makers overcompensated I think) was intended to address that, because it was actually fixable simply by changing the settings. Only when they found the other bugs that were actually damaging silicon and needed microcode updates to prevent did they seem to accept there was a problem from their end too, and they'd have to accept more RMA's and extend the warranties.
about the mining part: I think an aspect worth considering is that when mining hardware is run on limits (be that clocks or undervolted). The goal is not to have 0 errors but rather have a balance of error rate/uptime and hardware efficiency. so errors might be categorized wrong or missed duo to build in crash handling or some default invalid operation expectation duo to cpu settings.When i was cpu mining pushing the memory hard and accepting errors was more profitable that using a stable oc
bought the 14900k in April and already RMA'd last month and returned back to my 12900k.. microcode updates were too late im afraid. i do have an AIO setup also
@@dustee2680 I haven't personally used any 13th or 14th gen CPUs (still running a 5900x), but as someone whose hobby is tech, I've kept up with the issue since the very first complaints trickled in, a few months after the 13th gen launch. And from the many stories and reports I've heard, that was not an uncommon first sign of instability/degradation. I'm not saying this is FOR SURE what your issue is, since it could be a multitude of other things. But if I were you, I'd submit an RMA request if possible, just because it's better to be safe than sorry.
"You really need to talk and listen to your senior staff that have had decades of experience." Mean while, Intel: "We're just laying everyone off. It'll be fine."
intel either needs to sell off foundry and stop all the new manufacturing plants in US, or cut projects and layoff. There's no magically way to generate money to keep Intel going for another 2-3 years.....IMO it's just american ego that can't let Intel go. I don't think it's a bad idea if all chips from apple, qualcomm, broadcom, amd, nvidia, amazon, google and the government comes from Taiwan alone. Without Intel does China had stronger threat on taiwan? Yet, but how likelily is it really that china gonna just blockade or invade taiwan. There's no way China will blockage taiwan and cause a chip shortage. I think it's just fear
@@ThylineTheGay There was literally a tech conference about AI a while back where an Intel exec said that in his ideal world, AI would do literally everything at the company, and everything would be completely automated and autonomous... With ONE SINGLE human employee, just there to oversee the process. (I'm obviously paraphrasing, and can no longer remember the name of the person who said it, so I might be a little off, but that was the gist.) That sounded so incredibly dystopian to me that I was AMAZED that that person had obviously been promoted to a relatively high position within Intel, but still believed that was the ideal future. Not to mention, he thought that proposing his idea in front of a bunch of reporters, while emphasizing how great it was, would go over well. Then again, it actually did fly under the radar almost entirely, because almost everyone at that conference was there to hype up the possibilities of AI, not to critically examine them, so... What do I know. 😂
While I agree a fixed voltage doesn't really help in solving the problem, doing so back in November is what protected by CPU before all of this information came out.
I think "v_min shift" has a slightly different meaning than your interpretation. Rather than referring to the minimum voltage the CPU requests from the VRM, I think it's just a technical euphemism for "degradation" that describes the exact literal symptom: the minimum voltage for correct operation shifts (upward) until it is greater than the firmware V/f curve, at which point the CPU miscomputes.
Intel : "we finally ((fixed)) our CPU problem let's hope nothing happened before the 15th gen release" Wandell : "nobody expects the Spanish inquisition"
Actually *everyone* did expect the Spanish Inquisition. They'd notify you beforehand that you're going to be on trial. It's also not like Wendell didn't tell them beforehand this was going to happen, he publicly asked for input.
Perhaps it's telling that Intel has now started making chips at TSMC; their aging infrastructure apparently doesn't keep up. Their new Lunar Lake ultra-efficient ARM-killing mobile CPU is made entirely at TSMC on their 3nm node.
All this could have been avoided if Pat Gelsinger or the person responsible with communication between Company and public would have done his job perfectly or just not been given the job at all. Communicating through reddit posts and misleading people is nothing short of 'shady'.
Have greatly appreciated your rational approach from day one and this update is fantastic. I would really love to know the differences between the CPUs that are fine and those that aren't. I know it's likely impossible without huge datasets full of power data and VID requests if even then. It makes sense that worse bins that request more voltage would go first, but how many of the CPUs that are fine are also poor bins. No matter what I can understand how this got through and why it's been so hard to fix, but unfortunately nothing can excuse the response. This almost certainly comes down to management handling the situation from a short term financial stance until they were forced into action.
@bigmack70 For sure, and all the armchair experts in the comments that are still, constantly posting their half witted "solutions" involving BIOS settings alone.
@drewnewby both 13900k cpus that I had passed around 6 months in exact same way. Undervolted 1.29v and 1.28v on each cpu. Watt limited and frequency limited to 5.5/4.3. FrameChasers deletes my comments that say this. He just claims over and over that his community doesn't have this issue. Of course, when you ban people in paid Discord for speaking out. All that was before I knew he existed and Intel's scandal even formulated. Undervolted on launch
@@righteousone8454 I think I might stop watching Jufes all together. Yesterday he made a video about the 9950x and he was talking in the comments so I asked him what his 9950x scores on CBR23 just to see how much better/ worse it is compared to an overclocked 14900k in terms of performance and power efficiency and bro just replied "I don't care" Then why you bought the 9950x in the first place smh. Because the problem is that sure I can find some CBR32 scores online but those are done with basically zero tweaks, subpar cooling and bloated OS...
Imagine if motherboard manufacturers can find a way to auto undervolt any CPU installed by default to make them more efficient, cooler, and long lasting.
You can. People don't generally buy for efficiency, they buy for performance. Bigger number sells bigger volume. Thus: Number go up. People will buy a CPU with a faster clock speed, despite it running less instructions per clock by a significant margin because... bigger number, and math hard. And so, motherboard manufacturers are incentivized to... number go up: More voltage, more clocks, as long as it works like that for 4-5 years without issue they are in the clear. So why market for efficiency?
Some motherboards already do this in a ways. Some MSI and Gigabyte(not sure of others) boards have loadline settings that undervolt the cpu under load, for my i7 14700k on a MSI board the cpu was getting around a 50-60mv undervolt from those settings under load. The undervolt can actually be too much of an undervolt, for some cpus, almost too much for mine, with the setting the cpu has around a 10mv safety margin to remain stable under full load.
In several major tech companies I have insight into the gray beards are a part of every project while being project agnostic. The IPTs have to go before a board of them and justify their design at major milestones. Sitting in on them is always fun... so long as you're not the one defending your design.
am i able to definitively find somewhere if my system is affected? iv tried intel, forums, even bloody chatgpt :P i bought my system in sept 2024 i9 14900kf rog strix z790h. iv updated to the latest bios but surely if this has been going on since gen 13... at some stage the gen 14s were and are now coming out with no issues? no voltage or oxidation issues
All of this makes me wish for the old way to make processors. Back in the old days processors came with set multipliers, frequencies and voltages. The motherboard just had to set these right. usually using jumpers, and the processor would work on that setting no matter what it was doing. None of these frequency changes and demanding more power because temperature is low enough it can boost more. Sure there were downsides to this. For instance power savings weren't as granular or efficient but there were less risk of the processor suddenly getting more voltage than what's healthy. It also left more of a unused capacity as all processors of a model had to run at exactly the right speed so most of them had a decent margin when run at the specified speed. Something used for overclocking. Anyway, something like this wouldn't really have been possible back then. Processors had a given frequency they were meant to run at and a given voltage. Motherboard makers wither followed those specs or they might get into trouble. ASUS was a bit on the bad side here as they tended to raise the clock a few hertz. Not enough to really cause problems but enough to be seen in benchmarks where ASUS motherboards tended to score a few points higher than those that really followed the specifications. Anyway they didn't kill processors even then.
5:30 "locking the multiplier to a maximum of x53" okay I've been called out LOL. cuz that's what "fixed" my Intel CPU instability issues for me. going to install the latest microcode tomorrow though. MSI just put it out for my board. thanks for the awesome video Wendell :) you da boss
Thank you Wendell! I appreciate the update. It's welcome news that issues are being addressed and they are finally getting the horses back in front of the cart.
10:44 Why does requesting boost power but not being able to actually boost degrading the cpu? Wouldn't that situation be just fine so long as the voltage are kept to a reasonable level? Or do you mean requesting boost power but not actually boosting causes the voltage to spike due to lack of vdroop? You mention measuring the vcore with an oscilloscope didn't exceed 1.55v, so i don't think this is what you meant?
"boost itself to ruination" ... because it was designed to do so. That's what the sales-pitch benchmarks require. The gaming rigs are going to avoid these peaks, sometimes because of just abysmal cooling, as you say. But also because of tdp-limits set at a sane level. But it's not an accident that you could make these Intel-processors draw as much as 900W effect on pl2 with some unhealthy tweaking. The processors allowed that, no problem, and this was part of the sales-pitch. So don't treat this as an unhappy accident when the max temperature on the surface, or the diff on the gpu and cpu reached a certain level over an OEM-set limit. The benchmark targets that these systems were sold on were dependent on the pl2 boosts going incredibly high.
@@laden6675 It absolutely is Intel's fault. They had a production target with their flux-less assembly step ("Foveros") that would compensate for the higher nm-process issues -- by having incredibly higher internal tdp-limits and potential for overclocking. Then it turns out that the already known issues with that process (already reported on, as omitted by Intel's "greybeards" lol years in advance) is sabotaging these higher clock-targets and sustained high boost-targets. This is what the various server-customers who got sour over, because Intel sold them on the processors having these specific capabilities. And then they have to adjust this down, so the boost-algorithm doesn't "boost itself to ruination". Putting the processor kit at a much more modest sustained performance target level, far below what was initially advertised. That's what is going on here. Some people's sources in the "industry"(read: Intel) would of course like to put the blame on OEMs allowing overclocking, or even letting people set all cores to a static clock (which many do - completely oblivious to that the desktop processors do not have, even in optimal conditions, a tdp-budget big enough to keep all cores at max boost forever). And while that is unreasonable to do on a system with a limited tdp, it's still the case that Intel advertised this heavily as being completely possible, given good enough cooling. Which it probably would be.... without the known issues that Intel's flux-less component mounting has. So the whole "oh, if the external sensor trips didn't go off, then the processor would destroy itself" - nah. Not the case. They put in settings as the processors were expected to handle in ideal conditions - stall about the issues for years - and then just change the script as it suits them when there's no hiding the problem. And even fairly intelligent people who are actually interested - like here with lvl1 tech and gn - fall for it when that switch happens at the insider sources. You trust these fuckers way too much.
You have real talent. This video should recieve milions of view lot of followers. Technical knowledge and the skill to convey the finding in very pleasant easy to understand manner is very unique
who is gonna pay for all the hours of troubleshooting, rma, downtime etc. consumer reports doesn't recommend cars that have reliability > 4% wrt cars. imagine this percentage in the cpu realm.
Did not understand a lot of this tbh but what I did was fascinating - thanks for taking the time to do this. And I have to love a man with grey beard mounting an appeal for men with grey beards 🙂 (disclosure: if I grew a beard it would be at least grey and most likely white)
none of this affects me, I am an AMD only shop, will be as long as it makes sense, but it is fascinating and you are doing a great job reporting on this.
lmao love the "title" computer janitors... certainly feels like it sometimes with some of the PCs I work on. "How did you get this much dust/debris in here?!"
Is this also the case for I5-1340p and I7-1360p? I'm reviewing the Khadas mind and wonder if this device it's lifetime is limited or not. It runs a lot lower than the desktop CPU's. 2.4Ghz all cores for performance cores and 1.9Ghz for efficiency all cores. Can boost higher with tasks that don't use all cores at once. Thank you Wendell for sharing your knowledge.
I'm also curious about this, as almost every outlet said that this can affect anything 65W+. Well, 13/14400 and 13/14500's are also 65W, and while the issue's usually attributed to high power draw and XMP settings, I never see clear cut answers on those chips.
So me having a 14700K with a low-profile cooler WAS sensible after all? ;) Fortunate given I can't install the latest BIOS as MSI claim you have to update the ME first, which I can't, because its a Linux server and they only have the ME update for Windows.
Such a fascinating subject. I have a 13900K system .. that I just finished building on custom loop with 480mm radiator. I probably should have waited, but when Intel announced they had a solution I decided to go ahead and complete my build. It will be interesting to see how reliable this system turns out, having used only the Intel performance profile and the latest microcode. Fingers crossed.
Are the 13900H mobile chips impacted by this? We have several MS-01's planned for a "mini" Proxmox cluster and I'm wondering if I should be concerned. Haven't seen a recent BIOS update from Minisforum.
The behaviour is the same. I think what will either hold it back or delay degradation is the cooling. More cooling equals faster degradation as it boosts itself to death with more headroom. Oh, and constant load level is better than burst workloads or ramping up and down.
New to your channel, found you through Steve @ GN. Thank you for communicating at a level of understanding above the average id10t out there, it is really great.
I usually do NOT care about server stuff. And yet sometimes, you manage to make it interesting. That's impressive!
Месяц назад+50
Wendell, with regards to your comment regarding board vendors not sharing some of the responsibility here. I believe I'd seen that the VID spec on Intel EDS was originally 1.52V+200mV, which means before this 1.55V limitation, Intel believed the processor was safe to be running within 1.72V.
the ambiguity in the spec document could mean the socket has the capability for a future CPU. or it could be these. only Intel knows what it meant in that aspect
"Intel believed the processor was safe to be running within 1.72V." No - the absolute maximum specification had been 1.7V - and that is not a "safe for 24/7 operation" voltage but the absolute maximum above which all bets are off. that is the voltage where it is specified that if at any point you reach it the CPU might as well already explode.
Месяц назад+12
@@ABaumstumpf I don't think "all bets are off" is something within Intel's EDS. The specification is a maximum voltage. However, most CPUs VID range would fall well within this. Point is, the spec is the spec. It's not there for sh*ts and giggles as they say lol.
The "spec is the spec" is somewhat meaningless if Intel has a history of playing fast and loose to allow motherboard manufacturers to drive up performance. Intel could have been very, VERY specific about all of this but then they would have been less competitive against AMD. Motherboard manufacturers bear some responsibility as they COULD have asked for clarification. I'm sure they did at some point, but as long as failures weren't too high they likely went along with it (plus they were competitive agaisnt other board manufacturers)... until there were major problem and blame had to be thrown around. At the end of the day this is Intel's problem to fix through updates and honoring Warranties.
"is something within Intel's EDS. " No - absolute maximum rating are exactly that. They are the limits that must never be breached. If you look at the specs that silicon manufacturers provide for their chips - those that the user is supposed to wire up - yeah go figure. TI for example says this: "The Absolute Maximum Ratings section (specifies the stress levels that, if exceeded, may cause permanent damage to the device. However, these are stress ratings only, and functional operation of the device at these or any other conditions beyond those indicated under Recommended Operating Conditions is not implied." Nothing beyond RECOMMENDED is even guaranteed. The maxim ratings (and you will find the same with all other manufacturers) just literally mean what they say: the maximum.
Trust the gray beards, no matter the industry. There will usually be on who works and knows their trade, their craft and their industry. Aspire to become the Gray Beard.
Hearing you talk about Halt as a "way to use less power" reminds me of the truly old days, back when CPUs actually halting execution in a meaningful way wasn't even a thing and early CPUs would literally execute an idle loop to keep themselves occupied when they weren't needed. True CPU halt actually WAS a power saving method! This resulted in the notorious "Halt and Catch Fire" situation, when an old IBM system with core memory would "halt" by repeatedly reading a particular memory cell... which would overheat that address in the core lattice, as it wasn't meant to be kept "on" continuously like that.
Halt is still relevant for many processors. That's a very standard thing to do in embedded systems. But the special thing with a modern x86 chips is many cores. And hyperthreading. And translating x86 instruction into some internal RISC instructions - complete with instruction reordering. And speculative execution. All targeting a pool of ALU, FPU, ... So there is no longer a single program counter owning all CPU hardware.
@@FerralVideo "Wrong" seldom moves a debate forward. Especially since few problems have an absolute yes/no answer. Thanks to you for not going the standard "Reddit" route and end up angry, under the assumption that any response means being challenged to a fight - even responses adding some additional context.
Yes, its not catching the cpu problems in stress test. Most of the time. This is what baffled us a year ago with 13900K and we thought the problem could be elsewhere, like OS, memory, motherboard or even SSD. The cpu handled 350watts in unlocked stress test, but it would fail to install nvidia drivers 😂 The only was to know this back then was a processor swap.
The RMA experience with intel has been painful to say the least. It took 2.5 weeks to get to the point I would be able to get a shipping label, complete with repeated dialogue because the support representative did not read initial details outlined. It was like having a carrot dragged in front of you to get you to proceed, with hopes you would give up and stop the RMA process.
I don't think it's going to be as bad as you think. The latest _Arrow Lake_ CPU's have fast enough processing cores that they can eliminate the troublesome hyperthreading process that plagued reccent Intel CPU's. And the new CPU's will be using TSMC's 3 nm process, too.
Just like the oxidation issue, the "Vmin Shift Instability" is a separate defect from the main design flaw in Raptor Lake. They keep having Thomas blog out the latest distractions. As with the earlier warranty extension list, all Raptor Lake SKUs are affected by the design flaw, not just "K" SKUs. Intel is still denying HX SKUs are affected based on earlier miscommunications. They pushed Alder Lake architecture too far.
Its 0x12B now since 2024/09/27 I have have no issues what so ever on my i9 13900K, BUT I waited for the BIOS updates before I did anything heavy on my PC.
So far I am sticking with 0x129 with a voltage limit of 1450mv in the BIOS. My 13700K never goes above 1.25v at light load and drops to 1.16v at load anyway. The reason I don't want to go to 0x12B is the forcing of C1E C-States with no way to disable them. When playing certain games with C1E forced on, I'm seeing some cores drop down to 800mhz and I get microstutters. With C1E disabled on 0x129 my cores will still drop to 1200mhz when idle, but not when I'm actively doing things like playing games. Forcing C1E also causes horrible latency doing anything with audio.
I bought a 13900k in Nov 2022. I ran it at 5.8p/4.5e/5.0ring since day 1 on a 2x360mm custom loop. I did video editing, played Tekken 8 every day, and I've played just about every UE5 game released so far. I never had a single issue. I did stay on an older bios because I figured if it's not broken, then it doesn't need fixing. I started hearing about these instability issues soon after the Tekken 8 demo released on Steam. I figured it was just inexperienced people with bad cooling. Regardless, as time went on I never had a single issue but decided to reach out to Intel for a refund because I didn't trust the CPU long term. Intel immediately approved my request, never asked for proof of instability, sent me a check for $599, and it was all super easy. Now Intel will be paying for my ArrowLake upgrade. I wish this happened every generation!!!
Yeah I havent had a single issue on my 13700k which sees some good use. Intel customer service is bizarrely amazing in an industry where most will screw you over (looking at you, ASUS). This problem doesnt seem as apocalyptic as people make it out to be. And Im not a fanboy, I couldnt care less about what logo is under the CPU cooler
@@mattm3023 Only problem is warranty extension does nothing for people buying these second hand. If someone upgrades to Arrow Lake and dumps their 13900K the person buying it is SOL if it has any issues. (Still way better than Asus screwing over their original purchasers)
For me it was annoying getting my waterblock off, but hey, Intel customer service got me a new CPU in a few days from the point where I opened a ticket and I even got a 13900k -> 14900k upgrade for my trouble.
Intel listed the 13600k as one of the problematic cpus. Do you have any data regarding the i5s? Or are they mostly unaffected since they don't hit anywhere near 1.5v?
When I heard about the degradation issues I started running HWINFO in the background and my 13600K peaked 1.32 v vcore even on older microcode , I've had it since Oct 2022 (launch) and haven't had any issues. (MSI Pro Z690-A DDR4) currently running latest bios & 0x129 w/ intel defaults profile to be safe.
Was about to say that "can't" isn't in the Scottish dictionary, but alas at least the actor in the original series does say "can't" instead of "cannae". Probably because he's Canadian.
I have no faith in any management. I have worked at too many companies over the last 30yrs and good management is just a fleeting phase of most companies.
It would be wise and prudent for the greybeards to simply club together and set up a consultancy firm. Then when what they warned about comes to pass. They can then charge the requisite consultancy fees way above what they would get in salary. And to rub salt in the wound, they should name such a firm as "we told you so asshat"
SCAN Pro Audio already exist here in the UK if you're trying to build a system for the rigorous demands of real-time audio playback/recording/production.
BUT, the biggest question is.... Where can I get that wallpaper. Also, I understand your comparison to the lights dimming but that is actually not from a lack of power, that is from an abundance of reactive power coming from the vacuum cleaner power factor correction at startup would solve that, not more power:) #thatguy Awesome investigating I loved feeling like a great gray beard whisperer. I'm a 40 year old half gray in the making. (insane manic diving into these hardest issues for decades, respecting and learning from gray beards along the way) The reason there aren't more people like us is because they get fed up of not getting listen to and leave. Over and over and over again
I'm trying to understand the microcode fix. When you described the voltage bumping to 1.3 from .7 fixed the issue was that just the Linux performance mode as a quick fix or that is what the microcode fix does? I'm concerned for people running this as a 24 x 7 server idle power consumption might be notably higher. Can you clarify if power consumption should be poor with the recent microcode fix under load or idle conditions?
16:00 Intel's boost algo looks at the number of cores active and if they are P or E core, then the voltage they request, then sets the speed limit. Since the E cores are active in this scenario, you are at 1.35V for the E cores (the P cores dont need it, but core voltage is shared sadly). The algo then looks at 1.35V, all cores active, and caps the boost speed at 5.4 for P cores, and 4.2 ish for the E cores. The design is such, because in this state it can run full loads - instead it gets halt instructions. If you were to feed it more power under load, the chip would run hotter. And a hotter chip isnt as stable. So rather the boost algo is designed to assume it is unstable above 5.4 with all cores active and that set voltage. So it does not go there.
Responding based on timeline. Its not the all core load that's causing the problem. And i can agree that it's related to gaming. As a small SI, this is what we feel too. Another system is coming in for an RMA next week. We have asked the customer a ton of questions and compiling the information for more details.
My 14700K (possibly OC's by my board, but not by me) became so unstable that the BIOS would freeze. Now running my 13700(no letter), and it seems to be doing okay, but... I want the little performance boost I paid for, damnit.
Beeing an electronics engineer, the idea of core overvoltage due to lack of workload with poorly optimized code seems very strange to me. Let's try common sense. I understand poorly optimzed code as a lot of excess instructions where things could be handled more efficiently. But, there shall not be a difference in executing code of any quality. If "poor" code results in more cores activating and/or higher core clocks the CPU's current draw goes up respectively, which is normal. Higher current means higher voltage drop at the supply line from the VRM to the silicon die and even on-die. To compensate for this, a higher core voltage is governed. Of course, for stability, the voltage has to be ramped up BEFORE clock frequency increase and/or core activation and vice versa. This may take some 10 microseconds where the CPU briefly runs at a core voltage actually to high (due to lack of workload/current draw). This is not a glitch. It's by design. So, there cannot be an argument of "overvoltage" due to cache induced idle states either. Who does decide core voltage anyway? BIOS? OS? chipset? CPU microcode? There are buttons and levers everywhere, good enough only to obfuscate responsibilities. Actually the whole idea of predictive multi parameter dependent core voltage control seems to be a one-of-a-kind nightmare to me. Just think about a core supply current in excess of 300Amps over PCB, socket and silicon, producing more than 400Watts of on-chip thermal power at roughly 1Volt. That's wild. You can do this in a closely controlled environment, but in a consumer scenario? In my opinion, Intel has leaned out of the windows a bit to much with that.
Intel really needs to do a recall on these chips. It is going to cost them an enormous amount of money but I see no other choice. A class action lawsuit will cost them more because the lawyers get paid most of it.
Huh? No one's even talking about a full recall anymore. Turns out it's not as big of an issue as Tech influencers and 20 year olds are making it seem on RUclips. It's manageable. No one's even talking about a class action lawsuit anymore either. Because it's being handled already. Lmao.
@@DingleBerryschnapps I worked at Intel for 20 years in senior leadership positions. You will see this turn into a class action lawsuit. A recall does not mean every CPU gets returned. A recall can mean they provide a utility to all customers to test their CPU and if it fails they return it.
@@DingleBerryschnapps I have to agree with this one, zero issues with my 14900k since launch and I've given it a serious beating every time I can. CBR23 score = 43.5k at 440W under load. All limits disabled. I've even let this chip running prime over night at 1.6v on several occasions and it still works flawlessly. I would try higher voltages but unfortunately looks like they have blocked it through microcode. All other chips other than the 14900k allow up to 1.7v. The 14900k is locket to 1.6 at a firmware level. Even though you can post at 2.170v its strictly locket to things like navigating the bios. When you try booting at hihger voltages asus boards give you an overvoltage screen thingy and asrock boards for example simply revert it to default.
Thanks for your vids, i learn alot !. When i run XTU "AFTER" Mcu 129 and biosupdate on MSI edge i saw constant current/limit throtlling on the cpu/software on 14900kf with pro watercooling !.
I see, the algorithm relies that the CPU will consume much current so the elevated voltage will get lost in the resistance of the CPU power lines. But if for some reason the cores are starved for memory access, it won't and the voltage will rise.
5:05 oh that immediately brings to mind, maybe some kind of weakness to thermal stress (physical), which compounds with the other, regular 13/14th gen issues
if the problem is cooling related gaming laptops should be largely unaffected, as they cant boost high anyway. Outside of maybe the 2024 neo 16 the faliure rate should be the dame as every cpu before
If the workload is single core, that one core can eat the entire power budget and still boost a lot. A 13900hx is 55W, but over 100W peak. If a single core can eat 50W by itself, then you can run full boost on one core no problem.
I'm guessing the microcode for mobile parts was already different enough to not have that thermal algorithm flaw and was also thermally limited so less likely to be able to "boost itself into ruination" 😅
Do you have any data on the TE series? Medical company that I'm working at (so me) is evaluating an upgrade for our embedded systems, is there any good way to 'burn-in' and test for these. Just like the others, intel is saying that TE is not effected, but... let be honest. The package power for these only goes up to ~65W though, and the motherboards havent had a BIOS update available for a long time either.
nice to see someone who is still digging, while others saw 0x129 release and said "its fixed we can move on"
"its fixed we can move on" = Corporate for there's more dead bodies, but move on so we can claim we have no liability.
Of course he's still digging, he's AMD's number one shill. Bergamo is his middle name.
@@spewp wasnt able to find any coverage of burned x3d chips by this channel
The way the whole tech news industry went into radio silence, collectively, after Wendell and GN broke the story, was genuinely scary. Louis Rossmann's video was the symbolic bat-signal. After that one, it was absolute silence. Props to Wendell for helping keep up awareness, I wish other outlets like GN and HUB would do the same.
It seems that most RUclipsrs are no longer investigating the Intel CPU issue.
The fact that really good cooling could have had a part in causing an issue really blows my mind, imagine going all out spending time and money and your friend with a stock cooler having less issues then you
The irony of talking about the Intel Greybeards is those are the very people Intel is PAYING to walk out the door. Huge braindrain over there from what I'm hearing.
They created famous CPUs we know and love. Meltdown, Spectre, Foreshadow, Zombieload, SwapGS, Plundervolt, IME, RIDL, Fallout, Lazy FPU restore, Downfall, and now this. I say - keep them men working and create more beautiful CPUs!
@@moamber1AMD also has it's fair share of issues. Sinkclose, zenbleed, inception. Not to mention that spectre affects both Intel and AMD CPUs. The point is designing CPUs is hard. Not particularly related to the talents working at either company
@@moamber1 Bad actors will always exist, and those include the Government agencies responsible for some of the vulnerabilities, What seems like a mistake, sometimes was totally intentional, but what they didn't intend on was people finding it and also abusing it.
Well duh Intel likes it's nice H1B tax write off program. They're not going to axe their Indian project manager tax portfolio.
they just got 8 billion from congress, well 3 billion upfront. and they are laying people off....
A good number of "gray beards" at intel in Hillsboro Oregon have jumped ship. There seems to be a bit of panic around here.
the retirement exit package was nice. those folks were exactly the people needed for this kind of problem solving tho
Intel wants graybeards gone... they're too expensive to keep around.
Take the big bucks as long as they are available and move on to do something you would actually enjoy... sounds reasonable.
@@estyrer2 Sounds like what Hector Ruiz did at AMD when he took over from Jerry Sanders.
you do realize they are laying off 15% of the workforce right now?
My problem with this whole thing is that with the issues happening at all, and the denying of the issues for so long as well as misleading information about them, I don't feel comfortable using anything from them for the servers, desktops or laptops that I manage, at least nothing from 13th gen or later. Sure give them a generation or two of stability and they might earn that trust back, but I simply can't take that risk when the business operations are on the line.
Especially the denying and finger pointing was annoying. I'm very happy we're still on 12th gen 😅
It takes many years to build up a reputation that can be destroyed in a matter of weeks.
Something they don't teach the MBA's on their idiot course.
We should be building cpus and gpus with open source designs, open to all and run by the internet community. For profit corporation model is doomed moving forward
@@xlr555usa that won't happen, even most RISC V CPUs are not completely open source. The companies are very protective of their IP and rightly so, there are a lot of patents around CPUs that are significant advantages for anyone able to use them. Not that I would be opposed to the idea, just being realistic about it. Also the design is only one aspect and in this case I think the manufacturing is a significant part of the issues which wouldn't be revealed just from the design.
@@jeffreyparker9396 There was a time when CPU design could have been done using open source, but that's a long time ago. Today a modern CPU has billions of transistors and even the real design geniuses can't really claim to understand the entire thing. Even back in the 486 days we were talking millions of processors if I remember correctly. But the big problem then and especially now is manufacturing the processors. This is not cheap, and adjusting the design for a particular process isn't as easy as scaling the mask. No it's big money invested in the design and manufacturing even before the first chips are produced.
Up the Greybeards! You really provide an incredibly valuable resource for the PC enthusiast community and every other PC user, as well! My 14900K suffered weird errors after about 2 months while running with TVB which cleared up once I disabled it. It has been running fine since then. Thank you for all of your work!
22% of CPU's been pulled is at least not half... That's not very comforting. Least I don't have any of the Intel CPU's but for those who do WTF!!!
We have 13900k for remote access and workload is compiling and synthetisation of FPGA designs. Now after 2 years of constant running we've seen failures where we get random crashes and freeze ups
I'd think you'd want ECC for that kind of work. Is this on W680?
Same on a laptop with 13th
It's the best 💩, without solutions from intel.
Does this happen on new processors too?
@@BBWahoo Two years IS pretty new, in CPU terms. Traditionally, CPUs have far greater lifespans than any other system component, and have the lowest failure rates.
That said, the information we currently have suggests that the problem does not lie in the physical CPUs themselves, but in the microcode that dictates their behavior. So if you purchased a 13th or 14th gen CPU today, but did not update to the newest BIOS, or your motherboard vendor has not yet released a BIOS for your motherboard that contains the newest microcode update... It's theoretically possible that your CPU could begin to degrade before you get that microcode.
Generally, by the time the instability becomes noticeable by a normal user, the CPU has probably already sustained some amount of permanent damage. That doesn't necessarily mean that the degradation can't be stopped once the necessary microcode is applied, or that the CPU will fail within a 2-year period. (We just don't have the data to know for sure yet.)
TL;DR: Yes, the issue can affect any 13th or 14th gen CPU, even new ones. If I were in the market for a new CPU, I'd look at AMD or wait for 15th gen. But if you already own a 13th or 14th gen CPU, there's no need to panic, because there are things you can do that will (hopefully) prevent any future instability issues. (Update your BIOS as soon as there's a version available with the newest microcode. In the meantime, you can adjust the boosting behavior/voltages in your BIOS to mitigate the risks.)
Why wasn't 12th gen affected by instability? Alder Lake is mostly the same as Raptor Lake, the only difference is it runs a little slower and has less L2 cache and E-cores on die.
Well two of my friends who worked in intel(MACD) since uni (2004) just left last month "voluntarily" Do they count as gray beards?
tl;dr: CPU voltage loading algorithm is inaccurate when in memory-bound execution circumstances resulting in long-term harm
@Level1Techs thanks for not letting this bugfix/intel fab issue just die a quiet death, homie. much to be learned, science to be done + lots of parallels from the past complicated by PWM i think.
( @7:16 the PID algorithm that directs the Vpwm for the CPU decides using x86 instructions in-the-pipeline WITHOUT considering the "effort" that differs between them circumstantially - is what i heard?)
compound this with, I think, the chip monitoring its own temperature and changing behavior, but the temperature estimation can be wildly, wildly off. and how that affects intended boost.
It's also something very difficult to test for and also counter intuitive, running tests on what your design is doing is extremely costly in time and resources, so most tests are basically going to be some heavy loads that really kick hard at the cpu but not something with a bunch of cache misses, waiting on memory or interrupts. And even if you try to test for those, it's really hard to reproduce a load similar to a game server
I wish private citizens could get away with lying as much as corporations do on a regular basis. They acted like nothing was wrong forever.
soooo, $1000 EK direct die block allowed Intel cpus to burn even faster? oh irony
Yes, collier cpu, passing more current. Current does the damage. Pushing more than 120A in that silicon crossection is a no-no. Keep it safe at 80A
Those who have direct die mostly dont run stock so cpus wont see default 1.5v-1.6v voltages
That's common sense if you are overclocking it.
A dragster or Formula 1 car engine has to be rebuilt after every single race, because they are pushing them to the limits.
That said - I've got 13+ year old Intel systems that've been overclocked all day everyday and are still running, the motherboard capacitors are the weak link there.
@@igelbofh ???? sorry if I'm wrong, but if the silicon is cooler, wouldn't you need less power, all else being the same?
edit: i also think calling it a problem with direct die cooling is disingenous. They're perfectly fine on Ryzen. It's purely an Intel (13/14 gen ) issue.
@@TheHighborn i'm pretty sure everyone here is talking about intel
I'm going on my THIRD RMA with issues on the 14900K. I am NOT an overclocker.. I just want a stable product that runs at advertised speeds out of the box for what I paid for and WORK.
Do a x3d build after it launches?
No, no you dont want that. You SAY you do but you dont, evidently.
@@N4CR My 1st build was a 486DX2 100Mhz.. I'm seriously considering an X3D build after this debacle.
I hd to take a loss and go 7950x3d. Only 1 minor issue with memory timing before a uefi update but now just works. Go AMD. You will find it’s easier to setup. Just put a negative pdo overdrive to undervolt and it will auto overclock. Simple.
I was once involved with a product where the system would sometimes leave the output on HIGH, the reason for this was because some of the drivers were better than others and the target had been reached faster than expected - but the code had no means to recognize this and simply waited for the output to come down - which it did not because the feedback loop had no mechanism to handle excessive output. The hard part was proving to the engineers that the reports were correct - only for them to reply "we saw that in development but figured it was a on-off anomaly and ignored it".
If it can happen - it will happen
"By the power of Castle Grey Beard, I am SysAdmin!!!!!".... Wendell in a parallel universe.
I had complained about this since coffe-lake: The motherboards applying insanely bad settings out of the box: disabling most if not all protection mechanisms, increasing the power-limit to infinity, and applying insanely stupid CPU-frying automatic overclocks with very high over-voltage that resulted in nearly no extra performance but a lot of heat - And Intel did nothing. That is what they are to blame for.
My 8700K was gulping down in the neighbourhood of 150W when i first powered it up. No settings changed, straight out of the box onto the motherboard and into cinebench. Then i just enabled all safety-features again and set the rest of the settings according to intels specs. Lo and behold - the CPU was a tiny bit slower in benchmarks (well of course - i did limit it to the official 4.3GHz allcore instead of the 4.7GHz overclock) but it stayed below 110W. Some further small tweaking and now, even under full turbo in CB23, it stays below its TDP - which means it can now run 24/7 at maximum turbo while staying cooler and using less power.
Sadly i did not catch that an UEFI-update changed some secondary voltages and fried the iGPU with a nice blistering 1.5V.
I rather have a cool running system that is rock stable but a bit slower. Honestly don't need 5+GHz.
We live in an age when undervolting is necessary and both manufacturers are guilty of it. My 5950X at stock settings was underperforming by about 5% against official benchmarks, but after some efficiency tweaks (minus 200MHz boost, minus 30 all core curve optimizer, minus 0.1V core voltage offset) it actually performs 5% HIGHER than expected. It runs 15 degrees cooler and Vcore went from 1.5V down to 1.35V which is much better for longevity. I find that even budget processors like the 5600 need to be detuned a bit to keep the fans running at quiet speeds.
I run 5 GHz on my 13700K with wattage limited to 130PL1 and 145 for PL2. I can play games 1080p and 1440p just fine, with most recent patch...before this new one that just came out@@tourist6290
what wendell completely omites is that Intel much like nVIDIA has a complete death grip on the MB AIB's so, any "default bios mode", being performance or not, are directly approved by Intel, anything done that Intel doesnt like gets immediately axed (multiple instances were an AIB gave an option that was exclusive to K series CPU or Z series MB to lower tiers), the time frame of the craze microseconds boost clocks just for show on the BOX, and the outright lying on TDP aligns perfectly with the Zen launch, im sorry Wendell but this time i truly think you are either wrong or paid out by Intel to directly meatshield them or market them as the victim, which they are not, you said it yourself, more than a year and 2 gens back to back and they constantly refused to tell the truth until everyone already knew it from the mountains of dead CPU's on public display.
And another thing for the AIB's, Intel did try to blame them in May-June-July, trying to make it look it was the same "shenanigans" that they did with AMD's Z4 X3D, but they immediately took back their accusations, why? because all AIB's refused to take the blame, and all of them had easy ways of exposing the truth and the "REAL" Intel imposed "DEFAULTS" since coffee lake.
@@eX_Arkangel "that Intel much like nVIDIA has a complete death grip on the MB AIB's"
Not really - as the AIBs themself have said. You are mixing up Intel and AMD there.
"and the outright lying on TDP"
Again mixing up AMD and Intel there.
It's funny that you called out minecraft. It was exactly the only application that I was having issues with.
Minecraft can do some damage. It killed one of our Intel iMacs years ago.
Fascinating video, its helped validate some of the assumptions I'd made and led me to adjust others.
One thing I do think is worth noting is that the way (at least some) motherboard manufacturers tuned the settings in order to get those improved boosts was to *undervolt* via the load lines, reducing the voltage and by extension the thermal throttling.
Given that most cpus will have a reasonable factor of safety on the (minimum) voltage, and most workloads won't draw anywhere near the amount of current that would pull down the voltage enough that the full design load line would compensate for, that probably didn't cause many issues in of itself. Its reasonable to assume however, that for the instances where chips with minimal margin were given high current loads, even brand new chips that hadn't had any chance to degrade would see instability that would present similarly to the VMin Shift instability.
Not every unstable chip was necessarily damaged.
I think Intel thought so too, and their initial 'fix' of the Intel default profile, which included setting the AC and DC load lines to match the regulator by default (in theory, some board makers overcompensated I think) was intended to address that, because it was actually fixable simply by changing the settings. Only when they found the other bugs that were actually damaging silicon and needed microcode updates to prevent did they seem to accept there was a problem from their end too, and they'd have to accept more RMA's and extend the warranties.
about the mining part: I think an aspect worth considering is that when mining hardware is run on limits (be that clocks or undervolted). The goal is not to have 0 errors but rather have a balance of error rate/uptime and hardware efficiency. so errors might be categorized wrong or missed duo to build in crash handling or some default invalid operation expectation duo to cpu settings.When i was cpu mining pushing the memory hard and accepting errors was more profitable that using a stable oc
Love "computer janitors". But for Windows the image I get is someone cleaning out a septic tank.
bought the 14900k in April and already RMA'd last month and returned back to my 12900k.. microcode updates were too late im afraid. i do have an AIO setup also
Well they never said that the Microcode would magically repair the 14900K issues
eww
I have one too, running linux but im unsure if the occasional once-in-two-weeks freeze is because of Intel. What problems was yours having?
@@dustee2680 I haven't personally used any 13th or 14th gen CPUs (still running a 5900x), but as someone whose hobby is tech, I've kept up with the issue since the very first complaints trickled in, a few months after the 13th gen launch. And from the many stories and reports I've heard, that was not an uncommon first sign of instability/degradation. I'm not saying this is FOR SURE what your issue is, since it could be a multitude of other things. But if I were you, I'd submit an RMA request if possible, just because it's better to be safe than sorry.
@@dustee2680I guarantee you your cpu is toast and needs to be replaced
I'm taking a shot every time he says data, and I'm not doing well.
Spelling is still immaculate
slured speech to text is getting surprisingly good
are you experiencing BAC_min shift?
@@renevandenbosch9967 Mine stays decent but it takes me five minutes to write a 5 word comment
I fee eel
Go odod
Intel released ANOTHER microcode patch 12B
we need moar
intel 4546B microcode when
@@ThylineTheGay
Programmer socks WHEN
"You really need to talk and listen to your senior staff that have had decades of experience."
Mean while, Intel: "We're just laying everyone off. It'll be fine."
"hey i've heard this thing about 'ai', what if we get it to generate CPUs" - some intel exec, probably
intel either needs to sell off foundry and stop all the new manufacturing plants in US, or cut projects and layoff. There's no magically way to generate money to keep Intel going for another 2-3 years.....IMO it's just american ego that can't let Intel go. I don't think it's a bad idea if all chips from apple, qualcomm, broadcom, amd, nvidia, amazon, google and the government comes from Taiwan alone. Without Intel does China had stronger threat on taiwan? Yet, but how likelily is it really that china gonna just blockade or invade taiwan. There's no way China will blockage taiwan and cause a chip shortage. I think it's just fear
Oh... they had a big lay-off?
I am assuming that they lay-off the important departement and retain those DEI staffs
@@ThylineTheGay There was literally a tech conference about AI a while back where an Intel exec said that in his ideal world, AI would do literally everything at the company, and everything would be completely automated and autonomous... With ONE SINGLE human employee, just there to oversee the process. (I'm obviously paraphrasing, and can no longer remember the name of the person who said it, so I might be a little off, but that was the gist.)
That sounded so incredibly dystopian to me that I was AMAZED that that person had obviously been promoted to a relatively high position within Intel, but still believed that was the ideal future. Not to mention, he thought that proposing his idea in front of a bunch of reporters, while emphasizing how great it was, would go over well.
Then again, it actually did fly under the radar almost entirely, because almost everyone at that conference was there to hype up the possibilities of AI, not to critically examine them, so... What do I know. 😂
While I agree a fixed voltage doesn't really help in solving the problem, doing so back in November is what protected by CPU before all of this information came out.
Babalik sa November
I think "v_min shift" has a slightly different meaning than your interpretation. Rather than referring to the minimum voltage the CPU requests from the VRM, I think it's just a technical euphemism for "degradation" that describes the exact literal symptom: the minimum voltage for correct operation shifts (upward) until it is greater than the firmware V/f curve, at which point the CPU miscomputes.
“before you’re pooped….”. GN may have Gamer Jesus. L1 has wholesome gamer dad, and it makes me enjoy Wendells takes even more.
Intel : "we finally ((fixed)) our CPU problem let's hope nothing happened before the 15th gen release"
Wandell : "nobody expects the Spanish inquisition"
Actually *everyone* did expect the Spanish Inquisition. They'd notify you beforehand that you're going to be on trial.
It's also not like Wendell didn't tell them beforehand this was going to happen, he publicly asked for input.
Perhaps it's telling that Intel has now started making chips at TSMC; their aging infrastructure apparently doesn't keep up. Their new Lunar Lake ultra-efficient ARM-killing mobile CPU is made entirely at TSMC on their 3nm node.
@@smalltime0that actually makes the joke better. :)
All this could have been avoided if Pat Gelsinger or the person responsible with communication between Company and public would have done his job perfectly or just not been given the job at all. Communicating through reddit posts and misleading people is nothing short of 'shady'.
“Perfectly” isn’t even needed. Just more than “awfully” is all anyone wants/needs. Ideally more but we’ll take anything other than silence
yeah thats not how these things work
Have greatly appreciated your rational approach from day one and this update is fantastic. I would really love to know the differences between the CPUs that are fine and those that aren't. I know it's likely impossible without huge datasets full of power data and VID requests if even then. It makes sense that worse bins that request more voltage would go first, but how many of the CPUs that are fine are also poor bins. No matter what I can understand how this got through and why it's been so hard to fix, but unfortunately nothing can excuse the response. This almost certainly comes down to management handling the situation from a short term financial stance until they were forced into action.
It is so amusing, the spotlight on that cast iron radiator. He did a video long ago and sort of collects those things.
5:30 that shot across the bow at framechasers 😂😂😂
@bigmack70 For sure, and all the armchair experts in the comments that are still, constantly posting their half witted "solutions" involving BIOS settings alone.
"for all the sad persons who have no desire to learn more" ... perfectly said :-)
@drewnewby both 13900k cpus that I had passed around 6 months in exact same way. Undervolted 1.29v and 1.28v on each cpu. Watt limited and frequency limited to 5.5/4.3.
FrameChasers deletes my comments that say this. He just claims over and over that his community doesn't have this issue. Of course, when you ban people in paid Discord for speaking out.
All that was before I knew he existed and Intel's scandal even formulated. Undervolted on launch
@@righteousone8454 he's not disingenuous about everything, but he is a grifter trying to sell his product no matter what
@@righteousone8454 I think I might stop watching Jufes all together. Yesterday he made a video about the 9950x and he was talking in the comments so I asked him what his 9950x scores on CBR23 just to see how much better/ worse it is compared to an overclocked 14900k in terms of performance and power efficiency and bro just replied "I don't care" Then why you bought the 9950x in the first place smh. Because the problem is that sure I can find some CBR32 scores online but those are done with basically zero tweaks, subpar cooling and bloated OS...
Imagine if motherboard manufacturers can find a way to auto undervolt any CPU installed by default to make them more efficient, cooler, and long lasting.
You can. People don't generally buy for efficiency, they buy for performance. Bigger number sells bigger volume. Thus: Number go up.
People will buy a CPU with a faster clock speed, despite it running less instructions per clock by a significant margin because... bigger number, and math hard. And so, motherboard manufacturers are incentivized to... number go up: More voltage, more clocks, as long as it works like that for 4-5 years without issue they are in the clear.
So why market for efficiency?
@@formes2388Works in the US and wherever else there is cheap energy, but for everywhere else with expensive energy, it doesn’t.
@@formes2388 Nah, it's not the Pentium 4 era anymore. Nobody cares about GHZ they care about FPS.
Some motherboards already do this in a ways. Some MSI and Gigabyte(not sure of others) boards have loadline settings that undervolt the cpu under load, for my i7 14700k on a MSI board the cpu was getting around a 50-60mv undervolt from those settings under load. The undervolt can actually be too much of an undervolt, for some cpus, almost too much for mine, with the setting the cpu has around a 10mv safety margin to remain stable under full load.
In several major tech companies I have insight into the gray beards are a part of every project while being project agnostic. The IPTs have to go before a board of them and justify their design at major milestones. Sitting in on them is always fun... so long as you're not the one defending your design.
am i able to definitively find somewhere if my system is affected? iv tried intel, forums, even bloody chatgpt :P i bought my system in sept 2024 i9 14900kf rog strix z790h. iv updated to the latest bios but surely if this has been going on since gen 13... at some stage the gen 14s were and are now coming out with no issues? no voltage or oxidation issues
All of this makes me wish for the old way to make processors. Back in the old days processors came with set multipliers, frequencies and voltages. The motherboard just had to set these right. usually using jumpers, and the processor would work on that setting no matter what it was doing. None of these frequency changes and demanding more power because temperature is low enough it can boost more. Sure there were downsides to this. For instance power savings weren't as granular or efficient but there were less risk of the processor suddenly getting more voltage than what's healthy. It also left more of a unused capacity as all processors of a model had to run at exactly the right speed so most of them had a decent margin when run at the specified speed. Something used for overclocking.
Anyway, something like this wouldn't really have been possible back then. Processors had a given frequency they were meant to run at and a given voltage. Motherboard makers wither followed those specs or they might get into trouble. ASUS was a bit on the bad side here as they tended to raise the clock a few hertz. Not enough to really cause problems but enough to be seen in benchmarks where ASUS motherboards tended to score a few points higher than those that really followed the specifications. Anyway they didn't kill processors even then.
5:30 "locking the multiplier to a maximum of x53"
okay I've been called out LOL. cuz that's what "fixed" my Intel CPU instability issues for me.
going to install the latest microcode tomorrow though. MSI just put it out for my board.
thanks for the awesome video Wendell :) you da boss
Thank you Wendell! I appreciate the update.
It's welcome news that issues are being addressed and they are finally getting the horses back in front of the cart.
10:44 Why does requesting boost power but not being able to actually boost degrading the cpu? Wouldn't that situation be just fine so long as the voltage are kept to a reasonable level?
Or do you mean requesting boost power but not actually boosting causes the voltage to spike due to lack of vdroop? You mention measuring the vcore with an oscilloscope didn't exceed 1.55v, so i don't think this is what you meant?
Same gripe
The real intel performance is the friends we made along the way.
"boost itself to ruination" ... because it was designed to do so. That's what the sales-pitch benchmarks require. The gaming rigs are going to avoid these peaks, sometimes because of just abysmal cooling, as you say. But also because of tdp-limits set at a sane level.
But it's not an accident that you could make these Intel-processors draw as much as 900W effect on pl2 with some unhealthy tweaking. The processors allowed that, no problem, and this was part of the sales-pitch.
So don't treat this as an unhappy accident when the max temperature on the surface, or the diff on the gpu and cpu reached a certain level over an OEM-set limit. The benchmark targets that these systems were sold on were dependent on the pl2 boosts going incredibly high.
so do you want Intel to ban overclocking? this is not Intel's fault
Sorry for your loss, but the irony makes it funny
@laden6675 holy shit can I have some of that copium, must be premium shit😂
@@laden6675 It absolutely is Intel's fault. They had a production target with their flux-less assembly step ("Foveros") that would compensate for the higher nm-process issues -- by having incredibly higher internal tdp-limits and potential for overclocking.
Then it turns out that the already known issues with that process (already reported on, as omitted by Intel's "greybeards" lol years in advance) is sabotaging these higher clock-targets and sustained high boost-targets. This is what the various server-customers who got sour over, because Intel sold them on the processors having these specific capabilities.
And then they have to adjust this down, so the boost-algorithm doesn't "boost itself to ruination". Putting the processor kit at a much more modest sustained performance target level, far below what was initially advertised. That's what is going on here.
Some people's sources in the "industry"(read: Intel) would of course like to put the blame on OEMs allowing overclocking, or even letting people set all cores to a static clock (which many do - completely oblivious to that the desktop processors do not have, even in optimal conditions, a tdp-budget big enough to keep all cores at max boost forever).
And while that is unreasonable to do on a system with a limited tdp, it's still the case that Intel advertised this heavily as being completely possible, given good enough cooling. Which it probably would be.... without the known issues that Intel's flux-less component mounting has.
So the whole "oh, if the external sensor trips didn't go off, then the processor would destroy itself" - nah. Not the case. They put in settings as the processors were expected to handle in ideal conditions - stall about the issues for years - and then just change the script as it suits them when there's no hiding the problem.
And even fairly intelligent people who are actually interested - like here with lvl1 tech and gn - fall for it when that switch happens at the insider sources.
You trust these fuckers way too much.
You have real talent. This video should recieve milions of view lot of followers. Technical knowledge and the skill to convey the finding in very pleasant easy to understand manner is very unique
who is gonna pay for all the hours of troubleshooting, rma, downtime etc. consumer reports doesn't recommend cars that have reliability > 4% wrt cars. imagine this percentage in the cpu realm.
Did not understand a lot of this tbh but what I did was fascinating - thanks for taking the time to do this. And I have to love a man with grey beard mounting an appeal for men with grey beards 🙂
(disclosure: if I grew a beard it would be at least grey and most likely white)
none of this affects me, I am an AMD only shop, will be as long as it makes sense, but it is fascinating and you are doing a great job reporting on this.
lmao love the "title" computer janitors... certainly feels like it sometimes with some of the PCs I work on. "How did you get this much dust/debris in here?!"
We switched yo the mini PCs fanless... snaps on the back of a monitor & people got their desk space back.
Is this also the case for I5-1340p and I7-1360p? I'm reviewing the Khadas mind and wonder if this device it's lifetime is limited or not. It runs a lot lower than the desktop CPU's. 2.4Ghz all cores for performance cores and 1.9Ghz for efficiency all cores. Can boost higher with tasks that don't use all cores at once. Thank you Wendell for sharing your knowledge.
I'm also curious about this, as almost every outlet said that this can affect anything 65W+. Well, 13/14400 and 13/14500's are also 65W, and while the issue's usually attributed to high power draw and XMP settings, I never see clear cut answers on those chips.
Without You and Steve and Buildzoid this this would be sweped under the rug...
So me having a 14700K with a low-profile cooler WAS sensible after all? ;)
Fortunate given I can't install the latest BIOS as MSI claim you have to update the ME first, which I can't, because its a Linux server and they only have the ME update for Windows.
Y'all should be repaid for doing what amounts to a ton of work for Intel.
Such a fascinating subject. I have a 13900K system .. that I just finished building on custom loop with 480mm radiator. I probably should have waited, but when Intel announced they had a solution I decided to go ahead and complete my build. It will be interesting to see how reliable this system turns out, having used only the Intel performance profile and the latest microcode. Fingers crossed.
Are the 13900H mobile chips impacted by this? We have several MS-01's planned for a "mini" Proxmox cluster and I'm wondering if I should be concerned. Haven't seen a recent BIOS update from Minisforum.
The behaviour is the same. I think what will either hold it back or delay degradation is the cooling. More cooling equals faster degradation as it boosts itself to death with more headroom. Oh, and constant load level is better than burst workloads or ramping up and down.
New to your channel, found you through Steve @ GN. Thank you for communicating at a level of understanding above the average id10t out there, it is really great.
The thunderstorm at the end made me slightly panic and look around. I am undeground and in a mall...😅
I usually do NOT care about server stuff. And yet sometimes, you manage to make it interesting. That's impressive!
Wendell, with regards to your comment regarding board vendors not sharing some of the responsibility here. I believe I'd seen that the VID spec on Intel EDS was originally 1.52V+200mV, which means before this 1.55V limitation, Intel believed the processor was safe to be running within 1.72V.
the ambiguity in the spec document could mean the socket has the capability for a future CPU. or it could be these. only Intel knows what it meant in that aspect
"Intel believed the processor was safe to be running within 1.72V."
No - the absolute maximum specification had been 1.7V - and that is not a "safe for 24/7 operation" voltage but the absolute maximum above which all bets are off. that is the voltage where it is specified that if at any point you reach it the CPU might as well already explode.
@@ABaumstumpf I don't think "all bets are off" is something within Intel's EDS. The specification is a maximum voltage. However, most CPUs VID range would fall well within this. Point is, the spec is the spec. It's not there for sh*ts and giggles as they say lol.
The "spec is the spec" is somewhat meaningless if Intel has a history of playing fast and loose to allow motherboard manufacturers to drive up performance. Intel could have been very, VERY specific about all of this but then they would have been less competitive against AMD. Motherboard manufacturers bear some responsibility as they COULD have asked for clarification. I'm sure they did at some point, but as long as failures weren't too high they likely went along with it (plus they were competitive agaisnt other board manufacturers)... until there were major problem and blame had to be thrown around.
At the end of the day this is Intel's problem to fix through updates and honoring Warranties.
"is something within Intel's EDS. "
No - absolute maximum rating are exactly that. They are the limits that must never be breached.
If you look at the specs that silicon manufacturers provide for their chips - those that the user is supposed to wire up - yeah go figure.
TI for example says this:
"The Absolute Maximum Ratings section (specifies the stress levels that, if exceeded, may cause permanent damage to the device. However, these are stress ratings only, and functional operation of the device at these or any other conditions beyond those indicated under Recommended Operating Conditions is not implied."
Nothing beyond RECOMMENDED is even guaranteed. The maxim ratings (and you will find the same with all other manufacturers) just literally mean what they say: the maximum.
Trust the gray beards, no matter the industry. There will usually be on who works and knows their trade, their craft and their industry. Aspire to become the Gray Beard.
Hearing you talk about Halt as a "way to use less power" reminds me of the truly old days, back when CPUs actually halting execution in a meaningful way wasn't even a thing and early CPUs would literally execute an idle loop to keep themselves occupied when they weren't needed. True CPU halt actually WAS a power saving method!
This resulted in the notorious "Halt and Catch Fire" situation, when an old IBM system with core memory would "halt" by repeatedly reading a particular memory cell... which would overheat that address in the core lattice, as it wasn't meant to be kept "on" continuously like that.
Halt is still relevant for many processors. That's a very standard thing to do in embedded systems.
But the special thing with a modern x86 chips is many cores. And hyperthreading. And translating x86 instruction into some internal RISC instructions - complete with instruction reordering. And speculative execution. All targeting a pool of ALU, FPU, ...
So there is no longer a single program counter owning all CPU hardware.
@@perwestermark8920 Thanks for the compsci lesson.
(And also thanks for not just telling me "Wrong." like someone did on another comment I made.)
@@FerralVideo "Wrong" seldom moves a debate forward. Especially since few problems have an absolute yes/no answer. Thanks to you for not going the standard "Reddit" route and end up angry, under the assumption that any response means being challenged to a fight - even responses adding some additional context.
Icarus Corporation.
Yes, its not catching the cpu problems in stress test. Most of the time. This is what baffled us a year ago with 13900K and we thought the problem could be elsewhere, like OS, memory, motherboard or even SSD. The cpu handled 350watts in unlocked stress test, but it would fail to install nvidia drivers 😂
The only was to know this back then was a processor swap.
The RMA experience with intel has been painful to say the least. It took 2.5 weeks to get to the point I would be able to get a shipping label, complete with repeated dialogue because the support representative did not read initial details outlined. It was like having a carrot dragged in front of you to get you to proceed, with hopes you would give up and stop the RMA process.
I haven't used my PC in months... I'm not going to put bandaids on it. IS it possible to RMA without the original CPU box?
@tannercust7976 yep
I don't think it's going to be as bad as you think. The latest _Arrow Lake_ CPU's have fast enough processing cores that they can eliminate the troublesome hyperthreading process that plagued reccent Intel CPU's. And the new CPU's will be using TSMC's 3 nm process, too.
Just like the oxidation issue, the "Vmin Shift Instability" is a separate defect from the main design flaw in Raptor Lake. They keep having Thomas blog out the latest distractions. As with the earlier warranty extension list, all Raptor Lake SKUs are affected by the design flaw, not just "K" SKUs. Intel is still denying HX SKUs are affected based on earlier miscommunications. They pushed Alder Lake architecture too far.
It's not a flaw, sorry.
Oh look it's you again, please tell us more about your 13700K, sample size of one.
Yeah got the 35W cpu here MAYBE DEAD! 13600T
@@drewnewbyI mean mine isn't dead either 💀
@PixelatedWolf2077 Did you want a cookie, or just to let everyone know you don't get it?
Its 0x12B now since 2024/09/27 I have have no issues what so ever on my i9 13900K, BUT I waited for the BIOS updates before I did anything heavy on my PC.
So far I am sticking with 0x129 with a voltage limit of 1450mv in the BIOS. My 13700K never goes above 1.25v at light load and drops to 1.16v at load anyway. The reason I don't want to go to 0x12B is the forcing of C1E C-States with no way to disable them. When playing certain games with C1E forced on, I'm seeing some cores drop down to 800mhz and I get microstutters. With C1E disabled on 0x129 my cores will still drop to 1200mhz when idle, but not when I'm actively doing things like playing games. Forcing C1E also causes horrible latency doing anything with audio.
I don't understand how customers let Intel off the hook for shit like this.
I bought a 13900k in Nov 2022. I ran it at 5.8p/4.5e/5.0ring since day 1 on a 2x360mm custom loop. I did video editing, played Tekken 8 every day, and I've played just about every UE5 game released so far. I never had a single issue. I did stay on an older bios because I figured if it's not broken, then it doesn't need fixing. I started hearing about these instability issues soon after the Tekken 8 demo released on Steam. I figured it was just inexperienced people with bad cooling. Regardless, as time went on I never had a single issue but decided to reach out to Intel for a refund because I didn't trust the CPU long term. Intel immediately approved my request, never asked for proof of instability, sent me a check for $599, and it was all super easy. Now Intel will be paying for my ArrowLake upgrade. I wish this happened every generation!!!
Yeah I havent had a single issue on my 13700k which sees some good use. Intel customer service is bizarrely amazing in an industry where most will screw you over (looking at you, ASUS). This problem doesnt seem as apocalyptic as people make it out to be. And Im not a fanboy, I couldnt care less about what logo is under the CPU cooler
Mine failed hard. At 1.35v it's fine tho
@@mattm3023 Only problem is warranty extension does nothing for people buying these second hand. If someone upgrades to Arrow Lake and dumps their 13900K the person buying it is SOL if it has any issues. (Still way better than Asus screwing over their original purchasers)
My 13700K has been underwater under volted this entire time, I haven't experienced any issues yet.
I haven't heard of a 13700 with problems yet.
Good boi
@@ocoinnigh393313/14 gen above 65w CAN be affected
For me it was annoying getting my waterblock off, but hey, Intel customer service got me a new CPU in a few days from the point where I opened a ticket and I even got a 13900k -> 14900k upgrade for my trouble.
Meh, the warranty is non-existent for me since I've delidded my KS, but I'm not worried anyway since I've had mine tuned to stay under 1.4V anyway
Intel listed the 13600k as one of the problematic cpus. Do you have any data regarding the i5s? Or are they mostly unaffected since they don't hit anywhere near 1.5v?
When I heard about the degradation issues I started running HWINFO in the background and my 13600K peaked 1.32 v vcore even on older microcode , I've had it since Oct 2022 (launch) and haven't had any issues. (MSI Pro Z690-A DDR4) currently running latest bios & 0x129 w/ intel defaults profile to be safe.
Guys a patch never solve a problem. It need a fix and not a patch. Intel knows what's the fix is but just keep buying time until warranty is done.
Man splain it to me Wendell :P .78 volts... captian she cant handle this
Was about to say that "can't" isn't in the Scottish dictionary, but alas at least the actor in the original series does say "can't" instead of "cannae". Probably because he's Canadian.
I have no faith in any management. I have worked at too many companies over the last 30yrs and good management is just a fleeting phase of most companies.
So if I'm understanding this correctly, the microcode update stopped the CPU from mana burning itself. (MTG reference)
So, I can untap my CPU? Gonna be annoying to switch to a motherboard w/ a rotated socket.
@@crash.override I'm pretty sure ASUS made one of those.
@@freelancerthe2561 EVGA was the master of rotated sockets
It would be wise and prudent for the greybeards to simply club together and set up a consultancy firm.
Then when what they warned about comes to pass. They can then charge the requisite consultancy fees way above what they would get in salary.
And to rub salt in the wound, they should name such a firm as "we told you so asshat"
SCAN Pro Audio already exist here in the UK if you're trying to build a system for the rigorous demands of real-time audio playback/recording/production.
BUT, the biggest question is....
Where can I get that wallpaper.
Also, I understand your comparison to the lights dimming but that is actually not from a lack of power, that is from an abundance of reactive power coming from the vacuum cleaner power factor correction at startup would solve that, not more power:)
#thatguy
Awesome investigating
I loved feeling like a great gray beard whisperer. I'm a 40 year old half gray in the making. (insane manic diving into these hardest issues for decades, respecting and learning from gray beards along the way)
The reason there aren't more people like us is because they get fed up of not getting listen to and leave. Over and over and over again
We all know Intel pushing the engineers back and why Bob Swan was removed.... but will Gelsinger pull them out, as of know, doubtfully :)
The real sign was when Keller left early citing "personal reasons". When he left AMD early it was because they finish early.
I'm trying to understand the microcode fix. When you described the voltage bumping to 1.3 from .7 fixed the issue was that just the Linux performance mode as a quick fix or that is what the microcode fix does? I'm concerned for people running this as a 24 x 7 server idle power consumption might be notably higher. Can you clarify if power consumption should be poor with the recent microcode fix under load or idle conditions?
16:00 Intel's boost algo looks at the number of cores active and if they are P or E core, then the voltage they request, then sets the speed limit.
Since the E cores are active in this scenario, you are at 1.35V for the E cores (the P cores dont need it, but core voltage is shared sadly).
The algo then looks at 1.35V, all cores active, and caps the boost speed at 5.4 for P cores, and 4.2 ish for the E cores.
The design is such, because in this state it can run full loads - instead it gets halt instructions.
If you were to feed it more power under load, the chip would run hotter. And a hotter chip isnt as stable. So rather the boost algo is designed to assume it is unstable above 5.4 with all cores active and that set voltage. So it does not go there.
Responding based on timeline. Its not the all core load that's causing the problem. And i can agree that it's related to gaming. As a small SI, this is what we feel too. Another system is coming in for an RMA next week. We have asked the customer a ton of questions and compiling the information for more details.
My 14700K (possibly OC's by my board, but not by me) became so unstable that the BIOS would freeze.
Now running my 13700(no letter), and it seems to be doing okay, but... I want the little performance boost I paid for, damnit.
0:25 "No one likes hard data more than I do"
Have you seen the kind of things Matt Parker (Stand-up Maths) will do to a spreadsheet?
Beeing an electronics engineer, the idea of core overvoltage due to lack of workload with poorly optimized code seems very strange to me. Let's try common sense. I understand poorly optimzed code as a lot of excess instructions where things could be handled more efficiently. But, there shall not be a difference in executing code of any quality. If "poor" code results in more cores activating and/or higher core clocks the CPU's current draw goes up respectively, which is normal. Higher current means higher voltage drop at the supply line from the VRM to the silicon die and even on-die. To compensate for this, a higher core voltage is governed. Of course, for stability, the voltage has to be ramped up BEFORE clock frequency increase and/or core activation and vice versa. This may take some 10 microseconds where the CPU briefly runs at a core voltage actually to high (due to lack of workload/current draw). This is not a glitch. It's by design. So, there cannot be an argument of "overvoltage" due to cache induced idle states either. Who does decide core voltage anyway? BIOS? OS? chipset? CPU microcode? There are buttons and levers everywhere, good enough only to obfuscate responsibilities. Actually the whole idea of predictive multi parameter dependent core voltage control seems to be a one-of-a-kind nightmare to me. Just think about a core supply current in excess of 300Amps over PCB, socket and silicon, producing more than 400Watts of on-chip thermal power at roughly 1Volt. That's wild. You can do this in a closely controlled environment, but in a consumer scenario? In my opinion, Intel has leaned out of the windows a bit to much with that.
Intel really needs to do a recall on these chips. It is going to cost them an enormous amount of money but I see no other choice. A class action lawsuit will cost them more because the lawyers get paid most of it.
Huh?
No one's even talking about a full recall anymore. Turns out it's not as big of an issue as Tech influencers and 20 year olds are making it seem on RUclips. It's manageable.
No one's even talking about a class action lawsuit anymore either. Because it's being handled already.
Lmao.
@@DingleBerryschnapps I worked at Intel for 20 years in senior leadership positions. You will see this turn into a class action lawsuit. A recall does not mean every CPU gets returned. A recall can mean they provide a utility to all customers to test their CPU and if it fails they return it.
@russelmm I wouldn't even bother with a reply, he's in every comment section related to the flaw, trolling constantly.
@@drewnewby roger
@@DingleBerryschnapps I have to agree with this one, zero issues with my 14900k since launch and I've given it a serious beating every time I can. CBR23 score = 43.5k at 440W under load. All limits disabled. I've even let this chip running prime over night at 1.6v on several occasions and it still works flawlessly. I would try higher voltages but unfortunately looks like they have blocked it through microcode. All other chips other than the 14900k allow up to 1.7v. The 14900k is locket to 1.6 at a firmware level. Even though you can post at 2.170v its strictly locket to things like navigating the bios. When you try booting at hihger voltages asus boards give you an overvoltage screen thingy and asrock boards for example simply revert it to default.
Thanks for your vids, i learn alot !. When i run XTU "AFTER" Mcu 129 and biosupdate on MSI edge i saw constant current/limit throtlling on the cpu/software on 14900kf with pro watercooling !.
"other computer janitors" - looool, truer words were never spoken.
I see, the algorithm relies that the CPU will consume much current so the elevated voltage will get lost in the resistance of the CPU power lines. But if for some reason the cores are starved for memory access, it won't and the voltage will rise.
God, I miss the days I could game for 8 to 10 hours at a time....😂
Yo, did you learn what a VID table is? Excited about Arrow Lake?
5:05 oh that immediately brings to mind, maybe some kind of weakness to thermal stress (physical), which compounds with the other, regular 13/14th gen issues
hm, but it's not that much longer than the 12th gen
if the problem is cooling related gaming laptops should be largely unaffected, as they cant boost high anyway. Outside of maybe the 2024 neo 16 the faliure rate should be the dame as every cpu before
If the workload is single core, that one core can eat the entire power budget and still boost a lot.
A 13900hx is 55W, but over 100W peak. If a single core can eat 50W by itself, then you can run full boost on one core no problem.
I'm guessing the microcode for mobile parts was already different enough to not have that thermal algorithm flaw and was also thermally limited so less likely to be able to "boost itself into ruination" 😅
They are also binned for lower voltages, that helps too
On a serious note this right here is one of the major reasons I subscribe to this channel. Awesome presentation of the facts.
Do you have any data on the TE series? Medical company that I'm working at (so me) is evaluating an upgrade for our embedded systems, is there any good way to 'burn-in' and test for these. Just like the others, intel is saying that TE is not effected, but... let be honest. The package power for these only goes up to ~65W though, and the motherboards havent had a BIOS update available for a long time either.
As a computer engineer, I heard "degradation in the clock tree" and my heart dropped.
12:31
I recognize tmux, but what is the utility showing core frequencies and voltages in the left window?
So the chips that "stayed hot" or C0 had better longevity?
So... I killed my chip because i had a water chiller on it and it never broke 65C under gaming loads? Lol😅😅😅😅😅😅 gg Intel
Skill issue (as in too much skill. If your build had been worse your chip would've lived o7)
@@jackdaniels5538 2x360 rads and a salt water chiller
To run it out of the box stock. 😂😎
Yes, regardless of temperature too much current is too much current
I was an Intel fan. BUT NOTING CAN BEAT THE 7800X3DXD XD
*in gaming.