You can't really blame 1 person for mistakes like these. Yes 1 person most likely wrote the code, however mistakes like these are still the responsibility of the entire team. There is a reason why code reviews exist, why you should test in a staging environment before rolling out to production and why cannary releases exist. I think the crowdstrike development team really needs to re evaluate their development proces for delivering software as opposed to blame it on 1 person. Because if these mistakes CAN happen, they are bound to happen again.
@@rimmer6335It doesn't matter what security measures or QA process you implement. It's destined to fail like a house of cards because choosing the never secure cloud as a platform for critical ops is very bad choice to begin with. From a simpler perspective, it's like making a major blunder in chess game against a grandmaster, you may as well resign. 😅
@@projectsspecial9224 I get what your saying and while i do agree with you regarding the cloud, my point is that this mess could have easily been prevented if they at any point tested the driver on a windows 10 environment before auto deploying the update to everyone at once.
Cybersecurity is an illusion. Anyone who depends on the cloud for mission-critical operations is dumb. You get what you pay for. As the saying goes, It's not IF but WHEN it goes down! 😅
Falcon MDR was an AI update applied a day or so before this happened. It conflicted with Microsoft version being copilot security or defender. The null pointer argument and the empty lines is not something that gets accidentally released. That’s a cover.
@@questioneverythingalways820 I am being a little silly as I would think it impossible that zero testing was done. There’s definitely a disconnect here and maybe that is as you say. More accurately, perhaps it is better to say zero “effective” testing was completed.
The problem appears to be with Windows 10. Perhaps, the software upgrade wasn't tested on a Windows 10 environment before releasing it to the world. It's a configuration testing issue.
When I worked at a "Big Networking Company"(here in the valley), one of the customer, Lufthansa Airline, DEMAND that we test our UPDATE with a backup system and ONLY when the UPDATE is successfully executed that Lufthansa Airline will allow for a LIVE UPDATE with their engineers monitoring and watching !!! It looks like Crowd Strike DID NOT test their UPDATE before going live.
They failed to test adequately and their customers also rolled it out no questions asked when they really should have done so in a far more measured fashion. Incompetence and complacency on both ends
This is something I truly don’t understand, given that testing is an integral part of my IT career. That and DR and backup ar3 the main focus of my job with large enterprises. No update should be rolled out before proper testing by all parties involved. With ING I had to first get clearance of every major application owner, and especially from Hyperion, the financial application that was used to report the financials to the stock exchange and central banks.
According to the reports the file C-00000291*.sys in the update is filled with zeros and obviously bogus. Looks like the error occurred when they were assembling the files to deploy. Amateurs.
The problem being experienced globally was not a result of systems being interconnected. It was the result of so many organisations using the same software, which meant that they were all affected in the same way by the faulty software update.
@@raydall3734 LOL. So Crowdstrike on Linux would work? Dream on. Using Linux Mint now, it's pretty good though for real work I would not switch from Windows.
@@raylopez99Linux wouldn’t need Crowdstrike. BC no apps can have access to the kernel. Malware & viruses are trapped in the shell which can be cleansed easily.
Can almost guarantee it was given some non-zero amount of prerelease testing. However, at exactly what point something becomes "sufficiently" tested is kinda more art than science, and clearly this incident fell on the WRONG side of that line.
The number of incidence will go up with more redundancies and layoffs as things are getting more complex and lesser and lesser software developer/ engineers to put an eye on the software in code reviews. When managers without a degree in computer science gets up with a confidence based on ChatGPT
To affect so many devices means that they not just made a fatal mistake but also very basic, something that a very basic lab test would detect it, so they just apply the update without testing it once
Cloudstrike is not surprising. It is not even a matter of complexity, it is just a visible sign of a decades long degradation of the workplace culture. Companies say empty words like "they take responsibility" but promote a work environment where unqualified people are hiredm promoted and put under pressure to deliver faster and to ignore long held practices of Quality Control. They even take inordinate steps to give the illusion of QC while actively bypassing it and the finding scapegoats.
The people stuck at airports didn't sign a contract. Neither did the families of patients who couldn't get medical care, or businesses that rely on the companies that went down. There is huge liability here spread out throughout the world.
Nothing like that. We have Apple, Microsoft, Linux. So many are there. There's no monopoly. Microsoft enjoys a larger market share because Windows is easy to use given the gui environment. If you use Linux you'll understand how painful it is. For everything you have to type long commands and even remember them. Intel and Microsoft started this PC revolution and offered computers at affordable prices to the masses.
@@starupiva Plenty of GUI on Linux these days. Your statement would be more accurate around 15 years ago. The high powered types use the command line on windows and Linux -- a sort of trained scribe conceit I think. Of course if you were a command line athlete you would be using Arch or Gentoo while running commands on the Windows terminal for variety.
@@tomspencer1364 And a third party update wouldn't knock the kernel out either. BTW, windows is incredibly painful "under the hood". When things go wrong it "reboot" then "re-image" rather than fix.
Yes, this was baffling. However the issue was they pushed out a nulled out file. They probably did their testing with the correct file before deciding on pushing it out and it seems like it got corrupted somewhere along the distribution chain. The loading routine referenced data on the problem file that did not exist and thus caused a hard fail during the boot process for the OS. This wouldn't have been an issue if the file in question wasn't part of the loading drivers upon boot due to the nature of how AV software works. MS definitely has to improve how their OS handles these files during the boot process if they're not OS essential systems files. They're allowing 3rd party vendors to have too much control over the boot process and in turn leaving us at risk of this happening again in the future.
Perhaps but this update was not distributed in the standard way they normally push it. And it was released almost an hr before it was previously scheduled….
MS definitely has become complacent (and has a "so what" attitude at its position of dominance ) which will result in increasingly vulnerable situations for its users. Plus the fact that gigabytes of junk is loaded by MS in these commercial systems that effectively use only less than 1% of the total OS. Alternatives will now appear sooner than later
@@vijayakumarpottayil3746Employees have become complacent overall. It's not Microsoft of Crowdstrike. There is a laziness and sense of entitlement that has spread across industries as a result of the pandemic and initiatives like work from home.
This software actually runs with a higher priority level than the Windows operating system. Windows runs UNDER this software’s control not the other way around.
Went to work for an international corporation in 1989 which had no QA whatsoever in its software development department. They were baffled by all the blue-screen-of-death crashes their telemetry post processing software was causing. Quickly tracked it down to sloppy use of sprintf and exceeding shared memory segment bounds. Even after reporting these issues in great detail, they couldn’t believe it was root cause until I showed them specific examples of it happening in multiprocess debug sessions. Then the software engineers grew angry with me as if I had written the sloppy code.
CrowdStrike outage is the second major tech meltdown that founder and CEO George Kurtz has been involved in. He was also the Chief Technology Officer of McAfee in 2010, when a security update from the antivirus firm crashed tens of thousands of computers. cybersecurity company CrowdStrike pushed a faulty software update that bricked thousands of Microsoft Windows computers across the world and brought many services to a screeching halt. Air travel, credit card payments, emergency services, stock markets and much more were affected by the Microsoft outage linked to the disastrous CrowdStrike software update. It reminded some people of the McAfee blunder of 2010 when the antivirus firm inadvertently triggered a worldwide shutdown of Windows XP PCs across the world. People were even more surprised to discover that CrowdStrike’s billionaire founder and CEO George Kurtz served as the CTO of McAfee in 2010.
Not only that, but apparently there was at least one prior incident of a Crowdstrike Falcon channel content update causing similar issues _on various Linux distros._
Although there was a bug in the code, it should have been caught using a sandbox or with early adopters before it was deployed WW. So, this is also a process problem.
@@projectsspecial9224This particular incident has nothing to do with cloud computing. It affected on-premises PCs too. The common theme is centralisation.
@@rezwhap I know. I was referring to the fact that you should never rely on a cloud platform for cybersecurity especially for mission-critical systems that's L0-L3 including servers, PCs, Scada PLC, other devices etc.
@@asanokatana Actually, that is exactly what they should do! I think these virus companies make more of a hoopla about viriuses than is necessary. So, what is the big deal if they take a couple of days to test? Anyway, in this particular case, the cure was worse than the (potential) disease.
Not so surprising! It really exposes Crowdstrike’s failures on quality control, reliability, and operation excellence at the expense of quick delivery. Imagine your EV overnight software update resulted in brake failures.
This is bad development process and bad operation process. This company deserves to go out of business. The CEO is an idiot, I saw him on another channel. Completely clueless about how to run a company like this.
Over the years, so-called proud cybersecurity professionals mocked me and called me an idiot for not implementing cloud services for critical security updates. I got the comical sneering looks and condescending tone during my interviews for cybersecurity analyst positions when I said it's of high importance to focus the budget on verifiable backups, redundancy and recovery optimization, than on cloud services especially for mission-critical ops. Fast forward a few years later, those companies went down hard and their egotistical IT leadership running around like chickens with their heads cut off. 😂
Despite a world of 8 billion people, there are moments like this that make almost all of humanity feel like we're collectively learning a lesson. There's something oddly unifying about it but hardly the sort of 'unifying experiences' we want.
By interconnectedness, they mean centralization. Too many people chained themselves to the CrowdStrike wagon so when it fell off a cliff, they went down with it. Where did CrowdStrike's competition go? Mergers and acquisitions.
@@aisle_of_viewyou cannot use that excuse in front of an E.U enquiry. I indeed expect that, at least here in the E.U, that it is going to be a pricy adventure
Here is the thing... if word goes out that Crowdstrike are NOT legally liable in any way for this mega crash this is only going to FURTHER the stampede to alternatives. I do not see Crowdstrike coming out of this one unscathed.
Fun fact: the file with the error was a definition file that was replaced by an empty file. This file was sent to all clients but Windows was the only system that crashed. Linux and Mac OS just fixed themselves.
I bet the definition file, is a standard format, and they just edit it to include the new information, and somehow, a typo, got in. Crowdstrike needs to do a staggered release, so can stop as soon as an issue is discovered
An update that bricks systems should not have been released. Clearly no test was performed prior to release. When software becomes central to the correct operation of large scale business and societal systems you cannot afford to be "slap dash" about testing and quality control. I can't help thinking that this may be another instance of putting share price before customers, which doesn't end well in the long term.
its not complex but contrary simple ... it's horizontal integration where one player provided cheaper services and companies also cut back on IT costs and in the end its this... but on the other hand since all failed at once no business got hurt ... no big deal... things will stay the way they are and world will move on in the pace only few IT guys will be fired at crowdstrike that's it ...
It boils down to a quality assurance problem at CrowdStrike. There are people in comments who are claiming that this was unavoidable. Really? The bad patch was rolled out globally, because once it got into the update chain it was rolled out automatically. Old lessons are being learned by a new generation of IT. But don't take it the wrong way. This has happened before, and not just in IT.
I can barely nderstand forced update for this kind of thing. But what really staggers my mind is how Windows was utterly unable to boot because of this and how so many companies had no disaster recovery plan. A disk image, people? A weekly server backup? SOMETHING?!
He has a point though. Quite a few guys at my work in the IT department agree with this sentiment. They all agree that the excuses and reasoning for this downfall to happen all sound very much like BS. A large company like Crowdstrike can't really admit to having been hacked so the big cover up has to take place. With the amount of BS we're fed on a daily schedule, a cover up of this magnitude isn't to far fetched. Joys of being totally dependent on something designed to control the population. The old days without computers and mobile phones were great. @@sturgeon2888
Crowdstrike should have to test their updates first. Then each company should have someone in their IT department test the update on a few computers. That would have stopped the problem.
My theory is that the file causing the problem became a routine task handed out to lower-level employees. Complacency probably caused this outage, as well as global reliance on a single product. Standardization has its benefits, and its downsides.
@@lacollineenchantee980 Been there, done it. That's how I know! MX-Linux allows creation of complete install images with users, apps and settings. You get HUGE files but restoration to a known working point takes 30 minutes MAX! Saved me on two critical occasions, once when the laptop I was using MELTED. I had to buy a new laptop at the local "Hartono" - tells you where I now reside - and I had one hour after purchase to have the machine set up with MX-Linux and all apps ready to handle the client of the day! I made it with 2 minutes to spare but I made it! Zero disruption to my work - though I was very tired come the evening. I also have external hard drives and salvaged laptop drives from old hardware with USB cables, where I periodically back up my essential data across different drives. In my previous life in IT there were workstation Windows disk images for emergency restoration and daily and weekly server back ups. Now how was none of these "Things" worldwide? Why could no-one boot from an external drive such as a Windows boot drive or Linux ISO to delete the offending file?
CrowdStrike should allow customer's IT teams to configure when sensors or agents receive updates, rather than imposing a "leave everything to us" solution. 😅😅😅 As a CTO, I will never allow such solution in my server farms 😊
Years ago, I was speaking to an IT professional acquaintance for the airlines and I asked him if they were going to install the new version of Windows in their avionics system bus network and in-flight entertainment system. He said "heck no! I don't want to go down because of BSOD! We are installing linux!" 😂😂😂
@@projectsspecial9224 The NTSB crash investigation looked at the electrical interfaces and determined that the entertainment system had been connected to the flight control computer network.
Even if we accept that there was a mistake during deployment which caused file corruption, what is baffling is that Microsoft released it worldwide, and the software update occured at almost the same time globally. Such a simultaneous global release into millions of computers is simply not acceptable. They should have have staggered releases so feedback was available from the first sets that were deployed.
Is this some of many "emergency updates" to plug some newly discovered vulnerability, so all the usual testing got sidelined? I've seen many "emergency updates." There are so many "emergencies" you get numbed to it.
This was a routine virus update - they shipped a BLANK file instead of the normal definition file (and their software died because of bad internal file handling).
NIST SP 800-40 Official industry guidance It says you should test updates before implementing them. Every company affected has a very poor patch management process and totally disregarded official IT guidance.
Interestingly enough!?.....Only computers with the Microsoft Windows OS were completely affected by all of this and this never affected ANY Google Chrome MacOSX Linux and even Amazon Fire OS which is really ironic....
I think the problem was the environment parity. They have different pipelines for stage and prod. The prod pipeline bug caused to ship an all null sys file. This is just speculating with available things.
@@motogirlz101 I use MX Linux (Debian Stable) for a reason. Pioneering means lying a ditch with arrows in your back. I need my laptops for my work! [Arch users can go pioneering for me. Thanks guys!]
What if the error didn't immediately disable computers, but instead inserted an unnoticeable glitch that made small errors that compounded over time? It might take hours or days before the stock market became crazy or your bank balance disappeared or thousands of components were suddenly all the wrong size. It could throw our economy into chaos. No single "update" should have that power! But obviously, it can happen.
In the role that I have, I have cases where I see people get paid six figures to plan a contingency plan for companies. I’ve seen pay as high as 350-500K! How in the hell does this happen in terms of the companies? I remember 30 years ago I worked for Greyhound bus lines as a ticket cashier. System was down and I was the only one working with a line of people. I had to resort to writing boarding passes so people could board! Managers didn’t even have the foresight to do it! If there was plan or workaround, the people that these companies pay should be fired! There should always be a manual backup process to follow. This is insane in 2024.
"The company takes responsibility." How? Exactly what actions will they be taking? Probably nothing, just say a few words then wait for something to distract us.
So many onsite techs armed with bitlocker keys right now just to boot into recovery cmd and delete a .sys file. Wild. Wonder how service desks are handling remote support.
Small scale deployment and sit-and-wait (a period of time) are crucial factors for successful large scale deployment to production environment. Are Software Update Policy bulletins-proof? Are CIOs competent ? Are CFO allocating adequate resources? What about the governance and support from the Risk Management Committee and the Board?
"Welp. My first day at CrowdStrike went pretty good. Just now hit send on some code I wrote. Looking forward to the weekend!"
You can't really blame 1 person for mistakes like these. Yes 1 person most likely wrote the code, however mistakes like these are still the responsibility of the entire team. There is a reason why code reviews exist, why you should test in a staging environment before rolling out to production and why cannary releases exist. I think the crowdstrike development team really needs to re evaluate their development proces for delivering software as opposed to blame it on 1 person. Because if these mistakes CAN happen, they are bound to happen again.
@@rimmer6335It doesn't matter what security measures or QA process you implement. It's destined to fail like a house of cards because choosing the never secure cloud as a platform for critical ops is very bad choice to begin with. From a simpler perspective, it's like making a major blunder in chess game against a grandmaster, you may as well resign. 😅
@@projectsspecial9224 I get what your saying and while i do agree with you regarding the cloud, my point is that this mess could have easily been prevented if they at any point tested the driver on a windows 10 environment before auto deploying the update to everyone at once.
Not how it works, IT shops don't deploy major anything on Fridays
@rimmer6335 I would reply sooner, but I had a BSOD
when companies cheap out and don't hire enough QA engineers.
Or people they hired were hired based on how dark their skin color is instead of their capabilities.
QA engineers for what? Unpaid interns a.k.a. slaves are the norm, they vastly out-perform QA engineers in profit margin.
@@SamsungGalaxy-ls8ys Golly you drag the race card into this? You sure have one twisted little brain.
Cybersecurity is an illusion. Anyone who depends on the cloud for mission-critical operations is dumb. You get what you pay for. As the saying goes, It's not IF but WHEN it goes down! 😅
How's that DEI policy working out for them?
Sure, global code push to kernel based Windows software agent with zero testing. What could go wrong?
If you think they did zero testing you are silly.
Falcon MDR was an AI update applied a day or so before this happened. It conflicted with Microsoft version being copilot security or defender. The null pointer argument and the empty lines is not something that gets accidentally released. That’s a cover.
@@questioneverythingalways820 I am being a little silly as I would think it impossible that zero testing was done. There’s definitely a disconnect here and maybe that is as you say. More accurately, perhaps it is better to say zero “effective” testing was completed.
The problem appears to be with Windows 10. Perhaps, the software upgrade wasn't tested on a Windows 10 environment before releasing it to the world. It's a configuration testing issue.
@@imjamming null de reference in the code, which runs at kernel level. On reboot, it prevents Windows from booting. That’s not a Windows problem.
When I worked at a "Big Networking Company"(here in the valley), one of the customer, Lufthansa Airline, DEMAND that we test our UPDATE with a backup system and ONLY when the UPDATE is successfully executed that Lufthansa Airline will allow for a LIVE UPDATE with their engineers monitoring and watching !!! It looks like Crowd Strike DID NOT test their UPDATE before going live.
They failed to test adequately and their customers also rolled it out no questions asked when they really should have done so in a far more measured fashion. Incompetence and complacency on both ends
This is something I truly don’t understand, given that testing is an integral part of my IT career. That and DR and backup ar3 the main focus of my job with large enterprises. No update should be rolled out before proper testing by all parties involved. With ING I had to first get clearance of every major application owner, and especially from Hyperion, the financial application that was used to report the financials to the stock exchange and central banks.
It should not be needed to ask that, it must be a matter of fact to do that.
Just update part of your system, let it run, and if it is OK proceed with the rest... step by step
According to the reports the file C-00000291*.sys in the update is filled with zeros and obviously bogus. Looks like the error occurred when they were assembling the files to deploy. Amateurs.
As a retired 40+ year IT pro, I refer to this as a resume' generating event.
I'm using the Crowdstrike outage on my resume 😂😂😂 Imma be like "Saved my company from a global outage."
Interviewer flowchart:
Read Resume
Crowdstrike Employment? Yes--> Next Resume; Go to Read Resume
No --> Continue . . .
This is a great comment.
No It highlights the incompetency and unprofessionalism from careless managers...
The problem being experienced globally was not a result of systems being interconnected. It was the result of so many organisations using the same software, which meant that they were all affected in the same way by the faulty software update.
Agreed. More companies should drop Microsoft products.
Most companies should sue crowdstrike and threaten Microsoft to stop using crowdstrike products or else
@@raydall3734 LOL. So Crowdstrike on Linux would work? Dream on. Using Linux Mint now, it's pretty good though for real work I would not switch from Windows.
@@raylopez99Linux wouldn’t need Crowdstrike. BC no apps can have access to the kernel. Malware & viruses are trapped in the shell which can be cleansed easily.
@@niranjansm I use Windows, but not CrowdStrike.
Testing your software before sending it out to a billion computers. What an amazing idea! /s
Doesn't matter how brilliant your plans afterwards if you started with a really dumb idea like a cloud security platform 😂😂😂
Can almost guarantee it was given some non-zero amount of prerelease testing. However, at exactly what point something becomes "sufficiently" tested is kinda more art than science, and clearly this incident fell on the WRONG side of that line.
The number of incidence will go up with more redundancies and layoffs as things are getting more complex and lesser and lesser software developer/ engineers to put an eye on the software in code reviews. When managers without a degree in computer science gets up with a confidence based on ChatGPT
A third party app/company should not be able to brick the OS. Seriously Microsoft?
no testing, no selective group , layered updating..just ineptness
To affect so many devices means that they not just made a fatal mistake but also very basic, something that a very basic lab test would detect it, so they just apply the update without testing it once
It wasn't ineptness, these events usually have a plan behind them. They are just made to look like a mistake.
Cloudstrike is not surprising. It is not even a matter of complexity, it is just a visible sign of a decades long degradation of the workplace culture.
Companies say empty words like "they take responsibility" but promote a work environment where unqualified people are hiredm promoted and put under pressure to deliver faster and to ignore long held practices of Quality Control. They even take inordinate steps to give the illusion of QC while actively bypassing it and the finding scapegoats.
I'm sure CrowdStrike's contracts are carefully written so that they are not liable for any damages due to the malfunction of their software.
The people stuck at airports didn't sign a contract. Neither did the families of patients who couldn't get medical care, or businesses that rely on the companies that went down. There is huge liability here spread out throughout the world.
correct, the term limits the amount to the price of their software, or 60 bucks per device.
Objection: they could still be liable for "gross negligence" which overrides most EULA contracts (which are bound "to the extent permitted by law").
Already happened in 2010 with mcafee (DAT update). Humanity doesn't learn.
And the same person was responsible
It's also due to a monopoly of one operating system in the private business and public sector space.
Nothing like that. We have Apple, Microsoft, Linux. So many are there. There's no monopoly. Microsoft enjoys a larger market share because Windows is easy to use given the gui environment. If you use Linux you'll understand how painful it is. For everything you have to type long commands and even remember them. Intel and Microsoft started this PC revolution and offered computers at affordable prices to the masses.
Truth!
@@starupiva Plenty of GUI on Linux these days. Your statement would be more accurate around 15 years ago. The high powered types use the command line on windows and Linux -- a sort of trained scribe conceit I think. Of course if you were a command line athlete you would be using Arch or Gentoo while running commands on the Windows terminal for variety.
Best comment
@@tomspencer1364 And a third party update wouldn't knock the kernel out either.
BTW, windows is incredibly painful "under the hood". When things go wrong it "reboot" then "re-image" rather than fix.
Yes, this was baffling. However the issue was they pushed out a nulled out file. They probably did their testing with the correct file before deciding on pushing it out and it seems like it got corrupted somewhere along the distribution chain. The loading routine referenced data on the problem file that did not exist and thus caused a hard fail during the boot process for the OS.
This wouldn't have been an issue if the file in question wasn't part of the loading drivers upon boot due to the nature of how AV software works.
MS definitely has to improve how their OS handles these files during the boot process if they're not OS essential systems files. They're allowing 3rd party vendors to have too much control over the boot process and in turn leaving us at risk of this happening again in the future.
Perhaps but this update was not distributed in the standard way they normally push it. And it was released almost an hr before it was previously scheduled….
MS definitely has become complacent (and has a "so what" attitude at its position of dominance ) which will result in increasingly vulnerable situations for its users. Plus the fact that gigabytes of junk is loaded by MS in these commercial systems that effectively use only less than 1% of the total OS. Alternatives will now appear sooner than later
Random programmer maybe: it's just a Continuous Integration pipeline change, dude, no need to stress, reviewing it!
@@vijayakumarpottayil3746Employees have become complacent overall. It's not Microsoft of Crowdstrike. There is a laziness and sense of entitlement that has spread across industries as a result of the pandemic and initiatives like work from home.
This software actually runs with a higher priority level than the Windows operating system.
Windows runs UNDER this software’s control not the other way around.
So much for tech layoffs and AI .
They'll just use this as an excuse for why employee replacement with AI is critical
Went to work for an international corporation in 1989 which had no QA whatsoever in its software development department. They were baffled by all the blue-screen-of-death crashes their telemetry post processing software was causing. Quickly tracked it down to sloppy use of sprintf and exceeding shared memory segment bounds. Even after reporting these issues in great detail, they couldn’t believe it was root cause until I showed them specific examples of it happening in multiprocess debug sessions. Then the software engineers grew angry with me as if I had written the sloppy code.
Lots of Devs have huge egos. They write buggy code and then get mad when someone points out the bugs.
CrowdStrike outage is the second major tech meltdown that founder and CEO George Kurtz has been involved in. He was also the Chief Technology Officer of McAfee in 2010, when a security update from the antivirus firm crashed tens of thousands of computers.
cybersecurity company CrowdStrike pushed a faulty software update that bricked thousands of Microsoft Windows computers across the world and brought many services to a screeching halt. Air travel, credit card payments, emergency services, stock markets and much more were affected by the Microsoft outage linked to the disastrous CrowdStrike software update.
It reminded some people of the McAfee blunder of 2010 when the antivirus firm inadvertently triggered a worldwide shutdown of Windows XP PCs across the world.
People were even more surprised to discover that CrowdStrike’s billionaire founder and CEO George Kurtz served as the CTO of McAfee in 2010.
Not only that, but apparently there was at least one prior incident of a Crowdstrike Falcon channel content update causing similar issues _on various Linux distros._
Although there was a bug in the code, it should have been caught using a sandbox or with early adopters before it was deployed WW. So, this is also a process problem.
The cloud platform which is based on is unsecure and prone to bugs and exploits. So, it's unreliable to begin with.
@@projectsspecial9224This particular incident has nothing to do with cloud computing. It affected on-premises PCs too. The common theme is centralisation.
@@rezwhap I know. I was referring to the fact that you should never rely on a cloud platform for cybersecurity especially for mission-critical systems that's L0-L3 including servers, PCs, Scada PLC, other devices etc.
@@asanokatana Actually, that is exactly what they should do! I think these virus companies make more of a hoopla about viriuses than is necessary. So, what is the big deal if they take a couple of days to test? Anyway, in this particular case, the cure was worse than the (potential) disease.
Not so surprising! It really exposes Crowdstrike’s failures on quality control, reliability, and operation excellence at the expense of quick delivery. Imagine your EV overnight software update resulted in brake failures.
Bizarre to say the least
not testing before implementation on that scale is absolutely mind boggling...just unreal...
This is bad development process and bad operation process. This company deserves to go out of business. The CEO is an idiot, I saw him on another channel. Completely clueless about how to run a company like this.
Over the years, so-called proud cybersecurity professionals mocked me and called me an idiot for not implementing cloud services for critical security updates. I got the comical sneering looks and condescending tone during my interviews for cybersecurity analyst positions when I said it's of high importance to focus the budget on verifiable backups, redundancy and recovery optimization, than on cloud services especially for mission-critical ops. Fast forward a few years later, those companies went down hard and their egotistical IT leadership running around like chickens with their heads cut off. 😂
I don’t deploy often, but when I do it’s on Fridays!
Despite a world of 8 billion people, there are moments like this that make almost all of humanity feel like we're collectively learning a lesson. There's something oddly unifying about it but hardly the sort of 'unifying experiences' we want.
In IT, the problem is we are not learning through the generations. The same ludicrous mistakes are repeated over and over.
Is the company giving me 8 hours of payroll because I couldn't log in?
Right. I think a lot of us lost a bit of money on that. Some people I know made more money as a result of it. Broken window fallacy though.
By interconnectedness, they mean centralization. Too many people chained themselves to the CrowdStrike wagon so when it fell off a cliff, they went down with it. Where did CrowdStrike's competition go? Mergers and acquisitions.
The CEO of CrowdStrike is an accountant.
😂😂😂 it's obvious 😂😂😂
@@projectsspecial9224 As are the results...
And soon to be unemployed 🤔
@@maxoverridemax To the contrary, watch for the bonus he makes at year-end.
Guy is a pentester. He has several certifications and wrote the pentest BKM playbook for PriceWaterhouse. College degrees are typically useless.
For how long will it last?
Are we going to get the mother of all class actions?
Nope, the EULA and lawyers avoided that.
@@aisle_of_viewyou cannot use that excuse in front of an E.U enquiry. I indeed expect that, at least here in the E.U, that it is going to be a pricy adventure
Here is the thing... if word goes out that Crowdstrike are NOT legally liable in any way for this mega crash this is only going to FURTHER the stampede to alternatives.
I do not see Crowdstrike coming out of this one unscathed.
Fun fact: the file with the error was a definition file that was replaced by an empty file. This file was sent to all clients but Windows was the only system that crashed. Linux and Mac OS just fixed themselves.
Don't use Microsoft for critical systems..,.
You don’t have much choice. Mac simply can’t interface to diddly (Apple won’t let you).
Missing sandbox testing is reprehensible in this case. Even greater is the absence of a rollback mechanism. That’s on Microsoft.
Well, they struck down a crowd.
Crowdstrike earned its name ! Scary.
"If you want something done right, learn to code."
I bet the definition file, is a standard format, and they just edit it to include the new information, and somehow, a typo, got in. Crowdstrike needs to do a staggered release, so can stop as soon as an issue is discovered
An update that bricks systems should not have been released. Clearly no test was performed prior to release. When software becomes central to the correct operation of large scale business and societal systems you cannot afford to be "slap dash" about testing and quality control. I can't help thinking that this may be another instance of putting share price before customers, which doesn't end well in the long term.
Talk to IT Executives, not technical experts.
😂😂😂 they just talk about golf and the latest buzzwords 😂😂😂
@@projectsspecial9224 "AI offers a world of unbelievable advancements!" - yeah, like laying off 50% of the workforce
Any way you stack it, Windows isn’t great software.
its not complex but contrary simple ... it's horizontal integration where one player provided cheaper services and companies also cut back on IT costs and in the end its this... but on the other hand since all failed at once no business got hurt ... no big deal... things will stay the way they are and world will move on in the pace only few IT guys will be fired at crowdstrike that's it ...
It boils down to a quality assurance problem at CrowdStrike. There are people in comments who are claiming that this was unavoidable. Really? The bad patch was rolled out globally, because once it got into the update chain it was rolled out automatically. Old lessons are being learned by a new generation of IT. But don't take it the wrong way. This has happened before, and not just in IT.
I can barely nderstand forced update for this kind of thing. But what really staggers my mind is how Windows was utterly unable to boot because of this and how so many companies had no disaster recovery plan.
A disk image, people?
A weekly server backup?
SOMETHING?!
They were probably hacked, then lied about it to protect their business.
You sound like an expert in the field.
He has a point though. Quite a few guys at my work in the IT department agree with this sentiment. They all agree that the excuses and reasoning for this downfall to happen all sound very much like BS. A large company like Crowdstrike can't really admit to having been hacked so the big cover up has to take place. With the amount of BS we're fed on a daily schedule, a cover up of this magnitude isn't to far fetched. Joys of being totally dependent on something designed to control the population. The old days without computers and mobile phones were great. @@sturgeon2888
The best line in this story--- we use technology that we "take for granted." We're so dependent on computers that glitches like this terrify me. 😧
Crowdstrike should have to test their updates first. Then each company should have someone in their IT department test the update on a few computers. That would have stopped the problem.
Crowdstrike doesnt have QA?
I don't always test my code but when I do, it's in production.
Don't ask, you may not like the answer...
This is a PR video. Look at how many times she mentions the company definition.
My theory is that the file causing the problem became a routine task handed out to lower-level employees. Complacency probably caused this outage, as well as global reliance on a single product. Standardization has its benefits, and its downsides.
This is a test… not a mistake. My opinion.
Looks like Russia is issuing a warning to the West
Yes, there is lots of silliness being expressed online.
Founder tin foil hat people
Test how? What specifically?
I posted this for reaction to the comment only. Never disappointed in the comments that follow.
first rule of IT : test before implementing ; second rule of IT : test before implementing; third rule of IT : apply first and second rule ...
fourth rule of IT: never rollout an update on a Friday.
Technically the first rule of IT is "backup."
The 2nd rule is "back up you idiot!"
The 3rd rule is "have fall back"
THEN you test!
@@jedipadawan7023 right O you should apply for an IT job ... I think you have 99.99 % of the required IT skills ;o)
@@lacollineenchantee980 Been there, done it. That's how I know!
MX-Linux allows creation of complete install images with users, apps and settings. You get HUGE files but restoration to a known working point takes 30 minutes MAX! Saved me on two critical occasions, once when the laptop I was using MELTED. I had to buy a new laptop at the local "Hartono" - tells you where I now reside - and I had one hour after purchase to have the machine set up with MX-Linux and all apps ready to handle the client of the day! I made it with 2 minutes to spare but I made it! Zero disruption to my work - though I was very tired come the evening.
I also have external hard drives and salvaged laptop drives from old hardware with USB cables, where I periodically back up my essential data across different drives.
In my previous life in IT there were workstation Windows disk images for emergency restoration and daily and weekly server back ups.
Now how was none of these "Things" worldwide? Why could no-one boot from an external drive such as a Windows boot drive or Linux ISO to delete the offending file?
If Crowdstrike take responsibility than they should be made to pay every customer impacted by their bad update..
There should be an option to disallow automatic updates.
I know Professor Levant Ertaul! He's the best! CSU East Bay was an awesome school to attend for computer science.
They certainly struck a huge crowd this time!
All of these working people
This content update was never tested because it bricked all latest Windows systems, this could have been caught before its release
Nowhere in this video does it address or talk about the policy that the headline aludes too
Who needs Skynet when a simple system upgrade will do?
Updates are never rolled-out on a massive scale, but tested on a small scale first
Security updates have a problem here - staggered rollout can actually alert hackers to the vulnerability being fixed.
not in this case. Crowdstrike truly did a crowd strike. Globally.
@@KR-rs3vn I say Putin
the great reset is near
where do you get news related to it? is there a channel?
@@mrGoldt7xgo away bot
Great silliness is here.
More a like FORCED RESET😂😂😂
@@whlewis9164 can you help me with a related channel to the Great Reset?
CrowdStrike should allow customer's IT teams to configure when sensors or agents receive updates, rather than imposing a "leave everything to us" solution. 😅😅😅
As a CTO, I will never allow such solution in my server farms 😊
SouthWest FTW!! (well. at least this time).
Human complacency is the culprit. No testing was done and now an entire department was terminated.
The good news is that software q/a people just got a career extension.
At UCB they're working on electronic ping-pong? Typical.
Forrest Gump approves!
A small change in the computer control systems of the 737Max lead to two crashes and about 200 deaths.
The 737 never had a flight control computer system until the Max series. It was a brand new piece of hardware that caused the crashes.
compartmentalize * as the prof says * - that's done by a microkernel - kernel driver with issues takes the whole system with it
People DIED because of CrowdStrike's gross negligence and all they offer is "oopsie" while everyone looks the other way, absolutely disgusting
I am a CrowdStrike Shareholder. I want to see an audit of their software design lifecycle SOPs
Southwest Airlines running on Linux FTW. Gotta love that Love.
Years ago, I was speaking to an IT professional acquaintance for the airlines and I asked him if they were going to install the new version of Windows in their avionics system bus network and in-flight entertainment system. He said "heck no! I don't want to go down because of BSOD! We are installing linux!" 😂😂😂
@@projectsspecial9224and then a plane crashed because the inflight entertainment system was connected to the aircraft control network…
@@allangibson8494 who said it was? Are you an avionics expert?
@@projectsspecial9224 The NTSB crash investigation looked at the electrical interfaces and determined that the entertainment system had been connected to the flight control computer network.
Even if we accept that there was a mistake during deployment which caused file corruption, what is baffling is that Microsoft released it worldwide, and the software update occured at almost the same time globally. Such a simultaneous global release into millions of computers is simply not acceptable. They should have have staggered releases so feedback was available from the first sets that were deployed.
You know what they say in the field of softeware: "It always happens on Friday"
Perfect opportunity to start your own cybersecurity company.
Is this some of many "emergency updates" to plug some newly discovered vulnerability, so all the usual testing got sidelined? I've seen many "emergency updates." There are so many "emergencies" you get numbed to it.
This was a routine virus update - they shipped a BLANK file instead of the normal definition file (and their software died because of bad internal file handling).
NIST SP 800-40
Official industry guidance
It says you should test updates before implementing them. Every company affected has a very poor patch management process and totally disregarded official IT guidance.
"This is a test. It is only a test."
Interestingly enough!?.....Only computers with the Microsoft Windows OS were completely affected by all of this and this never affected ANY Google Chrome MacOSX Linux and even Amazon Fire OS which is really ironic....
I think the problem was the environment parity. They have different pipelines for stage and prod. The prod pipeline bug caused to ship an all null sys file. This is just speculating with available things.
I just don't buy this was an upgrade. More like an attempt to integrate and collet more data subversively.
It was a blank (zero content) virus definition file. Not even a software update.
Having an isolated TEST/LAB part of a Data Center is a pretty basic part of any data center. What a fail.
I wonder what the CrowdStrike -> Client SLA looks like. They going to get class action sued for this? Or nah?
Here, have some credits for future purchases.
Gotta love how companies using dinosaur softwares were spared from the issue lol.
What are the dinosaur softwares? I think the people not using Crowdstrike were spared.
Yes, Southwest got blasted a while back for their dinosaur tech but look how they shined yesterday. Sometimes old school is good.
@@motogirlz101 I use MX Linux (Debian Stable) for a reason.
Pioneering means lying a ditch with arrows in your back. I need my laptops for my work!
[Arch users can go pioneering for me. Thanks guys!]
What if the error didn't immediately disable computers, but instead inserted an unnoticeable glitch that made small errors that compounded over time? It might take hours or days before the stock market became crazy or your bank balance disappeared or thousands of components were suddenly all the wrong size. It could throw our economy into chaos. No single "update" should have that power! But obviously, it can happen.
In the role that I have, I have cases where I see people get paid six figures to plan a contingency plan for companies. I’ve seen pay as high as 350-500K! How in the hell does this happen in terms of the companies? I remember 30 years ago I worked for Greyhound bus lines as a ticket cashier. System was down and I was the only one working with a line of people. I had to resort to writing boarding passes so people could board! Managers didn’t even have the foresight to do it! If there was plan or workaround, the people that these companies pay should be fired! There should always be a manual backup process to follow. This is insane in 2024.
"The company takes responsibility."
How? Exactly what actions will they be taking?
Probably nothing, just say a few words then wait for something to distract us.
protip: university compsci profs know diddly about the reality of global corporate IT systems
A test, not a mistake
Yeah tests they failed to do which is a mistake !
More like stupidity 😂😂😂
So many onsite techs armed with bitlocker keys right now just to boot into recovery cmd and delete a .sys file. Wild. Wonder how service desks are handling remote support.
You pay for a service and you expect a service….
Texas based. That is the end. You get B and C talent in Texas.
They do seem to rely on "bargain bin" talent over hiring the best and brightest.
No accountability, no apologies - what a landmark in corporate responsibility !!
Yea, Southwest routes all of it's computer systems to a commodore 64 in a warehouse in Wisconsin.
If it ain't broke...
What happens when they outlaw cash.
And the computers go down for a day , a week, a month, a year?
Don't ask him...he's a professor...never worked in industry. He lives in a academic bubble...
Will take "responsibility"..... CEO of Crowdstrike, please elaborate...
it's called negligence darn it people what's the matter with you.! ??
It shows two things the confidence that crowdstrike has, and the arrogance it has. and then everybody pays the price but them
Small scale deployment and sit-and-wait (a period of time) are crucial factors for successful large scale deployment to production environment. Are Software Update Policy bulletins-proof? Are CIOs competent ? Are CFO allocating adequate resources? What about the governance and support from the Risk Management Committee and the Board?
Not a “software update”. It was a bad (blank) virus definition file.