CrowdStrike Outage Explained by Keith Barker CCIE
HTML-код
- Опубликовано: 6 сен 2024
- CBT Nuggets trainer Keith Barker explains the recent global CrowdStrike IT outage.
Everyone can benefit from Microsoft training: training.cbt.g...
Not a CBT Nuggets subscriber? Sign up for a FREE trial today: training.cbt.g...
-----------------
Connect with CBT Nuggets for the latest in IT training:
• LinkedIn - / cbt-nuggets
• X - / cbtnuggets
• Instagram - / cbtnuggets
• Facebook - / cbtnuggets
#crowdstrike #microsoft #cybersecurity #keithbarkerccie #ittraining #itcertifications #itprofessional #adept #cbtnuggets
This global internet outage is insane! All airlines grounded and i was stock the airport and even banks, media, and offices from the U.S. to Australia. How can CrowdStrike have such a monopoly that could help restore such a massive amount of tech?
It's pretty concerning. If they can fix this, what other control do they have over our infrastructure? or are we truly in the matrix?
Right? It makes you think about the stability of our systems. But hey, I barely spend time online. When I checked my portfolio with Desiree Ruth Hoffman, we were still in the greens. That’s been the case for 16 months straight!
Wow, really? I've seen the name Desiree Ruth Hoffman before but can't figure out where.
Probably from her forecast on Nvidia before the pump. But how are you in the greens with all the fluctuations due to the election and everything else? Can you share her strategy?
Honestly, just schedule a call with her. She has vast knowledge in finance and really knows how to navigate these times. I handed over my portfolio to her so I can focus on my family. These days, things just get scarier and scarier.
An amazing explanation of the crowd strike incident thank you! off topic I recently sold my condo for $400k and i want to invest the money in the stock market. However, it appears the market is at an all-time high. Should I invest elsewhere or wait for a market correction?
The stock market is risky But staying on the sidelines is riskier. Missing the next bull run will be far more costly to your long-term wealth than getting in at the "right price". Consult a financial advisor if you're unsure how to proceed.
You're right, I and a few Neighbours in Bel-Air Area work with an advisor who prefers we DCA across other prospective sectors. Instead of a lump sum purchase, Following this, my portfolio grew 37% in the last quarter
Mind if I look up your advisr please? I've worked in real estate for over 25 years and have neglected a major stock portfolio. This served me well when I was flipping and renting houses, however I need a different plan now
I've stuck with the popularly ‘’Laura Grace Abels” for about five years now, and her performance has been consistently impressive. She’s quite known in her field, look her up
thank you for putting this out, it has rekindled the fire to my goal... was able to spot Laura after inputting her full name on the web, she seems highly professional with over a decades of experience
Keith , stay with us , we want more interventions from you.❤
To see this video show up from my favorite CBT Nuggets instructor was beyond wonderful. Last week sucked. This video offsets that a lot.
Well explained! It’s clear now. Thanks a bunch, Keith!
He’s being paid to lie to you why are you thanking him?
You were , are ,will always be the best who can explain complex topics in a very efficient way and fully straight to the point , i like the video so much and thrilled to watch you again Keith
This boggles my mind as an IT professional. I was part of a team that deployed patches and software for years. This included OS deployment patch deployment, software deployment the whole thing on both Workstations and Servers. We tested our patches extensively before pushing them out to the entire population of the environment. This 1st included a sandbox environment, then a select user / system environment, then we would stage our patches out over several hours so if something happened we could back out before catastrophe struck. And honestly sometimes we would find problems with the patches, and we would be able to immediately stop, suspend and even back out.
Yes we would use 3rd party vendor solutions to help with this, and any time we changed ANYTHING we would follow our testing procedures and matrix, normal business. We would never shirk our procedures to test 1st, then deploy. To me this is a total failure of IT Governance and failure to maintain standards. (IT Governance is setting and maintaining standards and policies for the IT Infrastructure)
I work as a Network Engineer now but it helps to have come from the help desk because you feel their pain when you have to run around fixing issues like this. When I was help desk I always preached to our team TEST TEST TEST before deploying. There are so many resources you can use to test something before deploying it there is no excuse why this should occur.
For example I just used GNS3 to test a new firewall before deploying it (pfSense). Like you said there should be several layers of testing. Maybe start virtual with something like GNS3, then physical sandbox environment, then one user, then one department, then once you’re confident it’s safe, do a global deployment. Just like Keith said this issue could be resolved with proper Quality Assurance, which includes proper testing. TEST TEST TEST!
It’s really nice to have a resource like Keith to explain what happened. Reading the news is like getting 1% of what actually happened.
Keith, great video. Would they have not tested the file prior to deployment into the production environment? Sandbox possibly? Keep up the great work❤.
CBT nuggets was my first love and still the same. Thank you for being part of my list since 2012.
You would think as a one of the top Cybersecurity companies out there with a large footprint as they do, would no better to prioritize QA on the code, sandbox in a test environment, and gradually deploy critical updates before deploying them on a large scale. Especially when it involves ring 0. This process would involve change management and be a part of their written standards and procedures. Makes you wonder if this was done intentionally by an insider threat or just plain incompetence and neglect. Thanks for the content Keith Barker.
Thanks for this explanation Keith! The issue stemmed from a mismatch in the input parameters used in the update, which caused widespread crashes across millions of devices. Since businesses and offices are the most affected, companies should implement rigorous verification, such as an email verification API, to protect company assets.
I don't want to sound malicious, but in my opinion there are 2 possible factors why this is happening: many companies around the world during summer hire interns. This wasn't a major update, it was a routine one so they gave some "power" to the interns to run it. Other thing could be this push of "diversity" which means you don't hire the best candidate but the one that matches certain criteria, so maybe the one in charge wasn't the most competent but just matched some boxes. It's a bit difficult finding out what happened because those are company internal infos
This was predicted so id say it was intentional
Hi Keith. Do you think crowdstrike customers should also test these updates before being automatically deployed? Especially for critical operations (airlines, hospitals, etc), I wouldn't want any updates to go out to my organization without me testing them first. Thanks
Kind of similar to having your own onsite WSUS sever. My last data center engineer had one which meant no Windows Updates were applied to our PC’s without it first being filtered through him and analyzed to make sure it was safe.
Thanks for sharing Keith!
Yes Keith. You are right. Better QA plus implementing self correcting mechanism to prevent the issue from occurring again avoiding BSOD.
One more simple explanation from you on such a complex issue!! ---Thanks again!
As a technical writing prof, I applaud your presentation. It's a good example of "show, don't tell." The castle analogy was very effective. It helps frame the issue for a non-technical person. I will share the video with my family.
I'd like to see a video on the QA part you mentioned. How should Crowdstrike have done the QA? What's the normal QA procedure? Are smoke and/or regression tests normally done?
Here is what happened for the organization I work for. I am apart of the desktop support group. Mixed messages were sent to our clients. Some people said NO, I am going to leave the computer alone. Others got caught up and did either a system restore or computer reset. Clients ended up putting their desktops back to factory. Others did regain access but lost critical applications to their job duties. Management then retracted what they sent out to the clients, warning them not to do anything let IT support handle the problem. Keith, it was such an easy fix. Before Microsoft sent out their fix tool. Me and my team were using Hiren Boot CD. I was banging out at least 10 to 15 computers under an hour. No problems. No loss of data. No loss of apps. DO NOT DO A SYSTEM RESTORE OR RESET. You will delay your uptime by doing either of the two. Just sharing my experience with this. Oh and if you do a system restore, you will see a link on your desktop that say removed apps. Thanks for your knowledge on this topic Keith.
Wow, Hirens boot CD, I haven’t heard that name in years I used to have one when I was in school. It’s good to know this is still a useful tool, I will look into it again. Good job keeping your cool and thinking critically to resolve the issue. I’m sure it saved a lot of headaches. Sometimes it’s best to simply isolate an affected computer from the rest of the network and take the wait and see approach with issues like this. Nice work!
@@user-rr3fo6hy9q Thank you. Hirens is still the best swiss tool to have.
Thank you! Once again Keith nailed the explanation!
Does Falcon update its minor version when they make any updates? There has to be daily *automated* sanity to run against all Windows flavors with latest Falcon updates *before* it's pushed out to end user. Feels like I'm providing suggestion to high school kid who is just venturing into programming! 😄😄😄
Remoting into every computer would be a nightmare. Having to touch each one to apply the fix is incomprehensible, only to be surpassed by a longer fix or having to rebuild the computer.
Hey Keith, can you explain what a channel file is and an input field as talked about in Crowdstrike's RCA? Thanks.
ring zero is the foundation you cant see of the building ring 1 is the house you can see, if a window fall out you still have a functional house, if the foundations collapse the whole house collapses !!!
Thanks so much, Keith.
FINALLY, someone in the comments is asking: Why was this not tested at Crowdstrike customers before the update was deployed to the client computers?
Is this because there are many updates daily, so there is no time to do proper Change Management, and it very important with bleeding edge updates all the time?
What kind of legal exposure are they facing? Are they liable?
It’s hard to believe that Crowdstrike clients allow software updates to be downloaded directly to their information systems without any gatekeeping.
To protect against zero day attacks. They outsource their security to CrowdStrike.
This was an amazing explanation of the crowd strike incident thank you! I love the visual aids!
Fantastic work, Keith. Thank you! This was a great explanation of what happened. Great Work!
THanks very much, u have just gotten yourself a new subscriber
no question a crowdstrike issue but there is need for improvement on MS WHQL certification process
Great explanation Keith! You are the best!
Mostly the big three US Airlines were affected, so it was a local problem. ❤❤
MS really needs to put a pad lock on the nt kernel to prevent these kinds of system crashes. i think the area in which the kernel sits should really by only for the kernel and nothing else. messing with the core code of the OS is a really bad idea and is only more trouble for the I.T. people who have to maintain the systems and servers that have to run basically non stop
This comes back to the IT managers at each company?
thx keith u got a simple expaination as always thx for u re simplicity
So the falcon software goes over all the local policies and update controls? What I'm missing here?
What unit testing was done? Was it automated? Did the test environment thunk as well?
Still does not really make sense.
Once Falcon software changes (detail 3), it is not the same software what went through the WHQL.
As soon as the “detail 3” was changed, it is “detail 3.1 version” and it means that the whole Software technically is different.
Why this particular version with this “detail 3.1 version” was not tested?
The actual falcon process was signed. Falcon loads whatever signatures it needs for threat detections from the content updates it receives probably nightly from the server.
The falcon process was never modified. The libraries the process uses, and the actual falcon process are decoupled.
Very well explained ❤
Great video, cheers Keith
Very nice video. But still there is this little question unanswered. How was the Falcon driver allowed to load a faulty .sys file? What does a faulty .sys file even mean?
It cannot be by human error. This would mean that the entire world hangs on the afternoon shift of one guy at CS.
In order to load a .sys file into the Kernel ring, surely it has to pass multiple hash checks. cryptographic checks etc.
What checks does CS perform. At this point how can it or asamof any other AV company that utilizes boot drivers be trusted from now on?
Not only due to negligence but also intentional attack. If it's that easy to load a faulty file in its Falcon driver than what is there to stop a bad actor to do the same?
Easy. The .sys driver that has gone through the validation and certification is actually csagent.sys. That driver works flawlessly. However it uses definition files that are separate. It literally is the same as putting diesel fuel in a perfectly good gasoline engine.
a great explanation👍
The Windows 8, 9, 10 x series have Always been crap, trash even. Managers opted for a one point of failure with no contingency. I do my DTPublishing still on Win7 workstations. Am currently watching this Video on one of my Dells.
I really think Keith Barker needs to cover the Google BGP outage from last year.
More videos Keith 👏🏽👏🏽👏🏽
Or how too much power with one company is a bad thing.
Funny to find out the CEO with another company as Their CEO was the cause of another national computer black out due to a software FU there also!!!
Don’t give access to the kernel.
What if my computer turn in blue screen and I already turn it off can my computer able to fix?
Cause when I turn on, my system unit turn on but no output in my monitor
Great explanation - someone really, REALLY screwed up here. How on earth did this get past QA. Additionally, most updates should, in a perfect world, be applied to a pre-patch test environment within any critical business before being pushed out, bet that didn't happen or couldn't happen.
Nice. A real CBT Nut :)
Keith you missed the opportunity to use a moat in your castle diagram and analogy 😂. Fire breathing dragon too
n0t inc0rrect file.... hindi k0 lng itinul0y ang pirma.....
At least the screen is a nice blue colour....... there's that.
Actually it’s called a stop screen.
This is surface level BS.... The mistake they made was a mistake only the greenest of rookie programmers would ever make, and they likely had to rewrite unit test code in order to even push this out to customers
this is a long and complicate explanation ??? they basically uploaded an update file full of zero's and the executing program down at the most base level crashed because it could not make sense of the file full of empty zero's, this locked up the whole operating system and the error would repeat it's self even if you rebooted the system, it would reload the error file 0000291 automatically and crash the system again every time, so the solution was to boot up in safe mode that only loads the absolute bare minimum files to run basic systems and avoid loading the offending file again, and then navigate to the crowdstrike folder and delete the update file 0000291 and reboot, this fixed it because the offending file was gone and not reloaded so the system would run the same as before the offending file was downloaded with no loss of data, besides any unsaved work that was running on the system before it crashed !!!
"Haha, hospital and emergency services no worky because CrowdStrikey updatey. Oh well" - US Govt
sneaky falcon ....
Heh, Blue Falcon
This is the FIX in case Kamala Tanks in November...
wtf i got one during this video
You have all bots commenting!! Also, Why are you running cover for them? You know there was no excuse for this!!
Y2K4
...whoops! should have used a GOSUB instead of a GOTO. Sorry... 😉
Crowdstirke needs to go bankrupt and payout