the crowdstrike situation is wild
HTML-код
- Опубликовано: 18 сен 2024
- The title says it all lol watch this video find out more about how they messed up so bad.
🏫 COURSES 🏫 Learn to code in C at lowlevel.academy
🛒 GREAT BOOKS FOR THE LOWEST LEVEL🛒
Blue Fox: Arm Assembly Internals and Reverse Engineering: amzn.to/4394t87
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation : amzn.to/3C1z4sk
Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software : amzn.to/3C1daFy
The Ghidra Book: The Definitive Guide: amzn.to/3WC2Vkg
🔥 SOCIALS 🔥
Come hang out at lowlevel.tv
wow haha that video was really cool, I heard these courses at lowlevel.academy are also really cool
haha yeah
Thank you! Best on the internet
@@LowLevelLearning please make a video on the details and what actually caused the issue !!
What if I'm already a low level god?
Do you know if customers have the ability to not receive instant updates? Or is it something crowdstrike requires at contract time
Crowd-Strike: Global Offensive
Perfection
Terrific pun
hahahah lmao
lol 🎉
BAHAHAHA 😂😂
a newsletter email had the following greeting:
"Good morning and happy Friday to everyone who doesn’t work in IT."
Look on the bright side, demand for IT guys has skyrocketed, and what does a sudden high demand spike mean? That's right, IT guys can charge whatever they want now because every company is completely desperate for them.
To be fair, that's how people treat IT workers, even when things are working perfectly.
@@SolidIncMedia sir you're on call and im working late, bring me some mcdonalds would ya?
@@SolidIncMedia Lol
@@Patterner Which newsletter? Wanna see this.
They certainly made a lot of machines unhackable
now noone can steal their data
Not until somebody plugs in Serial
That leaves us with the philosophical question: Is a hacked machine unhackable? Or can a dead animal be killed? (Disregarding for the purposes of this discussion that the "hack" was presumably unintentional - it was effective, nonetheless ...
A lot like speed limits, No IT is Safe IT.
@@gerdd6692 you got me at ‘can a dead animal be killed’
Crowdstrike, according to their name, worked perfectly.
why they choose a name like this?
@@justinlinotte2981 all part of the CIA backdoor testing
@@justinlinotte2981 To take down the Internet for real soon, dummy.
@@justinlinotte2981 for fun? Why not
¯\_(ツ)_/¯
@@justinlinotte2981 No idea but it’s painfully poetic
So the learned lesson is never push code in production on friday
that's been an industry-wide standard for decades
Yes.. you need to be "young & bold" to do some things. Like climbing HUGE radio towers. As you get older, you have a tendency not to want to rock the boat. Which has it's own drawbacks So yeah, I've morphed into a chicken 🐔
The real lesson tbh is for managers and executives to stop allowing the overworked IT guy to push code to prod on a friday.
Some gurus actually go to great length to "demistify this myth" and now they must be in shambles that we have ultimate proof to never do that
read only Friday!
”The only safe computer is the one that does not boot" Crowdstrike CEO.
Technically true
😂😂😂
To be fair, this violates one of the three tenets of cybersecurity, and it's one that people outside (and sometimes even inside) of the field forget the most. Confidentiality, Integrity, and *_AVAILABILITY._* If you're missing any of these three, you haven't got security.
The fact that a segfault just caused Y2K to happen 24.5 years late is wildly amusing to me
HAHAAHAHA didn't think about it like that lol
oh damn ur right, this IS what they were afraid of!
Except, we had to update BIOSs to prepare for Y2K. This was: reboot, delete a file, reboot. I'm thankful that the fix was so easy (albeit time consuming).
@@NickRoman Unfortunately this did not work for every affected system. Throughout the organization I work for we had to use restore points because either the files wouldn't delete themselves or deleting said file wouldn't resolve the issue. What an absolute fucking nightmare.
Edit: To clarify, we still got all of it fixed. It just took a hell of a lot longer than wanted and many of us stayed overnight. Corporate straight up expensed all of our food orders, no questions asked.
We prepare for a primary election while identity theft is rampant. Magnificent! 🤦 ... The oligarchy of American credit scoring companies might as well be assigning random credit scores to each citizen within legal ramifications again. #VoteMillennial in 2024!
🪙💵💳🤖🇺🇸
All the times I've had to explain to management why we should wait a few days before implementing an update, only to be met with blank stares. I'm loving every second of this.
@@aisle_of_view they probably conveniently forgot
The only thing is this happened via unattended upgrades
I agree and disagree.
For example a server should probably try to be up to date with security. It can happen that a security issue has been released then there will be an entire army of bots sniffing every server that hasn’t been patched yet in an attempt to hack them. So you want to be fast in plugging holes because those bots will be very fast in finding your unpatched servers.
Any other update like windows workstations can probably wait a few days without issues.
@@CoderDBF This particular case would be considered a security update, as it is an update to endpoint security software.
@@CoderDBF Hard disagree even patches targeting critical security flaws have enough time to at least test in non critical pilot servers or clients. If a security issues found its already known for a while by bad actors while issue becoming well known will increase attackers testing for few hours will hardly change anything compared to auto pushing to every server/client.
Today was not the best day for me to wear my Crowdstrike t-shirt...
@@MenaceInc No don't think.. crowdstrik is saving world from hackers. Mistakes happens
Or if you enjoy small talk, it's the best day--great conversation starter
this is most likely due to open source software
@@araz911 what
@@shauas4224 the shutdown due to open source libs most likely
its cool to see pure technical explanation of how it happens, its far from common medias that only use shocking words to get as much audience as possible while they don't know a thing about what happened
I totally agree! I for one barely read the "regular" news because of how it almost always feels like they have one sentence of information to deliver and extrapolate it to a whole article with a bunch of word poop and no real info.
So I prefer this type of delivery every time! Informative, deep dive into the interesting bit, short and quick to the point!
@@justinlinotte2981 Definitely! I was quite confused about how this could have happened in the first place - and none of those other channel had covered that - but seeing that the actually delivered driver was all nulls explained it perfectly well. And this is likely why this passed all testing and everything.
@@TheStoneMountain1 Yes, and it's not just tech. Today I was reading in the New York Times about how there were violent protests in Bangladesh over a "quota system" for government jobs, but they declined to explain what this "quota system" was. I was curious to know what would provoke such protests. I found better information on Wikipedia, in articles such as "Quota system of Bangladesh Civil Service" and "2018 Bangladesh quota reform movement". MSM reporting is so vague and dumbed-down. (They also do "fact checks" on statements, e.g., by Trump or a random conspiracy theorist on social media, that anyone with a brain would know are _obviously_ false.) It insults my intelligence.
@@TheStoneMountain1 Yeah for 4 pages, and "you'll never guess the shocking reason" but never even mentions it once while you fight through ads and trying to click the next button.
100% I love this explanation - typical media always exaggerates everything
They literally waited for most RUclipsrs to go on vacation before rolling out the update lol
'Literally' as opposed to what? Figuratively?
@@prezadent1 using "literally" in this way is a form of hyperbole. English is cool like that 😎
@prezadent1 if you google the definition of 'literally' the second definition of literally is literally 'not literally'
@@Maxawa0851Yep, a ton of English words have contradictory meanings like this unfortunately lmao
They probably push out updates every few days.
"//Just a small fix, no need to test it"
Famous last words.
LGTM
this was so bad that bro had to make a YT video while been on vacation. what a legend
This is just viral marketing for Captain Crunch's new cereal, "OOPS! ALL NULL!"
😂😂😂 thank you, please take my like, I cannot even 😂😂😂💀⚰️🪦
HAPPY VACATION ED
WHAT A DAY (/WEEKEND/WEEK/MONTH) 💥
Can we rename it to Blue Falcon?
Bro was definitely having beers by the pool before this. Spring break feels for sure
@_JohnHammond you don't get any rest at the moment do you 😆
Why are you not the top comment wtf? Love you both ❤
you're also gonna farm videos out of that as usually, aren't you
don't forget to mention that while CS released a faulty package tons of morons allowed it to update itself (or even worse - pushed updates themselves) on a fucking Friday. they deserve that, it's basic sysadmin knowledge
I work on an airline, you dont imagine the mess. Oh Jesus, today was a nightmare. Hope tomorow get better.
I feel for you buddy
it will happen again as long as airline companies use windows
@@nicejungle And if they'd used Linux, they would have went down in April with kernel panics. It's not the OS that's the problem, it's Crowdstrike.
@@malavoy1 And if they'd used Linux, one reboot and you switch back to the previous kernel
Down time : one reboot
Compare to windows : you're screwed
@@nicejungle But Linux users are tech savvy. Most users of Windows are not, so MS hides safe mode behind multiple reboots to prevent them from ruining their system (and they would blame MS if they did ruin their system). Once in safe mode you can roll back system changes.
This will absolutely *not* be the last time something like this happens. When I first started in the industry, everything was packaged on disk/disc. Fixing a bug after shipping was EXPENSIVE, so we got the product to a 99% stable place and then kept trying to squeeze that last 1% of bugs out. Now? People just throw garbage over the fence, figuring they can just ship a patch later. Kernel mode software just *cannot* be developed that way. But... cyber security companies have time pressure that other kernel mode developers don't necessarily face. To be useful, such an app must be updated and deployed amazingly quickly, especially for kernel space, but that agility comes at the cost of stability. When the cure is worse than the disease, though... there's a problem.
At the very least (assuming it's not an immediate security risk) then updates should be delayed a couple hours by region or something, so if this happens then it's a smaller section of customers that get screwed over and they have time to cancel it and get it fixed for the rest of the world.
@@MSThalamus-gj9oi exactly! thats why i bought some CS stocks after the fall down. CS is a great vendor but this kind of things unfortunatelly could happen.
@@kugelblitz1557 Just apply CI/CD techniques. The first rule is "Only package once, at the beginning of the release, same package tested is the same deployed".
While I agree that this sort of software can't be developed in the same way as a lot of stuff, there's also not really any evidence here that it was.
The fact the update file was completely zeroed out points to a failure way past a dev shipping a bad code update... I can't see any way this happens without it being a build or deploy failure.
@@Bubblessss420 I was thinking of doing the same. The stock price dropped 20%, but you know it'll bounce back. It's a bargain right now. (No, I'm not a shilling bot. :D)
"The entire internet" for as far as it runs Windows. If it had taken down Linux the actual internet would have gone down.
Apparently in April, crowdstrike for debian actually went down in a similar manner, (kernel panics.) thankfully I guess no-one actually uses crowdstrike for Linux so no-one actually cared.
Facts.
Yeah, my internet was just fine, so not sure what he meant. Airports are not 'the internet'.
@@TimothyWhiteheadzm , well, I'm thankful that RUclips, Netflix, HBO... were all fine or fixed quickly.
Worldwide outages + clickbait = "The internet is going down". They claimed the same with the recent massive Facebook outage.
Crowdstrike, tests code only once, at production
Not only once - they do millions of parallel tests on a vast array of systems - without making the code platform independent they could only better this by cranking up a few zillion virtual machines or container "farms" ...
@@gerdd6692 Yeah, this one sounds like a problem with deployment instead, doesn't it?
I don't think we can dismiss the problem with such a "simple" explanation. Most certainly, they did test their code very properly and extensively.
But they missed one of the most important factors: that things could possibly go very wrong in transmission.
@Brahvim that's what I was thinking/the build. Development can have different config for build/deploy and you may not see until you pushed to the environment. Even with a UAT sometimes the config can be slightly different than prod even though it should be as close if non near identical to prod.
@@gerdd6692 yeah im just joking
I watched a ton of videos on WTF even happened here. This was the only one that actually explained what went wrong in any kind of detail, and you're on vacation.
Absolutely amazing channel here.
This. It's a super-simple error, yet no mainstream media explains it in a comprehendable way.
The crowd was striked by a blue screen of death
Struck.
@@samiraperi467 Stricken
@@samiraperi467 Moron, your bus is leaving… 🚌
It’s CrowdStrike not CrowdStruck
@@samiraperi467 Never heard of CrowdStruck… is that some tech company or something?
@@vizionthing Stricketh
The fact that it was an Antivirus that performed the single most successful malware attacks ever is just pure poetry.
Another win for the "remind me later" to every update gang.
My dad: Come on, it's just an internship, what's the worst that could happen?
Me: "You can't hack a system if the system doesn't work! "
- Cybersecurity
The alpha move of doing something that would make your stock value crash, but simultaneously freezing the stock market so that it can't.
Windows update doesnt matter. THis is not a win for remind me later as you cannot reind me later with this terrible crowdstrike rootkit.
I am aware of that, it just sounded like a good joke so I added it up.
if you would like to make a bet, many futures brokers are still working, so when the market opens you can go short or long with some margin if you think this will have an effect on the worldwide markets (it will)
So you just compiled the top comments across youtube over this topic and copy pasted them here for likes. Cool.
@@ajinkyamogre8515 with how internet speech is i didnt even realize it, i just assumed all of this stuff was one full sarcastic comment
I'm sure lots of people are looking for vulnerabilities in crowdstrike now. I don't know how public this was before that SO many companies are using their product....
they just showed us the vulnerability, creating a sys file full of nulls.
@@deeiks12 it is very public that literally everyone (in corporate IT) uses their products, the thing is that is transparent for most non IT people and they do not have a consumer version. And unfortunately they are ( or were) the best in the business.
@@cbaesemanai you had to actually be them to do that in the first place, so.. that's not it. Unless maybe if you hack the provider of their update pipeline, which might actually be what happened. But I guess if you did that you could break so much more.
@@renato360a I mean using it as a local exploit.
Crowdstrike is well known. They've had hordes of people looking for vulnerabilities in their software for years.
25 years ago, operating systems started signing drivers. 15 years ago, the same thing happened with the bootloader. A few years ago I heard that all PCs had to be replaced so that Windows could guarantee security. Now I understand that it gives total control over the computer during the initialization phase to a program without the slightest verification, just because it is in the right place in the filesystem and the name seems familiar? Live and learn...
But why did they roll out this update to every computer in the world all at once? Why didn't they run a canary? Why didn't they do the rollout in phases?
It's low level code running in the kernel, and it is deployed in machines all around the world by thousands of businesses, why on earth wouldn't they be more cautious with the rollout? This is quite possibly the most reckless deployment in the entire history of software.
Because AFAIK they didn't update the software itself, but rather they just pushed a new virus signature database file. The real issue is that the Falcon program can't handle invalid files
@@Pipe0481 That's still doable with a slow roll out, canary, dog fooding, etc. Anything at this scale should be done with extreme care. Heck, even if not at this scale, there's no reason to be so reckless.
@@MattGreer simple answer - they are morons
@@Pipe0481 more technically windows can't handle invalid files or any program for that matter
@@MattGreer If you canary then all the non-canaries are vulnerable to zero day from the new virus.
My vscode tunnel into my dev machine randomly stopped working yesterday and made me realize how much it sucks to be dependent on someone else for your own setup.
Dude. You're using vscode. Welcome to Microsoft dependence inc.
linux, emacs, vim, helix are waiting for you with wide arms
I bet one of the reasons for this is the LEAN. A plaque of corporate goal of efficiency that ends up ruining workplaces, if allowed to continue too long. I can guarantee we will later hear, if the company isn't able to hide it, workers at crowdstrike were overworked, overstressed, always pushed into a rush and denied time to make critical quality assurance checks and tests, that would have caught this error.
I feel like all the blame is going to be pushed onto the employee that coded in the mistake while the CEO/manager that made them do so via overworking them etc. will get away scott-free.
So community notes on Twitter is saying the viral tweet that claimed it was a null pointer dereference is BS.
And crowdstrike put out a blog statement that “This is not related to null bytes contained within Channel File 291 or any other Channel File.”
Hmm interesting
It baffles me that people would modify any piece of code that sensible without testing on a friday, hell even I check 3 times that my KBPs are correct before restarting
I'm sure they did test it.. the issue must have happened when actually pushing the update live. Doesn't excuse a friday update though. That's just asking for trouble
@@c_ornato there was still a network connection, right?
@@lion21297 Doesn't excuse pushing to every user at the same time either, you'd think the devs for a highly-used expensive piece of software would be more rigorous but it seems the dev instinct to push stuff fast does not discriminate.
@@lion21297 You need to understand that it is a security software. Maybe they've implemented protection from new attack vector. And hackers don't rest on weekends.
As we see, problem happened at some unpredictable late stage (file became all zeroes, it's not compiler output). Even if they released it on any other weekday effect would be the same.
@@lion21297 Sounds like there's a gap in their deployment testing...
How the hell is a multi billion dollar company not have basic error handling like null check!? Like seriously do they not take functional safety seriously!?
How do they not a pre-update push set up that acts as if the machines are actual client computers to test all updates before being pushed to real clients?
@@Shocker99 My thinking exactly, this is basic stuff.
Almost nobody checks the result of many functions, e.g. malloc or printf. Defensive programming techniques are clunky, cumbersome but still dont save you from errors (ref: "The Art of Software Testing" book).
The only way to get rid of whole classes of errors is with good type system, in other words to detect by compiler.
cue elitist C++ dev entering the scene and saying "no no this isn't a technology problem, this was a skill issue haha, I'd NEVER do this, I'm too skilled"
"error handling like null check!"
It should have been caught much earlier. Seem they have no integrity check of their binaries during the build-test-distribution process.
Excellent video!
My wife has high confidence in her coworkers following the instructions and fixing their desktops and laptops. I think the person who replied all to the instructions asking for his encryption key is proof that this is going to be one long weekend for people in IT. Never been happier to be retired.
Me too. Retired two months ago from a law firm who uses Crowd Strike through a consultant/VAR. I was imagining the entire firm losing their mind today due to this. Funny/not funny, but so happy it wasn't me having to deal with it.
@@edwardallenthree you're spot-on with that 😅
As an IT guy, I have been working non-stop all day today. It's utter insanity here at this company. Those dumb bastards
Hey at least now they realize how important you are.
Oh my god, they killed kernel! The bastards!
At least we got our Servers up and running in 2 hours this morning, overtime cash, I feel you, Brother.
@@haroldcruz8550 They are rather going to blame him for the problems he's trying to fix. Unfortunate truth of working in IT.
@@pieterbezuidenhout3757 yall have windows in your servers? what the fuck are yall smoking
How did this roll out this widely?
Is there no canary? Is there no QA?
How is business and airports the first wave of roll out...
How do you put all trust in a single third party?
@@Veptis I’m wondering the exact same thing.
From the perspective of a DevOps/Infra engineer - What kind of update deployment strategy is “just hit all billion machines at once”?
No canary? No region by region?
More questions to be asked here than just what went wrong in the actual code…
If this is how they deploy code with a bug, imagine if they deployed code with a serious vulnerability?
We need to hear about Q/A and deployment strategy at CrowdStrike!
good questions, and someone is damnwell going to have to answer them
@@samniechcial8493just stop using windows they're not worth the security issues.
"How is business and airports the first wave of roll out..."
someone else in the comments answered that already: they are the only ones to roll out to. it's big business software. there is no consumer grade product
There's no canary because what is pushed are virus signatures. If you canary then all the non-canaries will be vulnerable to the zero-day virus. Getting instant updates is the entire point of the product.
QA probably happened, but after the QA they deployed the tested file to some file server and the file got corrupted in transmit into all-zeros, which causes a crash loop.
By the way, this is the second BSOD software update push George Kurtz, CEO of CrowdStrike has presided over. First was April 21, 2010 as CTO of McAfee when an update inadvertently deleted svchost.exe from Windows XP machines. That would have been more massive but for the lack of always auto-update devices nowadays.
"Who needs svchost.exe anyways? I'm an ordinary Windows user, I don't use all those nerdy tools. Just trust the authorities. Safe and effective!"
- All my friends.
I just got done writing a comment under Fireship's Code Review saying how Ed is bound to release a video on this as well soon enough, I reload my YT start page and I can see it up there, from 11 minutes ago.
My dad is a senior developer and we watched this happen in real-time. We spent the day installing Linux instead.
That isn't the fix. CrowdStrike also has a linux agent. EDR's MUST work at the kernel level to do their job. It is just that in this case, they messed up the file for Windows, not for Linux or Mac..
@@jorper2526 although it is easier to monitor such stuff from userlevel in linux, i heard
@@jorper2526 is the Linux agent even Kernel-level? I recall hearing somewhere that it doesn't go nearly as deep as the Windows version.
@@rikuleinonen it depends. But by default, yes.
They did crash numerous linux machines in April.
@@jorper2526 thanks for the info.
cloudstroke
jfc lmao
clownstrike
Clownstroke
Clownstruck
Cloudstrife
That was just like the *Y2K* bug... but this time nobody was expecting it LOL
And the Y2K bug probably would have been less disruptive back then compared to a Crowdstrike error nuking systems operations worldwide now, because 24 years ago, you didn't have mission critical systems that require "always online" connections. It was a transitionary phase and older fallbacks were at the ready. Now its more damaging because the more modern systems ARE the fallback.
"should have used rust.. " 😂😂
these rust ads are getting crazy
@@mushroomcrepes4780 damn best rust ad ever 🤣🤣
wouldn't have helped in this case
Thank you so much for actual getting into technical details. All other articles just repeat "well, everything is down".
Big hugs to all the people having to manually recover systems today.
I wonder what the Linux download sites are seeing?
yay overtime
@@rcstl8815probably not much, corporations and enterprises don't tend to knee jerk quite that fast, if at all.
"looks like I picked the wrong week to stop sniffing glue"
The most important piece of software don't have rollback mechanism when update is broken is mind blown to me. 😂😂
@@misogear Oh, there is - Crowdstrike's update was just designed in a way that doesn't make use of that Windows feature.
@@Xehlwan except window's rollback is completely useless because you need to boot into the system, then jump through menus to finally rollback. the whole purpose of rollback is for when an update is BROKEN. if an update is broken and you can't rollback, then you dont have rollback. On linux, you can still rollback your kernel after a kernel panic (BSOD) with a single reboot
Just use a closed source piece of software on our closed source OS for our critical application, everything will be fine.
Boss, The closed source OS and hardware you have on your desk is not good enough to act as a dumb TV or kiosk by itself. First install the closed source drivers from a bunch of random hardware vendors. Then add some tooling to actually install and configure the host to do its thing. Add some more software to manage the truck loads of host security settings from all the stuff we don't need anyway, but can't remove. Add closed source kiosk software or maybe the POS application, which is just a wrapped browser. Buy some more security software since we can't trust any of the perviuos bits to work.
Don't think about deploying a cheaper Open Source & Open Hardware solution, like a rasberry Pi.
Open/closed isnt the issue in this case. Corrupt auto software updates is, and they can (and do) happen to both.
@@football42241 Except you'd be ripped to shreds in an open source project if you committed code that runs kernel mode, downloads dynamic code off the internet and runs it in kernel mode, doesn't have any sort of integrity check on what it downloads off the internet, like a digital sgnature, or even a CRC32, AND the virtual machine / interpreter which runs the code that was downloaded off the internet isn't sand-boxed and lets the dynamic code use naked pointers. How many freaking basic mistakes did this "cybersecurity" company make here.
Not to mention that all this time, their "cybersecurity" software has been one giant RCE waiting to happen if you manage to spoof the DNS of the update server, or MITM that HTTP(maybe S) request that we all know doesn't check for a specific root authority). I wonder how long the NSA have known about this one. I'd hope not as long as they kept Eternal Blue under wraps.
@@football42241 nope, auto software updates are extremely rare on open source operating systems. since they're made by devs, for devs, and all of us devs hate that shit. This crowdstrike thing could've very well happened even if the whole world ran on linux. but at least, it'd only happen to people who ran the update command, and even then all they'd need to do to get their computer back is rollback to previous kernel and reboot. The same problem could've happened, but it would've been in a smaller scale and easier to fix.
How was this .sys file signed? If it is all 0. how was Windows able to load this? Why are there no checks in place?
It seems that it is a submodule which is loaded by a CrowdStrike agent itself. That means they don't do basic checks...
I can feel a disturbance, as if millions of crowdstrike memes are being made
Also note this story: *"Major Microsoft 365 outage caused by Azure configuration change"*
... _"Microsoft says an Azure configuration change caused a major Microsoft 365 outage on Thursday, affecting customers across the Central US region. This massive outage started around 6:00 PM EST and prevented users from accessing various Microsoft 365 apps and services."_
This happened hours *before* the Crowdstrike issue surfaced, and I also wonder if there might have been some connection.
We had issues on Azure with HTTP2 traffic in South Africa. I'm wondering if they are related.
yeah I noticed the Azure outage too. Funny how nobody cares or even blames MS for the Crowdstrike/Windows interaction.
@@hellowill why blaming ms for something another company did wrong, it's their responsibility to not ship something bad. Windows offers you to boot in safe mode to remove the driver that start prioritized
The Azure issue caused some of their compute to not be able to access storage. Then hours later, crowdstrike pushes an update that's completely zeroed out.
If there isn't some connection and cascading failure discovered in postmortem it's going to be one hell of a coincidence 😅
Still a pretty massive failure on crowdstrike's part to manage to drop the file onto so many machines without some verification raising an alert that this file is screwed
@@workmad3Also updates are usually pushed to a small selection of devices and when nothing bad happens, the number is increased
I was supposed to fly to Japan this morning. Crowdstrike canceled my flight. I’m glad you’re already at your destination, Ed. Have fun!
So funny, my first thought was who tf pushes out an update on a Friday..
Everyone does (working in software development in a small company here). If something goes wrong you will have time to fix it over the weekend. Something so critical should have been tested better, though.
i find it incredible crowdstrike didn't do a staged rollout considering the risks. this is an acute reminder that lab testing can't cover all scenarios and that a fallback plan is always necessary no matter how remote the probability of mishaps. defn: risks = probability(event) × cost of damage (event)
They actually did apparently. It's just, they didn't notice it until it'd rolled out to the majority of their userbase. Meaning they did a staged rollout but it was too fast/no-one was monitoring it to catch.
I mean, that probably happens when you're doing it on a Friday.
Or testing
@@Sandromatic They apparently didn't notice at all. It was Google who pointed the finger, and even then it took them two hours to say "ehm.. yeah, it was us."
Nobody can't hack into your network if your entire network is down - I say this is mission accomplished 😎
Today was the most secure Windows has ever been
This is what I call a "paradigm shift" in cyber-security.
Kinda neat that so much of the internet depends on a few people uploading critical files.
@@DJJOOLZDE The internet is fine, it's the computers using Microsoft that has crashed.
Crowdstrike switched to MaS: Malware as Service.
Pretty wild that a kernel-level software from a cybersecurity company deploy bad software
not really that wild. This is exactly why i have been fighting against using this crap. It does not increase security in anyway.
@@Dead_Goat If it didnt increase security, people wouldnt use. Who are you fooling lol?
@@rythem2257do you understand the concept of snake oil? That's exactly it.
@@PvtAnonymous Explain then. I'm sure you have TONS of great information on this subject, and not just some "hurr durr linux" type of reply.
@@jorper2526 what is there to explain? Just with any AV you introduce a single point of failure with extensive privileges into your OS or even your kernel. As seen in the last few days, there seems to be a lack of testing on Crowdstrikes end which resulted in - again - a single point of failure. Threat actors could just as well find a gullible employee or even infiltrate the whole company and introduce malicious code that could take over all of the machines and do whatever they want with them, basically making Falcon into a rootkit. This false sense of security is, you guessed it, snake oil.
Where the fuck was the test team? Oh wait... The modus operandi of a modern hackshop is fuck QA.
Yep. But remote updates are evil anyway. That's a nonsense from a sysadmin POV.
@@joseoncrack They are not paying for QA, do you think they'll pay for a stage environment and a team to manage it where the update is tested before it rolls to the rest of the org?
This is so sadly true. My CEO who likes to pretend he's a dev but isn't and has no background operates like this. I swear it is due to the development of CI/CD pipelines and how relatively "easy" it easy to write some APIs, just have the devs write tests for them, and throw out the latest update to your microservice. I mean all software functions like that doesn't it? The cloud systems you run for your little API services or pushing critical and sensitive updates to customer systems, what's the difference? So what if it is a kernel driver, just keep those updates rolling!
QA is not very cost effective. Why test when you can do it live
@@haroldcruz8550 Maybe to prevent a 20% stock drop and a big fuck you from the customers?
Thanks for the video. The file being corrupted does seem to explain how it could get past testing, if the failure to write only happened after they verified the file was safe.
*Hackers: try, and fail to take various systems down*
*Crowdstroke: Fine. I'll do it myself.*
As an F1 fan I always wonder what crowdstrike was. Now I know lol
So, for the few companies Microsoft trusts to operate at the sys/kernel level, all updates should be funneled through Microsoft test channels. They cannot have companies breaking their install base.
THIS. I was ranting to my wife yesterday evening about this. Bless her, she knows nothing about IT and made all the right noises.
Appreciate getting news from a source that actually knows what they're talking about. Props to LLL for taking the time to make a video while on vacation, and hopefully nothing equally newsworthy happens so he can just relax...
I, for one, am shocked that the company called "CrowdStrike" which thinks it's a great idea to advertise on race cars would ever do something so ill-advised.
That's memorable! Good advertising, honestly.
I remember seeing an article about how CrowdStrike's CEO regretted not firing people earlier... In 2020.
I guess he's reaping now eh?
So... the CEO might regret _regretting_ the thought of _firing people_ who may now provide support to customers?
...Or is it that he would've found that firing these guys early - the ones who couldn't deploy this crashing update correctly, as beneficial?
yeah, I don't know how to read this. Should he have fired them earlier, or should he have kept them?
I guess he fired them later. I mean at the end of the day even someone who was fired could have planted this in a way that it looks just like a technical f-up.
You people have misunderstood the context.. the CEO laid off a bunch of people around 2020 and boasted how he probably should have done this earlier. Like Microsoft laying off most of their Windows QA teams so that they could use customers as Beta testers; when you reduce your employees you reduce the amount of slack the remaining people have to pick up on problems and head them off early. So a piece of code ends up in prod without at least 1 person catching a fatal bug. And some poor schmuck or team of schmucks who had to meet some kind of performance metric deadline push code on Friday that should have been tested more than just a few times on CrowdStrike's internal systems.
Your look is giving IT person they had to bring back from middle of his holiday to fix this. It feels so authentic :D
An intern writing tests with ChatGPT!
That was the reason for kicking Kaspersky out!
They wanted one for a while and they got it even though every other AV vendor has more privileged access to your computer than Kapserky did!
Security nuts make things worse
@@henson2k Nassim Taleb will have a good take on this.
"Crowd Strike" literally 😂
@@boy_deploy alex jones predicted this
*Make the SKINNIEST reference at bootdriver position, make sure it works... NEVER change it. Load everything else after the system is stable.*
"we're kind of dependent on these companies, and when they get it wrong, the whole world collapses. kind of makes you think."
such a cheerful delivery of that truly terrifying statement.
This incident feels like something out of a Tom Scott spec video (re: "that time Google forgot to check passwords") ... or the definition of an "onosecond".
2:55 - Wow. Just wow. I've personally experienced a few cases of complete individual file loss leaving behind "all nulls" (presumably from a failed deferred write-to-disk). But those were just two or three personal userfiles -- I couldn't imagine this occurring with a critical driver or system file.
For CrowdStrike outage, I went to mainstream media for human-readable explanation, I go to Fireship for system level explanation and now blessed with low-level-learning for the autopsy :D
I guess CrowdStrike doesn't believe in testing, or using canary systems LOL
Fireship for the snarky summary, Theo for the details, LLL for the disassembly. I missed out on all of this today, but I feel caught up now.
Blistex Inc is proud to announce they've teamed up with Linux to solve the blue screen of death. The "Tucks Medicated Pads" you already know and trust have been rebranded "Tux Crowdstrike Relief Pads" for Microsoft Windows users to relieve that burning sensation.
So here's a question, why does any company allow Microsoft or Crowdstrike to push programs onto their production system, ever? Back when I was in IT we would never let anyone do that. We would take the software ourselves and try it on a test system first before putting it into production.
tfw title starts with lowercase "lol"
Another lesson to learn from all this is, "don't put all your eggs in one basket." Meaning, diversify your tech stack, operating system, and EDR provider. "Easier said than done", but would you rather have multiple points of failure? "More expensive to maintain"; more than being shut down for a prolonged period of time?
Greed is incompatible with logic
😂😂😂
@luketurner314 A lot of people operate on the security blanket of "well, if this basket drops it would be SO MANY eggs breaking across the world that that company is NEVER GONNA LET THE EGGS BREAK and that really is probably the safest basket to be in. At least we're all in it together!"
In Linux module file with all nulls will not crash the kernel because Linux kernel makes multiple checks on a module file. Also modules are single files (sometimes module can request another module but it will be loaded in the same way) instead of multiple files. Couple months ago I tried to read that module loading code, but its poorly documented to be easily readable (if somebody was working with this code for long time it will be much easier).
and tobe fair running crowdstrike on linux isn't needed, now with windows...
@@vilian9185 only updates, but that can be done automatically in most cases (sometimes it will reboot some services or You have to reboot system manually when kernel or libc has security patches).
@@vilian9185 but possible. Crowdstrike pushed a buggy debian update few months ago to a similar result (but fewer affected users).
@@vilian9185 you do understand that linux is very hackable and much more likely to actually need something like crowdstrike than a windows system right?
@@Dead_Goat no?, lmao wtf you're talking about
It's 100% safe to say that CrowdStrike's is not doing mutation testing for their kernel mode driver and that's scary!
What puzzles me is why deleting the file solves the problem.
There should be no difference between the missing file and zeroed-out-file.
An all-zero driver file it's missing all the standard exe structures. Has no header, no import, export, relocation tables.
And no digital signature.
So Windows should refuse to load such a "driver". Which would technically be the same as missing file.
Who null pointer for zeroed-out-file but not for missing file? Can't load is can't load, no?
I really suspect something more is going on here...
I am wondering that, might have got signed as nulls. CICD failure
@@MathewBoorman yes, but the signature should also be part of the file. So at least that would not be zeroes.
Same for the exe header. If it is all zeroes it would not load.
Unless the description as zeroed-out-file is not 100% accurate and the header is present, the signature is present and correct, etc.
Which then raises the question what kind of processes do they have to create such a file, signing it, and push it out with no testing.
This is not something that "it works on my machine" :-)
Some big red flags about their processes...
Because if the file is missing opening it fails with a proper error. Once it opens the contents is trusted to be executed on the cpu.
the clownstrike background driver that listens for updates should of done at least a hash/crc etc check of the updated file before initiating a reboot. That's what I don't understand.
I use ZFS to store all my data, because I do *not* trust hardware. I always generate and verify MD5 hashes of my files and I have found so many _silent_ data corruption faults when a hard drive will randomly flip a single bit, or return 512 null bytes... with no hardware errors being detected/reported. SMART self-check says everything's ok; Linux device driver says everything is ok; etc.
I mention this, because I have been given NTFS drives by other people, and upon checking the files, there will unexpectedly be 4096 null bytes where I was expecting data. (And I had a second copy of the file for comparison.)
There were "no errors" reported at the hardware, filesystem, or OS levels. A file was silently replaced with a bunch of nulls... and I immediately wondered if Crowdstrike's build system is using NTFS.
First of all, thank you so very much for covering this during your vacations! I didn't even know this company existed until this morning, and quickly realized I'd only possibly get an explanation as to what happened from somebody like you, who knows about this stuff.
One important lesson we need to learn from this is to ensue to do transaction-based updates and to ensure integrity of each update with a cryptographic signature. They have likely performed intensive in-house testing of that update - but just didn’t account for the possibility of the update being corrupted / tampered with in transit.
However, if also brings up the question of whether we can really exclude the possibility of this being a dedicated cyber-attack that quickly!
Because, to my knowledge, we don't know yet how exactly it could happen that the version of that update that was installed on these billions of devices came up all zeroes. Surely, it shouldn't have happened - and likely wouldn't have if they used even the most basic CRC approach to verity integrity.
In that regard, the blame is by that company alone!
However, I wouldn't necessarily exclude the possibility that some bad actor knew about this vulnerability and exploited it.
I love your channel. so cool that you took a vacation from your vacation for this :)
I need more Tank Top Beach Boy Low Level Learning. Good vibes
Quite a few people already commented about the lack of testing. We don't need to speculate about what kind of in-house testing they may or may not have conducted, because we can just look at the facts that we have so far, from a neutral point-of-view:
Somehow, the final product / update that was shipped to billions of computers world-wide apparently didn't use even the most basic check-sum algorithm - let alone cryptographic signature as one would certainly expect from something of this magnitude.
It doesn't matter at all how careful or negligent their engineers were in regards to developing, testing, or anything - the problem happened _AFTER_ the final product left their house.
*BUT* they were negligent - and thus responsible - by shipping a product of this impact and magnitude without *ANY* checks that would prevent even an accidental tampering in process.
You're absolutely right. Bizarre bugs and errors can occur at almost any point, so any sane software would check the integrity of an update before installing it. That goes double for kernel level software.
I've been looking forward to you making a video about this since I saw the first Australian news reports this morning! Thanks for taking time out of your trip to film this!
It's ridiculous that they can't even save a single boolean or something to disk and and clear it when the OS booted, this so next time your driver loads it can check for the boolean to be true and just not load (because you can safely assume the PC crashed), and while in the OS your non-driver part of the code could check for updates...
or even better: do it linux style? and don't take down the entire system if there is a recoverable error at boot
@@az-kalaak6215I'm pretty sure a corrupted kernel module like this would crash Linux as well.
This is the best and direct answer to the question. Every other big news outlet goes on and on and never actually tells us what went wrong. Well done
Wait, how could this happen. Don't they digitally-sign driver files? If so, an all-null file should not pass the signature verification, and how did Windows even load the driver? Windows rejects unsigned drivers, doesn't it? This makes no sense.
According to the video, it wasn't Windows. It was CrowdStrike kernel mode driver loading a submodule without any NULL checks and error handling.
@@iljaseklervl That is dumb. How could a security application do not check the integrity of its module files? This is not just a simple distribution mistake, but a fundamental amateurish dumb coding.
Agree, looks like the gist of the issue to me. Mistakes happen, but this is pure negligence in lack of any sanity check verification.
if there are no checks.. sounds like a nice backdoor waiting to be exploited.
@@john6372sounds like it just was
"the wildest things in my 10 years" i feel like you are saying it every week :DD
is friday and the computer knows it
If anything Crowdstrike should lose all their credibility. Being trusted to deploy code that executes at the kernel level carries with it a huge responsibility. But we live in a world where people are okay with anti-cheat software having that sort of privilege so that they can play a game. Your operating system is supposed to be the last holdout against malicious and faulty applications (that's what we teach at school) but we are bending the rules for no good reason.
Rip the 5 hours of production data I lost at work
cool, appreciate the technical take on the issue. This will be a perfect example of what not to do.
So glad I'm making games today and not doing my usual IT gig.
The real Y2K was the C/C++ code vulnerabilities we wrote along the way
Crowdstrike is absolutely useless. All it does is monitor you employees it doesn't help.
Your explanation of how the driver integrates to the system to make it depend on it makes sense. Thanks for giving informative details at an understanding level.
cybersecurity is such a meme industry. scaremongering followed by complete incompetence when it comes to the threats that matter most
Cybersecurity companies have very much a "Who watches the watchmen?" problem. They are given a ton of power in the name of security and they can do a lot of damage.
Literally a "Come on you guys! There it is right in front of you the whole time! You're dereferencing a null pointer!" moment
What if crowdstrike runs falcon on their own workstations? Would it detect its own update as a virus?
No. all zeros is not a virus