1202

Major IT outage affecting banks, airlines, media outlets across the world (www.abc.net.au)

submitted 2 years ago* (last edited 2 years ago) by rxxrc@lemmy.ml to c/technology@lemmy.world

541 comments fedilink hide all child comments

All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It's all very exciting, personally, as someone not responsible for fixing it.

Apparently caused by a bad CrowdStrike update.

Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We'll see if that changes over the weekend...

you are viewing a single comment's thread
view the rest of the comments

[-] Saik0Shinigami@lemmy.saik0.com 11 points 2 years ago* (last edited 2 years ago)

I mean - this is just a giant test of disaster recovery plans.

Anyone who starts DR operations due to this did 0 research into the issue. For those running into the news here...

CrowdStrike Blue Screen solution

CrowdStrike blue screen of death error occurred after an update. The CrowdStrike team recommends that you follow these methods to fix the error and restore your Windows computer to normal usage.

Rename the CrowdStrike folder
Delete the “C-00000291*.sys” file in the CrowdStrike directory
Disable CSAgent service using the Registry Editor

No need to roll full backups... As they'll likely try to update again anyway and bsod again. Caching servers are a bitch...

[-] Monument@lemmy.sdf.org 13 points 2 years ago

I think we’re defining disaster differently. This is a disaster. It’s just not one that necessitates restoring from backup.

Disaster recovery is about the plan(s), not necessarily specific actions. I would hope that companies recognize rerolling the server from backup isn’t the only option for every possible problem.
I imagine CrowdStrike pulled the update, but that would be a nightmare of epic dumbness if organizations got trapped in a loop.

[-] Saik0Shinigami@lemmy.saik0.com 8 points 2 years ago

I think we’re defining disaster differently. This is a disaster.

I've not read a single DR document that says "research potential options". DR stuff tends to go into play AFTER you've done the research that states the system is unrecoverable. You shouldn't be rolling DR plans here in this case at all as it's recoverable.

I imagine CrowdStrike pulled the update

I also would imagine that they'd test updates before rolling them out. But we're here... I honestly don't know though. None of the systems under my control use it.

[-] Skimflux@lemmy.world 3 points 2 years ago

Right, "research potential options" is usually part of Crysis Management, which should precede any application of the DR procedures.

But there's a wide range for the scope of those procedures, they might go from switching to secondary servers to a full rebuild from data backups on tape. In some cases they might be the best option even if the system is easily recoverable (eg: if the DR procedure is faster than the recovery options).

Just the 'figuring out what the hell is going on' phase can take several hours, if you can get the DR system up in less than that it's certainly a good idea to roll it out. And if it turns out that you can fix the main system with a couple of lines of code that's great, but noone should be getting chastised for switching the DR system on to keep the business going while the main machines are borked.

[-] Monument@lemmy.sdf.org 2 points 2 years ago

That’s a really astute observation - I threw out disaster recovery when I probably ought to have used crisis management instead. Imprecise on my part.

[-] Monument@lemmy.sdf.org 2 points 2 years ago* (last edited 2 years ago)

The other commenter on this pointed out that I should have said crisis management rather than disaster recovery, and they’re right - and so were you, but I wasn’t thinking about that this morning.

[-] Saik0Shinigami@lemmy.saik0.com 3 points 2 years ago

Nah, it's fair enough. I'm not trying to start an argument about any of this. But ya gotta talk in terms that the insurance people talk in (because that's what your c-suite understand it in). If you say DR... and didn't actually DR... That can cause some auditing problems later. I unfortunately (or fortunately... I dunno) hold the C-suite position in a few companies. DR is a nasty word. Just like "security incident" is a VERY nasty phrase.

[-] jj4211@lemmy.world 5 points 2 years ago

Note this is easy enough to do if systems are booting or you dealing with a handful, but if you have hundreds of poorly managed systems, discard and do again.

[-] Saik0Shinigami@lemmy.saik0.com 2 points 2 years ago

Yeah I can only imagine trying to walk someone through an offsite system that got bitlocked because you need to get into safe-mode. reimage from scratch might just be a faster process. Assuming that your infrastructure is setup to do it automatically through network.

[-] StaySquared@lemmy.world 2 points 2 years ago

Nah... just boot into safemode > cmd prompt: CD C:\Windows\System32\drivers\CrowdStrike

Then: del C-00000291*.sys

Exit/reboot.

[-] Saik0Shinigami@lemmy.saik0.com 1 points 2 years ago

The stuff I copied into the end of my comment is direct from CrowdStrike,.

[-] StaySquared@lemmy.world 1 points 2 years ago

Hm.. yeah what I provided worked for us.

this post was submitted on 19 Jul 2024

1202 points (99.5% liked)

Technology

86419 readers

4764 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws