Dev Loses $440 Million in 28 minutes, Chaos Ensues

Поделиться
HTML-код
  • Опубликовано: 27 сен 2024

Комментарии • 908

  • @DanielBoctor
    @DanielBoctor  9 месяцев назад +201

    Thanks for watching!
    I thought I would try something slightly different with this video, focusing a bit more on telling a story. I had a lot of fun with it. Open to feedback from y'all, as well as suggestions for future videos (vulnerabilities, breaches, exploits, anything really). I'm doing a bit of travelling next week, so it might be a bit longer until my next upload.
    JOIN THE COMMUNITY ➤ discord.gg/WYqqp7DXbm
    Also, I think I finally fixed my intonations LOL
    Thank you for all of the support, I love all of you ♥
    *EDIT* - At 2:23 those timestamps were meant to be 9:35am, my apologize for the mistake. I thought I fixed it, however I must have ended up uploading the wrong render.
    *EDIT #2* - 2:00 should have been "two and a half minutes", rather than seconds. Thanks to those who pointed this out!

    • @ImperialRoads
      @ImperialRoads 9 месяцев назад

      I love pegging

    • @BillAnt
      @BillAnt 8 месяцев назад +4

      Fascinating, love it. :)

    • @DanielBoctor
      @DanielBoctor  8 месяцев назад +3

      @@BillAnt love you more

    • @renakunisaki
      @renakunisaki 7 месяцев назад +2

      I enjoyed it. Good pace and explanations.

    • @stockstreamtwitch
      @stockstreamtwitch 6 месяцев назад +1

      Great work dude. Glad this video fell into my recommended.

  • @sanderbos4243
    @sanderbos4243 9 месяцев назад +4142

    Imagine the stress of the engineers trying to identify the problem, knowing their company is losing 2.5 MILLION dollars per second

    • @DanielBoctor
      @DanielBoctor  9 месяцев назад +630

      At least they ordered pizza LOL

    • @BillAnt
      @BillAnt 8 месяцев назад +181

      ​@@DanielBoctor- Give new meaning to "When the sh*t hits the server fan". Ha-Ha

    • @supersat
      @supersat 7 месяцев назад +180

      I'm a little surprised NYSE doesn't have a mechanism to block trades when something is obviously going wrong

    • @DanielBoctor
      @DanielBoctor  7 месяцев назад +141

      They actually do, however they were not that helpful for Knight as they were designed for price swings, not trading volume. Mary Schapiro (the SEC chairman at the time) did end up reversing 6 of Knight's transactions, as they reached the cancellation thresholds outlined below:
      The SEC required more specific conditions governing the cancellation of trades. For events involving between five and 20 stocks, trades could be cancelled if they were at least 10 percent away from the “reference price,” the last sale before pricing was disrupted; for events involving more than 20 stocks, trades could be cancelled if they deviated more than 30 percent from the reference price.
      You can read more about this at Henrico Dolfing's report linked in my description.

    • @cambrown5777
      @cambrown5777 7 месяцев назад +121

      It's actually $292,000 per second, if the title is correct (440M / (28*60)). Still absurd.

  • @TheEnlightenedMalignancy
    @TheEnlightenedMalignancy 7 месяцев назад +2028

    Heavy engineer: “It costs thirty million dollars to run my code for twelve seconds”

    • @slomnim
      @slomnim 7 месяцев назад +56

      Put that on your resume xD

    • @TheNefastor
      @TheNefastor 7 месяцев назад +18

      Underrated comment ! 😂

    • @Fossil_Frank
      @Fossil_Frank 7 месяцев назад +9

      Well, on a good day it probably earns them more than that. It's part of the risk game, you accept having a robot making transactions for you and earning money at lightspeed, so you have to accept that sometimes it might lose money at lightspeed too.

    • @JohnSmith-ox3gy
      @JohnSmith-ox3gy 6 месяцев назад +12

      ​@@Fossil_Frank Just no. These are very low margin services. In 45 minutes they spent 4 years of profits of the entire thousand employee business. Your comment is off by a factor of over 1000.

    • @Fossil_Frank
      @Fossil_Frank 6 месяцев назад +1

      @@JohnSmith-ox3gy I don't know where you're from, but on my home turf any kind of stock transaction costs a hefty premium to perform. Granted, these kind of automated services probably discount it for the customers, but those customers are more than likely exclusively high rollers who bring in truck loads of money. I find it hard to belive they would charge them flat rates.

  • @orterves
    @orterves 7 месяцев назад +2456

    Blaming the devs for losing the money when the company pushed for the release in a month, through procedures that involved manual unverified deployments, classic.

    • @spacemanmat
      @spacemanmat 7 месяцев назад +268

      You know that Devs complained and weren’t listened to. Would be surprised if the company had a history of doing this. This time it the company paid the price.

    • @tr7zw
      @tr7zw 7 месяцев назад +158

      I mean, it was a failure on all fronts. 1 Month to implement this, no kill switch, broken deploy scripts, at that point 7-year-old dead and dangerous legacy code in the codebase, being able to "reuse" a flag that causes "dead" code to revive, no plans in case of emergencies... There's a lot at fault here.

    • @snorman1911
      @snorman1911 7 месяцев назад +83

      Similar to my company. Each rush job builds on the tech debt of the previous rush job, with the whole system getting worse each time. Then management demands to know why everything doesn't work optimally. Every objection is met with "If we don't get this out by next week, we'll miss the market!"

    • @Gideonrex1
      @Gideonrex1 7 месяцев назад +86

      I’m a dev ops engineer and my employer made me delete our dev environment bc he didn’t see how it was needed and was costing money. So I could see a company literally just having prod and the devs have no say.

    • @voidspirit111
      @voidspirit111 6 месяцев назад +10

      ​@@tr7zwthere was a kill switch. There allways is. They just wantes to.keep operationa going. The handling of thw situation ia more of a management and risk management failiure.
      But in America managament is rarely blamed.
      Based on the story they wanted to recover while being online so they don't lose face as a MM.
      They always had a hardware kill switch. A server kill switch would.have been a nicer option.
      I say it's a crisis management issue, because they were doing live debugging and troubleshooting while losing so much money and apperently nobody in the chain of command said " Stop take it offline". Beaides the fact that they had to.ve contacted from outside... like.. nobody was supervising that???

  • @LordHonkInc
    @LordHonkInc 6 месяцев назад +1817

    If you're losing ~$150M a minute, there is a kill-switch. It's called the server room breaker

    • @ratulsaha9487
      @ratulsaha9487 6 месяцев назад +415

      Exactly. Don't know why they didn't shut down all operations until they figured it out. They wanted to continue business as usual and ended up losing the entire company. What a bunch of dim wits.

    • @eyeofthepyramid2596
      @eyeofthepyramid2596 6 месяцев назад +13

      What does that do, never heard of it ?

    • @mortyrosenstein4211
      @mortyrosenstein4211 6 месяцев назад +296

      It’s the power circuit for the room. A light switch essentially. You just literally turn off the power for everything in the server room so the servers immediately stop melting the company down.
      The most basic and sure fire way to stop the problem. You just hit the big power button.

    • @astr0man573
      @astr0man573 6 месяцев назад +285

      These days it would be in the cloud and no one would have the credentials to nuke the account 😂

    • @AccountInactive
      @AccountInactive 6 месяцев назад +46

      ​@@eyeofthepyramid2596Same as the breaker in your home. Turns off power to any given room or circuit (like laundry machines or stove)

  • @zdrux
    @zdrux 9 месяцев назад +3181

    At first I thought, how could anyone be this stupid?.. Then I got to the point in video where they were given a month to design and deploy a whole new piece of software, and everything made sense.

    • @DanielBoctor
      @DanielBoctor  9 месяцев назад +315

      yep, that will do it

    • @ghutkamukesh
      @ghutkamukesh 9 месяцев назад +10

      ​@@DanielBoctor😂😂😂

    • @paraax
      @paraax 7 месяцев назад +151

      Yes, getting the software wrong is understandable. Not knowing how to turn your software off is however not understandable. The new routines worked within their existing framework. Deploying and decommissioning should be literally one of the first things they learned. Turn it off, and don't let it out of test mode again until you are sure it works.

    • @exponentialcomplexity3051
      @exponentialcomplexity3051 7 месяцев назад +47

      ​@@paraax I am still confused. Why couldn't they shut it down? Just pull the plug in worst case no?

    • @BAmalakas
      @BAmalakas 7 месяцев назад

      ​@@exponentialcomplexity3051distributed system

  • @KandyWrongIncognito
    @KandyWrongIncognito 7 месяцев назад +1448

    The part where the engineers were engaged in live debugging on a production system made me cringe into the next dimension. That's like trying to perform open heart surgery on a marathon runner as they're running the race. What an absolute disaster.
    Great video.

    • @DanielBoctor
      @DanielBoctor  7 месяцев назад +57

      well said. glad you liked it

    • @Entropy67
      @Entropy67 7 месяцев назад +87

      it must have been torture, considering that there was no bug in their new code, it was a deployment issue 🤣

    • @gingeral253
      @gingeral253 7 месяцев назад +43

      @@Entropy67The worst situation. Everything looks like it’s right and the problem turns out to be somewhere you never looked.

    • @thomquiri9860
      @thomquiri9860 7 месяцев назад +15

      not gonna lie, that's a new fear unlocked for me as a future software engineer

    • @LesserAndrew
      @LesserAndrew 6 месяцев назад +5

      I'm surprised they couldn't figure out how to kill the servers. Once, our CTO fixed a production issue by driving to the colo and unplugging a network connection.

  • @thenayancat8802
    @thenayancat8802 8 месяцев назад +818

    Seems like the "buy high sell low" code is not something you want on your production machines, but then I'm not a financial expert like these folks

    • @JacobSantosDev
      @JacobSantosDev 6 месяцев назад +24

      Well, they did have it behind a feature flag. But reusing a feature flag was a huge mistake.

    • @thenayancat8802
      @thenayancat8802 6 месяцев назад +21

      @@JacobSantosDev Again seems like just simply not pushing the test code to production machines is the safer option, but I'm not a financial expert/CS wizard like they are

    • @JacobSantosDev
      @JacobSantosDev 6 месяцев назад

      @@thenayancat8802 oh sorry. The entire purpose of a feature flag is to be able to turn on and off features that you are testing in production. Just because you tested something in other environments does not mean the feature will work as expected in the production environment. The point of a feature flag is to facilitate the feature used in a live environment where you will want to turn it off. Technically, it is the "kill switch" and based on the limited information, turning that switch off would have saved them. Except it doesn't sound like anyone had training or didn't have access to the feature flag. Different teams are going to have different procedures for how feature toggles are switched. Better of it is a page where product can manage but might be entirely engineer owned.
      "Not deploy test code" is a non sequitur as depending on how you define it, all code is test code. The correct terminology would be "dead code" as the code should never run but because there existed a condition where it could, once it is revived, fuckery happens. You never want dead code to revive. I have never heard of good things happening when dead code suddenly runs.

    • @jajordan2106
      @jajordan2106 6 месяцев назад +1

      It depeneds borrowing a stock at a high price selling the stock at a high price then buying when the stock price falls allows you to make some money although the potential losses are infinite

    • @cheesesniper473
      @cheesesniper473 6 месяцев назад +15

      The problem here was that there was no distinction made between user and development software. This PowerPeg development software should have never been on a live program server at any point in time. It belongs on a dev server or stored on a HD somewhere.

  • @wieliewiel2630
    @wieliewiel2630 6 месяцев назад +452

    "I don't always test my code, but when I do.. it's in Production" - 😅

    • @RudolfJvVuuren
      @RudolfJvVuuren Месяц назад +3

      This made me lol

    • @beyondfubar
      @beyondfubar Месяц назад +4

      Crowdstrike? Is that you?

    • @BrumBrumBryn
      @BrumBrumBryn Месяц назад +2

      Everyone has a test server, it's just sometimes people are lucky enough to have a separate Production server as well.

    • @JohnGardnerAlhadis
      @JohnGardnerAlhadis Месяц назад

      "You either die a bug, or live long enough to become a feature".

  • @Lyokou
    @Lyokou 7 месяцев назад +180

    As a software engineer as soon as I heard one month, I was like yep. Been there, done that.

  • @Nope_handlesaretrash
    @Nope_handlesaretrash 7 месяцев назад +199

    Oh no wont someone PLEASE think of the poor high frequency traders. Lol.

    • @counterleo
      @counterleo 6 месяцев назад

      Any donation link?

    • @Bramble20322
      @Bramble20322 5 месяцев назад +1

      @natmarelnam4871 Stock markets are literally leeches on society and bring literally no value. Prove me wrong.

    • @EnFuego79
      @EnFuego79 День назад

      Say that while watching your 401k go to zero.

  • @dale3478
    @dale3478 7 месяцев назад +140

    Other rollback: "finally sh*t has stopped hitting the fan"
    SMARS rollback: "holy sh*t! There's now 8 times more sh*t hitting the fan!"
    But seriously, for a software that runs at a scale of thousands of requests per second and work with millions of dollars, there should definitely be some sort of kill switch or feature toggle built in from the start. Although the "rollback cause even more problem" is definitely a first for me

    • @MarcosAlexandre-no3qx
      @MarcosAlexandre-no3qx 7 месяцев назад +6

      probably someone saw every problem and said that they need more time to work on solutions and the company probably said. Just send anyway, it wont happen and we put in our next update, that never came.

    • @user0K
      @user0K 3 месяца назад +1

      there should've been a metric, showing volume of orders from each server.
      but yeaah, probably timing issues first

  • @MonsieurSansHonte
    @MonsieurSansHonte 7 месяцев назад +75

    Deploying code from dev to prod without a QA staging environment or subsequent smoke testing, is a recipe for disaster!

    • @Stettafire
      @Stettafire Месяц назад

      Yup. No DR env either

    • @beepbop6697
      @beepbop6697 Месяц назад +1

      I've always wondered: does the NYSE (and other exchanges) have dummy/test environments for these HFTs to test their algorithms against?
      If not: these guys are always "testing in production" -- hopefully with a very small account/budget limit in case the new code goes haywire.
      It is nuts that the NYSE allowed them to place orders that couldn't be filled -- no circuit breakers in any of it. Try to do any of this on Robinhood or other trading app, and that app will prevent you from making trades that your account can't cover.

  • @shubashuba9209
    @shubashuba9209 6 месяцев назад +51

    South Park: "Annnd it's gone."

  • @ryanpalo
    @ryanpalo 7 месяцев назад +44

    This makes all of the times I screwed up prod feel so much better. Thanks for the indepth analysis on this.

    • @DanielBoctor
      @DanielBoctor  7 месяцев назад +4

      thanks for watching!

    • @MangaGamified
      @MangaGamified 6 месяцев назад

      You can't screw like them..
      **loses 100k per second **

  • @wwbcwp
    @wwbcwp 6 месяцев назад +15

    This reminds me of a time I accidentally uploaded an older version of a report I'd been working on in college, overwriting the new one and setting me back days.

  • @ychentt
    @ychentt 7 месяцев назад +49

    Laughed so hard. Just subbed. Can't believe for this quality you only have 11k subs.

    • @DanielBoctor
      @DanielBoctor  7 месяцев назад +2

      thank you for the support ❤

  • @ТимофейЧерников-щ2х
    @ТимофейЧерников-щ2х 7 месяцев назад +79

    I still don't understand why they didn't just stopped all their servers and cancelled all the order's that weren't filled. That would probably take a couple minutes, instead of half an hour

    • @chunkyMunky329
      @chunkyMunky329 7 месяцев назад +22

      They could certainly have cut the power, which would have been fastest but I don't think it would have been possible to cancel the orders that have already gone through

    • @ME0WMERE
      @ME0WMERE 7 месяцев назад +26

      @@chunkyMunky329 better to cut their losses than continue to lose $2.5 M per second

    • @chunkyMunky329
      @chunkyMunky329 7 месяцев назад +13

      @@ME0WMERE Thats what I'm saying. Except I was saying that they should cut the mains power to the building instead of manually switching off each server

    • @SianaGearz
      @SianaGearz 7 месяцев назад +6

      Do you know where the master breaker is for your whole building? Well apparently they didn't either.

    • @sandworm9528
      @sandworm9528 7 месяцев назад +16

      ​@@SianaGearz Yep I do, and if we were losing 2.5 million a second I'd probably hear someone yelling 'kill the power' and i would flip it

  • @spacetime3
    @spacetime3 5 месяцев назад +8

    That hit hard... "Hit the Kill switch (for the love of god !!) ...... There is no Kill switch....."

  • @ninjaasmoke
    @ninjaasmoke 7 месяцев назад +39

    God! Imagine rolling back to a stable software and losing even more money. That would have sent people nuts!!!!
    Losing close to 20m every second.

    • @darrennew8211
      @darrennew8211 6 месяцев назад

      They didn't roll back the flags, though. That was their mistake. The problem started when they flipped the flag, and then they left the flag and replaced the code.

    • @ninjaasmoke
      @ninjaasmoke 6 месяцев назад +1

      @@darrennew8211 thank you captain obvious

  • @alexander1989x
    @alexander1989x 3 месяца назад +8

    *uncomments the code
    *casually loses $440 Mil.

  • @voidsp
    @voidsp 7 месяцев назад +41

    2:23 that's a bit of a fallacy IMO. There is a kill switch almost always. If affected services are on-prem - kill their Internet connection. Pull a plug on whole office/building if you have to. And if it's a datacenter, do basically the same - DC support can disconnect your servers/racks from the Internet.

    • @EatMyAstro
      @EatMyAstro 7 месяцев назад +1

      Market-making (Knight's entire business) means being the middleman for every trade possible... with their infrastructure and resources, they were the #1 and were making a killing. IIRC Knight was responsible for nearly half of the volume across all exchanges in the stock market. If they seized operations here, they would lose everything. Their job is to remove friction between buyers and sellers by being a middleman, and as their reputation grew (along with their systems), it was paramount to always be online. It is not as easy as unplugging a box, and in many cases doing this would only make matters worse, logistics and business-wise.
      At the end of the day, yes there should've been a killswitch, and yes it should've been engaged. This was one of the first (if not the first) blowups in electronic markets that the industry had ever seen. And from the sounds of it, Knight was understaffed in their engineer/ops departments.

    • @MrAntiKnowledge
      @MrAntiKnowledge 7 месяцев назад +10

      I remember an old video where they switched a telephone network over and had to take the old one offline first.
      It involved 30 or so people with bolt cutters :D

    • @samramdebest
      @samramdebest 7 месяцев назад +3

      I remember that video too
      Went looking for it a while back
      Couldn't find it anymore

    • @DanielBoctor
      @DanielBoctor  7 месяцев назад +5

      Is this what you are referring to? This is fascinating
      ruclips.net/video/saRir95iIWk/видео.htmlsi=uBbpgRjyGvrHR1_S

    • @samramdebest
      @samramdebest 7 месяцев назад +3

      Yes it is
      And now I'm confused as to why I couldn't find it, apparently it was already in my likes?
      I must have been pretty tired when I went looking for it last time (or maybe it was set to private for a while 🤷‍♀️)

  • @loszhor
    @loszhor Месяц назад +4

    This is why you review your builds BEFORE launching!

  • @mariuszmoraw3571
    @mariuszmoraw3571 6 месяцев назад +5

    If you don't have software kill switch, you always have hardware kill switch. Disconnect malfuctioning algorith from web and run tests until reason is found and fixed.

  • @kylebroussard5952
    @kylebroussard5952 6 месяцев назад +3

    *If you have $440M and you can lose even 10% of it in a single day, you've done a terrible job. This right here is abominable*

  • @miss_adventure
    @miss_adventure 6 месяцев назад +2

    Production tip: the endings of these episodes feel so abrupt, it’s kinda jarring. I think it would be lovely to have more of an intentional outro - maybe summarize the topics discussed, or talk about some takeaways and how things might be improved in the future or something. Also a pause between the end of the script and the start of the “if you’ve made it this far” to give an indication that we’ve reached the end. Love how well you talk about these topics!

  • @Dr_Larken
    @Dr_Larken 3 месяца назад +3

    2:26 I can only imagine how that phone call to customer support went!??
    “Thank you for calling customer support, Knight in shining armor! How may help you?”?
    “OMG, We’re literally losing millions every second and can’t figure out WTF is going on…HELP!!”!
    “Um oh dear, Okay sir…um.. Have you unplugged your router, plugging it back in after 10 seconds?”

  • @rightwingsafetysquad9872
    @rightwingsafetysquad9872 6 месяцев назад +9

    No kill switch? Like there was no circuit breaker to flip, no power cord or network cable to unplug? If I were the acting executive I would have walked into the computer room with cable cutters or an ax or something and just started chopping. I understand there could be large penalties for failing to complete market orders left pending, but it can’t be worse than $2.5 million per second.

    • @eadweard.
      @eadweard. 6 месяцев назад

      What music would have been playing as you did it?

    • @rightwingsafetysquad9872
      @rightwingsafetysquad9872 6 месяцев назад

      @@eadweard. ruclips.net/video/I8EOAEYgsE0/видео.html

    • @hillaryclinton1314
      @hillaryclinton1314 6 месяцев назад

      I blame usa education system

    • @rightwingsafetysquad9872
      @rightwingsafetysquad9872 6 месяцев назад +1

      @@eadweard. Bat Out of Hell by Meatloaf.

    • @elobiretv
      @elobiretv Месяц назад +1

      Server probably is not even on the premises and they might not even have access to it.

  • @Ch17638
    @Ch17638 6 месяцев назад +6

    Wait who hold on ..... Major deployment left to one person ? And then when trading started not a single engineer was monitoring trades just to check if everything was working as expected ? Then the CEO takes a break on launch date. We once deployed a new process service for company payroll, and on day one we had all leads and seniors monitor the system with several layers of safety introduced (limits to transaction amounts, limits to amount of transactions on first run) that data got checked to with an inch of its life, then the next set and the next with reports filled by the dev managers that had to be signed by the CIO before the remainder of the transactions could go through but even then as it ran at a staggered rate we had someone ready to pull the cord if anything seemed off. There were redundancies for redundancies as this system could empty 3 bank accounts in no time.

    • @mrgyani
      @mrgyani 6 месяцев назад +1

      Yeah. Makes no sense. 😂 What a clown-show it was.

  • @yashshende2786
    @yashshende2786 7 месяцев назад +13

    That's why chaos engineering and DR testing is important... They will surely build a kill switch now 😂

    • @Stettafire
      @Stettafire Месяц назад

      Proper version control...

  • @spyr0guy
    @spyr0guy 6 месяцев назад +6

    "Software will handle it!"
    Software:

  • @gswdeclan
    @gswdeclan Месяц назад +1

    Losing over $100 million in a minute? Even Cathie Wood couldn't do that, truly impressive.

  • @somebodythatiusedtoknoooooooow
    @somebodythatiusedtoknoooooooow 4 месяца назад +2

    That's the issue with arbitrage bots, when they fail they lose years of profits in just minutes.
    Title should be "Manual Deployment ends up costing $440 Million. Maybe we need to hire some devops? "

    • @Stettafire
      @Stettafire Месяц назад

      Hire DevOps. QA. More than 1 dev. Good working practices. Estimation sessions from devs (no crunch time allowed outside of a P1). Proper pipelines and proper version control.

  • @JorkForkenson
    @JorkForkenson 3 месяца назад +1

    Having worked in American Fintech for over a decade I can tell you exactly where this came from. There is zero room for nepotism in IT but I see it all the time. People who have no clue how anything technical works are placed in the highest ranked positions because they are buddies with someone or are great at blowing smoke up peoples rears.

  • @LevelofClarity
    @LevelofClarity 9 месяцев назад +17

    Great video. Would love to see some follow-up stories relating to HFT industry. Read Flash Boys years ago and absolutely loved it. I hope too see more of this from you in the future 😎

    • @DanielBoctor
      @DanielBoctor  9 месяцев назад +4

      Thank you! It's definitely a area that I want to dive into. Thanks for sharing - I actually never heard of Flash Boys before ❤️

    • @LevelofClarity
      @LevelofClarity 9 месяцев назад +1

      @@DanielBoctor Flash Boys is a great book. If you’ve never read anything from Michael Lewis that would be a great one to start with. The last couple of years I’ve mostly been listening to Audible. Hopefully you’re able to check it out.

  • @ronanoke
    @ronanoke 7 месяцев назад +6

    Fantastic vid! A complex topic made simple, great job

  • @josephduenas4718
    @josephduenas4718 6 месяцев назад +3

    No kill switch or procedure to resolve this? Sounds like the devs got 2 days to brainstorm and the company said "yup! autobots roll out" 😂😂😂😂😂😂

  • @Xezlec
    @Xezlec Месяц назад

    My jaw just crashed through the floor. You're telling me a company that writes code managing BILLIONS of dollars automatically not only doesn't use proof systems to prove the code is correct before deploying it, not only doesn't take their time to carefully vet any code changes through multiple levels of review, but actually pushes developers to rush out code and deploy it as fast as possible with no review, and even has no way to turn it off?! This level of incompetence is beyond what I would have thought humanly possible.

  • @danser_theplayer01
    @danser_theplayer01 7 месяцев назад +4

    Imagine getting Power Pegged for -440 000 000 dollars💀

  • @wulfbak
    @wulfbak Месяц назад

    This is like Skynet becoming self-aware. Engineers in a panic try to pull the plug, but are unable to.

  • @AryanKumar-jo1pz
    @AryanKumar-jo1pz 6 месяцев назад +3

    The editing and effects are amazing. Reminds me of Lemmino
    really well done

    • @DanielBoctor
      @DanielBoctor  6 месяцев назад +2

      wow, I never thought that my content itself would be compared to the legend himself. thank you for the support ❤️

  • @susanoo2.042
    @susanoo2.042 Месяц назад +1

    God, the worst part is thinking the new code was the problem and then reverting all the servers back to the old code, only to lose cash 8x faster. From 2.5 mil a second to 20 mil a second!

  • @debasishraychawdhuri
    @debasishraychawdhuri 7 месяцев назад +7

    why did not they turn off the computers? just pull the cables.

  • @zamoqi
    @zamoqi 7 месяцев назад +3

    Enjoying your content big time. Appreciate the work that you put in!

    • @DanielBoctor
      @DanielBoctor  7 месяцев назад +2

      thank you for the kind words - glad you like it ❤️

  • @Orionbae
    @Orionbae 6 месяцев назад +2

    Its pretty cool to actually see what quant firms do behind the scenes great video 🔥

  • @mikethespike7579
    @mikethespike7579 6 месяцев назад +1

    One of my former professors once told us that computers are just hard working idiots. They will readily wipe you and your assets off the face of the earth if just one line of code tells them to.

  • @weakend
    @weakend 6 месяцев назад +3

    What a crazy story... so insane to me to run a company moving that much money and not have integration testing. On first glance you'd want to blame the engineers here, but the majority of the blame would have to be on engineering management/upper management to allow prod code on financial systems to be deployed sans integration testing. This story is a great anecdote as to why infrastructure as code/virtualization is so critical.

    • @Micke12312
      @Micke12312 24 дня назад

      It's very hard to simulate the load that exist in prod. Add to that a code base that had grown and noone really understands it anymore.

    • @weakend
      @weakend 24 дня назад

      @@Micke12312 sure; not $440 million hard

  • @Hebdomad7
    @Hebdomad7 6 месяцев назад +1

    The problem was management all along. But they got golden parachutes as punishment ... The sooner these wallstreet types becomes personally liable for these kinds of screw ups the better.

  • @gn0my
    @gn0my Месяц назад

    As a software engineer, Im sometimes horrified at the practices other companies have. Its SO easy to keep the power peg algorithm, but not have it in production. Things like that is just astonishing to me.

  • @shea455
    @shea455 Месяц назад +1

    They could have literally disconnected the network faster. It does not have to be automated. Even if in the cloud, a black-hole route is easy to create. As for process, this is classic: "Technology is a COST center" thinking. Cut budgets reduce time to deliver, and reduce talent in the technology pool as the best technology people have the easiest time replacing their employer.

  • @danielklimek6320
    @danielklimek6320 7 месяцев назад +10

    How they couldn’t hit ctrl+c or kill the processes in the background remains a mystery

  • @daverei1211
    @daverei1211 7 месяцев назад +2

    At some point they should have just put in a firewall rule to block connections with the trading server so no more trades could be made while they’d figure it out. Better to do this at $40m than $400m…..

  • @miguelmartins3864
    @miguelmartins3864 9 месяцев назад +3

    Excellent video - very informative! I enjoyed the blend of finance and software. Given how intertwined they are these days, there's likely many more topics to explore!

    • @DanielBoctor
      @DanielBoctor  9 месяцев назад +2

      THANK YOU MIGUEL! I completely agree as well ❤

  • @pierce9019
    @pierce9019 Месяц назад +1

    Fuck a kill switch, I'd trigger the fire alarm in the server room

  • @gFamWeb
    @gFamWeb 7 месяцев назад +10

    2:00 "two and a half seconds" did you mean minutes?

    • @DanielBoctor
      @DanielBoctor  6 месяцев назад +1

      oops, good catch. should have been minutes. good catch! updated the pinned comment.

  • @brucelee-hl8zn
    @brucelee-hl8zn Месяц назад

    Funny thing is I am an IT engineer and just before this incident i asked them during a job interview for a VP position that I would 100% secure all the deployment of all their new projects. I just asked to be paid 1 million $ per minute and they all laughed to my face. Who is laughing now?

  • @taralalram
    @taralalram 7 месяцев назад +1

    As someone who's worked in IT for 40 years from machine code programmer to head of engineering, this is definitely the CEO's fault. Testing takes time and yes men are too afraid to speak truth to power, instead ignoring all the advice from engineering. No one died here, it didn't work out so well for the people on the space shuttle.

  • @ForkCandle123
    @ForkCandle123 6 месяцев назад +1

    It's the fault of the regulator allowing firms to buy shares that aren't actually available at that time. And it's the fault of the firm doing such a thing. Incompetence of the firm involved not to have taken proper precautions. Such profiteering shouldn't be allowed. As it was, they lost out. They'd not have complained if they'd accidentally made that sum rather than lost it.

    • @QuicksilverSG
      @QuicksilverSG 6 месяцев назад +1

      So you're saying the regulators shouldn't have allowed them to build a house of cards out of a house of cards?

  • @maxbrooks5468
    @maxbrooks5468 Месяц назад

    “Well clearly it’s the new code that’s the problem! Let’s just roll it back to stop the bleeding.”
    “… oh no!”

  • @dustondoesit3913
    @dustondoesit3913 2 месяца назад

    This video was very well produced and executed, great content. Easy sub.

  • @richardgilson3512
    @richardgilson3512 7 месяцев назад +3

    Old server number 8 is the hero we need ;)

  • @DudeWatIsThis
    @DudeWatIsThis 6 месяцев назад +1

    The day Knight Capital got power-pegged.

  • @HarshAnalysis
    @HarshAnalysis 9 месяцев назад +4

    I feel like your channel is going to blow up soon . great video and editing . Can i know what editing software you use?

    • @DanielBoctor
      @DanielBoctor  9 месяцев назад +1

      Thanks! For sure, I use DaVinci Resolve 😊

  • @gurupartapkhalsa6565
    @gurupartapkhalsa6565 Месяц назад

    No one else found that this was a really obvious case of corporate espionage?

  • @JoeSmith-cy9wj
    @JoeSmith-cy9wj 6 месяцев назад +1

    In the end, the blame is on the CEO. Mistakes happen. The problem was greed, preventing proper procedure in a rush to grab more money.
    The entire stock exchange is there so a bucket of leeches can suck the life blood out of people who actually provide labor and goods, in favor of those too lazy, stupid, or entitled to work for a living.
    Betting on other's fortunes should be outlawed.

  • @hgbugalou
    @hgbugalou 7 месяцев назад +5

    No kill switch? Why can't you tell the dude at the datacenter to start pulling network cables? What am I missing here? Analysis paralysis?

  • @zemm9003
    @zemm9003 2 месяца назад

    FCC approving this just shows corruption runs rampant in the USA and it's just a matter now of when the printer slows down.

  • @Billclint-i8f
    @Billclint-i8f 21 день назад +3

    Most Americans find it hard to retire comfortably amid economy crisis. Some have close to nothing going into retirement, my question is, do I pull cash from my 401k and buy a house, or spread my money in stocks for cashflow? I'd love to afford my lifestyle after retirement?

    • @franklyn-z1k
      @franklyn-z1k 21 день назад +3

      Lately, I've been contemplating retirement, uncertain whether my 401(k) and IRA will ensure a secure future. I've also invested $200K in the stock market, experiencing fluctuations without substantial gains.

    • @randettawolf
      @randettawolf 21 день назад +2

      Using a 401(k) or IRA is a valuable strategy for retirement planning, providing potential savings growth and tax advantages. While the stock market is promising, expert guidance is essential for effective portfolio management

    • @brenda-v7c3k
      @brenda-v7c3k 21 день назад +1

      Opting for an investment advisor is currently the optimal approach for navigating the stock market, particularly for those nearing retirement. I've been consulting with a coach for a while, and my portfolio has surged by 45% since Q2.

    • @Carolj-p9j
      @Carolj-p9j 21 день назад +1

      Market behavior can be complex and unpredictable. Mind if I ask you to recommend this particular coach to whom you have used their services?

    • @brenda-v7c3k
      @brenda-v7c3k 21 день назад +1

      'Grace Adams Cook' , is the licensed advisor I use. Just research the name. You’d find necessary details to work with a correspondence to set up an appointment.

  • @TheJensss
    @TheJensss 2 месяца назад +1

    Why did they bot cut the power to the servers, floor or the building or cut the fiber or something? This could have been stoped in minutes if someone has just taken the decision to take the measures needed to get the servers offline

  • @yashpatel261
    @yashpatel261 2 месяца назад

    The management caused this not the devs. They make obscene demands from regular people,cut corners and then expect everything to work out fine. I wish i get to be in charge someday so i can tell C-Level guys to have some humility in the demands they make and the resources they allocate to IT which is super CRITICAL to the health of the business.

  • @rollingrock5143
    @rollingrock5143 6 месяцев назад +3

    Should bot trading be banned?

    • @guerra_dos_bichos
      @guerra_dos_bichos 6 месяцев назад

      Yes keep trying , it will stop them 😂

    • @QuicksilverSG
      @QuicksilverSG 6 месяцев назад

      Just wait patiently while the bot programmers are replaced by AI.

    • @warmike
      @warmike 5 месяцев назад

      No. It provides liquidity to the market (i.e. makes it easier for investors to buy or sell at a fair price).

  • @KertaDrake
    @KertaDrake 6 месяцев назад +1

    Should have just ran into the server room and started yanking cables when it came to that level of money loss...

    • @guerra_dos_bichos
      @guerra_dos_bichos 6 месяцев назад

      Its hard to do that when your servers are spread all over and owned by some cloud provider... companies on that level use high redundancies cloud providers

    • @Micke12312
      @Micke12312 24 дня назад

      The server room is rarely in the room next door...

  • @SassyOnline
    @SassyOnline Месяц назад

    Losing that much money and still ordering pizza is crazyyyyyy those staff didn't give a fuck. Ignoring monitoring, then not even being aware of the amount of orders they were placing? No sign off of such an important production release? Crazy man.

  • @hari1408
    @hari1408 6 месяцев назад +1

    i don't feel stupid anymore after watching this

  • @mkulak0
    @mkulak0 7 месяцев назад +3

    Perfect. Nothing is better than corporations losing money

  • @jacksnipe2441
    @jacksnipe2441 Месяц назад

    The most screwed up part of all this is if a firm with a FINRA governing seat had made this error, it would have been voted TCE and unwound.

  • @Yesnaught
    @Yesnaught Месяц назад

    Never trust an algorithm, bot, or AI you can't take a physical axe to.

  • @gerradkp
    @gerradkp 2 месяца назад

    amazing, sober, deeply technical analysis. Brilliant

  • @Denni55
    @Denni55 Месяц назад +1

    "You think it's a bad idea to let a few big firms manipulate the entire market?"
    "Nah, it will go great."

  • @milesbowen9433
    @milesbowen9433 Месяц назад

    The fact that they found the problem by making it 8x worse is just a little funny to me

  • @danielhobbyist
    @danielhobbyist 6 месяцев назад +2

    Why did they not just turn the servers off? I do not understand.

  • @JFirn86Q
    @JFirn86Q Месяц назад

    Ah yes, I always ensure my production code has a Buy High, Sell Low program somewhere in it. All the best ones do.

  • @ooplesoft
    @ooplesoft 6 месяцев назад +1

    Excellent video! What a great summary dude well done.

  • @Mordecrox
    @Mordecrox 2 месяца назад

    5:40 high risk high reward. Bold move, Cotton.

  • @Osirus1156
    @Osirus1156 5 месяцев назад

    I use this example all the time when telling higher ups why you shouldn't try to force the devs to move faster.

  • @DefinitelyNotaRussianSpy
    @DefinitelyNotaRussianSpy Месяц назад +1

    4:34 you got the wrong stock footage, that’s the Gold Coast Australia not New York USA

  • @arandomguy46
    @arandomguy46 Месяц назад +1

    why didn't they shut down the servers? At least it prevents any further trades from going through?

  • @devzozo
    @devzozo 6 месяцев назад +1

    I get like 30 useless emails daily about some system error for something I don't support, interspersed with random PTO notifications from coworkers and company/organization wide announcements that aren't relevant to me. I can totally understand just ignoring those emails.

  • @mathew2214
    @mathew2214 6 месяцев назад +1

    Computers shouldnt be allowed to trade stonks. Only people. And you should require a signature for each trade.

  • @Talik13
    @Talik13 6 месяцев назад +1

    At least the good news is that it's just the stock market; America's favorite government-endorsed gambling den that used to be for evaluating companies.
    "MAKE LINE GO UP" so that the people who 'invested' can keep getting more of a return off the backs of the people who work at said company.

  • @Shogoeu
    @Shogoeu 6 месяцев назад +1

    "There's no kill switch" - just pull the power cord...

  • @NoBubbles
    @NoBubbles 7 месяцев назад +1

    I put a kill switch on my program used to automatically open webpages for a factory I work in, who would forget a kill switch on a financial tool?!?

    • @Stettafire
      @Stettafire Месяц назад +1

      How didn't the auditors screech when they had their yearly audit? In my country you're not allowed to do anything financial without a proper audit and they have requirements of the codebase and working practices too. Like release documents, version control, PRs etc

  • @ccctube5721
    @ccctube5721 7 месяцев назад +1

    Hi Dan, I really enjoy the format of video you make, I think you may even be the person who pioneered this genre. Please keep them coming.

    • @DanielBoctor
      @DanielBoctor  7 месяцев назад +1

      Thank you so much. I can't say I pioneered the genre, but I appreciate the words

  • @kucingoyen1
    @kucingoyen1 7 месяцев назад +2

    Deployment without internal testing in all of the live servers? Lmao

  • @renakunisaki
    @renakunisaki 7 месяцев назад +2

    Reusing a flag like that is a terrible idea.

  • @dvs6121
    @dvs6121 Месяц назад

    2:01 @148M/minute, 2.5 MINUTES was enough to bankrupt the company, NOT "2.5 seconds".

  • @bblegacy
    @bblegacy Месяц назад +1

    Win some / lose some I guess. It's just too bad this can't happen to more banks and brokerages that operate on a basic premise of greed at any price; like Goldman, Sachs itself.

  • @MeaHeaR
    @MeaHeaR Месяц назад +1

    I got confused in the first 10 Seconds

  • @FraggnAUT
    @FraggnAUT 6 месяцев назад +1

    I still dont understand how they couldnt just kill their connection?

  • @uncleweirdbeard86
    @uncleweirdbeard86 Месяц назад +1

    How in the heck are you gonna make software to buy and sell stocks and not have a kill switch? That's like driving a car with no brakes

    • @Hyvexx
      @Hyvexx Месяц назад +1

      Had no kill switch, didn't take the time to review the code, rushed the developers and didn't even test it in a controlled setting before pushing. They really stripped all safety features and weight out of a car, put in a formula engine and didn't consider what would happen if it crashed.