"Bush hid the facts" Bug EXPLAINED

Поделиться
HTML-код
  • Опубликовано: 17 окт 2024

Комментарии • 997

  • @nielot0330
    @nielot0330 Год назад +2458

    I love computers.

  • @catomajorcensor
    @catomajorcensor Год назад +4201

    The question we should be asking is "how did Windows developers come up with the worst way to detect Unicode?"

    • @ОннокорОктябрь
      @ОннокорОктябрь Год назад +713

      - So... how we are going to detect unicode?
      - -Meth- Math

    • @RDCST
      @RDCST Год назад +41

      @@ОннокорОктябрь Meth?

    • @Cyberfishofant
      @Cyberfishofant Год назад +496

      ​@@RDCSTi was gonna _crack_ a joke, but that wouldn't be funny

    • @FriedMonkey362
      @FriedMonkey362 Год назад +207

      I mean it was good enough because it "rarely" occured, still the amount of false positives is indeed too much and yeah it was pretty bad

    • @intron9
      @intron9 Год назад +77

      It acts like a detector for windows's "Unicode" files that only have English characters, but with a lot of false positives.

  • @avitus27
    @avitus27 Год назад +2357

    I can't believe it took me this long to realise: this channel is a fly explaining bugs.

    • @Axcyantol
      @Axcyantol Год назад +78

      wait what

    • @amberwingthefairycat
      @amberwingthefairycat Год назад +192

      @@Axcyantol A fly (the type of insect) explaining bugs (computer errors, but the word also means insect)

    • @TDXC------
      @TDXC------ Год назад +55

      He also... Well creates bugs too
      (By that I mean destroying windows)

    • @Axcyantol
      @Axcyantol Год назад +27

      @@amberwingthefairycat no i understood it, i was just surprised about it being a fly explaining bugs

    • @amberwingthefairycat
      @amberwingthefairycat Год назад +9

      @@Axcyantol Oops, sorry, haha.

  • @shockwave952
    @shockwave952 Год назад +1135

    15 years ago I made a "Windows XP Easter Eggs" video featuring this bug, and now I feel this strange sense of satisfaction finally knowing why it happens. Thanks FlyTech!

    • @Dog_Dogs
      @Dog_Dogs Год назад +13

      That video is best video.

    • @elesqueleto2010
      @elesqueleto2010 8 месяцев назад

      i think you made the video a little too popular, still, good one 👍

  • @MadsterV
    @MadsterV Год назад +560

    Note: What Windows Write calls "unicode" is UCS-2, a now-obsolete 16-bit encoding.
    What today we call unicode is usually UTF-8, a variable-width encoding that conveniently matches US-ASCII, though there's also UTF-16 and UTF-32.

    • @billy65bob
      @billy65bob Год назад +27

      There's not just those variants either.
      There's the obsolete UTF-7 too (for some reason), and the multi-byte encodings come in both Little Endian and Big Endian flavours.

    • @freyja5800
      @freyja5800 Год назад +13

      @@weakspirit_ internally, yes. but when saving text it does default to utf8, since a text in 7-bit ascii and in utf8 are the same, and since in english you rarely need characters beyond that, using utf8 is more space efficient

    • @laurentverweijen9195
      @laurentverweijen9195 Год назад +9

      UCS-2 and UTF-16 is more or less the same (and what windows / C-sharp incorrectly call "unicode encoding".)

    • @Spartan322
      @Spartan322 Год назад +9

      Technically ANSI Windows-1252 (what is functionally always used when Windows refers to "ANSI") is incompatible with UTF-8 as there are numerous UTF-8 bytes which are "reserved" (usually for multi-byte characters) which Windows-1252 uses as printable (single-byte) characters. If any of those bytes exist in the string when read as UTF-8 the encoding will break, which in a well developed system will merely produce a few erroneous characters. Now since Windows-1252 extends standard ASCII, most of the bytes in Windows-1252 will be read well in UTF-8, specifically all the common American used characters. The problem with the encoding only occurs when you have these non-ASCII characters either in Windows-1252 or in UTF-8, in which case if you try to read the either UTF-8 from Windows-1252 or Windows-1252 from UTF-8, you will get problems. Experienced this exact issue when dealing with a game that reads and writes Windows-1252 but which has resources written in UTF-8, causing all sorts of weird problems.

    • @hellterminator
      @hellterminator Год назад +18

      @@laurentverweijen9195 UCS-2 can only represent ~6% of the reserved Unicode codepoints and ~69% of the ones already assigned , whereas UTF-16 can represent them all through surrogates. Don't get me wrong, UTF-16 is the worst Unicode encoding, but at least it *is* a Unicode encoding, unlike UCS-2.

  • @TwoNumbahNiens
    @TwoNumbahNiens Год назад +1242

    I can't believe George W. Bush would do this.

    • @alexandermorozov8593
      @alexandermorozov8593 Год назад +203

      "W." stands for WordPad

    • @JeyPeyy
      @JeyPeyy Год назад +74

      Amazing how he started hiding the facts at least 7 years before it happened

    • @blueschnabeltier_
      @blueschnabeltier_ Год назад

      Now its George L. Bush, 'cause he hid the facts

    • @johndododoe1411
      @johndododoe1411 Год назад

      George Walker Bush Sr. ruled 1089 to 1992, and did secret government stuff with Nixon and Reagan .

    • @dylanbksp
      @dylanbksp Год назад +4

      do you know what i can believe bush did?

  • @rogehmarbi
    @rogehmarbi Год назад +984

    I almost completely forgot WordPad exists, despite it being more powerful than Notepad.

    • @FlyTechVideos
      @FlyTechVideos  Год назад +319

      WordPad is too powerful for us mortals

    • @npoaccount9154
      @npoaccount9154 Год назад +38

      ​@@FlyTechVideosyea, nothing can break it

    • @masonboone4307
      @masonboone4307 Год назад +11

      Notepad is a decedent of wordpad

    • @zUltra3D
      @zUltra3D Год назад +137

      @@npoaccount9154 wasn't there a video about corrupting the heck out of windows 10 and everything was broken except for wordpad

    • @Fidumo
      @Fidumo Год назад +50

      @@npoaccount9154 as soon as i read that, i tried breaking it. i ended up creating a document filled with a creepy smiley face image with the size of 2,306,727,936 bytes (that's 2.3 gb). i cant open it because it's too big, and i think i corrupted the file by closing wordpad while it was saving the file. i still dont think this counts as breaking it, i think its just the file thats broken.

  • @y7o4ka
    @y7o4ka Год назад +506

    Dude really digged into windows api's assembly to uncover a strange bug from 1994
    Good job!

    • @thevorkman_6551
      @thevorkman_6551 Год назад +7

      I think he not dive, he just wrote his own that works similar...
      Maybe, idk

    • @aetimes2
      @aetimes2 Год назад +43

      @@thevorkman_6551 He went into the code to figure out how it worked, and then wrote his own that worked the same

    • @mstech-gamingandmore1827
      @mstech-gamingandmore1827 Год назад +56

      Don't forget the special... _sources_ ;)

    • @thevorkman_6551
      @thevorkman_6551 Год назад +1

      @@mstech-gamingandmore1827 Yes)))

    • @mskiptr
      @mskiptr Год назад +35

      XP source leak probably

  • @Fasguy
    @Fasguy Год назад +192

    The amount of times i've seen this spewed about as an "easter egg" is nuts.
    Just like the "Bill Gates Sucks" "easter egg" in C64 BASIC.

    • @chri-k
      @chri-k Год назад +24

      what’s that?

    • @brinleyhamer729
      @brinleyhamer729 Год назад +2

      yeah what’s that?

    • @Dj_Theorema
      @Dj_Theorema Год назад +27

      @@chri-k To keep It simple, C64 BASIC has a random number generator that, every time you turn on the machine, it always produces the same "random" sequence of floating-point numbers, all between 0 and 1. Using this "feature" someone wrote a small program (4 lines of code) that print the sentence "Bill Gates Sucks" on screen

  • @jhgvvetyjj6589
    @jhgvvetyjj6589 Год назад +180

    Aside from the statistical check, another heuristic uses the newline as a way to rule out Unicode since a word-aligned newline has the bytes 0D 0A, but U+0A0D is not assigned in Unicode. Also it apparently only detects based on the first 256 bytes or so, which might make the longest string challenge futile beyond that point.

    • @FlyTechVideos
      @FlyTechVideos  Год назад +89

      oh wait, but that does mean that a non-word aligned newline could technically trigger it, right? should have spent more time researching nooooo

    • @Renteks-
      @Renteks- Год назад +19

      just make this comment your string so you can win the meta award

  • @ivirius.parody
    @ivirius.parody Год назад +294

    Ah yes. As a developer, I love to see people finding bugs in our lazy work

    • @AlphaFruit-hx4cw
      @AlphaFruit-hx4cw Год назад +13

      I'm an programmer in my free time, and this is actually facts. 😂

    • @dubl33_27
      @dubl33_27 Год назад +12

      @@AlphaFruit-hx4cw which Bush hid

    • @nt-authority-system666
      @nt-authority-system666 Год назад +2

      you're spitting FACTS

    • @UnixTMDev
      @UnixTMDev 10 месяцев назад +1

      @@nt-authority-system666bush wasn't

  • @egeakan7276
    @egeakan7276 Год назад +166

    God, when I tell you I thought Bush was a literal bush from nature and I couldn't figure it out until this day...

    • @mucookul
      @mucookul Год назад +4

      Same

    • @Aeduo
      @Aeduo Год назад +14

      Sounds like it'd be related to homer simpson creeping out of that bush gif.

    • @WindowsDrawer
      @WindowsDrawer Год назад +1

      Same

    • @sadpeperoni7508
      @sadpeperoni7508 Год назад +3

      My wife's bush hid some facts. I'm catholic, so I discovered the truth only after the marriage

    • @Liggliluff
      @Liggliluff Год назад +12

      Yeah, for people not from USA, or not familiar with USA's presidents, the name "Bush" is likely going to refer to an actual bush instead.

  • @Kiwifruit00
    @Kiwifruit00 Год назад +48

    4:19 for anyone wondering why fly wrote it as 0x75 0x42 when the hex editor shows 0x42 0x75 its because the file is encoded in "little endian", which means the last byte in the hex editor goes first when the computer is reading unicode.
    i dont know why the computer does this, nor am i an expert in these kinds of things but i just wanted to share in case someone wants to know

    • @johndododoe1411
      @johndododoe1411 Год назад +4

      Windows uses little endian because the x86 CPUs do so .

    • @leap123_
      @leap123_ Год назад

      because x86 (which windows and every single os that supports x86 runs on) use little endian, so fuck intel i guess

    • @RedstoneNguyen
      @RedstoneNguyen Год назад +3

      Write a simple program converting from integer to string and you will find out why little endian is a thing. Btw, the numbers we are writing everyday is big endian.

    • @johndododoe1411
      @johndododoe1411 Год назад +2

      @@RedstoneNguyen Both storage directions make perfect sense . Little endian for decimal digits is how Arabs write Arab numerals, big endian is how westerners write the same numbers with the same digits . Because computers gulp up entire binary numbers in one memory clock cycle it's entirely cultural for them too . The x86 and x80 CPU families belong to little endian design cultures, the 68000 and SPARC families belong to big endian design cultures . ARM and PowerPC hardware is bilingual in this matter .

    • @RedstoneNguyen
      @RedstoneNguyen Год назад

      @@johndododoe1411 i didnt say anything about culture. My idea is, little endian is mathematically simpler to implement than big endian.

  • @SuperCaitball
    @SuperCaitball Год назад +45

    The "oracle" not guessing correctly on newlines might be due to differences in newline coding; given it's a Python script it may be using only LF line endings, but famously Windows always uses CRLF as its line endings.

  • @e_g..
    @e_g.. Год назад +26

    "This comment has the challenge shown for the longest strings that triggers the Windows glitch from the video you recorded. The video's bug shows that it's difficult for doing accidentally. Specially for the challenge proposed. I try using workarounds and the hardest one probably is the odd character words I have to usually put for those requirements. But sometimes I don't use odd length, which usually happens because the symbols are joined with the word. I created a small story for the text: The bus was not there for a bad reason, probably. Maybe someone found the reasons but I don't know? Which increases the words variety I can use for this! The current character counter is at 690 and I think I could add non-sense but that wouldn't get interesting enough. I'm a bit close here, 200 character distance. So, I could get another story using odd character count words only. There was a man named "e_g.". The challenge was waiting for him!! And so he broke the world record! FlyTech himself saw the text, and got amazed! The comment had too many characters and broke the record! I could not believe it! The story ended and thank you for reading this."
    this whole text has 1156 characters, which beats the previous record of 1016.
    the remaining balance was -399, which means you could add 2 more characters (following the rules from the challenge) without failing.
    a proof was made in a windows xp vm, and you can see the video proof in my channel or test it by yourself.

  • @2520WasTaken
    @2520WasTaken Год назад +88

    Have you heard about the Russian city: Seversk? It has a humid continental climate maintaining a low temperature and receiving 530mm precipitation every year. Through its presence, nuclear weapons have been assembled there and stored. One serious nuclear catastrophe would occur in 1993 because a container holding a dangerous and radioactive substance exploded.
    Character count: 362
    Edit: I didn't actually test this on Windows XP notepad, but I used a script, and the script gave 7640 and 2542, and 7640 is just barely greater than 3*2542

    • @AdachiVlogsFIN
      @AdachiVlogsFIN Год назад +7

      Saves And Does Not Corrupt On Windows XP Media Center Edition 2005.

    • @moregirl4585
      @moregirl4585 Год назад

      @@AdachiVlogsFIN Confirmed on mine

    • @Lolph_NG
      @Lolph_NG Год назад

      whats M69 doing here

    • @Sypaka
      @Sypaka Год назад

      NO WAY LMAO.

    • @fr4ctalz638
      @fr4ctalz638 Год назад +1

      tried it on my windows XP vm it worked

  • @Blaineworld
    @Blaineworld Год назад +80

    now i kind of want to know how the current unicode detection works

    • @keltrm
      @keltrm Год назад +11

      I tried disassembling it, but it seems to have been moved to the kernel (RtlIsTextUnicode)

    • @Milennium1902
      @Milennium1902 Год назад +41

      It uses the 2 bytes FF and FE shown at 5:24. To make the glitch happen on modern Windows you put ÿþ into the beginning of a text file, then save it, and voila! You don't even need to input any special text after it.

    • @lunafoxfire
      @lunafoxfire Год назад +15

      they probably do the sane standard thing and look for 0xFFFE at the start

    • @chri-k
      @chri-k Год назад +4

      @@Milennium1902**Icelandic obtains ÿ**

    • @hellterminator
      @hellterminator Год назад +18

      @@lunafoxfire They did that back then, too, but that's not nearly enough.
      The _presence_ of the BOM _confirms_ a file is unicode.
      The _absence_ of the BOM _does not_ mean a file is _not_ Unicode.
      That is to say, if there is no BOM, you still have to check if it's Unicode.

  • @maelmauron7530
    @maelmauron7530 Год назад +39

    The 00 padding on non-unicode characters explains the fact that in a lot of files that are not text, some strings have spaces between each character... I've had this question since 2017 XD

    • @maelmauron7530
      @maelmauron7530 Год назад +4

      By not text I mean archives, executables, etc.

    • @chri-k
      @chri-k Год назад +14

      but most modern applications ( one exception being Windows itself ) use UTF-8 and not UTF-16. UTF-8 is fully backwards-compatible with ASCII, so this may not be the only reason.

    • @johndododoe1411
      @johndododoe1411 Год назад +7

      It's not padding, it's the page number . Page 00 is mostly the same as Western ANSI code, there are about 7000 other pages to keep track of, including the ones with smiley faces .

    • @Liggliluff
      @Liggliluff Год назад +1

      All characters are Unicode characters. Some characters are also in ANSI, but they are also in Unicode.

  • @Biaanca5036
    @Biaanca5036 Год назад +57

    I was never able to reproduce this bug when I was a little kid..
    But well, my boxed copy of XP was Service Pack 2 so I guess that's a given :P
    But I also don't remember what OS I was using at the time either.

  • @hhkl3bhhksm466
    @hhkl3bhhksm466 Год назад +69

    Your videos are always entertaining and informative. Keep it up!

  • @cathacker13
    @cathacker13 Год назад +23

    for my whole life i thought bush hid the facts was an intentional easter egg so this was a very interesting video to me personally just because of that

  • @623-x7b
    @623-x7b 7 месяцев назад +6

    When I was a kid I used to open up exe files in a text editor. I thought programmers had to remember a lot of characters and had special keyboards.

  • @aprilnya
    @aprilnya Год назад +61

    mans casually drops "my windows crashes when opening notepad" like HUH

    • @FlyTechVideos
      @FlyTechVideos  Год назад +9

      ruclips.net/user/shortsAtu7atNw-kw

    • @zerotwoisreal
      @zerotwoisreal Год назад +5

      what more can you expect from windows 11

    • @UltraCenterHQ
      @UltraCenterHQ Год назад

      ​I mean, it's an insider build. You would expect bugs

  • @boalbads
    @boalbads Год назад +46

    "we are warned not to change it" "So let's change it" I love this channel.

  • @proCaylak
    @proCaylak Год назад +14

    2:05 you don't have to disappoint them. there also was another Bush who was president between '89 and '93. In fact, he's the father of the Bush we all know and hate. I have no explanations for "~Flytech" though. 😅

    • @mossadgynist
      @mossadgynist 7 месяцев назад

      He hid the fact that he personally assassinated JFK

  • @russelllukenbill
    @russelllukenbill Год назад +7

    When 7/11 (I say 7/11 because if I wrote the right date my comment won't post) happened, the very next day in school someone came in to school and showed a group of us in the computer lab if you typed 9 and then 11 into Word using wingdings, it was a plane flying into two buildings. Wingdings has been changed since then.

    • @FlyTechVideos
      @FlyTechVideos  Год назад +7

      i'm pretty sure nothing stops you from posting 9/11 in this comment section

    • @Xnoob545
      @Xnoob545 Год назад +1

      @@FlyTechVideos a different commenter said that "11:44 I can't do that as any link (sometimes even links to RUclips) cause the immediate deletion of the comment, meaning it's visible to its author until page reload, edits fail ("Unknown error") and reloading the page makes it disappear. Only the creator of a video can post links in their own comments section without needing to be worried"
      youtube likes censoring the comments

  • @selfSplintered
    @selfSplintered Год назад +13

    Congratulations on being an official Wikipedia source! :D

  • @CattopyTheWeb
    @CattopyTheWeb Год назад +108

    Very interesting bug, Fly! Thanks for the video

    • @Tocinos
      @Tocinos 8 месяцев назад

      69th like

  • @FlyTechVideos
    @FlyTechVideos  Год назад +8

    flies.sh/discord

  • @keiyakins
    @keiyakins Год назад +5

    the correct way to do this is, of course:
    1. is there a BOM? If so, respect it.
    2. try to decode it as UTF-8. If it worked, you're done, its UTF-8 (or us-ascii but thats a proper subset so whatever)
    3. If you get here, complain to the user and make them figure it out.
    (if you're loading a document with more metadata you may of course use that too, I'm assuming plain text)

    • @paradoxmo
      @paradoxmo 8 месяцев назад

      UTF-8 didn’t exist when IsTextUnicode() was written. The Unicode encoding in use was UTF-16 (based on the earlier UCS-2). So this is legacy code from the early days of Unicode that was never updated.

  • @toydotgame
    @toydotgame Год назад +9

    This also explains what happens when you load Unicode-encoded files such as .lnk and .url in Notepad or Vim etc, and it displays the weird spaced out lettering as the program assumes ANSI plaintext. Cool!

  • @JustPyroYT
    @JustPyroYT Год назад +32

    Great Video! Very detailed explanation of the bug! :D

  • @ChloekabanOfficial
    @ChloekabanOfficial Год назад +6

    The first time I found out about the "Bush hid the facts" bug, I thought the text was referring to a literal bush.

  • @DavidWonn
    @DavidWonn Год назад +11

    NT4 Notepad runs on NT 3.51, though it will abruptly close at times. On these older versions, you can change Notepad's global font, and in some cases you may even be able to read the erroneous Unicode characters!

  • @PanoptesDreams
    @PanoptesDreams Год назад +4

    All these years later, this explains why XP notepad was such a pee-pee about opening random text docs.

  • @adrianv.v.4445
    @adrianv.v.4445 Год назад +10

    Pairing a text-generating transformer with a minimizing function for the unicode check could be funny to see

    • @Brahvim
      @Brahvim Год назад

      Exactly what I thought!

  • @kleinesfilmroellchen
    @kleinesfilmroellchen Год назад +4

    Even though Windows called it "Unicode", the less confusable and more accurate name is UTF-16.

  • @SsvbxxYT
    @SsvbxxYT Год назад +6

    But what if...in Windows 3.5, Bush was referring to George *H.W.* Bush?

  • @MarkSir
    @MarkSir Год назад +4

    You didn't dismiss the conspiracy. Bush, the old one, was US president from 1989 to 1993

  • @VukAndrijanic
    @VukAndrijanic Год назад +6

    Nice video keep it up! Will you upload more creepypasta videos like you did before?

    • @FlyTechVideos
      @FlyTechVideos  Год назад +15

      Thank you! I only ever uploaded 2 of them, and no, I am not planning to continue them as I came to dislike even the 2 videos that I already made.

  • @pyromancy8439
    @pyromancy8439 Год назад +3

    I have moved to Linux completely a long time ago, and every time I stumble upon Windows, I seriously don't understand how can the most popular desktop operating system STILL have significant issues with encoding. Today I was on a video call with my coworker and had the pleasure of witnessing a modern (2021 version) app with cyrillic text display umlauts and diacritics instead of actual text on English-configured Windows., Look, mum, we've had Unicode for 32 years now!
    P.S. yes, I know Windows uses UTF-16, I refer to UTF-8, which is used practically everywhere on the web.

  • @fgregerfeaxcwfeffece
    @fgregerfeaxcwfeffece Год назад +3

    Microsoft is well known to preserve traditional bugs. Even the Win10 installer still could not select a partition to install.

    • @Sypaka
      @Sypaka Год назад +2

      please explain in detail.

  • @LeonAlkoholik67
    @LeonAlkoholik67 Год назад +7

    Kinda expected that it's just another encoding issue. Even nowadays Notepad still has similar encoding issues like when you write a script and it contains some special characters in it... and then you realize your script is broken. Aside that you should never use Notepad for scripting anyway...

  • @_AE_EA_
    @_AE_EA_ Год назад +7

    Could be wrong on this but I think windows uses CRLF encoding so you would need to put /r/n into the oracle to replicate the notepad newline

    • @FlyTechVideos
      @FlyTechVideos  Год назад +6

      I tried it with
      after the video, and the oracle says that it's censored while Notepad still doesn't break

  • @lior_haddad
    @lior_haddad Год назад +16

    encoder? I hardly know 'er!

  • @intron9
    @intron9 Год назад +7

    "in unicode encoding, each character is 2 bytes"
    Not exactly,but close enough explanation...

    • @Gameplayer55055
      @Gameplayer55055 Год назад +6

      UTF-16. Then they wanted some nice 🥵emojis and even ♔chess. Then UTF-32 appeared. yet UTF-8 is variable length which saves the space but not the nervecells of c++ devs

    • @Mnnvint
      @Mnnvint Год назад +2

      In the old Windows World's favorite unicode encoding. Which they got stuck with, even though it was a bad idea, because they were too eager to use unicode and more sensible unicode encodings hadn't caught on yet.

    • @jhgvvetyjj6589
      @jhgvvetyjj6589 Год назад +1

      @@Gameplayer55055 ♔ fits in 16-bit character though

    • @cl00e9ment
      @cl00e9ment Год назад +2

      It looks like they encoded the code points directly. They did not use UTF-8 encoding or else ASCII characters would be only one byte, and they did not use UTF-16 encoding either because UTF-16 is not padded with NULL bytes. In other words, they succeed to mess up their Unicode implementation and invent a new encoding while Unicode was supposed to unify everything. Oh the irony...

    • @jhgvvetyjj6589
      @jhgvvetyjj6589 Год назад +1

      @@cl00e9ment All ASCII characters do have null byte in high byte when represented in 16-bit integer though. 0x20 in 8-bit becomes 0x0020 in 16-bit, which becomes 0x20 0x00 in little endian, which is the correct little endian representation of space in UTF-16.

  • @UltimatePerfection
    @UltimatePerfection Год назад +4

    2:15 Bush Sr predates Clinton though as the president.

    • @FlyTechVideos
      @FlyTechVideos  Год назад +2

      The Bush that "hid the facts" refers to the junior one, doesn't it? (Iraq war?)

    • @UltimatePerfection
      @UltimatePerfection Год назад +2

      @@FlyTechVideos Bush Sr. hid two facts though: That his son is a liar, and that he was a madman that almost started the WW3.

  • @JustAPersonWhoComments
    @JustAPersonWhoComments Год назад

    Plot twist: Turns out, the entire video was just a secret message from aliens, and they were trying to communicate in their own funky beatbox language

  • @skyegibbs4955
    @skyegibbs4955 Год назад +3

    4:16 Saying that everything in "Unicode encoding" is 2 bytes is a bit misleading. This applies only to the implementation of Unicode used in very old versions of Windows (UCS-2), and does not apply to any modern, variable-width Unicode encoding. Notepad received support for UTF-8 and UTF-16 with Windows 7.

    • @skyegibbs4955
      @skyegibbs4955 Год назад

      en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows
      en.wikipedia.org/wiki/Windows_Notepad

  • @andyhu9542
    @andyhu9542 Год назад +1

    I can see where this comes from: if someone types in Chinese (or other Unicode-only) language, the characters will be very close to each other in the encoding space. Therefore, the most significant byte would be very close to each other, while the least significant byte would essentially be random.

    • @jacky7204
      @jacky7204 Год назад

      While if someone types with just a few Unicode characters thrown in, the ASCII values will produce the null bytes seen at 5:11, which also won't add to the most-significant-byte-difference counter.

  • @quantumelle
    @quantumelle Год назад +4

    How about this one?
    He erected later a great monastery in which he lived forty years and had eight hundred and eight followers--they bound him tightly and carried him between them on their shoulders
    (-9)

  • @ThatRandomToast
    @ThatRandomToast Год назад +4

    2:58 This variant of BSOD is caused by using VMware SVGA 3D

  • @mfaizsyahmi
    @mfaizsyahmi Год назад +3

    I'd imagine Raymond Chen would write about this embarrassing algorithm implementation in his blog soon.

    • @FlyTechVideos
      @FlyTechVideos  Год назад +5

      He already did: devblogs.microsoft.com/oldnewthing/20070417-00/?p=27223

  • @IvyANguyen
    @IvyANguyen 8 месяцев назад +1

    Learnt something new in your video: Mojibake! I never knew this phenomenon had a specific name. I recall seeing it a lot in 1998 to 2001 on WebTV here in the US when viewing Japanese, Chinese, & Korean web pages. (Really, any pages that didn't use the Latin alphabet.)

  • @joveaaron-real
    @joveaaron-real Год назад +5

    If Windows is smart enough to remove the first two unicode bytes (0xFF, 0xFE), why the hell didn't they use it to detect unicode aswell?

    • @FlyTechVideos
      @FlyTechVideos  Год назад +14

      They did - 0xFF 0xFE was consistently recognized as Unicode. The problem is that they assumed that text without this prefix can be Unicode as well, and they used the presented heuristic to guess

    • @joveaaron-real
      @joveaaron-real Год назад

      @@FlyTechVideos sounds like somebody in Microsoft didn't read the documentation all the way 🤣

    • @jhgvvetyjj6589
      @jhgvvetyjj6589 Год назад +7

      Microsoft detects Unicode even without the byte order mark by design since other programs and/or platforms may save text files like that.

  • @Liggliluff
    @Liggliluff Год назад +2

    (4:15) Incorrect, Unicode itself doesn't require 2 bytes per character. Unicode is just a list of character. It depends on what encoding you use. UTF-16 is what Windows uses, which requires 2 or 4 bytes per character, and there's also UTF-32 which requires 4 bytes per character, as well as UTF-8 which is a variable number of bytes per character.
    (5:25) FF FE doesn't mark it as Unicode exactly, it does, but it also marks it as UTF-16, which is why it's 2 bytes per character. B is 42 00 as you show, but FE FF is also UTF-16 but reversed, where B is 00 42 instead. EF BB BF is UTF-8, for example.
    (6:40) Since UTF-8 (Unicode) is variable length, why can't this be an odd length? Characters can be variable length from 1 to 6.

  • @ZipplyZane
    @ZipplyZane Год назад +6

    I was hoping you'd then follow with how they fixed the bug. How does Notepad in Windows 7 detect Unicode?

    • @chri-k
      @chri-k Год назад +4

      by checking if the file starts with 0xFFFE

  • @DacroyleYT
    @DacroyleYT Год назад +2

    Dang, I thought the "Bush hid the facts" thing was an easter egg until I saw this

  • @liniarc
    @liniarc Год назад +4

    So I ran a few tests and found the exact letters and words don't influence notepad's unicode detection algorithm too significantly. By far the biggest factor is the location of the space and punctuation symbols. If the space and punctuation symbols occur primarily at an odd index, then the unicode detection algorithm can get a large bonus towards the odd/lower bytes since the ascii distances between the punctuation symbols and lowercase letters greatly exceeds the ascii distances between any two lowercase letters. This means that if you use words with an odd number of letters, most space symbols end up on the odd indices. However, writing sentences using exclusively odd number of letters for all the words isn't easy. You can therefore sometimes use pairs of even words for a nicer structure. By using these tips, you can write quite lengthy sentences which sound almost completely natural without having to recalculate a new score every other character. PS. This whole comment would get censored by notepad

    • @ZephyrysBaum
      @ZephyrysBaum Год назад

      Wow

    • @liniarc
      @liniarc Год назад

      Character count: 1016
      3 * Higher Diff - Lower: -85
      Tested successfully on Windows 2000 pro
      With the heuristic outlined, it's fairly easy to make arbitrarily long strings, especially if you aren't overly concerned about clarity, word flow, or long term sentence structures. I wrote the comment without significant scripting/code assistance (only checking the isUnicode value every sentence or so).

    • @ZephyrysBaum
      @ZephyrysBaum Год назад

      @@liniarc I’ll try making one I think

  • @jiaan100
    @jiaan100 7 месяцев назад +1

    Did you know conspiracy and conspiracy theory were words in 16th and 17th century, but conspiracist and conspiracy theorist weren't words until the 60s and 70s.

  • @stephaniethebatter7975
    @stephaniethebatter7975 Год назад +3

    Bush didn't hide the facts, but Windows certainly did.
    (not part of the challenge, just a joke)

  • @hackdesigner
    @hackdesigner Год назад +1

    Visual C++ 2008 Express Edition? I see a man of culture!

  • @kaninchengaming-inactive-6529
    @kaninchengaming-inactive-6529 Год назад +4

    Least broken microsoft product:

  • @AAFREAK
    @AAFREAK 8 месяцев назад

    The fact I just remembered that Microsoft dropped support for WordPad, this video gives me yet another reason for me to be infuriated for that. WordPad has been faithful to me for other reasons, but this is a reminder of something I could have still benefitted when migrating to different OSes. Plus, seeing that even W11 crashed while using Notepad just tells me that it was a bad decision in the first place as it furthers the incompetence.
    Plus, I'm certain I remember this old bug from the days of the drink holder prank (which they've since patched over). It's nice this still can be done. Nostalgic, at least.

  • @Lopoi
    @Lopoi Год назад +7

    Can videos be used as citation? I thought it had to be academic articles or something like that

    • @FlyTechVideos
      @FlyTechVideos  Год назад +11

      Not sure if they _can_ , but I think I've seen some. Don't take my word for it though

    • @rizkyadiyanto7922
      @rizkyadiyanto7922 Год назад +1

      even blog posts can.

    • @shepardpower
      @shepardpower Год назад

      I think so

    • @wiger_
      @wiger_ Год назад +4

      generally self-published content is not supposed to be used as a reliable source for a citation, but in this case, i guess it could be used as a showcase of a behavior mentioned in the article

    • @renerpho
      @renerpho Год назад +3

      @@FlyTechVideos I've been working on that exact same question for a different video lately (a Karl Jobst documentary).
      The short version is that the video can not be used unless it is published by a reliable source. Since RUclips videos are self-published, they don't count. An exception can be made if the person who posts the video is considered a subject matter expert. We've discussed that for Karl Jobst, but determined he doesn't qualify. For it to work, Jobst would have to have published articles about his work in trusted sources, outside of RUclips, and he hasn't done so.
      What that means for your video:
      Have you published journal articles about PC bugs, under your name? (Just being cited by them is not enough.) If the answer is "yes" then that's great! Please give us a link to that. With some luck, that will make you pass as a subject matter expert, and THEN we can start thinking about citing your RUclips video.
      P.S. For Jobst's video, we could avoid citing the video in the end. Jobst presented all his sources in the video. References to reliable sources that demonstrate that his conclusions were correct. Can you share such a source for your topic? In that case, we can start working on the Wikipedia article as well.

  • @GrizlikD
    @GrizlikD Год назад +2

    FlyTech, after he held a Microsoft employee in his basement for the past year:
    _"I am legally not allowed to tell you how I figured out. Let's say, I consulted some trustworthy sources for this."_

    • @SirAU
      @SirAU Год назад +1

      Yes.

    • @SkigBiggler
      @SkigBiggler Год назад +2

      More likely he used he leaked XP kernel source

    • @SirAU
      @SirAU Год назад +1

      Wait, is that my uncle?

  • @Foxy_AR
    @Foxy_AR Год назад +19

    9:45 if you were to spam a lot of these blocks, could you write secret messages?

    • @FlyTechVideos
      @FlyTechVideos  Год назад +38

      If you consider mojibake secret, then yes

    • @nothing-lo8lh
      @nothing-lo8lh Год назад +1

      @@FlyTechVideos Well no because it's fixed in newer versions of windows.

    • @Gameplayer55055
      @Gameplayer55055 Год назад +5

      I live in ukraine and i've seen many university pages full of them (idiot devs, no )
      After making essays in vscode i can see some mysterious п»ї too. its shit from BOM as i know

    • @aurastrike
      @aurastrike Год назад +2

      @@nothing-lo8lh You can cause it by adding ÿþ to the start of a .txt file

  • @MMMMMMarco
    @MMMMMMarco Год назад +1

    In unicode, a character isn't 2 bytes. Unicode itself is not an encoding, just a standard. UTF-8 is unicode where each character uses 1 byte or more, UTF-16 uses 2 bytes or more and UTF-32 always uses 4 bytes for each character.

  • @Neubulae
    @Neubulae Год назад +3

    Try "联通", iirc this single word also caused issue therefore similar "rumors", or rather, memes, arose amongst Chinese communities about how China Unicom had vendetta with Microsoft whatsoever.

  • @subparlario4916
    @subparlario4916 Год назад

    "Feel free to use this video as a citation! :)" bruh i audibly CACKLED when i saw that 💀

  • @CrushedAsian255
    @CrushedAsian255 Год назад +4

    Before watching , let me guess is it some kind of Unicode auto detect mode bug

  • @WalnutBun
    @WalnutBun 8 месяцев назад

    So, fun fact: As the video says, UTF-16 ("Unicode" encoding according to Notepad) text files always start with either 0xFFFE or 0xFEFF (to indicate endianness). 0xFFFE and 0xFEFF don't make *any* sense as ANSI-encoded text (they display as ÿþ and þÿ respectively), meaning it's far safer to just look for those patterns to detect whether or not a file is encoded in UTF-16.

    • @FlyTechVideos
      @FlyTechVideos  8 месяцев назад

      What you are referring to is calldd "UTF-16 with BOM". The BOM (Byte Order Mark), however, is not mandatory (in fact it is discouraged). Read more here en.m.wikipedia.org/wiki/Byte_order_mark

  • @cmyk8964
    @cmyk8964 Год назад +6

    But why was that “low > 3*high” heuristic chosen?

    • @FlyTechVideos
      @FlyTechVideos  Год назад +15

      A unicode string with "simple" characters usually has a lot of null bytes, e.g. 42 00 43 00 44 00 ... and the heuristic is engineered to detect exactly this. As we can see, this leads to false positives

    • @TheGodOfAllThatWas
      @TheGodOfAllThatWas Год назад +3

      ​@@FlyTechVideos Is that logic true across most languages or is this an English bias? Are there any countries where it's not true?

  • @RabbitEarsCh
    @RabbitEarsCh Год назад +1

    Real unicode detection is pretty expensive, so this NT bug got to live for quite a long time...

  • @netkv
    @netkv Год назад +15

    i completelly forgot some insane systems dont use utf8

    • @Mnnvint
      @Mnnvint Год назад +2

      The windows world started using unicode before utf8 was invented. The Java world too. Sometimes it pays to be slow (although I remember switching my old redhat/mandrake systems over to default to utf8 was not fun either).

    • @johndododoe1411
      @johndododoe1411 Год назад +1

      Unicode was created in the late 1980s. Microsoft and Java chose the early 16 bit Unicode and then had to use UTF-16 to encode the next few thousand pages . Then someone decided that any characters not handled by UTF-16 should be banned from all other encodings .

  • @JohnDlugosz
    @JohnDlugosz Год назад

    I posted an analysis of the algorithm as an answer to the post "how did Windows developers come up with...".
    TLDR; it can distinguish the patterns found in multi-byte ANSI vs 16-bit Unicode for Asian encoded text, and other languages fall into subsets of those distributions.

  • @the-Gammaron
    @the-Gammaron Год назад +6

    "Legally not allowed to tell"
    Let me guess, reading leaked source code?
    Idk. You can't probably reply to this anyway.
    Or maybe it was a joke.

    • @Milenakos
      @Milenakos Год назад +6

      the word "sources" is italic

    • @the-Gammaron
      @the-Gammaron Год назад +1

      @@Milenakos Ahhhhhhh ok I'm dumb

    • @maximiliano_sv
      @maximiliano_sv Год назад +1

      @@the-Gammaron I still don't understand that about the sources, can someone explain it to me? is it some joke or something?

    • @ruben_balea
      @ruben_balea Год назад +1

      @@maximiliano_sv Basically Microsoft does not allow everybody to see the Windows source code or disassemble Windows to get the closest thing to the original source code, ChatGPT explains those things better than me:
      The source code of Windows is a set of instructions and programs written by Microsoft developers, forming the foundation of the operating system. While there are several reasons why an ordinary person cannot legally read the source code of Windows, here are some of the main ones:
      - Intellectual property rights: The source code of Windows is owned by Microsoft. The company invests significant resources, time, and effort in developing and improving their operating system. As a result, they have intellectual property rights over the code and can control who has access to it. This means they cannot allow just anyone to access and read their code without authorization.
      - Protection of trade secrets: The source code of Windows contains valuable and strategic information for Microsoft. Disclosing that code to unauthorized individuals could compromise the security, stability, and competitiveness of their operating system. Protecting their trade secrets is crucial for maintaining their market position and competitive advantage.
      - Risk of vulnerabilities and hacking: If the source code of Windows were available to the general public, it could be scrutinized by malicious individuals, including hackers and malware developers. This would increase the risk of discovering vulnerabilities and creating targeted attacks on the operating system, jeopardizing the security of millions of users.
      - License and end-user agreements: By using Windows, users agree to the terms and conditions set by Microsoft in their license and end-user agreement. These documents clearly state that users do not have the right to access, modify, or distribute the source code of the operating system.
      - In summary, the source code of Windows is owned by Microsoft and is protected by intellectual property rights and trade secrets. Its widespread access is not permitted to safeguard the security, competitiveness, and legal rights of Microsoft, as well as to prevent potential risks to users of the operating system.
      Disassembling parts of Windows to figure out the source code presents legal and practical challenges. Here are the reasons why an individual cannot easily disassemble Windows to obtain the source code:
      - Legal restrictions: Reverse engineering, which involves disassembling software to understand its underlying code, is subject to legal restrictions in many jurisdictions. Companies like Microsoft have legal protections in place to prevent unauthorized access, modification, and distribution of their software. Engaging in reverse engineering without proper authorization or a specific legal exception can infringe upon intellectual property laws.
      - Technical complexity: Disassembling software like Windows is a complex process that requires expertise in low-level programming languages, assembly code, and system architecture. It involves converting machine code (binary instructions) back into human-readable form. Understanding the disassembled code and piecing it together to reveal the original source code is a challenging and time-consuming task that requires significant skill and knowledge.
      - Incomplete information: Disassembling Windows and examining the disassembled code may provide insights into specific functions or algorithms, but it does not yield the complete source code. The source code of a complex software system like Windows consists of millions of lines of code, libraries, and dependencies, which cannot be fully reconstructed through disassembly alone.
      - Trade secrets and obfuscation: Software developers often use techniques such as code obfuscation to make disassembly and reverse engineering more difficult. Obfuscation intentionally complicates the disassembled code by adding irrelevant instructions, removing meaningful variable names, or applying other transformations. These measures aim to protect trade secrets and intellectual property by making it harder for unauthorized individuals to understand and reproduce the original source code.
      - In summary, legal restrictions, technical complexity, incomplete information, and code obfuscation make it challenging and legally risky for an individual to disassemble parts of Windows and obtain the complete source code. Reverse engineering is a highly specialized field that requires expertise, and engaging in it without proper authorization can have legal consequences.

    • @safwan6363
      @safwan6363 Год назад

      Thank you lol

  • @mfessi
    @mfessi Год назад +1

    21x johnsparks still does the trick.
    Source: "Windows Notepad - an Old Problem Surfaces Again"

  • @cs127
    @cs127 Год назад +5

    great video. your explanation was perfect!

  • @ghost_ship_supreme
    @ghost_ship_supreme Год назад +1

    This would be an interesting game mechanic for secret codes

  • @Dontlokhere
    @Dontlokhere Год назад +3

    Bush hid the facts
    畋样凭摩琠映捡獴
    Picking up mongoose

  • @KnakuanaRka
    @KnakuanaRka 8 месяцев назад +1

    I’m guessing as for why it detected Unicode like this, the idea is that legitimate Unicode text will mostly contain symbols from a single alphabet, which are close together and thus will have high bytes that are close, so the sum of differences in high bytes will be low. If it isn’t, the high bytes (made of every second letter) will be all over the place, making the sum of differences higher. Would be good to have an explanation like that in the video.

    • @FlyTechVideos
      @FlyTechVideos  8 месяцев назад +1

      You're right... thanks for the suggestion

  • @barfooguy
    @barfooguy Год назад +18

    I'm just wondering, how can the windows notepad crash windows?

    • @RandomGeometryDashStuff
      @RandomGeometryDashStuff Год назад +1

      buggy gpu driver?

    • @sebastianx708pl
      @sebastianx708pl Год назад +4

      When I used RTM version of Win 11 on VMware after release, while using paint crashed into BSOD in few seconds so could be a VM problem

    • @BlueFlame_00
      @BlueFlame_00 Год назад +4

      10gb+ files

    • @johndododoe1411
      @johndododoe1411 Год назад +1

      Win11 notepad is very buggy, it can't even display actual text files .

    • @RandomGeometryDashStuff
      @RandomGeometryDashStuff Год назад

      @@johndododoe1411yes, microsoft made notepad slower than notepad++
      same with cmd.exe (old one still exists as conhost.exe)

  • @bonkmaykr
    @bonkmaykr Год назад

    I grew up thinking this was an intentional easter egg! I guess people just thought it was intentional because Bush has had conspiracy theories about him and XP came out around back then. Thanks for showing us how this really works, was very cool

  • @coolmannder
    @coolmannder Год назад +4

    you can't make a video showing something in windows xp without dreamscape and bandicam

    • @FlyTechVideos
      @FlyTechVideos  Год назад +7

      dreamscape is content-id protected and because i am not desiring to donate all my income to 007 sound system i would rather not
      but yes

    • @CattopyTheWeb
      @CattopyTheWeb Год назад

      lol

  • @MasonSchmidgall
    @MasonSchmidgall 8 месяцев назад

    I think i know why line termination stop the bug from occuring. In Windows, lines are terminated using both the carriage return '
    ' character and the newline '
    ' character. If the characters ever appear together without a separating byte, then notepad can instantly deduce that the file is ansi. If they were separated by a byte, then it was unicode.
    Of course, this is an unconfirmed theory of mine. However, it makes a lot of sense.

  • @CBFNetworksArchive
    @CBFNetworksArchive Год назад +5

    Petition to make his videos way frequent:
    👇

    • @thepikachugamer
      @thepikachugamer Год назад +8

      That will overwork him

    • @CBFNetworksArchive
      @CBFNetworksArchive Год назад

      @@thepikachugamer Why? He is making videos every 4 months! Is it too rare?

    • @Alexander-oh8ry
      @Alexander-oh8ry Год назад +4

      ​@@CBFNetworksArchive Petition for you to 1. not decide that for another person and 2. to do your math

    • @CBFNetworksArchive
      @CBFNetworksArchive Год назад

      @@Alexander-oh8ry Petition to you to 1. do not send me messages if you're a alexander (yes, it's on purpose) and 2. not to make me do things that I shouldn't do today or after 2 weeks

    • @Alexander-oh8ry
      @Alexander-oh8ry Год назад +3

      @@CBFNetworksArchive dam somebody sounds insulted because they cant force a youtuber to do more videos and is called out for it
      (and because they cant do math)

  • @Drag0nmaster
    @Drag0nmaster Год назад

    Fun fact: the "godmode" shortcut "easter egg" does work, but it doesn't require the words "godmode." It is just a control panel with more stuff

    • @Sypaka
      @Sypaka Год назад

      you can do this with almost every CLSID, as long it has a "ShellFolder" tag on it.
      Try "{ED834ED6-4B5A-4bfe-8F11-A626DCB6A921}" and you want to punch MS.

  • @shaunclarke94
    @shaunclarke94 Год назад +1

    Great info on how Unicode encoding is detected. I've always wondered about this.

  • @MylarDaleToloMDTTV
    @MylarDaleToloMDTTV Год назад +1

    The bug is also included in Windows Longhorn pre-reset.

  • @Lampe2020
    @Lampe2020 Год назад +1

    11:44 I can't do that as any link (sometimes even links to RUclips) cause the immediate deletion of the comment, meaning it's visible to its author until page reload, edits fail ("Unknown error") and reloading the page makes it disappear. Only the creator of a video can post links in their own comments section without needing to be worried.

  • @christopherg2347
    @christopherg2347 Год назад +1

    I can only point to this 2 decades old article:
    "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)":
    *The Single Most Important Fact About Encodings*
    If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that “plain” text is ASCII.
    *_There Ain’t No Such Thing As Plain Text._*
    If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.

  • @pro-socialsociopath769
    @pro-socialsociopath769 10 месяцев назад

    I love how this was made almost exactly like a classic video from 2008. With the exception of the RUclips Anthem and a Fraps watermark...

  • @Tigrou7777
    @Tigrou7777 Год назад +2

    It would be interesting to find out how they fixed it in newer versions of Windows (if that encoding is still relevant).
    What is a quick fix or did they choose a totally different method ?

    • @Sypaka
      @Sypaka Год назад +1

      doesnt work on Windows 10... but on notepad2.

  • @jetseverschuren
    @jetseverschuren Год назад +2

    And that's why everybody just uses UTF-8 nowadays

    • @jhgvvetyjj6589
      @jhgvvetyjj6589 Год назад +1

      Which is easily mixed up with all the other 8-bit encodings leading to more glitches!

    • @jetseverschuren
      @jetseverschuren Год назад +1

      @@jhgvvetyjj6589 for text, really only plain ASCII or UTF-8 is used, at least in sensible systems

  • @gowindows6639
    @gowindows6639 Год назад +1

    Finally, a video that explains everything! tysm, FlyTech

  • @ron4212
    @ron4212 8 месяцев назад +2

    Lol it this video is now a citation in the "Bush hid the facts" wiki page

  • @harrytsang1501
    @harrytsang1501 9 месяцев назад

    I am pretty sure Windows XP notepad messed up this half the time if you are using CJK languages. There's a reason we wait a minute for word to load since it is slightly less prone to this but still not immune.
    The widespread adoption of emoji really made western developers aware of unicode and text encoding problem is rarely a problem anymore

  • @MatthewTehSpartanUpdated
    @MatthewTehSpartanUpdated Год назад

    1:50
    This is truth that WordPad is un-breakable.

  • @UdderlyEvelyn
    @UdderlyEvelyn Год назад

    I remember when this bug was new when I was young, cool to see why it happened years later when I'd forgotten about it.