This video is pretty fascinating. With many years of experience using Linux and command line, I am already familiar with the fact that file format is just a trick (every file is just a binary stream anyway), but I am sill surprised that you can easily craft files to be interpreted by multiple programs differently. I am not sure whether it's a good thing or a bad thing. From a developer's perspective, we want the file format to be unambiguous, because we know from experience that ambiguity is a common source of bugs and unexpected behaviors. However, sometimes we also want flexibility and tolerance. For example, we want to add more features to file format but not break the older version program, which means we shouldn't be overly strict on recognizing format. These two design principle are sometimes conflicting to each other, and I think it is the main cause of the issue.
I'm someone in the category of people freaking out about closed source binwalk so I see files agnostically already. But I thought - "Hey, If I give this video a chance I'm sure LiveOverFlow will teach me something new" - All I can say is WOW The idea of not encapsulating but "programming" a file into the zip format is a complete paradigm shift. I will never be able to look at files in the same way again holy shit bro you just blew my mind.
The scene with dark background, a table and a simple t-shirt makes this feel like an interrogation scene where police is asking the criminal questions.
I really like this scenario, for me doesn’t look any like that... but only a minimalist and well filmed scenario. No more that typical youtuber background bullshit with their setup behind
I remember my first time learning about this file format trick was probably about 2005-2006 on 4chan of all places. Someone uploaded an image, it was the cover of a C++ textbook, or some C language. I can't quite remember, but what I do remember was you could download the image, and extract that exact document from it. They embedded the textbook within the image itself, and used any image hosting site to discretely share it with people. I was blown away.
I really like this type of videos. The explanations where really good the quality is very high and overall I can confidently say that I've learnt something I didn't know before!
I honestly like the light mode more. Not only it feels more reminiscent with what I associate with the channel, keeping it light at all times prevents eye whiplash like at 0:55 where the light fills the entire video instantly.
The 010 tool is pretty awesome, reminds me of something similar I made for terminals - but more advanced and with a hex editor. Thanks for sharing that!
If anyone has seen this image going around on Discord; "Please don't open me in the browser". Basically 2 image png animation. Renamed to .zip, gives a .mp3 file inside, containing some metadata about opening the audio in a image viewer. Pretty cool, took half of the day to get to the end of it.
Files being source code is so blatantly obvious I never though of it, but when you pointed it out it instantly made sense how one could play with the file. and espessially when you showed that ansicphpbash "file format" :) How many of us have struggled with one code trying to parts a bit of another code as a string/variable, or what ever, only to realize you forgot to reformat it so that the thing you're trying to pass is not a being interpreted as actual code.
I'd like to point out some flaw in the video 1. The Person A and Person B analogy, rather than just "liking" it should've been "only knowing" or "ignoring except". PDF program would read the PDF code, and not the ZIP code, and the other way around for Zip programs 2. Rather than changing the file name extensions, you could probably just run the file with the program right away. Though I'm not sure since I haven't tried it, but I'm sure it would run as is. Anyway, great video as usual, thanks for sharing this information with us!
Dont We All................... just monika ngl LiveOverflow should check out/play games that have neat tricks like what ddlc does, and im pretty sure there are obscure as heck ones out there.
You can do this in windows with the copy command and the /B switch for binary "copy /B picture.jpg+folder.zip new.jpg" I learned this when I heard that a promotional desktop wallpaper for Portal had an Easter egg in it. If you opened it as an archive the ending song "Still Alive" mp3 was in there. This was a triumph!
LaTeX is fucking incredible. Also if you like meme ways of writing your documents check out groff/troff. Much simpler than LaTeX. Or, just start converting markdown/emacs org-mode to LaTeX. That takes the pain out of it. I wrote all my Bio notes in org-mode then compiled it into a final LaTeX document without much trouble.
Great video, I love thinking of zip as interpreter for .zip "source code"! I also just love the concept of weird machines Recently, for educational purposes, I've written a couple of image file formats and also I'm writing an interpreter, so this is right up my alley :)
I like this channel a lot :) I really liked how you explained that zip files actually program zip to make a file, rather than "contain" data. You could have brought in zip-bombs at this point, because then a 1Kb file making a 42Gb file kind of shows how it's generative. Having said that, I think you made one point unclear, which was that programs sometimes "ignore" bytes they don't understand, like scanning for a dog and not seeing the cat. PDF is weird in that it looks for it's magic sequence anywhere in the file, ignoring the zip at the beginning. ZIP, for example, wont do this. Python wont do this. etc etc. Nevertheless, through use of commenting, which is like programming zip to deliberately ignore code, you make polyglots. Polyglots almost always use commenting. If commenting wasn't possible, making polyglots would be WAY harder! Programs don't typically ignore anything, unless you trick them into it. Also this whole video strikes at the heart of a big problem in Europe, what does data privacy/security/illegal information actually mean? What if a picture file, for example, looks like a beautiful sunset in one image viewer application, but child-pron in another. Is the *file* child-pron, or is the image-viewer *making* child-pron when it's displayed? Or both? Do you need to have both on your computer to break the law? TL;DR we all have child-pron and state-secrets on our computers, sometimes in the same file, we just don't have the software to view it.
Wow your production level got upgraded so much since the last time I watched your vids. Looks even better than before. Wish YT algo would push more of your content to my Recommended.. Now I will to go on and "manually" watch your latest stuff. Super interesting content and approach to digital security one of my favorite channel on the subject for me as a developer/programmer Keep it up :)
À true hacker spirit, reminds me of my youth. It pleases me to see young talents, there are so few of them, while I thought 30 years ago that there would be countless hackers far better than us in the future. It never happened, everything has gone down, so these videos are refreshing.
This gives me new appreciation for the mantra: parse don't validate. If you just look for what you are expecting, you might admit more than you were bargaining for.
He does know why, just not precisely what the reason is in this particular case. In this particular case it could be anything, depending on the way the zip format is there might be many ways to hide the pdf. You just need to find a way to make the information redundant, like making things a comment in the earlier polyglot C php bash etc example. :)
That's because the PDF file was not zipped, its contents were just combined with the contents of the zip file in the final generated file. So when the zip program reads the final file, it encounters the only thing that was zipped in that file: the text file. It's not that hard to get 🙂
Great video. I recalled how many students were amazed when I had students extract an image from a PDF in my seminar course (a course where students teach the class) where I talked about stegosploit. Makes you think what files could be hidden in a PDF. However, that was only the start of it because the toolkit created by Saumil Shah, the person who created Stegosploit, hid the toolkit inside the image. So you had to rename the image extension to HTML and open it in the browser to obtain the toolkit. I was also very shocked when I first was researching the topic.
No this type of ctf challenges are not at all annoying infact I solved this type of challenge yesterday and learned alot about file extension and that's how I reached to this video. You explained everything perfectly. Thanks
Because steganography often uses this as the method of embedding data. That said, not all cases of this are steganography, for example, a self-extracting archive is seen by the shell as an executable file but as an archive by the archive program. (An archive program is usually involved with these.)
Greetings from Bremen. Thanks for the interesting Video. As a linux user and C programmer not much was new for me but it was still interesting to listen to you explaining in detail
I think the CTF about finding the hidden stuff in a file would be a great challenge for a stego CTF. And it is valuable experence in identifying stego. And if your getting into information security stego experence is very important.
ZIP file’s ‘header’ is located at the end of file and it includes list of files in the archive. Any file records which are not specified in that header at the end are simply ignored. This way you can easily modify a ZIP archive by appending data at the end of it (e.g. you can remove files from the list, add new file records and new entries in the list etc.). This also means that you can just append a ZIP file at the end of another file, and the result will likely work as both. ZIP program will look at the end of the file to look for list of files to extract and will completely ignore whatever is at the beginning of the file. This is how self-extracting archives work. Your zip tool might warn you if there is some data at the beginning that it doesn’t expect, but that depends on the tool. The way the trick works with PDF is that the PDF header does not need to be located at the very beginning of a file. This means that one can add a prefix which will make ZIP think it is a file record but because the file record is never referenced in the header at the end, the tool will just ignore it. On the other hand, PDF reader will ignore the prefix looking for the PDF header.
Very interesting, it really works, no special tools required, just with: cat tmp.zip >> random.pdf Then random.pdf still works as pdf and can also be read as zip in my filemanager (dolphin) if renamed. But not in other archive tools, it says that it not a supported archive, so that would need the prefix trick as well. "file" reports it as pdf, because it reads the magic number at the start, which was of course not modified by the simple appending.
@@1Hippo, it may be ignorance or security concerns why some tools don’t open such archives. Info-Zip will complain about extra bytes but otherwise will just go ahead and extract files correctly. By the way, some libraries might offer a ‘streaming’ interface for reading archives (e.g. Java’s ZipInputStream) which will read the file entries as they appear in the data ignoring the directory at the end.
I can totally relate to this. Grew up with windows but entirely switched to Linux like 8 years ago. I have a completely different understanding for the filesystem now.
Over the last twelve years, I have tried to switch over to Linux no less than 6 times. I am driven insane by trying to deal with obscure problems, and have to turn back to windows every time. Perhaps it's just because I'm not a coder and am just a power user. But if a power user like me can't take the frustration of Linux, I can't imagine normal people ever being able to take it.
@@LeoStaley if you are a windows user who needs windows it will be difficult I guess. I'm a software engineer, so I have a big benefit from using Linux and only downsides when using windows so that made the decision easy for me.
For a second there I thought this was some bare entry-level tutorial ... saw the channel name ... Wait a second 😂 Nice video btw. I was thinking about something similar and it's nice to know that it has been already researched (cus I'm lazy as fuu)
Something I didn't quite understand is: Wikipedia says that polyglots "performs the same operations or output independent of the programming language used to compile or interpret it.", but you showed that when renaming file formats, after "fusing" them with Mitra, the PDF and the TXT showed different things, this isn't the same output. Isn't this a fault on Wikipedia's side?. Very cool and informative video! Thanks as always LiveOverflow
A really good video as always. The quality on overall looks really sharp. I would prefer to have windows in dark theme to better match the rest of the video
I remember my frustration when first switching from Windows to Ubuntu for work projects. I didn't understand how the Ubuntu file system structure worked, how I should manage individual files, and how to work with them. I asked people questions like "Where should I install programs in Ubuntu?" and similar. At that time I thought to myself self "Gosh, Windows seems like a much cleaner system, everything is neatly organized, I have a dedicated folder for Program Files and the only thing I should do is click shortcuts". But after learning the Ubuntu FS layout, understanding how PATH actually works and what is it intended for, and a lot of other tips and tricks Windows FS principles feel rather restrictive. Although I now daily-drive Windows for home and work stuff (Windows made MAJOR progress towards being a developer-friendly system in the last few years), I still miss some of that clean simplicity and infinite possibilities that a proper GNU/Linux system provides.
Just checked on my system, /etc/magic does not exist. So at least for Arch Linux it is in /usr/share/file/misc/magic. It reads a compiled version (.mgc) first.
Pretty sure the joke was about the magic numbers like Sam said It's the first few bytes of a file / the signature which you can teoad to find out the format and other info like the version of the program uses to create that file
Often PDFs will be read from the end. PDFs are designed to be written linearly, but a linearly written file is inefficient to access in a random fashion. Thus, after writing out the contents of a PDF, the writer will usually append an index at the end. The index is placed at a known offset from the end of the file, and readers will generally access that first, and then only read the parts of the file that they actually need.
PoC||GTFO PDFs are all zip polyglot, some of the PDFs are also SNES roms and boot sectors... They also have an exploration of how it is done in the journal volumes. Highly recommended reading that journal.
I'm currently creating a custom file type for my company, nearly finished it. Where I don't disagree entirely I would say a better way to look at it is a file type is designed and the tools or libraries that interact with it are coded with an understanding the designed structure. The file type is a concept and needs code to create it or read it, it technically doesn't exist beyond the concept of the structure. In simplest terms it's a structure in which data is stored.
Nice! that should be uploaded to Wikipedia it would make it visually clear! en.wikipedia.org/wiki/Help:Adding_image en.wikipedia.org/wiki/Polyglot_(computing)
These sorts of subjective-filetype files usually involve an archive because they ignore stuff they don't understand more than other types of programs, but steganography can involve most formats (though plain-text is a lot harder to pull off as the container).
This video is pretty fascinating. With many years of experience using Linux and command line, I am already familiar with the fact that file format is just a trick (every file is just a binary stream anyway), but I am sill surprised that you can easily craft files to be interpreted by multiple programs differently. I am not sure whether it's a good thing or a bad thing. From a developer's perspective, we want the file format to be unambiguous, because we know from experience that ambiguity is a common source of bugs and unexpected behaviors. However, sometimes we also want flexibility and tolerance. For example, we want to add more features to file format but not break the older version program, which means we shouldn't be overly strict on recognizing format. These two design principle are sometimes conflicting to each other, and I think it is the main cause of the issue.
@@00O3O1B a word file is also just a zip file
this is why magic numbers exist
It also doesn't sound very efficient to put different "files" into the same file for most use cases.
I'm someone in the category of people freaking out about closed source binwalk so I see files agnostically already. But I thought - "Hey, If I give this video a chance I'm sure LiveOverFlow will teach me something new" - All I can say is WOW The idea of not encapsulating but "programming" a file into the zip format is a complete paradigm shift. I will never be able to look at files in the same way again holy shit bro you just blew my mind.
The scene with dark background, a table and a simple t-shirt makes this feel like an interrogation scene where police is asking the criminal questions.
I really like this scenario, for me doesn’t look any like that... but only a minimalist and well filmed scenario. No more that typical youtuber background bullshit with their setup behind
#hackersroom
"We know you work with the File Format Cartel! Who is your leader?!?"
Except instead, he’s answering questions nobody asked.
I remember my first time learning about this file format trick was probably about 2005-2006 on 4chan of all places. Someone uploaded an image, it was the cover of a C++ textbook, or some C language. I can't quite remember, but what I do remember was you could download the image, and extract that exact document from it. They embedded the textbook within the image itself, and used any image hosting site to discretely share it with people. I was blown away.
How dark should the background be?
Live overflow: YES
I really like this type of videos. The explanations where really good the quality is very high and overall I can confidently say that I've learnt something I didn't know before!
Roger, LiveOverflow gone Dark Mode
Why use my name?
I was searching for this comment🤣🤣
I love dark mode. I wonder what % of 0.5M viewers are on OLED and how much energy was saved.
Black lives matter
I honestly like the light mode more. Not only it feels more reminiscent with what I associate with the channel, keeping it light at all times prevents eye whiplash like at 0:55 where the light fills the entire video instantly.
omggg u finally made a dark mode intro. 🙏🏿🙏🏿🙏🏿
My mind was absolutely blown away. I've never thought that the same file could be interpreted differently. This is eye-opening for me.
Aah yes! the Schrodinger's zip file.
I laughed at this a little too hard.
That cat example doesn't seem that random...
അടിപൊളി
@@thefridge6913 you can never laugh too hard :D
Schrödinger doesn't like this trick
The conversation went like this:
- WTF did you do?
- You dog...
- Zip it man!
The 010 tool is pretty awesome, reminds me of something similar I made for terminals - but more advanced and with a hex editor. Thanks for sharing that!
If anyone has seen this image going around on Discord; "Please don't open me in the browser".
Basically 2 image png animation. Renamed to .zip, gives a .mp3 file inside, containing some metadata about opening the audio in a image viewer.
Pretty cool, took half of the day to get to the end of it.
Yes. I'm not the only one. I just posted this video in the discord server.
Looks interesting. Can you post a link here?
@Nigel YING No. It's not quiet the end. Try to do "file" on the happiness file
I saw the one which only works on VLC (not the UWP version)
LiveOverflow 2016 - finding a parser differential in loading ELF
LIveOverflow 2020 - what is a file format
just joking. top notch stuff I didn't know.
This whole channel is a blessing in my Cybersecurity journey! thankyou soo much for creating such level of content...
Don’t listen to the weirdos. Loving the low light setup. I see you clearly and nothing else. Minimal and clean
Files being source code is so blatantly obvious I never though of it, but when you pointed it out it instantly made sense how one could play with the file.
and espessially when you showed that ansicphpbash "file format" :)
How many of us have struggled with one code trying to parts a bit of another code as a string/variable, or what ever, only to realize you forgot to reformat it so that the thing you're trying to pass is not a being interpreted as actual code.
Wow, you make it so easy for people to understand these complex topics!
LiveOverFlow: hiding files in files is not fun
justCTF: yes
010's template feature is the best one I have seen yet in any hex editor. It's really useful for reversing proprietary file formats.
Nothing beats Hex Editor Neo. Unfortunately it's not free.
Presentation is excellent, no background and you in the center explaining and gesticulating is a very good idea.
Ideal video to start reading my digital forensic course. As if you know I am procrastinating
love the bit about CTF at the end -- that makes so much sense!
This is great, the education was great, and the whole building up to the message about the CTFs was hilarious, but I 100% agree.
I'd like to point out some flaw in the video
1. The Person A and Person B analogy, rather than just "liking" it should've been "only knowing" or "ignoring except". PDF program would read the PDF code, and not the ZIP code, and the other way around for Zip programs
2. Rather than changing the file name extensions, you could probably just run the file with the program right away. Though I'm not sure since I haven't tried it, but I'm sure it would run as is.
Anyway, great video as usual, thanks for sharing this information with us!
loving the new brand design
I remember being blown away by file extensions when I played DDLC
Dont We All...................
just monika
ngl LiveOverflow should check out/play games that have neat tricks like what ddlc does, and im pretty sure there are obscure as heck ones out there.
love the example with the "Town Musicians of Bremen
" :)
You can do this in windows with the copy command and the /B switch for binary
"copy /B picture.jpg+folder.zip new.jpg"
I learned this when I heard that a promotional desktop wallpaper for Portal had an Easter egg in it. If you opened it as an archive the ending song "Still Alive" mp3 was in there. This was a triumph!
Love the musicians of bremen image and the new videos format!
LiveOverflow: „You don’t want to do PDF by hand“
Me: *cries in LaTeX*
i feel you!
What is Latex, some condom ingredient?
A typesetting tool.
LaTeX is fucking incredible. Also if you like meme ways of writing your documents check out groff/troff. Much simpler than LaTeX.
Or, just start converting markdown/emacs org-mode to LaTeX. That takes the pain out of it. I wrote all my Bio notes in org-mode then compiled it into a final LaTeX document without much trouble.
LaTeX by hand is much easier than PDF by hand.
Great video, I love thinking of zip as interpreter for .zip "source code"! I also just love the concept of weird machines
Recently, for educational purposes, I've written a couple of image file formats and also I'm writing an interpreter, so this is right up my alley :)
I like this channel a lot :) I really liked how you explained that zip files actually program zip to make a file, rather than "contain" data. You could have brought in zip-bombs at this point, because then a 1Kb file making a 42Gb file kind of shows how it's generative. Having said that, I think you made one point unclear, which was that programs sometimes "ignore" bytes they don't understand, like scanning for a dog and not seeing the cat. PDF is weird in that it looks for it's magic sequence anywhere in the file, ignoring the zip at the beginning. ZIP, for example, wont do this. Python wont do this. etc etc. Nevertheless, through use of commenting, which is like programming zip to deliberately ignore code, you make polyglots. Polyglots almost always use commenting. If commenting wasn't possible, making polyglots would be WAY harder! Programs don't typically ignore anything, unless you trick them into it.
Also this whole video strikes at the heart of a big problem in Europe, what does data privacy/security/illegal information actually mean? What if a picture file, for example, looks like a beautiful sunset in one image viewer application, but child-pron in another. Is the *file* child-pron, or is the image-viewer *making* child-pron when it's displayed? Or both? Do you need to have both on your computer to break the law? TL;DR we all have child-pron and state-secrets on our computers, sometimes in the same file, we just don't have the software to view it.
Wow your production level got upgraded so much since the last time I watched your vids. Looks even better than before.
Wish YT algo would push more of your content to my Recommended..
Now I will to go on and "manually" watch your latest stuff.
Super interesting content and approach to digital security
one of my favorite channel on the subject for me as a developer/programmer
Keep it up :)
À true hacker spirit, reminds me of my youth. It pleases me to see young talents, there are so few of them, while I thought 30 years ago that there would be countless hackers far better than us in the future. It never happened, everything has gone down, so these videos are refreshing.
the thing that went in my head when I see 6:26 is C
I didn't realise there's php and bash until you told us
wow! this new format is fantastic ... not to mention the video quality! spot on man!
Loved the format, the topic, the explanation...
Loved everything, great video
The quality of this video is good! Keep it up man! Really appreciate your work.
This gives me new appreciation for the mantra: parse don't validate. If you just look for what you are expecting, you might admit more than you were bargaining for.
So cool to see 010 Editor mentioned! I love it :)
11 minutes in
"I don't know exactly why the pdf isn't shown in the zip file"
Dude that's literally the only reason I watched this video.
The pdf file is probably contained in a place where the zip program doesn't check, and pdf headers don't need to start at the beginning
@@petey5009 Since the PDF is in the Zip record it is probably checked by the Zip program. But since it has no filename it is simply not displayed.
He does know why, just not precisely what the reason is in this particular case. In this particular case it could be anything, depending on the way the zip format is there might be many ways to hide the pdf. You just need to find a way to make the information redundant, like making things a comment in the earlier polyglot C php bash etc example. :)
That's because the PDF file was not zipped, its contents were just combined with the contents of the zip file in the final generated file.
So when the zip program reads the final file, it encounters the only thing that was zipped in that file: the text file.
It's not that hard to get 🙂
@@erickcardozo462 You should watch the video.
Love the new style of videos. Also the video quality is absolutely stunning, so well produced!
Great video. I recalled how many students were amazed when I had students extract an image from a PDF in my seminar course (a course where students teach the class) where I talked about stegosploit. Makes you think what files could be hidden in a PDF.
However, that was only the start of it because the toolkit created by Saumil Shah, the person who created Stegosploit, hid the toolkit inside the image. So you had to rename the image extension to HTML and open it in the browser to obtain the toolkit. I was also very shocked when I first was researching the topic.
No this type of ctf challenges are not at all annoying infact I solved this type of challenge yesterday and learned alot about file extension and that's how I reached to this video. You explained everything perfectly. Thanks
I Love that your Videos can be watched by IT guys but still be understood by beginners :)
Great lighting ! Really striking how you made that work despite usually commenting without a camera feed
This is really interesting, I’ve never heard of polyglot computing! This kinda reminds me of steganography.
Because steganography often uses this as the method of embedding data. That said, not all cases of this are steganography, for example, a self-extracting archive is seen by the shell as an executable file but as an archive by the archive program. (An archive program is usually involved with these.)
Greetings from Bremen. Thanks for the interesting Video. As a linux user and C programmer not much was new for me but it was still interesting to listen to you explaining in detail
That was very informative and entertaining. Learn something new today. Thanks :D
Thank you for sharing this knowledge, we truly live in the age of information!
Huge thank you, I was really hoping to find a video like this.
You couldn't do it better than this
THANK YOU SO MUCH FOR THE DARK MODE LOGO!
"grew up with a commandline" yes, you could say that i grew up when i started using linux a few years ago
I came to this video thinking I will just reaffirm on my knowledge about files, it them I learned a lot of new stuff. Thanks
Great explanation LiveOverflow .
This is one reason why simple things like the `file` command in Linux are *so useful*
Damn! This is so cool. At first I thought it was about magic bytes but you never fail to surprise!
I think the CTF about finding the hidden stuff in a file would be a great challenge for a stego CTF. And it is valuable experence in identifying stego. And if your getting into information security stego experence is very important.
I am glad I actually understood the statue reference!
Thanks, I'll bring this example up at my Formal Language Theory classes. This is a fun way to talk about the intersection of formal languages. :-)
What an outstanding explanation. Just mind blowing. Keep it up man!
2:38
There's is magic though! At the first bytes of the file
Love the enthusiasm and great content as always! Keep these coming :)
That was interesting to know! Thanks for making a video about this.
I always used to think that these formats are "strict" as in they wouldn't allow unknowns. Turns out they do and you can play tricks with them.
_[HTML without DOCTYPE has entered the chat]_
ZIP file’s ‘header’ is located at the end of file and it includes list of files in the archive. Any file records which are not specified in that header at the end are simply ignored. This way you can easily modify a ZIP archive by appending data at the end of it (e.g. you can remove files from the list, add new file records and new entries in the list etc.).
This also means that you can just append a ZIP file at the end of another file, and the result will likely work as both. ZIP program will look at the end of the file to look for list of files to extract and will completely ignore whatever is at the beginning of the file. This is how self-extracting archives work.
Your zip tool might warn you if there is some data at the beginning that it doesn’t expect, but that depends on the tool. The way the trick works with PDF is that the PDF header does not need to be located at the very beginning of a file. This means that one can add a prefix which will make ZIP think it is a file record but because the file record is never referenced in the header at the end, the tool will just ignore it. On the other hand, PDF reader will ignore the prefix looking for the PDF header.
Very interesting, it really works, no special tools required, just with: cat tmp.zip >> random.pdf
Then random.pdf still works as pdf and can also be read as zip in my filemanager (dolphin) if renamed. But not in other archive tools, it says that it not a supported archive, so that would need the prefix trick as well. "file" reports it as pdf, because it reads the magic number at the start, which was of course not modified by the simple appending.
@@1Hippo, it may be ignorance or security concerns why some tools don’t open such archives. Info-Zip will complain about extra bytes but otherwise will just go ahead and extract files correctly.
By the way, some libraries might offer a ‘streaming’ interface for reading archives (e.g. Java’s ZipInputStream) which will read the file entries as they appear in the data ignoring the directory at the end.
I can totally relate to this. Grew up with windows but entirely switched to Linux like 8 years ago. I have a completely different understanding for the filesystem now.
Over the last twelve years, I have tried to switch over to Linux no less than 6 times. I am driven insane by trying to deal with obscure problems, and have to turn back to windows every time. Perhaps it's just because I'm not a coder and am just a power user. But if a power user like me can't take the frustration of Linux, I can't imagine normal people ever being able to take it.
@@LeoStaley if you are a windows user who needs windows it will be difficult I guess. I'm a software engineer, so I have a big benefit from using Linux and only downsides when using windows so that made the decision easy for me.
@@LeoStaley never fully switch to linux, dont get tricked by the masochistic nerds
This video blew my mind ^^ Thanks a lot for sharing and teaching this ! Keep doing your awesome work !
For a second there I thought this was some bare entry-level tutorial ... saw the channel name ... Wait a second 😂
Nice video btw. I was thinking about something similar and it's nice to know that it has been already researched (cus I'm lazy as fuu)
I love how you turned on dark mode in the intro
Love this filming setup
I'm digging the dark theme! IDK if it's new or not, I haven't watched ALL of your videos ( yet )
Awesome video man...
Very informative.
Something I didn't quite understand is: Wikipedia says that polyglots "performs the same operations or output independent of the programming language used to compile or interpret it.", but you showed that when renaming file formats, after "fusing" them with Mitra, the PDF and the TXT showed different things, this isn't the same output. Isn't this a fault on Wikipedia's side?. Very cool and informative video! Thanks as always LiveOverflow
I Wouldn’t call it a fault. I would just say that some terms have maybe slightly different meaning depending on who you ask
A really good video as always. The quality on overall looks really sharp. I would prefer to have windows in dark theme to better match the rest of the video
"YOU decide what to open the file with"
xdg-open: exists
Yo, thanks.
Really like the new bg.
Epic dark mode intro!
Ohhhhh man ...i lobh your talks🔥🔥🔥🔥 the last part was funny " please dont give such kinda Ctfs"
Thank you for dark mode!
Very informative thanks 👍🙏
Great explanation!!
I remember my frustration when first switching from Windows to Ubuntu for work projects. I didn't understand how the Ubuntu file system structure worked, how I should manage individual files, and how to work with them. I asked people questions like "Where should I install programs in Ubuntu?" and similar. At that time I thought to myself self "Gosh, Windows seems like a much cleaner system, everything is neatly organized, I have a dedicated folder for Program Files and the only thing I should do is click shortcuts". But after learning the Ubuntu FS layout, understanding how PATH actually works and what is it intended for, and a lot of other tips and tricks Windows FS principles feel rather restrictive. Although I now daily-drive Windows for home and work stuff (Windows made MAJOR progress towards being a developer-friendly system in the last few years), I still miss some of that clean simplicity and infinite possibilities that a proper GNU/Linux system provides.
2:38 LOL at no magic joke 😀
For those new to this, the linux file format recognizer (the file command) is configured in a file called /etc/magic
Anyone ever use that file? I though it just stayed empty and people used defaults, #!, /usr/share/applications/, or whatever.
The term "magic" comes from en.wikipedia.org/wiki/Magic_number_(programming)
Just checked on my system, /etc/magic does not exist. So at least for Arch Linux it is in /usr/share/file/misc/magic. It reads a compiled version (.mgc) first.
File exists in Ubuntu 😃
Pretty sure the joke was about the magic numbers like Sam said
It's the first few bytes of a file / the signature which you can teoad to find out the format and other info like the version of the program uses to create that file
Often PDFs will be read from the end. PDFs are designed to be written linearly, but a linearly written file is inefficient to access in a random fashion. Thus, after writing out the contents of a PDF, the writer will usually append an index at the end. The index is placed at a known offset from the end of the file, and readers will generally access that first, and then only read the parts of the file that
they actually need.
This reminds me of that time I concatenated a shell script unzipping itself with a zip file to have basic self-extracting archives.
PoC||GTFO PDFs are all zip polyglot, some of the PDFs are also SNES roms and boot sectors... They also have an exploration of how it is done in the journal volumes.
Highly recommended reading that journal.
A video topic suggestion: how to make your own file format. That would give us more intuition in the topic.
love ur content man keep it up!
I'm currently creating a custom file type for my company, nearly finished it.
Where I don't disagree entirely I would say a better way to look at it is a file type is designed and the tools or libraries that interact with it are coded with an understanding the designed structure.
The file type is a concept and needs code to create it or read it, it technically doesn't exist beyond the concept of the structure.
In simplest terms it's a structure in which data is stored.
File formats based on extension: Windows virgin.
File format based on actual content: Unix file CHAD basedlord
MIME types: Our new web overlords
LiveOverflow has finally enabled Dark mode!
I rarely like a video. This guy got my Like. :D
Good feeling, a new video!
6:26 Here's a triple syntax-highlighting image that can help understand how this can be valid PHP, C and Bash at the same time i.imgur.com/f7a4Uqu.png
Nice! that should be uploaded to Wikipedia it would make it visually clear!
en.wikipedia.org/wiki/Help:Adding_image
en.wikipedia.org/wiki/Polyglot_(computing)
I use binwalk when dissecting a file for its contents.
Great video, thank you for doing it
Broo liveoverflow has predicted elden ring culture faster than anyone
These sorts of subjective-filetype files usually involve an archive because they ignore stuff they don't understand more than other types of programs, but steganography can involve most formats (though plain-text is a lot harder to pull off as the container).