1 Handmade Linux x86 executables: ELF header

Поделиться
HTML-код
  • Опубликовано: 10 фев 2025

Комментарии •

  • @xKaihatsu
    @xKaihatsu 3 года назад +45

    For anyone wanting to create an ELF-64 executable that just exits, here is how I did it.
    ; ELF Header
    7F 45 4C 46 ; magic number
    02 ; ELF-64
    01 ; little endian
    01 ; ELF version
    00 ; System V ABI
    00 ; ABI version
    00 00 00 00 00 00 00 ; unused bytes
    02 00 ; executable object file
    3E 00 ; x86-64 (AMD 64)
    01 00 00 00 ; ELF version
    78 00 40 00 00 00 00 00 ; entry point
    40 00 00 00 00 00 00 00 ; program header offset
    00 00 00 00 00 00 00 00 ; section header table offset
    00 00 00 00 ; flags
    40 00 ; ELF header size
    38 00 ; program header entry size
    01 00 ; program header entry count
    00 00 ; section header table entry size
    00 00 ; section header table entry count
    00 00 ; string table index
    ; Program Header (.text = 0x400000 compared to 0x8048000 on x86)
    01 00 00 00 ; loadable program
    05 00 00 00 ; permissions (read & execute flags)
    78 00 00 00 00 00 00 00 ; program offset (ELF header size + this program header size)
    78 00 40 00 00 00 00 00 ; program virtual address (0x400000 + offset)
    00 00 00 00 00 00 00 00 ; physical address (irrelevant for x86-64)
    10 00 00 00 00 00 00 00 ; file size (just count the bytes for your machine instructions)
    10 00 00 00 00 00 00 00 ; memory size (if this is greater than file size, then it zeros out the extra memory)
    00 10 00 00 00 00 00 00 ; alignment
    ; Program
    ; Entry = 0x400078
    48 C7 C0 3C 00 00 00 ; mov rax, 60
    48 C7 C7 2A 00 00 00 ; mov rdi, 42
    0F 05 ; syscall (the newer syscall instruction for x86-64 int 0x80 on x86)
    Use this assembler whenever you get stuck trying to decode instruction opcodes: defuse.ca/online-x86-assembler.htm
    I'll admit that using the 64 bit registers were unnecessary because of the zero extension property of the 32 bit registers, I want to get used to encoding them though.

    • @davidsmith7791
      @davidsmith7791  3 года назад +6

      Very nice! It runs. The file command reports a corrupted section header size in the binary. You can deal with this by changing the section header entry size (e_shentsize) from 00 00 to 40 00.

    • @Cowboy8625
      @Cowboy8625 2 года назад

      That disassembler is priceless for learning this! That was an extreme help! Thanks for sharing!

  • @i4xa
    @i4xa 4 года назад +68

    This explanation is amazing. I watched the whole video and only afterwards I realized it has 66 views atm. This deserves much more attention.

    • @mattiviljanen8109
      @mattiviljanen8109 3 года назад +1

      I'm watching this now with 5555 views.

    • @godnyx117
      @godnyx117 3 года назад

      15k now but still to few for this gem!

  • @FoxLivestreams
    @FoxLivestreams 3 года назад +28

    Ok man, you are hella underrated.
    1. Descriptive
    2. Great word flow
    3. Great video flow
    4. Interesting content
    Keep it up!

  • @lucianoosinaga2980
    @lucianoosinaga2980 3 года назад +13

    this reminds me of that old meme picture of a guy programming with a keyboard with just two big 1 and 0 keys lol Thanks for this video, please do more of these!

  • @1vader
    @1vader 3 года назад +25

    Awesome video! Though you should be able to shave off a few more bytes from the instructions by using xor etc. instead of loading numerical constants. But I guess that might be a little bit harder to understand and those 5 or so bytes probably don't matter.
    For those interested, I think this is the shortest x86 assembly to set eax=1 and ebx=0:
    0: 31 c0 xor eax, eax
    2: 40 inc eax
    3: 31 db xor ebx, ebx
    "xor reg, reg" is a classic trick to zero a register in very few bytes (since it doesn't require a 4 byte intermediate constant zero) and it also used to be one of the quickest ways. I'd be surprised if it's actually faster than "mov reg, 0" on modern hardware but compilers still use it when you do something like "int a = 0" in C code.

    • @davidsmith7791
      @davidsmith7791  3 года назад +8

      Yes, good point. As you say, I am keeping our instruction set small for the benefit of beginners.

  • @ushiocheng
    @ushiocheng 3 года назад +16

    When someone goes to write pure mach in 2021, I would call you the master 👍

    • @daveshouldaine2520
      @daveshouldaine2520 3 года назад +2

      sorry for probably stupid question,. but what does "mach" means in this case?

    • @ushiocheng
      @ushiocheng 3 года назад +3

      @@daveshouldaine2520 Abbrivation of machine, referring to machine code, which is what he is writing

    • @daveshouldaine2520
      @daveshouldaine2520 3 года назад +2

      @@ushiocheng thank you very much!

  • @Jennn
    @Jennn 2 года назад +1

    Holy Crap. You made this so easy to understand and straight forward. My youtube algorithm changed, and is showing me 1 new channel per 8 feeds and I try to visit the new ones as much as possible and got lucky again today. Yay World~!

  • @cj_ayho
    @cj_ayho 4 года назад +22

    maybe gcc binary will be smaller if you strip it?

    • @davidsmith7791
      @davidsmith7791  3 года назад

      Brian Raiter goes down that path in this nice lecture: www.muppetlabs.com/~breadbox/software/tiny/techtalk.html

  • @AnimeLover-su7jh
    @AnimeLover-su7jh 2 года назад +2

    All I need to know, is the fact you read the technical specifications, and provided the links for them. That is enough to give a link a sub.

  • @deniismailov1782
    @deniismailov1782 3 года назад +7

    Great work! Keep it up man!

  • @pauloconci4196
    @pauloconci4196 3 года назад +42

    When assembly is too bloated lol

    • @gSys1337
      @gSys1337 7 месяцев назад +1

      Assembly isn't bloated. Compiler are bloated

  • @rathnec
    @rathnec 3 года назад +1

    I liked your approach of knowing it to the core!!

  • @AJMansfield1
    @AJMansfield1 3 года назад +8

    This is really cool! Also, there's at least 16, possibly 20 more bytes you can shave off that I've spotted:
    First, you can save 6 bytes by using more compact instruction opcodes in the program. Instead of the five-byte "mov eax, 1;" you can instead do "xor eax, eax; inc eax;" (with those instructions encoding as `31 c0` and `ff c0` respectively) and then likewise instead of the five-byte "mov ebx, 0;" you can use just "xor ebx, ebx;" (encoded as `31 db`). Note that xoring a register with itself is actually the preferred way to zero a register in x86 and those opcodes normally execute in zero cycles because internally they actually just trigger a register rename.
    Next, you can trim off another 8 bytes by packing two of those opcodes into the e_ident[E_PAD] region. Place the `31 c0 ff c0` starting at offset 0A, then add an additional two-byte relative jump encoded as `eb 49` to jump to the remaining `31 db cd 80` back in the "normal" program area at offset 59. (Adjust your entry point field and all that to match of course.)
    Two more bytes can then come off by aliasing e_shstrndx over top of the first two bytes of p_type - just set those offsets so that those regions overlap; AFAIK the segment name index doesn't actually do anything at runtime and can be whatever value you want.
    And I haven't spotted precisely where yet, but it's probably possible to save four more bytes if you can find another existing unused bit of header area to pack the last two opcodes `31 db cd 80` into. That way, you could have an ELF file with actually _no_ dedicated program section at all, just by reusing existing unused header regions. Maybe p_paddr could be used this way?

    • @protonjinx
      @protonjinx 3 года назад +3

      xor ebx, ebx
      lea 1(ebx), eax

    • @AJMansfield1
      @AJMansfield1 3 года назад +4

      ​@@protonjinx Oh, nice, that shaves off one more byte, now the entire program can fit into the padding region without needing to splice it with a jump!

  • @jolex_nerd8132
    @jolex_nerd8132 9 месяцев назад

    if you want to save some bytes,
    instead of:
    mov eax, 00000000h
    mov ebx, 00000001h
    int 80
    you could use:
    mov eax, 00000000h
    mov bl, 01h
    int 80
    wich just saves some bytes of immediate load operands, since return codes are modulo 255 anyway.

  • @coolandsmartrr
    @coolandsmartrr 3 года назад +2

    Looking forward to the next video!

  • @triularity
    @triularity 3 года назад +1

    I remember back, a few decades ago, I used a graphical text editor (which could mostly edit existing binary content without corrupting it, aside from adding a newline to the end of file) to merge several static GIF files together and create an animated GIF. I think I also used awk to generate custom binary bytes to copy/paste into the text editor. Luckily, the GIF format didn't seem to care about that erroneous newline tacked on the end of file.

  • @2005kpboy
    @2005kpboy 3 года назад +1

    That's a brave, creative and novel attempt...

  • @alik250
    @alik250 4 года назад +2

    This was so cool

  • @leonhrad
    @leonhrad 4 года назад +2

    really cool video :)

  • @Dude29
    @Dude29 3 года назад

    Very well done!

  • @t74devkw
    @t74devkw 3 года назад +2

    Damn you're underrated

  • @Jennn
    @Jennn 2 года назад

    Thank you boys

  • @TheJackal917
    @TheJackal917 4 года назад +3

    I never understood a word, but I think your vids are helpful, especially for someone like, who considers to move from Windows. Thanks! I wish you many subs.

    • @davidhusicka8440
      @davidhusicka8440 3 года назад

      This explains how "exe" files work internally on Linux. How is this useful to someone who wants to switch?

    • @TheJackal917
      @TheJackal917 3 года назад

      @@davidhusicka8440 oh boi. Now how can I explain.obvious things?

    • @totally_not_a_bot
      @totally_not_a_bot 3 года назад +1

      I mean, this video shows that vi is a thing? And some basic shell scripting? Other than that, that you can kinda do whatever you feel like on Linux without much restriction provided you have the knowledge. So yeah, Linux is nifty. If you're particularly attached to any Windows-exclusive software I'd give it a pass, but otherwise, load up a virtual machine and give it a spin. You might like it.

    • @TheJackal917
      @TheJackal917 3 года назад

      @@totally_not_a_bot games, man. Games. But even more so, it's privacy and security.

  • @isaackay5887
    @isaackay5887 3 года назад +3

    I feel like I just *_watched_*_ a _*_StackOverflow_*_ explanation_

  •  3 года назад +6

    I only make executables by hand, just like my grandpa. /s

  • @rathnec
    @rathnec 3 года назад +1

    wow!!

  • @Borodinskyy
    @Borodinskyy Год назад

    i have been wanting to do something for a while, first time i have found what i was looking for since most times i search for it i just get assembly stuff

  • @neilmeich
    @neilmeich 11 месяцев назад

    nice

  • @jacquesquipere
    @jacquesquipere 2 года назад

    Ok but gcc hello.c fails immediately with studio.h: No such file or directory.

  • @happygimp0
    @happygimp0 3 года назад

    You can edit binary files with bvi, no need to use xxd

  • @ryanhaart
    @ryanhaart 4 года назад +2

    How to link in libraries?

  • @mikolajkozakiewicz1070
    @mikolajkozakiewicz1070 3 года назад

    🥰

  • @623-x7b
    @623-x7b 3 года назад +1

    It would be amazing if you guys covered ultimate doom's .wad file format - I want to make a random level generator for Doom but lack the skills and the time.

  • @der.Schtefan
    @der.Schtefan 3 года назад +4

    Don't you think that if somebody is interested in this video, he would know what a hex dump is?

    • @davidsmith7791
      @davidsmith7791  3 года назад +2

      I meant for the discussion beginning around 0:30 to be enough definition of hex dump for our purpose.

    • @iwikal
      @iwikal 3 года назад +1

      @@davidsmith7791 I think OP meant the opposite; anyone who would potentially be interested in this video probably already knows what a hex dump is. I disagree with the sentiment, though. I think it's great that you gave this brief explanation, on the off chance that someone doesn't know.

    • @davidsmith7791
      @davidsmith7791  3 года назад +4

      @@iwikal Oh, you are right. Thanks. I hope this video series is accessible even to those who know only a little about programming.

    • @aylen7062
      @aylen7062 3 года назад +1

      I'm interested in this video and a beginner who didn't know what it was. xD

    • @aylen7062
      @aylen7062 3 года назад +2

      @@davidsmith7791 It was, for me. Thank you!

  • @albertvanderhorst4160
    @albertvanderhorst4160 Год назад +1

    In ciforth (lina/wina/xina) I use a slightly different approach. The Forth iscreated by an assembler (fasm/gas). I compile a program ( lina -c hello.frt) to an runnable binary that execute a word such as
    HELLO. That is an application of SAVE-SYSTEM, and I merely have to patch the header of the the original lina. So I avoid the generation of an elf header, leaving it to tools that knows how to do it. I don't do an analysis of all the fields, merely the fields I need. Fasm is ideal; contrary to gcc tools it doesn't generate a plethora of sections that may help dbg. The -c options takes merely 2 screens in the library, inclusive SAVE-SYSTEM.

    • @davidsmith7791
      @davidsmith7791  Год назад

      Thank you for the FASM recommendation. That's new to me.
      In order to patch an existing ELF header, you have to know which fields to change. It is easy to imagine that I fail to change some field properly; and then something goes wrong when my data space grows over 1 million bytes. Or perhaps I never encounter such a problem, but every time a mysterious bug arises, I wonder whether I have written a bad ELF header. It is freeing to see a Linux executable defined in terms of the sequence of bytes of the file rather than in terms of tools people often use. I never get a good answer when I ask the tools, "what exactly are you doing?"

  • @l2ubio
    @l2ubio 3 года назад +1

    talking about compression...you compressed a lot of information in 11 minutes of video

  • @elementiro
    @elementiro 3 года назад

    𝕤 𝕖 𝕧 𝕖 𝕟