Linux Technical Debt: A Visual Explanation (Directory Entries/struct dirent) - Jody Bruchon Tech

Поделиться
HTML-код
  • Опубликовано: 23 авг 2024
  • This is a technical explanation of why the "record length" used in Linux directory entries is useless nonsense and why having the length of the actual file name instead would be much more useful. In 1995, Linus Torvalds blasted the "d_namlen" extension to a struct dirent, saying "d_reclen" made a lot more sense. He was wrong. He was also young and still learning a lot, so I don't blame him for making this mistake. Unfortunately, this bad decision became a source of "technical debt" in Linux (tech debt is when something is put in place earlier in a project that's not the best way to do something and other things get built on top of it, making it harder and harder to fix or replace it later).
    SUPPORT LINKS
    PayPal: paypal.me/Jody...
    Ko-Fi: ko-fi.com/L3L0...
    Liberapay: liberapay.com/...
    SubscribeStar: www.subscribes...
    Patreon: / jodybruchon
    MY OTHER RUclips CHANNELS
    Jody Bruchon: / jodybruchon
    Gazing Cat Productions: / @gazingcatproductions
    Jody Bruchon's Stock Footage and VHS Archive: / @jodybruchonstockfootage
    FOLLOW ME ON OTHER PLATFORMS
    Telegram: t.me/Jody_Bruchon
    BitChute: www.bitchute.c...
    Odysee: odysee.com/@Jo...
    Rumble: rumble.com/c/J...
    RUclips: / jodybruchon
    Brighteon: www.brighteon....
    Dailymotion: www.dailymotio...
    Minds: www.minds.com/...
    Locals: jodybruchon.lo...
    MY WEBSITES
    Personal/programming site: www.jodybrucho...
    Video production site: www.gazingcat.com/
    Computer repair site: nctritech.com/
    jdupes Duplicate File Finder: www.jdupes.com/

Комментарии • 25

  • @Error42_
    @Error42_ Месяц назад +3

    I have noticed on Linux dumping a listing of many files and folders to a text file seems much slower than it is to do on Windows. I was wondering why that would be the case, perhaps this explains it to some degree. This is even noticeable on an SSD.

    • @JodyBruchon
      @JodyBruchon  Месяц назад +3

      @@Error42_ That depends heavily on how you're doing it. Some ways can result in massive forking or reopening. I'd do it with
      find -type f > list.txt
      But you can extremely slowly do it with
      find -type f | while read -r X; do echo "$X" >> list.txt; done
      This will cause every entry to open, seek, write, and close, and if echo isn't a shell builtin it'll also cause a fork-exec every time too.

  • @mathis8210
    @mathis8210 Месяц назад +1

    Love the professionalism of just scribbling the stuff down chaotically on a paper. :)

    • @okaravan
      @okaravan Месяц назад +1

      Many years ago I searched for a good software for technical drawing, and after a long search came to conclustion, that it doesn't exist. Nothing surpasses a piece of physical paper with a pencil and an eraser. And the process of physical drawing activates technical imagination like nothing else do.

  • @vfjpl1
    @vfjpl1 Месяц назад +1

  • @s.b.8704
    @s.b.8704 Месяц назад

    “Technical debt” is an understatement: more than not having a name length field, there is no reason in 2024 to have to deal with an API that exposes all these details as if we didn't have optimizing compilers capable to optimize away the abstractions needed to hide these implementation details (opaque types).

    • @JodyBruchon
      @JodyBruchon  Месяц назад +1

      I don't understand what you're trying to say.

    • @s.b.8704
      @s.b.8704 Месяц назад

      @@JodyBruchon if the structure was an opaque data type, accessible only via functions, one could change the implementation of the string (also opaque) from a zero-terminated string to a counted string (length-prefixed), also changing a length function from a loop to a simple field access (and an optimizing compiler could inline it), without impacting programs that use this interface (at least at the source level).

    • @JodyBruchon
      @JodyBruchon  Месяц назад +1

      @@s.b.8704 But that would add tons of function call overhead to what currently is just a pointer reference.

    • @s.b.8704
      @s.b.8704 Месяц назад +1

      ​@@JodyBruchon it shouldn't, especially if the function is simple. At least with GCC and Clang, already at -O1 the function call disappears and the field access code is replaced inline. There is no difference between the code that the compiler produces for direct access to structure fields and access via a function.
      Try this in Compiler Explore (godbolt, if only YT would let me post a link…), choose C and x86-64 gcc or clang and put -O1 as argument to the compiler:
      [quick&dirty code: of course structure "S" is not really opaque here and even less the string "c", I just want to compare direct field access and via functions with less possible code.]
      #include
      #include
      typedef struct {float a; long int b; int len; char *c;} S;
      char *sc(S s){ return s.c; }
      int sl(S s){ return s.len; }
      int main(int argc, char *argv[]) {
      S s;
      s.c=argv[1];
      s.len=strlen(argv[1]);
      printf("s.c: %s
      ", s.c); // 1
      printf("s.c: %s
      ", sc(s)); // 2
      printf("Length of string s.c = %lu
      ",strlen(s.c)); // 3
      printf("Length of string s.c = %d
      ",sl(s)); // 4
      return 0;
      }
      The assembly code produced for 1 and 2 is the same, no function call for 2. The code produced for 3 and 4 is different: in 3 it calls strlen but no function call for 4 (move the cursor over the assembly code and the corresponding C code will be highlighted).
      Note that if you want to execute the code, in source frame select "+Add new..." menu, add "Execution Only" and write a string for execution argument (argv[1]).

    • @tikabass
      @tikabass Месяц назад +2

      @@s.b.8704 If the structure is really opaque, the compiler will not be able to inline access as you describe. Linker optimization could inline the code, but the kernel is not just one big executable.

  • @Visentinel
    @Visentinel Месяц назад

    Why not write this up and send this up via the linux kernel mailing list ?

    • @JodyBruchon
      @JodyBruchon  Месяц назад +3

      @@Visentinel Because they're not going to care.

    • @Visentinel
      @Visentinel Месяц назад

      @@JodyBruchon not with that attitude they won’t, you need to actually write it up and send it first to find out if they would.

    • @JodyBruchon
      @JodyBruchon  Месяц назад +1

      Changing this structure requires a lot more work than you can imagine. A new syscall would have to be made, glibc would have to be modified to use it, and user software would have to be modified to use d_namlen when available instead of strlen, plus this ignores all compatibility issues. The ship sailed on this in 1995 when Torvalds shot down d_namlen completely. The changes needed to fix it are so invasive that it's almost certainly not worth it.