Writing Garbage Collector in C

Поделиться
HTML-код
  • Опубликовано: 8 сен 2024
  • References:
    - Source Code: github.com/tso...
    - Previous Episode (malloc/free): • Writing My Own Malloc ...
    - Data alignment: Straighten up and fly right: developer.ibm....
    - My JSON Serialization Library: github.com/tso...
    - Print out value of Stack Pointer: stackoverflow....
    - gcc.gnu.org/on...
    Support:
    - Patreon: / tsoding
    - Twitch Subscription: / tsoding
    - Streamlabs Donations: streamlabs.com...
    Feel free to use this video to make highlights and upload them to RUclips (also please put the link to this channel in the description)

Комментарии • 82

  • @InmuAyuayu
    @InmuAyuayu 8 месяцев назад +160

    This guy: writes Garbage Collector in C
    Me: writes Garbage in C

    • @ByteAndGo
      @ByteAndGo Месяц назад +1

      I'm dying 🤣

  • @Gabriel_Pureliani
    @Gabriel_Pureliani Месяц назад +18

    Your garbage collector doesn't work, i'm still here

    • @algj
      @algj Месяц назад +1

      Same here

  • @rogo7330
    @rogo7330 2 года назад +80

    Garbage collector: *collects itself*

    • @Mrkenjoe1
      @Mrkenjoe1 Год назад +9

      "I'm trash, so I took myself out." -The garbage collector probably.
      (P.S. this is a joke)

    • @rogo7330
      @rogo7330 Год назад

      @@Mrkenjoe1 you writing code in C? Probably you doing something silly with your data (I mean variables). If you still have issues with describing what exactly this code is doing and why its here, or if it's here because of "C didn't give me something-something", try to write down what you want this program to do (in user pov, not for programmer) and start again.

    • @Mrkenjoe1
      @Mrkenjoe1 Год назад

      Removed

    • @tomtravis858
      @tomtravis858 13 часов назад

      It's like that old Java joke.
      If Java had true garbage collection most programs would delete themselves upon execution.

  • @OdyseeEnjoyer
    @OdyseeEnjoyer 2 года назад +78

    I've a lot of experience with C and padding (specially with bitfields) but your way of teaching is so interesting that you've made me to stay for the entirety of the video

  • @LuisMatos_ahoy
    @LuisMatos_ahoy 2 года назад +31

    "I am a random person from the internet allowing you to use global variables!"
    Nice 😂😂

  • @Valo_iO
    @Valo_iO 23 дня назад +1

    wasted 5 years in uni, studying CS (I failed some unrelated classes, that's why it took me that long), and I learned more by watching your videos and similar things on YT!

  • @playerguy2
    @playerguy2 2 года назад +16

    14:11 you are mostly correct.
    essentially, the compiler is guaranteeing alignment of the datatypes involved.
    Most machines have memory alignment requirements for most datatypes.
    On x86_64:
    Chars cannot have one nybble in one address and the other in another address.
    Shorts must be aligned to 2 bytes.
    Ints must be aligned to 4 bytes.
    etc, etc..
    Same applies to (hardware-)vector datatypes. (look up vectorized instructions for more info.)
    So you have rightly observed that chars have a smaller alignment requirement than the pointer, and as such can be packed more effectively.
    Printing the addresses would've confirmed that.
    If that behavior is not desirable (for example, if you'd rather pack data as densely as possible) you can use the pack pragma in C/C++.
    The downside is more costly reads and writes.

    • @playerguy2
      @playerguy2 2 года назад +3

      bonus: the exception you mentioned is often called a "bus fault" (similar to a segmentation fault).
      It is an error the cpu catches (or doesn't, depends on the arch) and often calls a vector to handle and cleanup.
      Linux handles this pretty well even on the Raspberry pi.

    • @heraldo623
      @heraldo623 4 месяца назад +1

      Machines have no concept of data types. It operates on words. Registers are wordsized. Memory load reads a word from the memory to a register. Memory store writes a word from a register to the memory.
      Data types are defined by the programming language standard and enforced by its compilers.

    • @chonchjohnch
      @chonchjohnch 13 дней назад

      @@heraldo623he specified that the compiler guarantees the alignment

  • @tresuvesdobles
    @tresuvesdobles 2 года назад +18

    Pretty cool video! BTW: If there is a cycle of pointers among several chunks in the heap (it may not necessarily be a two-cycle) your code will enter a never-ending loop until you reach recursion limit and cause a Stack Overflow. A naive solution would be to set a manual recursion limit on mark_region, but then cyclic chunks would never be deallocated.

    • @beyondcatastrophe_
      @beyondcatastrophe_ 2 года назад +11

      No, it won't, the recursion is only entered if the chunk wasn't reachable before, so it won't try to check in a cycle

  • @nivelis91
    @nivelis91 5 месяцев назад +1

    After watching this I immediately hate this channel for how many videos you have, how long they are, and that I'm not gonna be able to see them all in one weekend.

  • @heraldo623
    @heraldo623 4 месяца назад +1

    You should mark as alive only the heap chunks that can be reached from the stack. It will make self-referenced chunks to be collected too (circular pointers). If you take the heap itself as root of the "references tree" circular refs will stay alive forerer, thus leading to memory leaks.

  • @freestyle85
    @freestyle85 3 года назад +13

    I think you can don’t use builtin gcc function for get the frame address. Just declare on the stack some variable of type uintptr_t and get pointer to it by the “&” operator.

    • @TsodingDaily
      @TsodingDaily  3 года назад +12

      Yep, there are many ways to obtain the stack pointer as discussed in stackoverflow.com/questions/20059673/print-out-value-of-stack-pointer which I linked in the description as the reference. :)

    • @freestyle85
      @freestyle85 3 года назад +7

      @@TsodingDaily How could I have missed this? Sorry! :-)

  • @adamkrawczyk9261
    @adamkrawczyk9261 2 года назад +3

    15:06, just bookmarking but good shit. It's pretty interesting

  • @DatBoi_TheGudBIAS
    @DatBoi_TheGudBIAS 23 дня назад

    The stack is always aligned to 8 bytes, actually to 16 most of the time
    The reason is Dat it requires 16 byte alignment for faster access for mnemonics such as movaps, Cuz movups, the unaligned version, is slower and less optimized
    Many functions internally use movaps and other aligned instructions so the stack is automatically aligned either before the call or on startup (I'm not sure where)
    But it always is

  • @aFlyingElefant
    @aFlyingElefant 3 года назад +16

    In what video did you develop your arena allocator? I cannot find any reference to it in the faq or the repo or anything.

    • @TsodingDaily
      @TsodingDaily  3 года назад +14

      I don't remember. I implemented my own arenas many times in different projects throughout the years, sorry.

  • @xirate7091
    @xirate7091 11 месяцев назад +1

    45:55 there's some Terry Davis energy here

  • @MykolaKlymyuk
    @MykolaKlymyuk 3 дня назад

    This guy would be a great cast for Riddler in the Batman's universe

  • @liebranca
    @liebranca 2 года назад +2

    sizeof(size_t) != sizeof(intptr_t), though generally it doesn't matter c:
    Cool stuff btw. Subscribed.

  • @greob
    @greob 3 года назад +2

    That was really interesting, thanks for sharing.

  • @berndeckenfels
    @berndeckenfels 3 месяца назад

    I debugged the traditional GC of gofer (Haskell variant) on DOS because it did not use pointers but “cell index” which way too often tainted memory for integer value (with only a few hundred cells available that hurt very much).

  • @siddharthsinghchauhan8664
    @siddharthsinghchauhan8664 2 года назад +16

    Although his sessions are not educational per se... But damn they are awesome... I watched this particular one many times.. to figure stuff out for design interviews

  • @cheebadigga4092
    @cheebadigga4092 Год назад

    I think the struct Foo is 16 bytes instead of 9 because of how GCC does structs. It has a special "packed" mechanic for structs which prevents it from aligning to the full next 8 bytes, but only if you're explicit about it (GCC extension to the C standard). Maybe clang would've behaved differently. Maybe not. But either way, it's the compiler who does the alignment, not the kernel or libc.
    Tried it out: typedef struct { char i; void *ptr; } __attribute__((packed)) FooPacked;
    results in 9 bytes instead of 16.
    So essentially as long as you're on x86_64 GCC you can be 99% certain that the alignment is done for you by GCC if you're not explicitly tell it to do otherwise. I don't know about other architectures so I can't say anything about them.

    • @user-oe4id9eu4v
      @user-oe4id9eu4v 11 месяцев назад

      Actually not really ..
      C struct size and field offset is strictly defined by rules. One rule defines that pointer type should always be aligned to architecture size.
      That is C standard well-defined behaviour.

    • @cheebadigga4092
      @cheebadigga4092 11 месяцев назад +1

      @@user-oe4id9eu4v how does that deny anything I said? If GCC implements the C standard correctly, then "how GCC does structs" effectively means the same as "how the C standard defines how structs should behave ".

  • @koftabalady
    @koftabalady Год назад +3

    How can I learn this stuff? can someone give me a simple roadmap or anything?

    • @khuntasaurus88
      @khuntasaurus88 11 месяцев назад

      Learn what stuff

    • @koftabalady
      @koftabalady 11 месяцев назад

      @@khuntasaurus88 Operating systems, compilers, interpreters, garbage collectors, assembly, low level networking, advanced machine learning without libraries and being good at doing all of this...

    • @FF-hy4ro
      @FF-hy4ro 10 месяцев назад +1

      @koftabalady going through Tsoding videos and pausing to try and implement the solutions yourself before he does may be a good way

    • @_start
      @_start 4 месяца назад

      @@koftabalady you have online free book called "Crafting Interpreters" in that you will learn how to make your programming language in C, this will teach you how to create 4 data structures (rope, hashtable with open addressing and simple string hash function, dynamically allocated array and implicit linked list). Additionally you will implement simple mark and sweep garbage collector. Also it will introduce you with the Pratt parsing which is the best parsing method for parsing with precedence. Final product will be garbage collected interpreted programming language that emits its own bytecode, you can also expand on it by making it much more embeddable with C/C++.
      If you wish to learn about low level networking you should probably check out Unix Sockets Programming by Stevens, W. Richard (3rd edition). This will teach you about IPv4/IPv6 (Internet protocol v4/v6), TCP (Transmission Control Protocol), UDP (User datagram protocol), SCTP (Stream Control Transmission Protocol), etc.. You will be coding this in C also, it basically teaches you all of the stuff there is to know about programming with sockets in C and you will know how these protocols work under the hood.
      For operating systems I would recommend the book "Operating Systems: Three Easy Pieces" , you should also write few basic programs in assembly (recomending fasm or nasm assemblers) before trying to write your own OS because you will need to write some assembly. I would also reccomend coding along osdev.org and take it easy, creating your own OS is considered the hardest thing you can do as a programmer (depending on which features your OS will have or what hardware it will support, etc...).

    • @rusi6219
      @rusi6219 2 месяца назад

      ​@@koftabaladytry a computer science degree and lots of practice

  • @GursimarSinghMiglani
    @GursimarSinghMiglani 14 дней назад

    awesome 👍

  • @FsKir
    @FsKir Год назад +1

    Не знаю сишек, но было интересно посмотреть. Спасибо за видео

  • @berndeckenfels
    @berndeckenfels 3 месяца назад

    With struct #pragma pack applies - but that does not apply to the location of data segments

  • @chrisalexthomas
    @chrisalexthomas Год назад +1

    you wrote a what?? wow, you woke up today and chose violence... :D

  • @pechasetentidos5854
    @pechasetentidos5854 10 месяцев назад

    And what if i have in the stack frame some argument passed to some function that is in the range of the heap base address+size?

  • @Local_Nerd
    @Local_Nerd Год назад

    Thanks

  • @galbalandroid
    @galbalandroid 2 года назад +2

    What is the editor you use in the terminal? great vid by the way! :D

  • @RealCatDe
    @RealCatDe Месяц назад

    1:26:56 voidf

  • @soniablanche5672
    @soniablanche5672 Месяц назад

    apparently it's better to have 16 bytes alignment for x64

  • @mixail844
    @mixail844 3 месяца назад

    didn't get, why he made recursion in mark_reachable function?

  • @tsilikitrikis
    @tsilikitrikis Год назад

    I FCNG LOVE YOU

  • @Jurasebastian
    @Jurasebastian Год назад

    i can fill heap with random data and get accidental pointer to heap

  • @bartlomiejodachowski
    @bartlomiejodachowski 2 года назад +1

    18:34

  • @_luisgerardo
    @_luisgerardo 2 года назад

    Good thumbnails haha

  • @alexandrohdez3982
    @alexandrohdez3982 Год назад

    👏👏👏👏

  • @kelali
    @kelali 2 года назад

    9:36 realy weird

  • @sangalamballa6975
    @sangalamballa6975 8 месяцев назад

    i have a question, how can we install it to JDK?

    • @MrChelovek68
      @MrChelovek68 8 месяцев назад

      Я тоже,но на вид это один из лучших языков. Прямая работа с памятью и абсолютная гибкость-привет указателям.

  • @alexandersemionov5790
    @alexandersemionov5790 2 года назад +1

    Can you implement Ownership like in Rust?

  • @brightprogrammer
    @brightprogrammer Год назад

    I love the way u sarcastically praise other languages 😂

  • @user-lt9ey6gw3x
    @user-lt9ey6gw3x 7 месяцев назад

    Not really Garbage Collector but nice video.

  • @FsKir
    @FsKir Год назад

    Hey, what graphic editor you used in this video?

    • @welanduzfullo8496
      @welanduzfullo8496 Год назад +5

      the one he implemented himself xD

    • @FsKir
      @FsKir Год назад

      @@welanduzfullo8496 LOL

  • @dragonlp7144
    @dragonlp7144 2 года назад +2

    which ide is he using

    • @celdaemon
      @celdaemon 2 года назад +2

      It looks like vim

    • @tcroyce8128
      @tcroyce8128 2 года назад +7

      @@celdaemon It's a vanilla Emacs and his own customization IMO. Does not feel like any off the shelf Emacs variant. Vim can do a lot of his text naviation and redirecting cmd output into buffer operations, but that is stretching vim's ability a bit. The distinction comes from how the statusbar is rendered in Emacs.

    • @joekerr5418
      @joekerr5418 2 года назад +1

      the best ide you could ever possibly use in your life

  • @IndellableHatesHandles
    @IndellableHatesHandles Год назад +1

    Python's GC is garbage, so will it collect itself?

  • @meJevin
    @meJevin 2 года назад +1

    дико кринжанул с начала видео.
    в кратце - чел открыл для себя cache lines и выравнивание памяти.
    поздравляю, i guess.

    • @DelphiPro
      @DelphiPro Год назад

      А чтобы не гадать долго, надо было вывести просто дамп занятой памяти структуры в лог, чтобы посмотреть что куда двигается ))) Пишу, посмотрев 18 минут видео

  • @chrisalexthomas
    @chrisalexthomas Год назад +1

    You're only allowed to use global variables, when you know exactly and precisely why you're NOT allowed to use them and all the reasons against using them and can explain the reasons why in detail. Then you qualify. Until then. You aren't allowed. End of discussion :)

    • @pattyspanker8955
      @pattyspanker8955 Год назад

      explainnnn

    • @drednaught608
      @drednaught608 10 месяцев назад

      I don't know, global variable constants seem to be common enough and is not bad practice if immutable.

    • @DatBoi_TheGudBIAS
      @DatBoi_TheGudBIAS 23 дня назад

      As an amateur hobbyist C programmer, I only use global variables when I want something to be absolutely accessible by everyone and not have the pain of including it as an argument to every function I make.
      Stuff like the heap allocation in the videi