A better hash table (in C)

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии • 66

  • @SimGunther
    @SimGunther Год назад +28

    strager had a whole video on hash tables and it turns out that a better hash function based on the understanding of the keys going into the table equals a MUCH faster hash table! 🎉

    • @marcossidoruk8033
      @marcossidoruk8033 Год назад +13

      That video is completely miselading or at the very least it seems it has mislead you.
      Whats being implemented in this video is a general purpose hash table, what the video you mentioned shows is a perfect hash table that only works with a predefined, hardcoded set of words because he needed that for a JavaScript compiler.
      Those are two completely different problems and tbh his solution is quite dumb because for such a specific problem and such a limited set of keywords if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table.

    • @tommasobonvicini7114
      @tommasobonvicini7114 Год назад

      Folks, look at the number of thumbs up SimGunther received, then look at marcossidoruk ones: welcome to the software industry.

    • @godnyx117
      @godnyx117 Год назад

      @@marcossidoruk8033 Is it really faster tho? If that is the case, then a genera purpose language with good mata-programming features (like D) can easily create a library that does that!

    • @MaxCoplan
      @MaxCoplan Год назад

      it’s pretty obvious strager didn’t really know what he was talking about and just made the video for the clickbait. Did you see the thumbnail? On his stream today he said he didn’t even go to college! How anybody can take software engineering advice from him is beyond me.

    • @strager_
      @strager_ Год назад +6

      > if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table.
      You should leave a comment on my video with your suggestion.

  • @CodePagesNet
    @CodePagesNet 4 месяца назад

    Thank you for the helpful video and C videos in general. I encourage and promote the understanding that C has advantages over OO, even though people may not understand that yet (OO is merely a code format, and an inflexible one at that).

  • @adambishop328
    @adambishop328 Год назад +1

    wow thank you for strcspn, i've been looping through my character arrays for a long time to try and format them into null-terminated strings without any return or newlines. Sweet function

  • @ahmadhadwan
    @ahmadhadwan Год назад +1

    Very interesting video dr. Jacob, I'm glad you decided to expand on the last video.

  • @surters
    @surters 8 месяцев назад

    If you want to make a generic hash table, you need a lot of helper functions that knows the type that you would need to pass along, that gives a lot of extra parameter. Or you could just pass along a point to a struct with all those extra functions, each of them function pointers for that type.
    Some of the extra pointers could be print_obj, destroy_obj, initialize_obj, copy_obj, assign_obj etc.

  • @TheSulross
    @TheSulross 11 месяцев назад

    Just had to implement an open addressing hash table using linear probing and and double hading to reduce clustering - and I validated that, yes, double hasing does reduce clustering and the second has function can be very cheap and practically no cost.
    In my case the has table is allicated up front to some size and does not have to be increased in size over operational life time.
    Only keys are stored in the has table so a lookup returns an index. So the data resolved to is kept in a separate array that is of the same max entries size as the hash table itself. So the very same hash table can be used to lookup different data structure values depending on context - something that is the case in my domain.

  • @Nohope__
    @Nohope__ 8 месяцев назад

    I'm going to have to watch this 10 times.
    (TYSM for the amazing material < 3)

  • @sanderbos4243
    @sanderbos4243 Год назад +4

    What I really enjoyed programming and found incredibly useful during my 1.5 years of C assignments was to write my own vector implementation. A basic one is only about 50 lines of code. Because my uni also requires us to free() *all* allocated memory manually, I was then able to write void *my_malloc(size_t count, size_t size, char *description): a malloc() wrapper that stores the new address in one of those vectors. I could then call print_allocations() and free_allocations() at the end of my main()! Very nice during debugging.

    • @Urre5
      @Urre5 Год назад

      Did they say why you had to free stuff at the end of main

    • @sanderbos4243
      @sanderbos4243 Год назад

      @@Urre5 I presume it is because in most of our projects we don't have any loops that would force us to free memory. So they just want to make sure we are aware of how to use free() properly. On some systems the OS might not do it for us at the end of the program too.

    • @Urre5
      @Urre5 Год назад +1

      @@sanderbos4243 yeah I was hoping it's because of the latter part, but they should be explicit, and even in particular teach you not to free on systems which will clean things up, because otherwise you'll waste the users time when exiting the program. Then again if it's a nice arena or something where you have all your allocations it shouldn't take too long anyway

    • @sanderbos4243
      @sanderbos4243 Год назад

      @@Urre5 Totally agree, we pretty much learn to use malloc() and free() however we like, as long as we don't have leaks. We're not told basic performance stuff like it maybe being braindead to use malloc() and free() unnecessarily all over the place, and without telling us about stuff like big O. Every exercise is a PDF, and our school (look up Codam or 42 school) deliberately doesn't have any teachers nor books we have to read, so everyone helps each other, and we spend a ton of time reading up online. It's incredibly freeing since the school is open 24/7 and you aren't required to be there for that many hours per week, but it isn't for everyone, since it's your own responsibility to become an awesome programmer. Oh, and it's completely free. :)

    • @Davtd.
      @Davtd. 11 дней назад

      @@Urre5you should still free the memory + im not aware why it should waste time

  • @mr.erikchun5863
    @mr.erikchun5863 Год назад +1

    Thank you Jacob for making these videos.

  • @sortof3337
    @sortof3337 Год назад +1

    I thought you stopped syaing without further ado. hahah. very good video. The reason I am good at c is because I have all your videos I can reference. :D

  • @greg4367
    @greg4367 Год назад +2

    Looking forward to part 2

  • @tiramihai1152
    @tiramihai1152 Год назад +1

    A 41 minute Jacob Sorber video? I'm in for a ride

  • @69k_gold
    @69k_gold Год назад +1

    Please make a video about terminal (termios.h) and different terminal modes etc

  • @randomscribblings
    @randomscribblings Год назад +5

    strdup() == malloc() + strcpy()

  • @zxuiji
    @zxuiji Год назад +3

    What I'd like to see is a vid on the new lattice based encryption algorithm, be one I'd definitely save for later and I'm sure a number of peops here would end up using it in future jobs or existing ones if they have them.

  • @russelwestbrick3023
    @russelwestbrick3023 Год назад +1

    wonderful teaching!!

  • @svenvandevelde1
    @svenvandevelde1 2 месяца назад

    Just know that malloc and calloc are implemented using a heap structure, which is much more complex than a hash table. Why not creating this case study without malloc and calloc. Through static memory allocation. The usage of malloc and calloc slow down the logic dramatically. Also, if your hash table size and the structure size can be binary calculated through bit shifts, the key calculation can be made using a rotating random binary calculation. Which will result in blazingly fast key calculation.

  • @zxuiji
    @zxuiji Год назад

    Considering how you use the allocations should've really just used calloc everywhere to avoid runtime issues

  • @JaccovanSchaik
    @JaccovanSchaik Год назад +4

    33:12 strdup()!

  • @RobertaPROTO
    @RobertaPROTO Месяц назад

    Hi there, for a School project i do have to use a double linked list to record frequent items ( for each node i ll put the density and the time it was updated) however the problem is that i have to "organize my double linked list as a hash table using a function H" How is it possible? It also said that i do have to make the pointer to points to the next in term of time of updating

  • @zxuiji
    @zxuiji Год назад +2

    10:25, you already typedef'd it, you don't need another typedef, rather that's just asking for compile time errors

  • @aniritri8635
    @aniritri8635 Год назад +2

    Have you checked Zig yet ? Seems like a nice language to overview and compare to C.

    • @djazz0
      @djazz0 Год назад

      And Nim! :)

  • @FelixNielsen
    @FelixNielsen Год назад

    I have a question may be relevant or entirely unrelated. I'm not actually sure.
    In short, I have a problem, the solution to which could well be a hash table. My keys are known to be unique and equal length, that is to say in terms of bytes no more than a few, or not strings, but rather integers, if you so desire.
    The question the becomes, is there a special category of hash functions (or other method), which can convert these keys into values in a given range, naturally ranging from 0-(n-1) for n items?
    Mind you I can think of other solutions for doing what I need to do, which is basically a runtime defined and/or modified switch case like functionality, but none I can think of are entirely well suited.
    Thanks for your efforts.

  • @austinraney
    @austinraney Год назад +2

    Is the calloc call at 13:16 not incorrect? I thought it was number of elements then size of t. Right, it still will allocate the same amount of memory, I just would have expected that you would need to cast to make the compiler happy. What am I missing?

    • @austinraney
      @austinraney Год назад +1

      Having thought about it for a second, I guess the calls would be functionally equivalent. Are there ever cases when they aren’t?

    • @Uerdue
      @Uerdue Год назад +1

      @@austinraney From the manpage, I cannot find any evidence that swapping the arguments could ever mess things up.
      I would however argue that it could potentially hurt performance, because the `calloc` implementation might use that extra bit of information you provide by specifying what's the amount of items and what's the size.
      For example, it might try to align the memory such that no single item in the array will cross a page boundary. For this, it would need to know what's the element size.
      Interestingly, I haven seen more people supplying the arguments in the wrong order than doing it correctly.

    • @Uerdue
      @Uerdue Год назад +4

      Update: I checked the libc implementation on my machine, and found that it doesn't care: It multiplies the values, makes sure no overflow happened, and then just goes on to allocate a block of memory as large as the result of the multiplication.
      Other libc implementations might differ.

    • @austinraney
      @austinraney Год назад +1

      @@Uerdue thanks for doing some digging and sharing! It’s much appreciated!
      I was curious in particular about potential page alignment problems like you mentioned.

    • @JacobSorber
      @JacobSorber  Год назад

      Yeah, calloc is typically just a multiply, a malloc call, and a memset (or the equivalent). Sorry if I caused any confusion.

  • @Ido-Levy
    @Ido-Levy Год назад +1

    Hey, thank you for putting out these videos! I'm learning a lot from you :) Why aren't you checking for memory allocation failures?

    • @mytriumph
      @mytriumph 10 месяцев назад +1

      in my experience, it generally isn't necissary on modern computers. The odds that so much of your computer's total memory is being taken up by other processes, so much so that the program fails to allocate a comparatively small amount of memory, is small enough on modern computers that you can reasonably rule it out. Now, is it good practice to check anyway? Yes, absolutely. But it ultimately doesn't end up making that big of a difference

  • @johanngambolputty5351
    @johanngambolputty5351 Год назад +2

    Just to be cheeky, the thing is, you don't have to type text from video anyway, you can use optical character recognition, I like to do
    `spectacle -r -b -o /tmp/screenshot.png && tesseract /tmp/screenshot.png stdout --psm 6 | xclip -sel clip`
    set to a keybind, then you can just paste into your favourite editor ;)

  • @thomaswillson1107
    @thomaswillson1107 Год назад +1

    Hi, can you show us how did you custom your vscode (comparaison operators like `!=`, etc...), thx for the video !

    • @strager_
      @strager_ Год назад +1

      Those look like ligatures. You need a font with code-oriented ligatures, and you need an editor which supports ligatures. I don't know what font Sorber uses, but Fira Code is a popular font which has ligatures.

    • @jvp5000
      @jvp5000 4 месяца назад

      @@strager_ thanks

  • @Ido-Levy
    @Ido-Levy Год назад

    Also, why are you using uint32_t instead of just int?

    • @soniablanche5672
      @soniablanche5672 11 месяцев назад

      uint32_t is always gonna be unsigned 32 bit, int size will depend on your machine

  • @greg4367
    @greg4367 Год назад

    Jacob, the Subscribe button on your WEB page is inop.

  • @IBelieveInCode
    @IBelieveInCode Год назад

    Good Game 🙂

    • @IBelieveInCode
      @IBelieveInCode Год назад

      I've just written my own C "Hash Table" module. It's on my channel. Without sound. My english is badly written, but it's worse when I try to speak.

  • @_veikkomies
    @_veikkomies Год назад

    When you write "tmp = tmp->next" (e.g. lookup function), don't you have to define what "tmp->next" means? Where was that done?

    • @IBelieveInCode
      @IBelieveInCode Год назад +3

      "next" is a field of the struct "entry". You probably missed that 🙂

    • @_veikkomies
      @_veikkomies Год назад

      @@IBelieveInCode Ahh yeah, probably. Thanks

  • @erbenton07
    @erbenton07 Год назад

    points-- for unnecessary use of feof
    Jacob, long video's are fine

  • @anon_y_mousse
    @anon_y_mousse Год назад +4

    It's not a bad start, but it would help if you made it slightly more generic. You could use _Generic and specialize on a few known types, or you could use a union and associate traits with whatever user defined data gets passed in. It would help to have the user pass in a hashing function and a comparison function and have flags to determine if data should be copied or merely pointed to.

  • @randomscribblings
    @randomscribblings Год назад

    In delete you're leaking the key memory.

    • @JacobSorber
      @JacobSorber  Год назад

      Did you watch the video? 😀 Yeah, I know. The example is unfinished. See you next week.

    • @randomscribblings
      @randomscribblings Год назад +1

      @@JacobSorber Yeah... the comment was made as I was watching.

  • @randomscribblings
    @randomscribblings Год назад

    strdup() again