I hope you all liked this comparison video. If there are other comparisons you would like to see explored, let me know. Also, one commenter noted a bug in the linked list. Lines 30 and 31 in linkedlist.c need to be swapped (oops). Sorry about that. I've fixed the posted code. Unfortunately, RUclips doesn't allow me to update existing videos. So, those of you getting the code from the video, just make that change before you try to use the second insert function (it currently isn't called).
Oohh.. I thought I had gone insane or something xD I probably spent an hour trying to figure out why your code was that way, as I thought it should be the opposite. Usually, if a more experienced programmer disagrees with you, you're in the wrong... So I tried sooo hard to figure out why I was wrong (even tho my code works).
Thought of another approach, keep 2 arrays, one for pointers/objects & another for deleted pointer/object indices, keep a count & total for both (that way can share the allocation behaviour via some functions that don't care what the arrays hold) and then just check if the deletion count is greater than 1 to do something like: objects[indices[deleted_count - 1]] = object
hi jacob, thanks for all the great vids you post, your effort is much appreciated i know this video is more than 4 years old now, but i noticed that in main.c line 164 you wrote: long double traverse_time = tv_to_seconds(&myusage.ru_usage); i think you should have wrote: long double traverse_time = tv_to_seconds(&myusage.ru_usage) - last_time; also line 155 i think should have been: long double add_delete_time = last_time - start_time; am i wrong? can you please explain why computed the values this way? thanks again
Graduated from college last May, and was in your OS class about a year before that. Looks like your video making skills have improved! Hope you're doing well, thank you for your willingness and love for teaching, it made a lasting impact on me. Best wishes.
6:51 Is it me or insert_after_node() should have the statements in reverse order? I haven't programmed in C for a while, but I would say that there's a bug in this function.
This function is "dead code" in this video. You're right about the bug, but this function is not used by the program, so the bug is hiding to bite the unsuspecting...
Thanks for the video. Thinking about your data structures is a good idea in general and I really like arrays (contrary to some folks especially in modern languages). However, I think the comparison is not really fair, particularly for linked lists because you added quite a bit of memory logic to the array functions. Similarly, you could also request 100 linked items in advance and flag them used, which would counter the memory locality argument a bit. I guess, if you'd pick an example for arrays, where you don't want to have unused gaps in your memory, it'd have performed worse in comparison to your implementation, not necessarily compared to the linked list, but maybe. I also suppose that realloc sometimes has to copy the old memory block if your memory space is fragmented, but I did not check on that. In the end, it always depends on the use case and it is good when you know what performs how under which circumstances, so thanks again.
Never gonna have a lot of cache... You say that but when I found out my wife's CAD workstation had as much Lvl3 cache as my first PC had main memory... I'm glad I was wearing my brown pants that day.
Linked lists are better for updating a lot and mostly accessing sequentially. Dynamic arrays are better for random access but can be expensive to add to, but that is mitigated by having a bigger capacity then its size.
Yes, it doesn't matter if one links a node that is in the stack to a node in the heap, but one usually shouldn't because then things tend to get complicated. I am not saying it's not plausible but should be avoided in large projects. Please correct me if I'm wrong! Love your videos, by the way.
Great video once again. One comment though, I know it's not the point of the video but it looks like insert_after_node() at 6:45 is not correct. Am I missing something?
A really good comparison. It encourages me to always consider different approaches for one task. The "guest appearence" surrounds this episode prodigiously. :-) For me - as a non-native speaker - this series helps me to improve my English skills.
I default to using a (resizable) array unless my usecase can take advantage of another datastructure's characteristics. With respect to your evaluation here I do have to point out that the remove function isn't really fair. Remove and insert are a pair performing opposite actions. If you don't do a search in the insert, the remove shouldn't either. What the linked list is really good at, and the array is terrible at, is inserting or removing elements somewhere in the middle of a long list/array (when you already know the index/pointer). I also notice you intermix terms remove/delete. The rule of thumb for naming I like to use, is to call it 'delete' when the operation includes freeing the memory associated with the element, and to call it 'remove' if you're just removing the element from the datastructure.
Dear Jacob Sorber, Does there exist a platform independent way to evaluate system Cache Size during run time according to indirect symptoms of program behavior? Regards, AB
Should test insertions on the array and then talk about which is slow. Arrays are only useful if you know the capacity up front, only append, and have read-heavy operations. Otherwise trees and linked lists are your friends.
One method of old was to maintain a parallel array of "next" pointers (integers). Simply chain the elements of the next array together and have a "freep" pointer to the first element. That way, allocating and deallocating a value element is simply a matter of adjusting the freep pointer. Assuming that the value and next arrays are in the same cache, this should be quite fast.
great video! note that keeping things contiguous not only prevents page faults, but also improves performance by taking advantage of spatial locality in the caches, as usually the caching system stores the data at the requested address AND some of the neighboring data.
Nice video. I like that you have stressed its drawback well (I don't consider the fact that you have to hold one more pointer a drawback because memory is cheap but memory coherence is very important). I haven't used linked lists in my life tho not necessarily because they are slow but 99% of the time they are not useful over a vector (At least in the context of game development because I'm most accustomed to it than other areas).
You've got a beautiful little girl there mate. Loved the fun family ending. Have you gotten her to share that same passion for coding/CS? I've got a 4 and 6 yr old, but so far they are only interested in the end results...they are NOT impressed by my time to delivery or debugging skills.
Hi Jacob, I do know that this video is not that new but I just stumbled upon it today. First I did not get through all of the comments and maybe my comment will be redundant but one way to improve LL efficiency could be to keep each 'deleted' block for future use hence not 'wasting' time to deallocate to reallocate later on. Not fancy, already know but still IMO worth mentioning.
How I think of it, if you don't care about the ordering and don't need to maintain the ordering, then use a hash table with the data stored as arrays internally. An array of chains, each chain is an array, and deletions are a simple swap to the end and decrement. If you need the data in a sorted order, and obviously when that's not often, just call the sort function at that time. If it needs to be sorted all the time, either use a tree or a hybrid hash tree. I hardly ever, if ever at all anymore, use a plain array or a plain linked list to store any significant amount of data. I hope you've taught your daughter C because that would be awesome.
How about pairing an array of pointers and an array of values? Would it ever be cheaper to move around addresses instead of values and still keep the data blocked together?
I have so far not ever felt the need to use linked-lists. They COULD be useful in scenarios where the data really represents a list, but normally you are better of using different data-structures. For larger sets of data it is basically not really usable as it does not benefit from sorting, does not benefit from hashing or cache locality either. And for your array implementation - why set the values individually instead of using the available operations? You could use memset for example. Using c++ and the library functions with a make that also takes advantage of the hardware you have and it might even change the array to vectorised operations - checking several numbers concurrently.
@@sevakantonyan9833 then maybe one day you will dive in functional programming, like in LISP or scheme, and you will understand how to use linked list.
You could try defining the elements of your linked list as an array, and have each element's pointer point to the next one in the array. That way, you still have the flexibility of a linked list, whilst most elements are close to each other in memory. Or so I hope.
True, but keep in mind that that only ensures that they start out with good locality. If you are removing nodes from the list and adding them back in, or moving nodes around, then things can still get messed up. You're also still storing the next pointers, and dereferencing them as you go through the list.
You're right, but by having 'next' pointers when using an array for storage suggests that you're concerned about the sequence of the data (for some reason). This example shown doesn't care about the sequence of the data stored, just that it is stored and it can be found again. The 'sequence' of the linked list, here, is just the sequence of each node being added as needed, but the data is meaningless.
You could write a linked-list, as is, to a file (you would need to save both the data and the links), but it's probably not what you want to do. At least, when you load it back into memory, you would have to make some adjustments because the heap won't look the same next run. So, you can't use those old pointers you saved, at least not as is.
I'm not understanding the question. Running it as a background process shouldn't change the performance of the program, just how it interacts with the shell.
@@ronjeremy1232 A linked list COULD be used as a way to handle what processes are runningg but would only be used as a list of running processes and not their context state or queue.
6:35 - Line 17. If you run out of memory, malloc will return NULL and in line 19 you'll get a segmentation fault for dereferencing a null pointer. Therefore, shouldn't you check whether malloc returns NULL?
Welll..... yeah no. Malloc does not actually return NULL just cause you are out of memory - that depends on the OS and linux - by default - uses an optimistic approach and only really allocates the memory when it is used.
Just allocate whole bunch of memory at once, no one is going to notice it. After all nowadays we are using text editors that take half of a gigabyte of ram...
I hope you all liked this comparison video. If there are other comparisons you would like to see explored, let me know.
Also, one commenter noted a bug in the linked list. Lines 30 and 31 in linkedlist.c need to be swapped (oops). Sorry about that. I've fixed the posted code. Unfortunately, RUclips doesn't allow me to update existing videos. So, those of you getting the code from the video, just make that change before you try to use the second insert function (it currently isn't called).
Oohh.. I thought I had gone insane or something xD
I probably spent an hour trying to figure out why your code was that way, as I thought it should be the opposite.
Usually, if a more experienced programmer disagrees with you, you're in the wrong...
So I tried sooo hard to figure out why I was wrong (even tho my code works).
Thought of another approach, keep 2 arrays, one for pointers/objects & another for deleted pointer/object indices, keep a count & total for both (that way can share the allocation behaviour via some functions that don't care what the arrays hold) and then just check if the deletion count is greater than 1 to do something like:
objects[indices[deleted_count - 1]] = object
hi jacob, thanks for all the great vids you post, your effort is much appreciated
i know this video is more than 4 years old now, but i noticed that in main.c line 164 you wrote:
long double traverse_time = tv_to_seconds(&myusage.ru_usage);
i think you should have wrote:
long double traverse_time = tv_to_seconds(&myusage.ru_usage) - last_time;
also line 155 i think should have been:
long double add_delete_time = last_time - start_time;
am i wrong?
can you please explain why computed the values this way?
thanks again
Graduated from college last May, and was in your OS class about a year before that. Looks like your video making skills have improved! Hope you're doing well, thank you for your willingness and love for teaching, it made a lasting impact on me. Best wishes.
haha the ending was nice :D
Thanks. We had fun making it.
Yes, you should avoid linked lists according to Bjarne Stroustrup: ruclips.net/video/SfkMiGFVhZo/видео.html
It’s a bit silly to compare speed without optimizations. Especially for sequential access on an array. Compilers just love that kind of thing.
6:51 Is it me or insert_after_node() should have the statements in reverse order? I haven't programmed in C for a while, but I would say that there's a bug in this function.
i was going through the comment to see if someone already said it aha. newNode is just pointing to itself in this order i'm pretty sure
This function is "dead code" in this video.
You're right about the bug, but this function is not used by the program, so the bug is hiding to bite the unsuspecting...
Thanks for the video. Thinking about your data structures is a good idea in general and I really like arrays (contrary to some folks especially in modern languages). However, I think the comparison is not really fair, particularly for linked lists because you added quite a bit of memory logic to the array functions. Similarly, you could also request 100 linked items in advance and flag them used, which would counter the memory locality argument a bit. I guess, if you'd pick an example for arrays, where you don't want to have unused gaps in your memory, it'd have performed worse in comparison to your implementation, not necessarily compared to the linked list, but maybe. I also suppose that realloc sometimes has to copy the old memory block if your memory space is fragmented, but I did not check on that. In the end, it always depends on the use case and it is good when you know what performs how under which circumstances, so thanks again.
Never gonna have a lot of cache...
You say that but when I found out my wife's CAD workstation had as much Lvl3 cache as my first PC had main memory... I'm glad I was wearing my brown pants that day.
I was literally asking myself this question a week ago
I hope my answer helped.
Linked lists are better for updating a lot and mostly accessing sequentially. Dynamic arrays are better for random access but can be expensive to add to, but that is mitigated by having a bigger capacity then its size.
One of the best explanatory performance comparison video I have ever seen. Thank you so much for the delecate job.
You're very welcome! Glad it was helpful.
Yes, it doesn't matter if one links a node that is in the stack to a node in the heap, but one usually shouldn't because then things tend to get complicated. I am not saying it's not plausible but should be avoided in large projects. Please correct me if I'm wrong! Love your videos, by the way.
I crush the button like my code crushes the machine!
I bet sparks shot out when you did. :) Thanks!
Great video once again. One comment though, I know it's not the point of the video but it looks like insert_after_node() at 6:45 is not correct. Am I missing something?
A really good comparison. It encourages me to always consider different approaches for one task. The "guest appearence" surrounds this episode prodigiously. :-)
For me - as a non-native speaker - this series helps me to improve my English skills.
I default to using a (resizable) array unless my usecase can take advantage of another datastructure's characteristics. With respect to your evaluation here I do have to point out that the remove function isn't really fair. Remove and insert are a pair performing opposite actions. If you don't do a search in the insert, the remove shouldn't either.
What the linked list is really good at, and the array is terrible at, is inserting or removing elements somewhere in the middle of a long list/array (when you already know the index/pointer).
I also notice you intermix terms remove/delete. The rule of thumb for naming I like to use, is to call it 'delete' when the operation includes freeing the memory associated with the element, and to call it 'remove' if you're just removing the element from the datastructure.
Dear Jacob Sorber,
Does there exist a platform independent way to evaluate system Cache Size during run time according to indirect symptoms of program behavior?
Regards,
AB
I hope we are all on the same page...
Should test insertions on the array and then talk about which is slow. Arrays are only useful if you know the capacity up front, only append, and have read-heavy operations. Otherwise trees and linked lists are your friends.
One method of old was to maintain a parallel array of "next" pointers (integers). Simply chain the elements of the next array together and have a "freep" pointer to the first element. That way, allocating and deallocating a value element is simply a matter of adjusting the freep pointer. Assuming that the value and next arrays are in the same cache, this should be quite fast.
Something about this is so much fun to watch
Ok, I have to ask... are you Matthew McConaughey brother?
great video!
note that keeping things contiguous not only prevents page faults, but also improves performance by taking advantage of spatial locality in the caches, as usually the caching system stores the data at the requested address AND some of the neighboring data.
Nice video. I like that you have stressed its drawback well (I don't consider the fact that you have to hold one more pointer a drawback because memory is cheap but memory coherence is very important). I haven't used linked lists in my life tho not necessarily because they are slow but 99% of the time they are not useful over a vector (At least in the context of game development because I'm most accustomed to it than other areas).
Right, depending on what you're doing, arrays (or vectors, which are just wrappers around arrays) might be all you need.
What about: a red/black tree stored in a heap allocated dynamic array (malloc/realloc magic and such)?
Is this ^ data structure end-game?
No, Priority R-Trees are the data structure endgame
linked list gang
You've got a beautiful little girl there mate. Loved the fun family ending. Have you gotten her to share that same passion for coding/CS? I've got a 4 and 6 yr old, but so far they are only interested in the end results...they are NOT impressed by my time to delivery or debugging skills.
Hi Jacob, I do know that this video is not that new but I just stumbled upon it today.
First I did not get through all of the comments and maybe my comment will be redundant but one way to improve LL efficiency could be to keep each 'deleted' block for future use hence not 'wasting' time to deallocate to reallocate later on. Not fancy, already know but still IMO worth mentioning.
Wow! I adore your wonderful offspring!
Love both of you, keep going!
hahaha that ending. your daughter is adorable.
Yeah, she's a lot of fun. Thanks.
great!
How I think of it, if you don't care about the ordering and don't need to maintain the ordering, then use a hash table with the data stored as arrays internally. An array of chains, each chain is an array, and deletions are a simple swap to the end and decrement. If you need the data in a sorted order, and obviously when that's not often, just call the sort function at that time. If it needs to be sorted all the time, either use a tree or a hybrid hash tree. I hardly ever, if ever at all anymore, use a plain array or a plain linked list to store any significant amount of data. I hope you've taught your daughter C because that would be awesome.
What kind of father would I be, if I didn't teacher my kids C?
Thanks a lot for this wonderful explanation, please if you make some corrections to the code, could you put a link to the git repository if possible?
You're welcome. Video code (including corrections) is available through Patreon.
Link List, the Basis of Forth!
They are the basis of Lisp. The basis of Forth is the stack.
How about pairing an array of pointers and an array of values? Would it ever be cheaper to move around addresses instead of values and still keep the data blocked together?
what do you mean
I wonder if people tried to tackle this page fault problem for other structures with graphs/trees that also have a lot of indirections.
Yeah, it's an issue for any linked data structure.
at 6:50, inside the function at line 29, you should first do line 31 than 30
You're right. Good catch. Fortunately, I didn't use that code in main. I'll get it fixed in the posted code. Thanks.
@@JacobSorber Keep up the good job, thanks for what you're doing.
Thanks a ton ! You are doing a really great job.
I have so far not ever felt the need to use linked-lists.
They COULD be useful in scenarios where the data really represents a list, but normally you are better of using different data-structures.
For larger sets of data it is basically not really usable as it does not benefit from sorting, does not benefit from hashing or cache locality either.
And for your array implementation - why set the values individually instead of using the available operations? You could use memset for example.
Using c++ and the library functions with a make that also takes advantage of the hardware you have and it might even change the array to vectorised operations - checking several numbers concurrently.
You will never need to use linked list.
@@sevakantonyan9833 then maybe one day you will dive in functional programming, like in LISP or scheme, and you will understand how to use linked list.
You could try defining the elements of your linked list as an array, and have each element's pointer point to the next one in the array. That way, you still have the flexibility of a linked list, whilst most elements are close to each other in memory. Or so I hope.
True, but keep in mind that that only ensures that they start out with good locality. If you are removing nodes from the list and adding them back in, or moving nodes around, then things can still get messed up. You're also still storing the next pointers, and dereferencing them as you go through the list.
You're right, but by having 'next' pointers when using an array for storage suggests that you're concerned about the sequence of the data (for some reason). This example shown doesn't care about the sequence of the data stored, just that it is stored and it can be found again.
The 'sequence' of the linked list, here, is just the sequence of each node being added as needed, but the data is meaningless.
Excellent video, as always. But man, this would have been good to know sooner...
I know. Sorry. I make them as fast I can find the time.
@@JacobSorber Don't apologize; your videos are very good quality. I've learned so much about C from you in the past few days.
lol that outro
Can you save a linked list to a file if the member data is pointer to an object on the heap?
You could write a linked-list, as is, to a file (you would need to save both the data and the links), but it's probably not what you want to do. At least, when you load it back into memory, you would have to make some adjustments because the heap won't look the same next run. So, you can't use those old pointers you saved, at least not as is.
@@JacobSorber Thanks , I follow your videos they are brilliant!
ies
/thread
what if we use it for background processes in a shell
I'm not understanding the question. Running it as a background process shouldn't change the performance of the program, just how it interacts with the shell.
@@JacobSorber like if you were implementing your own shell program, would a linked list be a good way to keep track of them (background processes)?
@@ronjeremy1232 A linked list COULD be used as a way to handle what processes are runningg but would only be used as a list of running processes and not their context state or queue.
In short:
Need to add elements a lot? Consider a linked list.
Otherwise, always use contiguous lists.
3 strikes within 14 seconds. Instant-dislike.
6:35 - Line 17. If you run out of memory, malloc will return NULL and in line 19 you'll get a segmentation fault for dereferencing a null pointer. Therefore, shouldn't you check whether malloc returns NULL?
not in POC code
Yeah, in production code you should check to make sure malloc didn't return NULL. This code exists simply to provide a performance comparison.
@@JacobSorber Oh ok. Btw I'm a computer science student and your videos are an excellent complement. Thank you so much. Anyway, cheers from Brazil xD
Welll..... yeah no. Malloc does not actually return NULL just cause you are out of memory - that depends on the OS and linux - by default - uses an optimistic approach and only really allocates the memory when it is used.
@@ABaumstumpf What do you suggest we do then? I mean, how do we use malloc safely without checking for null?
Just allocate whole bunch of memory at once, no one is going to notice it. After all nowadays we are using text editors that take half of a gigabyte of ram...
Very True
Your daughter is gorgeous!!!