Linux kernel is switching to a modern C! Why? Lets study the source code and see why
HTML-код
- Опубликовано: 8 сен 2024
- The news is "Linux is switching to modern C"! Here the modern C seems to C99 (1999) or C11 (2011). But why? This is an opportunity for us to have a look at Kernels mailing list, see what they say, check the source code and educate ourselves! Every news piece is an opportunity to digg in! Lets digg.
Key takeaways:
1. Even Linus googles things
2. If you are Linus, you can call the C standard people incompetent
For Linus everyone is incompetent, except him 😅
My name is Linus and I am your god!
@@fr3ddyfr3sh.... and the only projects made that matters are linux(kernel) and git....
3. People who use GOTO are either really bad C programmers or really good C programmers :)
@@vaisakhkm783 lol. his 'only' products that matter are the worlds most popular operating system and the dominant Version control software lol.
The Linux linked list implementation is truly genuis in its simplicity. I remember seeing it the first time, and thinking...how is this useful if there's no data in the list_head struct?! Then I had a major 'Ah-Ha' moment when I realized the structure carries no data outside the 2 nodes, because the structure is added TO each structure which needs it, therefore becoming relevant to it and it's data. Just beautiful lol
Kpp
can you send a reference to that ?
How do you get the data in the next node though?
say I have struct A object a which has an int and a list_head. How do I get the next int? a->node->next and then what? that is the node which is in the next object
@@m.furkaner7418 well, and I'm no expert for SURE lol, but I guess you don't. Its a very simple linked list implementation, it only serves the purpose of linking any data structures that need to be linked for various purposes and reasons...ie the for_each_entry macros, etc. You only need to add a couple list_head structs to each struct you want in the list. If you're wanting to directly access each members data, I assume you just access each specific instance, or use a for_each_entry macro to touch each one.
It is quite interesting to see people are still struggling with using loop variables outside of loops in 2022. I still remember that you couldn't have inline variables, but had to declare them outside causing a ton of opportunities to do something really bad. But that was so long ago - I forgot about it. Those things need to be resolved BEFORE the last people that still understand the differences and impact pass away. It also shows that we routinely do not invest enough effort in adapting and EMBRACING progress made available in new versions of languages / standards. There are usually REALLY good reasons why these new features get introduced. Almost always they help solving / preventing issues.
No, Sorry. When you are responsible of something the magnitude of the Linux code, you cannot just jump from release to whatever is next every time a dependency gets a new version.
You need a reason to change, and do it just when the work needed is justified, and just for that reason. This case, is a perfect example; C99 couldn't be used, so they were forced to go to C11, but they removed all C11 new features possible with a flag. (Maybe all, not sure), so the only changes were the ones C99 introduced.
No worries, people can re-read code, documentation and draw consequences, no remembering needed. Even if is a book from 1978. This is not Arcane Magic.
@@framegrace1 Linus can do whatever he wants with his code.
@@framegrace1 they use C11 features in Linux. Look into something like _Generic
You'll find some uses in Linux source
This was really fun to explore. Never thought I'd be interested in diving into the Linux kernel code, but this walkthrough was very informative. I like your style. It's very easy to follow. I subscribed. The algorithm worked today!
Please forgive me for being pedantic here: there is a lot to say for being able to declare your variables whereever you want in your code, but strictly speaking the bug is not in de location of the declaration, but in the fact that it was not initialized with a known value before use and not checked for this known value afterwards. Still, the patch itself is an elegant way of working around it in such a fashion that it won't smack you in the face.
Great video btw.
No, the point is that there should be no "defining a known value" (so: which one?), initializing a variable with it, and checking it, all before and after the loop, where the variable is just not needed. No one needs to check &req->req != _req again, because the opposite has already been checked in the loop - if it was available. Lots of unnecessary overhead and potential new error sources.
The only thing that really matters afterwards is whether or not something has been found, hence the flag. Everything else should just not exist any longer - and therefore not bother. Reducing scope of symbols makes by itself safer - and mostly cleaner - code.
You both are wrong. The problem is that the second argument of list_for_each_entry is a pointer to a list HEAD. A list HEAD is not a valid item.
A list with 2 elements in it looks something like this:
HEAD first_item->node second_item->node HEAD
where frist_item and second_item are "proper" elements with both data and the next/prev pointers, but HEAD is just a struct of type `list_head` without any data. So after the loop is completed (without a break), the pointer cb points to something like `HEAD - offset_of(my_type_t, node)`, but HEAD is just a list_head without a surrounding struct.
@@ruroruro Actually, that was the first thing that puzzled me as well, the structure with the head and tail of a linked list should not be on either end point of a linked list. In fact, it should not be part of the proper list at all.
My original point still stands though, the iterator should always be initialized with a known value before use.
@@minastaros I beg to differ. It is standard best practice to initialize any variable with a known value before use. The C compiler will even warn you about it, as will tools such as valgrind.
@@damouze @RuRo When I wrote the comment, I did not look up the whole code of function gr_dequeue() which I have now. So I reacted on some buzzwords, slightly sloppily... My apologies... However:
*Initializing any variable* : no, not any variable. Only when chances are that they are _not_ assigned another value prior to their first usage. Here, initializing req (the iterator) with a value would actually be an error, because req _is_ set to the list's head by list_for_each_entry() in any case. Thus, no matter what value you used to initialize it before, that would never be used. It is also a semantic difference for the fellow programmer: by _not_ initializing, you make clear that the variable is assigned some value in any case later. Code checkers would not complain.
*Checking for a known value* : yes, that would be possible, and since req is a pointer, NULL would be an appropriate value. But, since req is already the loop iterator, it must be set to NULL _after_ the loop in case of "_req" is not found, which would force the positive case (_req found) to jump _over_ this NULL assignment, and "break" is no longer sufficient. So doing it the other way round by setting a flag is way simpler.
*req not used outside the loop* : I was just plain wrong, unaware of the complete code. Sorry. Of course, req being the iterator is used afterwards to do some stuff with the found element. ;-)
Here is the complete code: sbexr.rabexc.org/latest/sources/26/03124c1618ca75.html#0069b001006cc001
Here is the definition of list_for_each_entry(): docs.huihoo.com/doxygen/linux/kernel/3.7/include_2linux_2list_8h.html#a9b782fefb5ab71ce9762182e45a615e1
Just wanted to say thanks for the great and very clear video. Reading more source code (and kernel source) has been on my to do list for a while as I'm self taught and don't have a lot of feedback/examples to follow. This was very understandable and encouraging, while also highlighting where the kernel might not be the best example of modern practices. Will definitely check out the 'heavily commented source' link you mentioned.
14:28 You made a subtle mistake here. The problem isn't that the list could be empty. The code is wrong even if the list has items in it. The problem is that `hci_cb_list` (or `&ep->queue` in the original USB code) isn't a list item, but is just a list head (that has no data associated with it).
A doubly linked list with 2 elements in it would look something like this:
HEAD first_item->node second_item->node HEAD
Where first_item and second_item are "proper" elements with extra data and a `node` field of type `list_head` in them.
But HEAD itself isn't a real "proper" element, it is just an entry of type `list_head` without any surrounding data.
As you can see, a doubly linked list is actually a ring of pointers. To iterate over a linked list you dereference `element->next` until you get back to the HEAD you started with. But crucially, that means that (unless you break out of the loop), the element pointer `cb` (or `req`) would be something like `HEAD - offsetoff(my_type_t, node)` after the loop is done. !!! However !!! HEAD isn't a proper element, so there is no "containing" struct around it, so now you are pointing into some random memory region.
but the implementation of the doubly linked list in this case is just
head_1head_2head_3head_4 (i would agree the names look kinda wrong)
each "head" is contained inside a structure and they use a macro that uses pointer arichmetic to obtain the structure a particular "head" is in
there's a deeper explanation in this other comment: ruclips.net/video/-G1FuEQqxVI/видео.html&lc=UgwE8eFlByly-Amns3t4AaABAg.9ZLDPIrwcXn9ZPWxzAOQSv
@@juanb936 no. You can check the code in the kernel for yourself (can't post a link or RUclips will delete my comment). Each linked list has a fake element on both "ends" that doesn't have any data associated with it and it is also the "reference" to the list. When you want to pass a list to some function, you pass the pointer of this special "HEAD" element and then the function can iterate over the list until it gets back to the HEAD.
@@ruroruro i mean, the linked list structure used in this case doesn't have data associated to any of its "nodes". you can see the declaration in the video at 6:48
@@juanb936 the linked list elements have data "around" them. You subtract the offset of the `node` field or whatever to get a pointer to the "outer" struct that has actual data in it.
Now! The "HEAD" element of the linked list DOESN'T have data around it. The head of the linked list is just a naked `list_head` struct. It is not included inside another struct as a field. It's just 2 pointers to the first/last element in the list.
So when the loop terminates, you don't get the pointer to the "last" element in the list, you get a pointer to HEAD-offsetof(node) which is not a pointer to a valid element struct.
when a 23 year old C standard is modern lol
Wow, this is a great video, I’m glad RUclips recommended me your channel. Love your enthusiasm, it’s infectious. You inspire me to learn new things!
I suppose linux will become even better, I can't even imagine how better it can be, really excited for the modernized version
It will be more like Microsoft Windows!
@@illegalsmirf so that's good?
@@rodricbr Oh yeah, sure
@@illegalsmirf lol
It's mostly to fix issues and bugs . I don't think performance improvement will happen
Linux source code with all those macros must be literal hell on earth.
It's more so hell to debug, as crashing, faulting or anything of that nature *will* crash the OS and you can't "Just run gdb on it". As such you can never rely on the old crutches of "The system will reclaim all of my leaked memory", "leaking is fine because then I can't have danglign pointers", "crashing is fine if no cleanup is neccessary", and so on..
Wolf there is actually a kernel debugger (kdb)
Wolf not to be confused with kbd
Sorry, that pushes my buttons. You are misusing the English adjective "literal." "Literal hell on Earth" has a larger scope than your remark encompasses. See Al Yankovich.
@@playerguy2 The hardware will reclaim my leaked memory when the user shuts it off :p
Beautiful explanation. For a beginner like me, I learned a lot. Thanks for making it beginner friendly.
Very interesting to listen to this on the side. I've actually owned the Understanding the Linux Kernel book for many years. The way I learned was zooming into specific parts of the kernel. Specifically learning about all of the vfs areas, proc, then later cgroups etc.. Once you get a grasp of one area, it really helped me get a better understanding than before. Plus it gave a good feeling of accomplishment in the end.
This was an awesome watch. Love your friendly demeanor and simple explanations. Learned a good amount! Please make more videos like this, I'd love them. Subscribed. Thank you!
I love your enthusiasm and welcoming demeanor. You are a great teacher and make me excited to finish my computer science degree!
To anyone of you guys watching this video and understood it, "You Are My Hero"
I don't get the fuss about this. I've studied C back 20+ years ago and we could define variables in FOR loops just fine.
But yes, you always have to check you have the variable you want to work with. My C code was 10% logic and 90% checks.
The video is pretty easy to follow due to clear explanations
Thank you, I will try to live up to your adoration..
Meanwhile in "C land" a 23 year old version of the language is considered modern. ;) Video title would be far more accurate as "Linux kernel is switching to a less antiquated C!"
As a software architect indeed one strives to solve things globally instead of locally. A bit surprising to me that Linux is still using an old version of C since C99 has some very wanted features (i.e. inline functions; flexible variable declaration and restrict keyword). Linus rant on the end is totally uncalled for.
6:50 C89 actual means C language as it was specified in ISO standard version in year 1989. C11 means C language as it was specified in 2011. Those are years, not version numbers. (This is correctly explained later but this was the first occurrence and the way you introduced it made it possible to misunderstood the C standard naming style.)
its the millennium bug all over again!
He sort of explains that at 0:17 when he says the versions in their "year" interpretion ("C two thousand eleven" and "C nineteen ninty nine")
Please publish a C course🔥💚
You explain a lot of trivially basic stuff but then fail to explain the core idea of how lists work. In fact what you have shown can be misleading to a beginner.
In the example. mem_zone_bm_rtree structs are not linked among themself. nodes is not pointing to the next and previous mem_zone_bm_rtree but rather to the nodes of the next and previous mem_zone_bm_rtree in the nodes list. To get the address of the mem_zone_bm_rtree from nodes, you can calculate the nodes member offset from the start of the struct or use list_entry which does it for you. Your next video should explain why it is done like this rather than the way CS 101 teaches linked lists (having a void* in the list_head to point to the struct which contains the data) ;)
Do you know when C will vanish from the TIOBE index and from everywhere? Well, 2078, because then you can't numerate the new standards with two digits in an unambiguous way any longer.
Well - just kidding!
The wise men behind the standard voluntarily only published new standards every 10 years or even more seldom. So we have c78, c89, c99, c11, c2? and so on, and in 2094 you can define -c94 meaning 2094 without problems. Of course in 2797 it will be hard to guess, wether c96 refers to 2796, 2696, 2596 or something else, but history has shown, that standards, older than 100 years, are rarely used. Or do you remember a C standard from 18-something? There you have it!
If the intervals of new C standards increase, this method can be far wider stretched than 2978. It might be a more easy process, than finding new animal names starting with Q for Ubuntu distributions, which are needed every 13 years.
Not c2, we have c17
There was c17, so not even 10 years
First time seeing this channel, this was a really interesting video!
Just landed in this video, really informative and well explained.
Why does your lower third in the start of the video look like windows XP? Is it your KDE panel or something?
Oh so I won’t be confused trying to read his work from a 2000’s understanding of C.
Don't be confused: the C programming in the kernel is really very conservative and certainly predates the 2000's.
@@HiltonFernandes ok then..
I am going to read the kernel source with only what studied in c++ years ago in my school...XD how difficult it can be
....
@@vaisakhkm783 The C you studied in school will 100% be supported in C89. If you didn't work with C after school, the additions up to C11 will all be new to you.
Dear @@vaisakhkm783, while the use of the syntax of C is limited in the Linux kernel, its architecture is really complex. Only a very experienced C programmer can read it with proficiency.
@@HiltonFernandes I know. I read it recently. Was rather confused
Sweet...I have the Book by Kernigan and Ritchey...when I was still in the service I attempted to teach myself C using an ANSI C book on a rare i386 Sun machine. I gave up because nothing compiled...it wasn't until 10 years laer I found out that the Sun machines had k&r C not the ANSI C. :-D
So too stupid to compile...
very informative and well put together!
C89, "the 89th version of C" awesome I might have to steal that joke
I do sometimes use the goto label construct.
goto is an extremely sharp double-sided knife without a hilt
Would be nice to see the define of the list_for_each_entry macro, I'm sure it must be a macro because it's followed by a { } section?
That's such a great video. Will forward to all my family members.
I'm PHP programmer and I never stop reading learning about C specially the source code of some open source software . C is cool language and very simple get you very close to bar metal CPU instructions.
Somewhat unbelievably, this is the reason giving why my university taught C back in the 90s! The lesson then was not learned: the purpose of a high level language is to abstract, with the goal of preventing bugs. (Back then the purpose was thought to be code reuse! Slightly before that it was programmer productivity. They really had no idea.)
Then you may love Rust even more.
@@techpriest4787 Honest answer, No
@@user-ir2fu4cx6p not for hobby perhaps because of how hard Rust is to learn. Or compile for that matter. My first game engine was written in C. But unlike C++ that can not be kept simple well. Rust can go as low or as high as you want it. The highest I got was C# level. The lowest was under C++ but still a bit higher than C. I do not do embedded so assembly low level Rust but it is also a big thing for Rust. Though since I am quite confident with Rust now. Even hobby grade apps like my backup app is written in Rust. It is not any harder than C# yet more memory efficient and no GC out of the gate. Plus if any high performance code is needed. There is no need to switch to C/C++ and no need for silly extra work to write bindings that only introduce debug and performance issues.
I think it is a common missconception that Rust may only replace C. For me it already replaced my old languages I used namely F#/C#/C. I can not wait to ditch GLSL and HLSL as well.
Can hardly think of anything more modern than a 1999 programming standard.
What software do you use to draw on the screen?
Thanks for making me feel better about googling while programming 😅
reading source is always fun
HELLLL YEAAAHHHH
That was great. Thank you jadi!
Maybe the Linux kernel can move away from "one giant file per module" and crummy looking "save vertical space" code too.
What the hell are you talking about ? There might be several source files per module...
Zig programming language needs to pay attention to this matter - scoped variables for iterating the loop
Nice video, informative content. And what are the software you are using to annotate webpages and on the screen? Thanks.
@@geekingjadi Thanks for your reply. And what's the plugin used to make highlight on webpages?
gnu99, gnu18 and gnu2x are not C99, C18 and C2x. C18 doesn't even exist. gnu99 is GNU C 99. It is extended version of C.
thanks for the explanation man.
C99 is considered modern????? They've been on C89?????
When you're building software that needs to work 100% of the time you are allergic to use new stuff without years or decades of testing and improvements. Using the new hot lib or framework that web developers love is not something you can afford to do in something so critical.
@@nocivolive And we're still using QWERTY keyboards.
Linus living up to his name! A git!
Use the srouce Luke, use the srouce!
i realy happy when i see more than 100k seen in this video
Any idea where to get that source code?
KERNAL is the C64 operating system. The kernel is any Operating system's core.
If goto wasn't used, the problem wouldn't exist. I don't understand why people use goto statements.
edit: after watching the rest of the video, I think you would need {} to scope _req around the list_for_each_entry function. But switching to intermingled declarations and code is still the better solution, I think.
I got good information about c programming language.
Thanks jadi
From iran🇮🇷
Love from Shiraz!!!
Thanks for the detailed explanation
I think usb node example is some what dirty way to promote any struct to a doubly link list
Yes, what do they do, use offsetof or something? Nonsense.
This is not an dirty way, its an data structure called intrusive doubly linked list. Its used in the kernel because it is more tolerant to allocation failures, as opposed to something like having many structs and using an dynamic array to store them.
I am falling 8n love with linux more and more)
How do you successfully learn an architecture to become more familiar with why the code is written the way it's written?
@@geekingjadi in general whether it's usb or network subsystems or anything really. Every tutorial for C is about the features of the language with toy examples on how to construct some one-off type. If you look up the source for http 2 good luck going from tutorial to that. There's a bigger picture about C as an abstraction over memory and compute about C apis and interfaces and why it's written that way and not a completely different way and what these large code bases like haproxy for example are trying to do. There's a lot about innovating in C that's I've never heard talked about. But I could write you a tax calculator or Fahrenheit to Celsius converter like no ones business
A double linked list? Man...I can hear my CS profs yelling over the decades....
Just return an Int with the number of items in the list, then if greater than zero, take action?
I'm not a kernel developer so I can not comment. There are lots of considerations there.
How are you able to write on your screen with red ink? Which software?
What C version is it moving from? C1960? LOL
If I didn't google it, I likely copied it from previous code that I wrote
When Linus is *googling*, we are mere mortals 😀!! You have to get to an efficient solution and if googling does that, so be it !!
Nice like always ❤
9:29 it's about drive, it's about power.
How do key people involved with kernel manage to keep their inboxes open for anyone to email to, while still keeping them organized and respond to bug reports and have conversations with other key people? Aren't they getting flooded by internet trolls and 4ch
or cancel culture mobs? Do they have a really awesome firewall/filter?
Really curious.
Well, the core contributors are well known, so they can put them on a single list into a single inbox. Then the rest is the rest. There's still possibilities for further sorting.
Linux has been on C11 for a while now
Love from America!
the irony is that this still wouldn't fix the bug, just force people to fix this pattern of bug
stupid question, wasn't linux about to be ported to rust
Not a stupid question. I believe a Rust port is actually underway.
Ll
Lol
No, porting was never on the table.
@@TheCocoaDaddy There is no port happening. What actually happened is Linux now supports module contributions in rust.
C99 and is NEW LOL, no im just OLD
SO COOL!
No comments about "gcc bad, clang good" being vented to Linus more than one time?
wait so what version of C is linux using right now?
gnu89 i think
This is why I prefer C source code compilers that generate assembler code, which is then assembled.
I can read the generated assembler code and verify that the generated code is what I expected from the C source code.
Keep it up amoo jadi🔥🔥🔥
Let's jump to c17 instead
When is this coming out?
Excelente
Amazing
I grew up in Iran... this accent sounds Farsi, but more extreme than I usually hear... ,🙂
❤
Oooh, you get a boolean as well
What about C17, from 2017 or later?
To me, C99 is the big jump. The subsequent ones have been more incremental.
C99 also introduces iso646.h, so you can use “and”, “or”, “not” instead of “&&”, "||”, “!”, like C++.
Thank you for the video. Tres cool.
Why not C17?
"...Srouce..."
This guy has an awesome attitude!
One of the best videos about linux's code!!
i heard linux kernel is moving with Rust! isn't that true ?
The more accurate phrase should be "linux kernel adds rust support; mainly for drivers". I have worked with it and can record a video
@@geekingjadi Great idea for the video! and long life for C
love from Pakistan 🇵🇰
that whole break out of the loop when found or fall out of the bottom of the loop when not found is a clusterfuck way of coding things. the body of code after the loop that processes what ever was found should be INSIDE the loop and then return when done. drop out the bottom is "not found" and NOT A BUG!!!.
Now your loop is crammed full of busy logic and you've introduced another level of nesting to the whole thing. Great
@@maskettaman1488 the logic already exists. if (end condition) break. my solution if(end condition { handle condition here; exit; }
falling out the bottom of the loop == condition not met.
as the loop is a macro you can pass a pointer to a callback function for that macro - this would further factor your code into manageable chunks instead of having if/and/but loops nested to the umpteenth level
if (condition met) { *(handle_condition)(); return success;
at end of loop, return fail
however, they REALLY should make their code misery compliant and initialize all variables up front... yea i know, that would mean burning up .0000000000000000000001 of a oxygen molecule for every function call zomg
Wait, it's not c99?
Srouce
699!! my bad its C99 lol
If it ain't in K&R, it don't count.
VS Code is a telemetry malware. Codium has that part cut out. Stll not my cup of tea. I didn't switch to Linux to use ms crapware.
7:56 dæbld
Linux is kernel. Commodore 64 / Vic20 = kernal.
The alphabet has more than 3000 years; ans so C
It is kernel
"kernAl" 😉