4:40 "Surely the transitive property of addition means that a+b=b+a." It is the commutative property of addition. Also, commutativity can be a property of (binary) operations like addition while transitivity can be a property of relations. i.e. Transitivity of < (less than): 1 < 3 and 3 < 5, therefore 1 < 5. Okay, that my "Ummm Ackchyually" moment.
was going to comment this. understanding that c arrays decay to pointers was difficult for me to understand as a noob and really makes me appreciate c++ std::array
People always love saying "c is such a simple language". Well it is if you ignore all the more technical parts like value categories, value transformations (including array decay) etc.
One way of looking at this is that a pointer is a variable that _contains_ an address, but an array-variable _is_ the address. The difference is subtle, but can be important. For example, suppose you have an array `char str[] = "string";` in one file that you're trying to access it in another via `extern char *str;`. This should work, because arrays and pointers are the same, right? But if you do, say, `printf("%s", str);`, it'll try to interpret the string itself as an address and you get nonsense if not a crash.
The way I have always looked at it is that the index denotes how many elements appear before it. Helped ease my mind back when I was learning programming.
I thought of it as 00000000 being the first positive integer in binary, and thought the reason indexes start at zero was to be able to make arrays one element bigger, since element 11111111 would be element 2^8 and not (2^8) - 1.
@@cigmorfil4101 only in languages like python which apply special rules to negative indexes. Most languages have no such feature as it adds arguably unnecessary levels of code complexity
Minor point: Arrays and pointers are not the same type in C. The reason you can print the address of an array using %p is because arrays decay to a pointer to their first element when accessed. From K&R C "In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays really should be treated simultaneously." One important distinction between arrays and pointers is that array names are constant, but pointers are variables: This means assignments like 'mypointer = myarray' and 'mypointer++' are legal, but 'myarray = mypointer' or 'myarray++' are illegal.
In fact, the %p is just a format specified. It doesn’t care what’s passed as a parameter; it will simply try to print it as a pointer. If you try to pass an integer variable instead it will just print the value of the integer in hex with 0x before it.
@@BlueSheep95 there are some other strange differences. Multi dimensional arrays are quite different than pointers. We had quiz questions in class about these differences and my takeaway was nobody should write such code that could distinguish between arrays and pointers anyway
The point is not so much the data type, as more what's behind it, the reason is technical. Most important aspect is that we are talking about a fixed size array, from which the size is decided at compile time. In the case of a local variable this adds up to stack allocation size. So rules for what you can and cannot do with that fixed buffer is something which has to be enforced. It cannot be changed and the variable is directly bound to it, therefore it cannot be changed either.. And actually the difference between this array itself vs a pointer to it has been made even clearer in more modern programming languages, like Rust. In Rust you can get a slice from an array and use that everywhere in your code. This slice is also technically described as a "thick pointer", because it internally contains both the memory address as well as the length of the actual array. But the idea is not so much different. By understanding that this difference also actually exists in C even if it doesn't look like it does, it becomes harder to get confused by it.
People have already covered transitive vs. commutative, so I'll leave that alone. However, as someone who writes both C and Fortran, both 0- and 1-based indexing make sense in their respective context. Yes, in C the "first" element of an array is the one which isn't offset by anything, so arr[0]. For a systems language that makes sense. Fortran was written with linear algebra in mind, so arrays are stored in column major order with 1-based indexing, because I want to translate the (i, j) notation to my program, where I want element M(i, j) to make sense.
I just build my first chip8 emulator, and it has been the single most informative "low level" project I ever did. The chip8 may be a virtual cpu, but it taught many topics like what assembly actually is, how the fetch-decode-execute cycle works, what a program counter does, etc etc. If you read this, could you maybe do a video on how well such virtual processors compare to real hardware CPU's? :)
That would be interesting. I've heard there was a CPU that ran Java bytecode as its native machine language but it was unsuccessful as an alternative to virtual machines
There was some 8 bit simulator (can't remember the name), it had 4 registers A, B, C, D, and used square brackets for pointer dereference. It was essentially a Z80 with fewer instructions. In terms of speed it was obviously faster since it was being emulated on modern hardware but I'd hesitate to call it better since as far as I remember there were no bit rotates
@@williamdrum9899 that sounds really interesting! You mean like the ones described here: en.m.wikipedia.org/wiki/Java_processor ? I wonder how complex the java virtual machine actually is compared to something like a 6502. :o
1-based ordinals were the first mistake. We could actually keep the words "first" and "second" and just spell them "0st" and "1nd", but I guess it's too late now.
Strictly speaking an array is a unique type which decays to a pointer when passing it around to a function. You can see this because a sizeof on a local array r value gives you the total size in bytes of the array memory while a pointer just gives you the size of a pointer type.
It depends on whether you see a programming language as an abstraction of computer memory (0 based) or an abstraction of mathematics (1 based). What I like about C is you can have an array of struct, and as long as all the fields have a fixed length, than you can grab that block of sizeof(struct) * n as a continuous block of memory and copy or send it somewhere. It can save a lot of time over languages that make you access 1 element at a time.
Since when is mathetmatics 1 based? In a polynomial, which is one of the most common objects in math, we have to start at 0 (the smallest term in a typical polynomial is muliplied by x^0, not x^1). Not to mention, when solving equations, we often like to set things equal to 0, for easy equation manipulation. I don't think you can argue that mathematics is 1 based. You can argue that 0 can often be ignored, since there are many applications where 0 simply SHOULD be ignored, but you can't reasonably argue that it is 1 based.
@@simonwillover4175 This just is my memory of the 1 based vs 0 based programming language arguments I've heard over the years. I could have sounded less sure in my comment. I'm not an expert but you general hear this is the first element in ... rather than 0th.
@@Finkelfunk source for "CS literature tends to favor indices starting at 1"? i understand programming langs for math like matlab, wolfram, maple, etc. often are 1-based, but all the big general-purpose langs like c/c++, java, lisp, and their descendants are all 0-based. also see Dijkstra's argument for 0-based.
it makes sense to use the number 1 as "the first element of an array" but when you have a pointer that points to the start of an array the question is "how far away am i from the first element?" and the answer is always 0.
@@Finkelfunk Thanks for pointing that out. Personally, I believe that it doesn't really matter whether the indices start at 0 or 1. Even starting somewhere ridiculous, like -1, is okay, since it would just make most code more verbose. However, I believe the 0 indexed system is superior due to the way rounding works. Consider the 24-hour time system. 5:12 means that 5 hours and 12 minutes have *already* passed since midnight. In fact, even in the AM / PM system, the minute part of the time (in this case, 12 minutes) refers to how many minutes have *already* passed since the start of the hour. Put simple, when we right time in hours and minutes (and seconds), the minute (and second) part of the time is 0-indexed. We start at "00 minutes" in a given hour (and 0 "seconds" in a given minute). Zero indexing is also natural with counting. When we count numbers, we start with infinitely many zeroes on the left of the starting value, and as we count up, those zeroes start being replaced with non-zero digits. When we hit the carry limit, we set the current digit to 0, not 1. Imagine if we had a 1-indexed position format for numbers. We would count like this: 1 2 3 4 5 6 7 8 9 0 21 22 23 24 25 26 27 28 29 20 31 32 ... And each number would have an infinite string of "1" instead of "0"s on its left. This system would be ridiculous (in my opinion)!
In defense of Lua: - Lua doesn't have arrays and almost everything except for primitives is a table (basically a map or well, an associative array) and you can make them start with 0, 1, 255, true, 3.14 or any string. It's just that it's a convention to start with 1 and most functions and assume that's where your integer-indexed table starts. - In Lua you very rarely have to even use a syntax like array[1] as you can do iterations with pairs() ipairs(). If you decide to index directly there's an argument to be made that arr[#arr] gets the last element of the array. If you had them 0-indexed you'd always need to do arr[#arr-1]. All of this is not really big deal but in the end I feel like if the language isn't very low-level and operates on raw memory often 0 based indexing isn't an obvious choice.
tables themselves are associative arrays, it's why Lua describes them as having a "table" part and an "array" part, in actuality they are both the same thing, you're just using different keys to access different values that are stored in the same table, with integers being valid keys which allows you to write syntax like a traditional array :D
That's not quite right. Lua does have arrays. It's just that tables adapt to your usage. And internally Lua actually uses C arrays when your table is used solely as a 1-indexed array. It will only turn it into a hash-table internally if you deviate from that. *_"In Lua you very rarely have to even use a syntax like array[1]"_* That entirely depends on the requirements of what you're doing, and on the framework behind it. I use love2d most of the time, and I rarely use ipairs, because I'm usually using 0-indexing and/or doing performance taxing things because the default loop is quite faster. *_" If you decide to index directly there's an argument to be made that arr[_**_#arr_**_] gets the last element of the array. If you had them 0-indexed you'd always need to do arr[_**_#arr_**_-1]."_* Therein lies a problem that you didn't catch: the # operator only counts from 1. If you're 0 indexing, #arr will already give you the length-1, so your code is wrongly overcompensating. But the blame isn't really yours to carry, as the fundamental problem is that 1-indexing introduces traps like that into the language. Ultimately you actually can't use the # operator with 0-index. If the array has < 2 elements, the # op will always report 0 length, but the real length could be 1 or 0, and there's no way to tell. You also have to keep in mind that _ipairs_ assumes base 1, which is also a bit of a trap. And that's actually the main reason why I like avoiding ipairs. This is actually a big deal. Not the worse thing, sure, but still somewhat of a big deal, because it's error prone and annoying. I'll just copy-paste below the comment I just posted on the video, where I tried to lay out some issues succinctly: There's also a lot of indexing math that you have to do yourself that only works if the arrays are 0-indexed. If you are making a platformer game, you'll have a 2D array of tiles for the levels, and you'll certainly use "index = x+y*width" or "x = i%width" and "y = i/width" to access the tiles. None of that works with 1-indexing unless you spend some time figuring out how to -adapt- overcomplicate the math. I've talked about this with a lot of people over the years, and I've seen many people who confuse indexing with counting, and also many who think 1-indexing is just something you get used to and it becomes a complete non-issue. It doesn't, ever. You just learn to live with it. It's not the worst thing, to be fair, but it's a perpetual rock in your shoe. While Lua (and also Julia) actually allows you to easily 0-index arrays, realistically you won't do that with every single array you ever create, because the language itself pushes for base-1. If you create an array literal, like "a = {1,2,3}", it will be naturally 1-based. The # operator only counts the elements from 1. The _for_ loops include the upper limit, because Lua expects you to loop from 1 to limit, not from 0 to limit-1. All of this plays a part in making it quite annoying and very prone to human mistakes. - You have to worry about not forgetting to -1 the for loop limits when looping from 0, or you get an extra iteration that can cause problems. - Sometimes you have to waste time thinking whether you should 0-index an array or just let Lua have it its way. I've had times I chose the latter, only to then regret it and have to waste even more time carefully changing my code to accommodate to 0-indexing. - Your code becomes inevitably inconsistent, with some 0-based arrays and some 1-based arrays, and then you have to be extra careful to keep in mind the ones that are 1-based, because you might have to +1 or -1 whatever variable carries the index. - It's harder to do utility functions that deal with arrays, because you can't predict the base of the arrays users might throw in there, and you have to waste more time making them work for both. - It's harder to port code to and from Lua. It requires extra care and attention, because loops will need corrections, arrays may or may not need to be made 1-based, and consequentially some code may need to account for that, etc. And then if the code isn't working, you have to double check all of the above on top of double checking if the translation is correct. I've been coding in Lua for about half a decade, and that's been my experience. Lua is actually a brilliant language, maybe my favorite ever, but this was a really unfortunate design decision that I wish had never happened. My initial months with Lua (not a beginner programmer), were also quite confusing. It took me quite some time to figure out when I should 0-index and when I shouldn't, and to this day, sometimes I'm still not 100% sure in all cases until I try one of them.
@@skaruts couldnt have said it better myself. i actually also did not know that optimization you mention in the beginning, with using an actual C array until you use the table like an associative array which then turns it into a hashtable. i too really love Lua, it is definitely my favorite language, and the 1-indexing assumption most built-in Lua functions have is irksome. however, you do have a distinct advantage in Lua in that you can *override* these built-in functions and make them work for both 0 and 1-indexing, which helps to address many of the problems you bring up. the other main things i dislike about Lua is the lack of a continue statement and no typing, i think those were not good decisions to make either. ultimately, though, since Lua is free and open source and has reasonably relaxed licensing, you could actually make whatever changes to Lua you like for your own use or even to ship into other products with and i think that's really cool :)
@@Templarfreak I tend to avoid tampering with the standard stuff, because I could forget that I did it. But yea, you can still create your own variations of it. The flexibility of Lua is actually one of my favorite things about it. Also, you can use goto if you really, really need a continue. I think it's usage is discouraged, but I've used it when porting code that used continues with very complicated if statements I didn't want to mess with. for ... do if complex_condition then goto continue end -- code ::continue:: end end
@@skaruts yeah, this is like the only way that i know of that you can use to get a continue-like statement using a goto, which i do all the time. i think this particular use-case of goto is perfectly fine. it still sucks that we dont have a more proper solution, though. in some cases, tampering with the built-in functions is also a necessity, though, if you want to implement your own types then certain functions would benefit from being overridden. for example if you want the built-in type function to return the correct value then you have to override it because Lua does not provide a better method of doing so. also by default all usertypes you define C-side that you expose to Lua will always just be considered a usertype by the type function and Lua in general, which may not be appropriate depending on your situation.
Lua actually DOES have a 0th element to their arrays! it's just that all the built-in Lua functions that iterate over arrays all start at 1. you can access 0 perfectly fine with your *own* code, though, because they are simply associative arrays with integers as valid keys, which means 0 is a valid key for an index of an array as well. also, the funny thing about those built-in functions working in that way is that you can also override built-in Lua functions, tables, etc :D
The trick is Lua doesn't have arrays! (well it does but that's niche). They're all hash tables so you can just as easilly index -2 billion as you can 0 and start from 150. You can even index starting from "porkypie" if you want. iirc you need to use strings and userdata to get actual arrays. userdata is C binary
@@Mallchadi havent totally fact-checked this yet but as it turns out if you do just use integers as keys Lua will actually initially only make your table an array on the C side until you use something else as a key for it which it will then create the hashtable part of your table
actually the type of an array is indeed an array (in your case it's 'int[4]'). But it decays to a poitner when used in an expression. There are 3 cases where it doesn't decay into a pointer: 1) sizeof( my_array ) 2) &my_array 3) typeof( my_array )
Just as a reminder, sizeof(my_array) can't be used like this: int getArraySize(size_t* my_array) { return sizeof(my_array); } Because then it will just give you the size of a pointer on your machine
My father was interviewing for his second job, and was asked this very same question. He got it right and the job. The guy that asked about arrays/indexes wrote the companies P&L system and used this in someway for a radix tree and my dad ended up taking the project over.
Love the way that different folks say that arrays are and arrays aren't pointers. Lots f confusions about the meta-confusions about the distinctions of cognitive and standardisation levels. A non assignable 'pointer'. I love the woosh those non-lvalues make as they fly by.
No comment regarding Lua, but Fortran defaults indexing to start with 1, however it can be changed by the programmer. So, yeah you can do some insanely serious number crunching (as many still do) in Fortran and a default 1 indexing. ; )
I learned this stuff on accident while learning about vesa video modes and directly writing to vram. pushing qbasic to its absolute limits and breaking out of it really taught me a lot when I was starting out.
I always enjoy watching these shorts, so keep them coming. Fun fact: as the index into an array is (usually) a signed integer, as far as the C compiler is concerned, 0 is the midde of the array, not the beginning. This actually becomes quite useful for people who do systems programming in C and who need to access hidden bits in system structures, especially if you're doing bare metal programming.
@@williamdrum9899 For instance. But it could also be that a function returns relative indices in an array that was passed to it as a pointer. Items to the left will have negative relative inidices and items to the right will have positive indices). The compilere does not force you to use a zero or a one as the first index in an array. As far as it is concerned the moment it needs to do something with an array, it will add the index to the pointer to the start of the array. Remember: subtracting is just adding with a negative (2-complement's) value. So: int *p = NULL; int a[100]; /* Let's assume for brevity's sake that this array is actually initialized */ int b = 50; int v; then: v = a[b - 3]; is equivalent to p = &a[b]; v= p[-3]; and: p = &a[b]; v = *(p - 3); I'm not saying that this is always good practice, but the many, many ways one can go about referencing an array (or any other object that is fundamentally a pointer under the hood) and its contents, simply warms my heart ;-).
No, not true. 0 is the start of the array. No space is allocated before 0. Sure, you can potentially do negative indexing, but that would be illegal. You might as well say that all arrays are huge because even if you declare a 3 element array you can still attempt to access the 20,000th element (and likely trigger an exception on any system with an MMU).
@@williamdrum9899 If you use malloc to allocate space, then in integer and a pointer (usually) are stored before the space itself to store the information required by the free() call.
@@cccmmm1234 You are confusing the convention with how the C compiler treats arrays under the hood. An OS or firmware may put boundaries on the memory you are allowed to access, but the C compiler does not care about that, nor does the C language specifically say an array should be 0-based or that indices in an array should always be 0 or positive. For instance: //------------------------------------------ char *p = "Hello World!"; char *q = NULL; int i, n = strlen(p); q = (char *) malloc(n + 1); p += n; for(i = 0; i < n; i++) { *q++ = *p--; } q[n] = 0; //------------------------------------------ is functionally equivalent to: //------------------------------------------ char *p = "Hello World!"; char *q = NULL; int i, n = strlen(p); q = (char *) malloc(n + 1); p += n; for(i = 0; i < n; i++) { q[i] = p[0 - i]; // Remember p points to the last non-nul character of the string } q[n] = 0; //------------------------------------------ In C, strings are merely character arrays. By convention we assume 0 as the start of the array, but there are circumstances where a function may return a pointer to a portion of memory where the "left hand side" (negative index) contains data we may want to use as well as the "right hand side" (positive index). In the above example, after the initial loop, p points to the last non-nul character in the array, but not to the very last character in the array (which is the nul-character). In other words: we have valid data both on the left side of p and on the right side of it. p[0] contains the exclamation mark, p[-1] contains 'd', and as mentioned before p[1] contains the string terminator.
Lua has tables instead of arrays, its like a dictionary, the index are actually keys and values are values assigned to that keys, also lua stores tables in heap and not stack and its size is dynamic, thus it is very possible for a table to be like {9: "9th", 5: "5th", "aString": "AStringValue"}, and when you iterate through it with pairs method, it goes from 9 key to "aString" key.
Turbo Pascal string arrays back in the day was fun; 0 holds the length of the string, 1 is the first character. Now just don’t think to much about text longer than 255 characters, such thoughts are illegal :)
Yep all pointer arithmetic occurs in this fashion (and array style dereferencing is just that with some added syntactic sugar), this is also why pointer arithmetic isn't allowed with void pointers - it doesn't "know" the size/alignment of the underlying data.
Why can't it just default the size of the data to 1 bit or 8 bits? That would be a pretty understandable thing. Or maybe 64-bits, since most systems use 64 bit memory addresses.
@@simonwillover4175 void pointers are intentionally defined as "typeless" so that they may be used to abstract away the underlying type it's pointing to. Assigning any default size is going against that, if you want to inspect the memory byte-wise you can always cast (void*) to (char*) - since their alignment is guaranteed to match. Also bitwise memory access isn't a thing afaik, memory granularity is generally on a byte scale.
Lua is a nice and simple scripting language, but it it's good to understand that it has a ton of Pascal (which has almost the same control statements) and VB style design in it, and all those languages have 1 based indexing or they even mix things up. VBA and COM interop stuff on Windows are the worst actually. I have had a lot of headache moments in the past programming code around spreadsheets that have their first cells start at row 1 and index 1 while I started from 0 as I am used to. 😤
I like this, but it has a downside. Let's say I store these two strings: (7) "go home" (13) "Don't go home" Now if these were null terminated I could just store the second string, and still print the first with a little pointer arithmetic. With a pascal string you can't really do that
Lua doesn't use arrays, Instead, it uses tables, which are a more abstract data type separate from arrays (though simple tables are represented as c-arrays under the hood). Lua using 1 as the first index in a table isn't necessarily 'incorrect', just different. Since tables in Lua also function as trees, dictionaries, etc., you can start a table at index '0' and implement a custom iterator function to simulate how arrays work in other languages. I do still agree that all array-like structures should start indexing at 0 just out of convention alone, but it's not wrong in any way to index from 1 in Lua's case. Example: --// Custom iterator local function zpairs(t) local i = -1 return function() i = i + 1 if t[i] ~= nil then return i, t[i] end end end local tab = { [0] = 1, [1] = 2 } --// Table indexed from 0, will not work with ipairs function. --// Using the custom iterator for i, v in zpairs(tab) do print(i, v) end --[[ Expected output: 0 1 1 2 ]]
Huh, I never knew basic tables were represented as arrays under the hood. Guess I should change my comment then, though, that still doesn't really change the fact that the actual name for this is a "Table", not an array in the Lua programming language. It still functions as a dictionary which keys increment from 1.@@skaruts
@@BeconIsYeck the name isn't very relevant, though. An array is simply _"an ordered series or arrangement"_ (google), and it can apply to lists or groups of things, like solar panels. The names we use are just conceptual distinctions for arrays with different functionalities. A Set is an array that excludes duplicates. A Deque is an array with a specific mode of access. The name _"associative array"_ is often used to refer to Dictionaries / hash-tables / maps. The Lua table can be made to work as any of the above and more.
I would like to make (what I believe to be) a few important points regarding 1-based indexing: -It is not less optimal than 0-based indexing at a low level. Any optomizing compiler will simply use a pointer that begins 1 index before the start of the array. In fact, whenever your write a loop that contains an expression of the form myArray[constant offset + i], the base address used for the array is the normal base address + constant offset. -It is not less natural than 0-based indexing. Both are arbitrary decisions. Just like pi is an arbitrary multiple of the circumference of a unit circle, 0 is an arbitrary offset into the array. Often it is more convenient to start at 0, but it is also sometimes more convenient to start at 1 or any other number of offsets, depending on the problem. Overall, 0-based indexing is often most convenient. However, it is not objectively "better" than 1-based indexing. Most people are used to using 0-based indexing, of course, so it should stixk around for now. However, compilers also do plenty of things that seem less convenient or "natural" at a low level because they are more intuitive.
It does. The reason you can write either one is that architectures access arrays slightly differently but are all capable of doing it, some cpus just need to take extra steps. For example, in MIPS Assembly you can only use constants as offsets for a memory load. If you want a variable offset you must add it to the array's base pointet first.
C arrays are arrays, not pointers . They are pointer-like types so their "value" is indeed the address of their content but if you try to get & myarray you'll get the same value as myarray meaning we got the address of the array. Being a specific type allows typing of multiple dimension arrays because now you can reason about array of arrays (packed, no multi-indirection kind) . You could not do it if C had no "array of N objects of type T" type and everything was translated to pointers
True, but it'll really boggle you when you try to use _Generic and it matches every array passed to it as a pointer to the given type instead of an array of any dimension. It's just super annoying because it kind of reduces the utility of the functionality. I can't seem to determine if it's a bug in gcc or if that's accurate to the standard, but I don't like it either way.
Sees the thumbnail "yeah of course that works." Like it just logically makes sense your accessing the array ptr bytes in from 0 thats just accessing the array again
Indeed, the explanation doesn't make sense to me. If you are indexing from 0 using array syntax I would expect that the 0 would be treated as a void*, so the compiler wouldn't multiply any type size, since that's unknown, and just work with raw bytes instead.
1-based indexing is not evil nor incorrect. That just happens so C-style arrays can work with math better if they start at 0. Also, nerd font is broken
This is actually the same as writting array[index], I've always seen the array brackets as another dereferencing method. You can do pretty weird stuff with that, f.e:. typedef struct { int x, y, z; } Vec3; void printFoo(Vec3* foo) { printf("x = %d ", foo->x); printf("y = %d ", *((int*)foo + 1)); printf("z = %d ", ((int*)foo)[2]); } Those dereferencing methods are completely valid, as you always interpret a block of memory.
Technically, if you look closely in just the right way, you’ll see that arrays have the type of array, not pointer. (Big example is with `sizeof`, but there are others). It’s just that they’ll decay to pointers very easily.
I have long wondered why arrays started with zero, this was a good answer. I used to think that we just didn't have any reason to waste that 0th index, so we used it haha. Also that i[a] thing is very cool I didn't know that could work!
index 0 exists in lua, it is used to say "invalid index". since it can't use -1 like in C-like languages for thinks like indexOf. (since -1 is a valid index in lua)
It's a great trick but you have to be careful when writing the assembly for it. Use relative offsets for jumps, and absolute addresses for calls. Otherwise you end up just executing the original code in the former and risk a program counter escaping in the latter
[commenting this before watching the video] It makes sense - the array is a pointer to a block of memory and you're adding x times the size of whatever is in there. And since addition gives the same result in both directions, you can index x with the pointer and still be correct.
array starts from zero because it's reduces the time of calculating the address. the formal is: base address+index * sizeof(ex int) if it starts from 1 not zero the formula would be base address +( index -1 ) * sizeof(ex int).
Shorter answer why 0[a] works: Arrays in C are just syntactical sugar. You can make the compiler do the very same thing without ever using array syntax in C. a[x] is just nicer way of writing "*(a + x)" and that's why a[x] is the same as x[a], as addition is commutative (a + x = x + a)
Honestly it would make sense to start at 0 because of -1, which points to the end of the array, but if arrays started at 1, it would be pretty wierd (you would use 0 instead)
In C the -1 element is not the end of the array but the element before the address pointed to by the array pointer: int myarray[] = {1, 2, 3, 4, 5, 6, 7, 8}; int *myarrayptr = &myarray[5]; printf("%d ", myarrayptr[-1]); will display the number 5 as myarrayptr is pointing to myarray[5], which contains 6, and the element before it is myarryt[4] which contains 5. Similarly printf("%d ", myarrayptr[-5]); will print the value of myarray[0] which is 1. C has no array bounds checking (you are supposed to know what you are doing) so you can quite happily run off _either_ end of any array you've defined. This was used in des.c (which did the [Lucifer] DES encryption, as used by unix password encryption back in the 1980s): it defined two arrays L[] and R[] next to each other and effectively merged them into a single array for processing by using the first array defined (L) until it specifically wanted to use the two halves (Left and Right) separately.
When you are using a thing you should not be forced to think what's going under the hood. As the creator of an array you should be able to decide what indexing you want to use. When I started to learn programming in school they used Pascal to teach and Pascal has this interesting quirk - it can have an array with any arbitrary integer indexing, any step. You can have an array that starts at 100, and then goes 105, 110, 115 and so on. Indexing is an interface and the interface user should decide how does the interface work. You should be able to choose any indexing you want.
I love pointer arithmetic, as soon as you start interpreting everything as a chunk of memory, instead of arrays, structs,... , the possibilities get endless. For example: typedef struct { int x, y, z; } Vec3; void printFoo(Vec3* foo) { printf("x = %d ", foo->x); printf("y = %d ", *((int*)foo + 1)); printf("z = %d ", ((int*)foo)[2]); }
Always remember arr[i] is equal to *(arr + i). And the index always increments by the sizeof() the datatype (int, char, ...). This is valid too: int a = 0xAABBCCDD; int b = (int)(*((char*)&a + 2)); printf("%x", b); Which will print BB, because you only take one byte (char) out of a 4byte integer, as you interpret the integer memory as char. Pointers are amazing 😅
I found the explanation of why 0[arr] works confusing… explaining how arr[0] works and _then_ how 0[arr] works would’ve made it clearer for me I think. Great video nonetheless!!
5:40 I can’t understand how this holds true for any index that isn’t 0, like I don’t see this working with an index of 1 since with 1[array] => *(1 + array), array is the non pointer type that gets upgraded to an index which would leave us with *(1 + array * 4), which isn’t what we want at all
Lua doesn’t actually have arrays though. It has tables, which are dynamically sized associative containers that can be keyed using almost any data type. In other words, you can think of a Lua table as being like std::map. As such, you *can* use 0 as a key if you want. However, convention is that you don’t.
I have a soft spot for Lua but I do wish the arrays were 0 indexed like they should be lol. Either way, arrays in Lua are insane abstractions that you can index with basically anything, iirc you can do it with a string or function or whatever you want lol
This could also be explained by pointer arithmetic being the same as array arithmetic. In pointers you usually do *(p+i) being “i” the index. This said, you can also do *(array + i) and it would still work, as p[i] also works. Pd: just finished watching the video and you explained this, must watch all the video before commenting hahaha
I feel like 1-based indexing is superior. It is way more intuitive and tbh it also makes more sense when thinking about memory. It is the first part of the allocated memory for the array. Yes, when skipping over to other elements you then multiply by the index-1, but that can’t possibly be a problem for performance or security, right? I feel like 0-based indexing is just a flex of programmers on other people.
Arrays are just pointers in memory to a start, that span an x amount of elements. A pointer + (any intergral value or address) = an address (pointer arithmetic hmm yes). Memory is funny, and when we want something we just ask for the address the value starts at. Oh yea we know that we take 4 bytes because it is an integer. So the datatype * (how many items) desides the span, the index * typesize + array pointer will be the actual thing you want. Oh yea just read an x amount of bytes starting from there (where x is the typesize). Tadaaaaa, you have successfully buffered an integer into memory. Incredible yes. I always try to explain to people that index 1 and position 1 are two different things. They do not seem to understand...
Yes, because of the way C does multidimensional arrays. Though, an array of pointers doesn't qualify as a multidimensional array, and in general you shouldn't do it, so don't.
Summary: * indexing is actually a commutative operation * an array (or vector) is actually just a pointer; if we some array, named `items`, the compiler represents `items` as a pointer; the compiler does know that this pointer is pointing to a list of data, rather than a "single" piece of data, but it treats most pointers just like they are numbers; this is done this way for the sake of simplicity, really; there is no need to differentiate pointers to lists, pointers to single pieces of data, numbers, and booleans in certain contexts! * `array[index]` is actually a shorthand for `*(array + index)`; this accesses the value at the "location" of the sum of `array` and `index`; really, `(array + index)` is just another pointer, to a specific piece of data, and pointers can simply be represented by numbers; * well, addition is obviously commutative; therefore, anything that uses addition in the right way also has the opportunity to be commutative; in our example (of array indexing), the addition is used commutatively; notice that we can swap `array` and `index` in the code: `(array + index) == (index + array)`; this equality obviously holds under an unary operation, such as `*`: `*(array + index) == *(index + array)`; * we can see from the previous conclusion that our indexing shorthand is also commutative: `array[index] == index[array]`;
Thank you for your C explanation and your time. It would be interesting to see C++ also. In “modern” languages like Go or Rust the classes were cut off cause they decrease of code execution speed and they use structs like replacement. What do you think about? Is it affect on code execution speed.
Classes use what's called a "vtable" which means they store a function pointer. The youtuber Creel makes a great video explaining it called "Object Oriented Programming is a Dirty Rotten Low-Down Trick." In short, every class object has a hidden variable - a pointer to its "version" of a polymorphic function. This means you have an extra pointer to dereference. Now, this isn't always a bad thing. In fact, this "polymorphic" style is very important in system calls on many 80s computers, to maintain compatibility between different firmware versions
7f is the heap and executable space, on Windows, most of the time. Is it the stack on Linux? I didn't know that. Usually stack addresses are much lower for me.
Type of the array in C is array, not a pointer. Array type degrades to the pointer when operated on it, basically like when you assign integer to float or function name to the pointer to function. You can prove that by taking sizeof of array and you will see that it is of size `basic type * count of objects`.
If you want to call it "index", then you should start at 1, per mathematical tradition and day-to-day experience: when you assign numbers to things -- which is one of the definitions of indexing -- you always start with 1; for example if you tell someone you live in the 4th house from the intersection you expect them to start counting from 1, not 0. If you want to start at 0 then just call it what it is: an "offset".
You're confusing _counting_ with _indexing._ They're not the same thing, neither conceptually nor in practice. Consider these two arrays: [1, 2, 3, 4, 5, 6, 7, 8, 9, 0] -- array with a 10 element count, indexed from 1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] -- array with a 10 element count, indexed from 0 The actual values are irrelevant, I just used them to illustrate the different indexing. As you can see the count is the same, regardless of the indexing. In practice the indexing math -- that you need for, e.g., convert an index to an X, Y or vice versa -- will only be simple and straightforward if you're indexing from 0. I'm talking about things like this: index = x + y * width x = index % width y = floor(index / width) Pretty simple stuff. But if your array is 1-indexed then you'll have to waste time overcomplicating that math, and you'll probably gonna get it wrong too.
@@skaruts Why do you think I'm confusing them? All I'm saying is that in real life indexing (assigning numbers to objects) is TYPICALLY done starting from 1 and counting up. You can show 3 shirts to a friend and tell them: "this is 1, this is 2, this is 3, which one do you think looks best "? Of course you can also say "this is 0, this is 1 and this is 2" or even "this is 5, this is 17 and this is 611" but your friend may find that odd. That is also how it's TYPICALLY done in math. Go to Wikipedia and search for "Row and column vectors" and you'll see it. It's probably why languages like Matlab, Mathematica and Julia are also 1-based. If you're talking about pointer + distance then I think "offset" is a much better name than "index".
There are tons of things in maths that are indexed from zero. Infinite cardinals, base vectors in spacetime algebra, polynomial coefficients, and so on.
This explains why in the C implementation arrays start at 0, but the answer to the question "why were c arrays implemented that way (start from 0)?" is probably mainly because if they started from 1, you'd not only loose 1 index from the addressable integer range (which may not be much today with 32 bit or 64 bit integers, but if you are working on enbeded systems with bytes, especially in the old days, that's significant), you'd also have to check for both upper bounds (length) and lower bound (1) when accessing en element, instead of just checking that the index is below length.
0-indexing also simplifies the indexing math a lot. index = x + y * width x = index % width y = floor(index / width) None of that works with base 1. If you really wanted base 1, you'd have to overcomplicate that math, and it's actually quite tricky to get right. And if you're working with 3D grids, I don't even want to think about it.
@@atomgutan8064 hmm, that does work indeed (I've just tested it). It's actually simpler than I thought, but I personally wouldn't have figured it out. What about the conversion from index to x,y, though?
@@atomgutan8064that won't work. That will never point you to the last index of the array. In a 16x16 matrix, the last element is the 256th. If *_index = 256_* , then *index%width* is 0, which is incorrect. Well, it will break anytime *x == width.* As for the Y, it's also wrong. If *x == width,* then that equation will break as well. My Y was also wrong, as I forgot to floor it. For base 1 you might want to just *ceil(index/width),* perhaps. But this is why I was saying this is quite tricky to get right.
C is basically just a smidgen of syntactic sugar on top of what your CPU instruction set does (that's why it does not hide endianness, for instance, because that would be an awfully inefficient thing to do on CPUs that have the "wrong" bye order). It requires you to think at least a little bit about the hardware that your program runs on. Not too much, just enough to make you a really good programmer over time. If you are trying to avoid that level of knowledge, then you will always stay a mediocre programmer because performance is, whether we like this or not, hardware dependent. Not saying you can't get around low level stuff most of the time by using optimized libraries and LLVM, but nobody ever died from looking underneath the hood of their car.
I did not know that it is possible but in my opinion it is some weird bug in compiler parser which is related to token parsing priority. Prove: Ok, you said that: a[i] = *(a + i); i[a] = *(i + a); When i compile: int index; int index0 = *(index + 0); compiling fails, error: invalid type argument of unary '*' (have 'int') but when I compile: int index; index[0]; compiling fails, error: subscripted value is neither array nor pointer nor vector and then when I compile: int index; 0[index]; compiling fails, error: subscripted value is neither array nor pointer nor vector Which clearly states that id from lexer before brackets token can't be a number and compiler specially checks that rule before doing any optimization. Compiler always checks variable type of token before [] otherwise compiling "index[0];" and "0[index]" and "int index0 = *(index + 0);" should generate the same error. So in this case it is bug not a feature.
There's also indexing math that you have to do yourself that only works if the arrays are 0-indexed. If you are making a platformer game, you'll have a 2D array of tiles for the levels, and you'll certainly use "index = x+y*width" or "x = i%width" and "y = floor(i/width)". None of it works with 1-indexing unless you spend some time figuring out how to -adapt- overcomplicate the math (and I'm not sure it's even possible to make it work). I've talked about this with a lot of people over the years, and I've seen many people who confuse indexing with counting, and also many who think 1-indexing is just something you get used to and it becomes a complete non-issue. It doesn't, ever. You just learn to live with it. It's not the worst thing, to be fair, but it's a perpetual rock in your shoe. While Lua (and also Julia) actually allows you to easily 0-index arrays, realistically you won't do that with every single array you ever create, because the language itself pushes for base-1. If you create an array literal, like "a = {1,2,3}", it will be naturally 1-based. The # operator only counts the elements from 1. The _for_ loops include the upper limit, because Lua expects you to loop from 1 to limit, not from 0 to limit-1. All of this plays a part in making it quite annoying and very prone to human mistakes. - You have to worry about not forgetting to -1 the for loop limits when looping from 0, or you get an extra iteration that can cause problems. - Sometimes you have to waste time thinking whether you should 0-index an array or just let Lua have it its way. I've had times I chose the latter, only to then regret it and have to waste eve more time carefully changing my code to accommodate to 0-indexing. - Your code becomes inevitably inconsistent, with some 0-based arrays and some 1-based arrays, and then you have to be extra careful to keep in mind the ones that are 1-based, because you might have to +1 or -1 whatever variable carries the index. - It's harder to do utility functions that deal with arrays, because you can't predict the base of the arrays users might throw in there, and you have to waste more time making them work for both. - It's harder to port code to and from Lua. It requires extra care and attention, because loops will need corrections, arrays may or may not need to be made 1-based, and consequentially some code may need to account for that, etc. And then if the code isn't working, you have to double check all of the above on top of double checking if the translation is correct. I've been coding in Lua for about half a decade, and that's been my experience. Lua is actually a brilliant language, maybe my favorite ever, but this was a really unfortunate design decision that I wish has never happened. My initial months with Lua (not a beginner programmer), were also quite confusing. It took me quite some time to figure out when I should 0-index and when I shouldn't, and to this day, sometimes I'm still not 100% sure in all cases.
As far as I know arrays in lua are what’s called a table. It is the single complex datatype after functions and c object data. All things are handled via tables, there is nothing like tuples, dictionaries, lists or even classes. You want to have OOP? You have to realize it with tables. Tables are simply key value pairs (skipping over modifications you can do with metatables). You can use everything as a key, a string, a number, even boolean values. So there the numbering is not important for the underlying datastructure.
Lua is interesting under the hood. If your table is being used solely like an array with base 1, then it uses a C-array internally. If you deviate from that, then Lua will turn it into a hash-table internally. This is actually explained in Lua 5.1 book by the creator of Lua. I presume that that means Lua has to correct your indices in some way under the hood, when the table is a C-array under the hood.
So 1[myarray] crash? I presume it would be equivalent to a pointer to the next memory value after where myarray is starting, but first if myarray contain stuff that are not of size 1, I would get gibberish, and I the type of the array is kinda lost in my assumption
4:40 "Surely the transitive property of addition means that a+b=b+a."
It is the commutative property of addition. Also, commutativity can be a property of (binary) operations like addition while transitivity can be a property of relations. i.e. Transitivity of < (less than): 1 < 3 and 3 < 5, therefore 1 < 5.
Okay, that my "Ummm Ackchyually" moment.
i am no bueno at words nor maffs
As a mathematician it also hurt quite a lot when he said "transitivity" 😂
But it doesn't work for 1[a], right?
Mfw the binary operation is a group
@@luwi8125 It does
Actually, array in C does.have its own type, but it will decay into a pointer when it's used in expressions.
was going to comment this. understanding that c arrays decay to pointers was difficult for me to understand as a noob and really makes me appreciate c++ std::array
People always love saying "c is such a simple language". Well it is if you ignore all the more technical parts like value categories, value transformations (including array decay) etc.
@@sinom what do you mean by value categories and value transformations? although I do agree that C arrays can be a little hard to work with
@@sinom value categories are c++
One way of looking at this is that a pointer is a variable that _contains_ an address, but an array-variable _is_ the address.
The difference is subtle, but can be important. For example, suppose you have an array `char str[] = "string";` in one file that you're trying to access it in another via `extern char *str;`. This should work, because arrays and pointers are the same, right? But if you do, say, `printf("%s", str);`, it'll try to interpret the string itself as an address and you get nonsense if not a crash.
The way I have always looked at it is that the index denotes how many elements appear before it. Helped ease my mind back when I was learning programming.
Yeah. It makes a lot of sense. I can't imagine anyone accepting it without first coming to this conclusion.
You do realise you can use things like a[-1]?
What does it mean to have -1 elements before the one you're accessing?
I thought of it as 00000000 being the first positive integer in binary, and thought the reason indexes start at zero was to be able to make arrays one element bigger, since element 11111111 would be element 2^8 and not (2^8) - 1.
@@cigmorfil4101not in c, or not without unexpected results
@@cigmorfil4101 only in languages like python which apply special rules to negative indexes. Most languages have no such feature as it adds arguably unnecessary levels of code complexity
Minor point: Arrays and pointers are not the same type in C. The reason you can print the address of an array using %p is because arrays decay to a pointer to their first element when accessed. From K&R C "In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays really should be treated simultaneously." One important distinction between arrays and pointers is that array names are constant, but pointers are variables: This means assignments like 'mypointer = myarray' and 'mypointer++' are legal, but 'myarray = mypointer' or 'myarray++' are illegal.
If those assembly programmers could read they'd be very upset
The difference is the implication of "const" when defining an array over a pointer. Nothing more.
In fact, the %p is just a format specified. It doesn’t care what’s passed as a parameter; it will simply try to print it as a pointer. If you try to pass an integer variable instead it will just print the value of the integer in hex with 0x before it.
@@BlueSheep95 there are some other strange differences. Multi dimensional arrays are quite different than pointers. We had quiz questions in class about these differences and my takeaway was nobody should write such code that could distinguish between arrays and pointers anyway
The point is not so much the data type, as more what's behind it, the reason is technical.
Most important aspect is that we are talking about a fixed size array, from which the size is decided at compile time. In the case of a local variable this adds up to stack allocation size. So rules for what you can and cannot do with that fixed buffer is something which has to be enforced. It cannot be changed and the variable is directly bound to it, therefore it cannot be changed either..
And actually the difference between this array itself vs a pointer to it has been made even clearer in more modern programming languages, like Rust.
In Rust you can get a slice from an array and use that everywhere in your code. This slice is also technically described as a "thick pointer", because it internally contains both the memory address as well as the length of the actual array. But the idea is not so much different.
By understanding that this difference also actually exists in C even if it doesn't look like it does, it becomes harder to get confused by it.
People have already covered transitive vs. commutative, so I'll leave that alone.
However, as someone who writes both C and Fortran, both 0- and 1-based indexing make sense in their respective context.
Yes, in C the "first" element of an array is the one which isn't offset by anything, so arr[0]. For a systems language that makes sense.
Fortran was written with linear algebra in mind, so arrays are stored in column major order with 1-based indexing, because I want to translate the (i, j) notation to my program, where I want element M(i, j) to make sense.
Fair enough. Even when working with coordinates, I still prefer zero indexing since I like to think of the "origin" as 0,0
MATLAB also uses 1-based indexes and column-major order.
It's that 1st element that's the bugger.
Does the null pointer point to the first element of an array that starts at the beginning of virtual memory??
I just build my first chip8 emulator, and it has been the single most informative "low level" project I ever did.
The chip8 may be a virtual cpu, but it taught many topics like what assembly actually is, how the fetch-decode-execute cycle works, what a program counter does, etc etc.
If you read this, could you maybe do a video on how well such virtual processors compare to real hardware CPU's? :)
That would be interesting. I've heard there was a CPU that ran Java bytecode as its native machine language but it was unsuccessful as an alternative to virtual machines
There was some 8 bit simulator (can't remember the name), it had 4 registers A, B, C, D, and used square brackets for pointer dereference. It was essentially a Z80 with fewer instructions. In terms of speed it was obviously faster since it was being emulated on modern hardware but I'd hesitate to call it better since as far as I remember there were no bit rotates
@@williamdrum9899 that sounds really interesting! You mean like the ones described here: en.m.wikipedia.org/wiki/Java_processor ? I wonder how complex the java virtual machine actually is compared to something like a 6502. :o
@@Vancha112 JVM has more instructions. I think it's a stack machine so probably minimal registers
@@Vancha112 Yeah that's the one. Although I have no idea how it would work.
06:25 for more experienced C programmers, it's easy to illustrate this just by saying
#define x[y] *(x+y)
yes, assuming it's a byte array, otherwise
#define x[y] *(x+y*sizeof(whatever type you want to store))
@@louisauffretwhen adding an integer to a pointer in C, the multiplication by sizeof(T) is done automatically.
@somenameidk5278 so would manually multiplying it by sizeof have the same effect since sizeof is otherwise implied?
@@dspivey_music nope, it would be incorrect, the "implicit" sizeof is always applied
@@dspivey_music no, you would be multiplying it twice
1-based indexing is criminal
why?
1-based ordinals were the first mistake. We could actually keep the words "first" and "second" and just spell them "0st" and "1nd", but I guess it's too late now.
@@notdeep236 Making a for loop over an array leads to more operations with 1 based indexing (by checking for
@@_clemens_ okay okay for languages like c I would agree with all of this but a language like lua. why care? lua is not for the same things.
@@notdeep236 Not sure about lua internals, alsomost never used that. Also when a language is there, it can't be changed anymore for obvious reasons ;)
Strictly speaking an array is a unique type which decays to a pointer when passing it around to a function. You can see this because a sizeof on a local array r value gives you the total size in bytes of the array memory while a pointer just gives you the size of a pointer type.
It depends on whether you see a programming language as an abstraction of computer memory (0 based) or an abstraction of mathematics (1 based). What I like about C is you can have an array of struct, and as long as all the fields have a fixed length, than you can grab that block of sizeof(struct) * n as a continuous block of memory and copy or send it somewhere. It can save a lot of time over languages that make you access 1 element at a time.
Since when is mathetmatics 1 based? In a polynomial, which is one of the most common objects in math, we have to start at 0 (the smallest term in a typical polynomial is muliplied by x^0, not x^1). Not to mention, when solving equations, we often like to set things equal to 0, for easy equation manipulation. I don't think you can argue that mathematics is 1 based. You can argue that 0 can often be ignored, since there are many applications where 0 simply SHOULD be ignored, but you can't reasonably argue that it is 1 based.
@@simonwillover4175 This just is my memory of the 1 based vs 0 based programming language arguments I've heard over the years. I could have sounded less sure in my comment. I'm not an expert but you general hear this is the first element in ... rather than 0th.
@@Finkelfunk source for "CS literature tends to favor indices starting at 1"? i understand programming langs for math like matlab, wolfram, maple, etc. often are 1-based, but all the big general-purpose langs like c/c++, java, lisp, and their descendants are all 0-based. also see Dijkstra's argument for 0-based.
it makes sense to use the number 1 as "the first element of an array" but when you have a pointer that points to the start of an array the question is "how far away am i from the first element?" and the answer is always 0.
@@Finkelfunk Thanks for pointing that out. Personally, I believe that it doesn't really matter whether the indices start at 0 or 1. Even starting somewhere ridiculous, like -1, is okay, since it would just make most code more verbose.
However, I believe the 0 indexed system is superior due to the way rounding works. Consider the 24-hour time system. 5:12 means that 5 hours and 12 minutes have *already* passed since midnight. In fact, even in the AM / PM system, the minute part of the time (in this case, 12 minutes) refers to how many minutes have *already* passed since the start of the hour. Put simple, when we right time in hours and minutes (and seconds), the minute (and second) part of the time is 0-indexed. We start at "00 minutes" in a given hour (and 0 "seconds" in a given minute).
Zero indexing is also natural with counting. When we count numbers, we start with infinitely many zeroes on the left of the starting value, and as we count up, those zeroes start being replaced with non-zero digits. When we hit the carry limit, we set the current digit to 0, not 1. Imagine if we had a 1-indexed position format for numbers. We would count like this:
1
2
3
4
5
6
7
8
9
0
21
22
23
24
25
26
27
28
29
20
31
32
...
And each number would have an infinite string of "1" instead of "0"s on its left. This system would be ridiculous (in my opinion)!
In defense of Lua:
- Lua doesn't have arrays and almost everything except for primitives is a table (basically a map or well, an associative array) and you can make them start with 0, 1, 255, true, 3.14 or any string. It's just that it's a convention to start with 1 and most functions and assume that's where your integer-indexed table starts.
- In Lua you very rarely have to even use a syntax like array[1] as you can do iterations with pairs() ipairs(). If you decide to index directly there's an argument to be made that arr[#arr] gets the last element of the array. If you had them 0-indexed you'd always need to do arr[#arr-1].
All of this is not really big deal but in the end I feel like if the language isn't very low-level and operates on raw memory often 0 based indexing isn't an obvious choice.
tables themselves are associative arrays, it's why Lua describes them as having a "table" part and an "array" part, in actuality they are both the same thing, you're just using different keys to access different values that are stored in the same table, with integers being valid keys which allows you to write syntax like a traditional array :D
That's not quite right. Lua does have arrays. It's just that tables adapt to your usage. And internally Lua actually uses C arrays when your table is used solely as a 1-indexed array. It will only turn it into a hash-table internally if you deviate from that.
*_"In Lua you very rarely have to even use a syntax like array[1]"_*
That entirely depends on the requirements of what you're doing, and on the framework behind it. I use love2d most of the time, and I rarely use ipairs, because I'm usually using 0-indexing and/or doing performance taxing things because the default loop is quite faster.
*_" If you decide to index directly there's an argument to be made that arr[_**_#arr_**_] gets the last element of the array. If you had them 0-indexed you'd always need to do arr[_**_#arr_**_-1]."_*
Therein lies a problem that you didn't catch: the # operator only counts from 1. If you're 0 indexing, #arr will already give you the length-1, so your code is wrongly overcompensating. But the blame isn't really yours to carry, as the fundamental problem is that 1-indexing introduces traps like that into the language.
Ultimately you actually can't use the # operator with 0-index. If the array has < 2 elements, the # op will always report 0 length, but the real length could be 1 or 0, and there's no way to tell.
You also have to keep in mind that _ipairs_ assumes base 1, which is also a bit of a trap. And that's actually the main reason why I like avoiding ipairs.
This is actually a big deal. Not the worse thing, sure, but still somewhat of a big deal, because it's error prone and annoying. I'll just copy-paste below the comment I just posted on the video, where I tried to lay out some issues succinctly:
There's also a lot of indexing math that you have to do yourself that only works if the arrays are 0-indexed. If you are making a platformer game, you'll have a 2D array of tiles for the levels, and you'll certainly use "index = x+y*width" or "x = i%width" and "y = i/width" to access the tiles. None of that works with 1-indexing unless you spend some time figuring out how to -adapt- overcomplicate the math.
I've talked about this with a lot of people over the years, and I've seen many people who confuse indexing with counting, and also many who think 1-indexing is just something you get used to and it becomes a complete non-issue.
It doesn't, ever. You just learn to live with it.
It's not the worst thing, to be fair, but it's a perpetual rock in your shoe. While Lua (and also Julia) actually allows you to easily 0-index arrays, realistically you won't do that with every single array you ever create, because the language itself pushes for base-1. If you create an array literal, like "a = {1,2,3}", it will be naturally 1-based. The # operator only counts the elements from 1. The _for_ loops include the upper limit, because Lua expects you to loop from 1 to limit, not from 0 to limit-1.
All of this plays a part in making it quite annoying and very prone to human mistakes.
- You have to worry about not forgetting to -1 the for loop limits when looping from 0, or you get an extra iteration that can cause problems.
- Sometimes you have to waste time thinking whether you should 0-index an array or just let Lua have it its way. I've had times I chose the latter, only to then regret it and have to waste even more time carefully changing my code to accommodate to 0-indexing.
- Your code becomes inevitably inconsistent, with some 0-based arrays and some 1-based arrays, and then you have to be extra careful to keep in mind the ones that are 1-based, because you might have to +1 or -1 whatever variable carries the index.
- It's harder to do utility functions that deal with arrays, because you can't predict the base of the arrays users might throw in there, and you have to waste more time making them work for both.
- It's harder to port code to and from Lua. It requires extra care and attention, because loops will need corrections, arrays may or may not need to be made 1-based, and consequentially some code may need to account for that, etc. And then if the code isn't working, you have to double check all of the above on top of double checking if the translation is correct.
I've been coding in Lua for about half a decade, and that's been my experience. Lua is actually a brilliant language, maybe my favorite ever, but this was a really unfortunate design decision that I wish had never happened.
My initial months with Lua (not a beginner programmer), were also quite confusing. It took me quite some time to figure out when I should 0-index and when I shouldn't, and to this day, sometimes I'm still not 100% sure in all cases until I try one of them.
@@skaruts couldnt have said it better myself. i actually also did not know that optimization you mention in the beginning, with using an actual C array until you use the table like an associative array which then turns it into a hashtable.
i too really love Lua, it is definitely my favorite language, and the 1-indexing assumption most built-in Lua functions have is irksome.
however, you do have a distinct advantage in Lua in that you can *override* these built-in functions and make them work for both 0 and 1-indexing, which helps to address many of the problems you bring up.
the other main things i dislike about Lua is the lack of a continue statement and no typing, i think those were not good decisions to make either. ultimately, though, since Lua is free and open source and has reasonably relaxed licensing, you could actually make whatever changes to Lua you like for your own use or even to ship into other products with and i think that's really cool :)
@@Templarfreak I tend to avoid tampering with the standard stuff, because I could forget that I did it. But yea, you can still create your own variations of it. The flexibility of Lua is actually one of my favorite things about it.
Also, you can use goto if you really, really need a continue. I think it's usage is discouraged, but I've used it when porting code that used continues with very complicated if statements I didn't want to mess with.
for ... do
if complex_condition then
goto continue
end
-- code
::continue::
end
end
@@skaruts yeah, this is like the only way that i know of that you can use to get a continue-like statement using a goto, which i do all the time. i think this particular use-case of goto is perfectly fine. it still sucks that we dont have a more proper solution, though.
in some cases, tampering with the built-in functions is also a necessity, though, if you want to implement your own types then certain functions would benefit from being overridden. for example if you want the built-in type function to return the correct value then you have to override it because Lua does not provide a better method of doing so. also by default all usertypes you define C-side that you expose to Lua will always just be considered a usertype by the type function and Lua in general, which may not be appropriate depending on your situation.
Lua actually DOES have a 0th element to their arrays! it's just that all the built-in Lua functions that iterate over arrays all start at 1. you can access 0 perfectly fine with your *own* code, though, because they are simply associative arrays with integers as valid keys, which means 0 is a valid key for an index of an array as well. also, the funny thing about those built-in functions working in that way is that you can also override built-in Lua functions, tables, etc :D
Yes!
local arr = {[0] = 20, 21, 22, 23}
for i = 0, #arr do
print(arr[i])
end
The trick is Lua doesn't have arrays! (well it does but that's niche). They're all hash tables so you can just as easilly index -2 billion as you can 0 and start from 150. You can even index starting from "porkypie" if you want. iirc you need to use strings and userdata to get actual arrays. userdata is C binary
@@Mallchadi havent totally fact-checked this yet but as it turns out if you do just use integers as keys Lua will actually initially only make your table an array on the C side until you use something else as a key for it which it will then create the hashtable part of your table
@@Templarfreak damn, that sounds very cool
actually the type of an array is indeed an array (in your case it's 'int[4]'). But it decays to a poitner when used in an expression.
There are 3 cases where it doesn't decay into a pointer:
1) sizeof( my_array )
2) &my_array
3) typeof( my_array )
Yep though the third is an extension
Just as a reminder, sizeof(my_array) can't be used like this:
int getArraySize(size_t* my_array)
{
return sizeof(my_array);
}
Because then it will just give you the size of a pointer on your machine
This is, of course, frustrating to deal with when trying to pass arrays, especially multidimensional ones, between functions
@@natnial1 added to C23
My father was interviewing for his second job, and was asked this very same question. He got it right and the job. The guy that asked about arrays/indexes wrote the companies P&L system and used this in someway for a radix tree and my dad ended up taking the project over.
Love the way that different folks say that arrays are and arrays aren't pointers. Lots f confusions about the meta-confusions about the distinctions of cognitive and standardisation levels.
A non assignable 'pointer'. I love the woosh those non-lvalues make as they fly by.
No comment regarding Lua, but Fortran defaults indexing to start with 1, however it can be changed by the programmer. So, yeah you can do some insanely serious number crunching (as many still do) in Fortran and a default 1 indexing. ; )
I learned this stuff on accident while learning about vesa video modes and directly writing to vram. pushing qbasic to its absolute limits and breaking out of it really taught me a lot when I was starting out.
Thanks for the debunk, i also thought it was linked to arithmetic instead of parsing. 0x7f info is also quite relevant!
15 years programming in C/C++ and I didn't know that basic trick. Amazing! Thanks!
I always enjoy watching these shorts, so keep them coming.
Fun fact: as the index into an array is (usually) a signed integer, as far as the C compiler is concerned, 0 is the midde of the array, not the beginning.
This actually becomes quite useful for people who do systems programming in C and who need to access hidden bits in system structures, especially if you're doing bare metal programming.
Interesting. So the array can have metadata before element zero in this setup?
@@williamdrum9899 For instance.
But it could also be that a function returns relative indices in an array that was passed to it as a pointer. Items to the left will have negative relative inidices and items to the right will have positive indices).
The compilere does not force you to use a zero or a one as the first index in an array. As far as it is concerned the moment it needs to do something with an array, it will add the index to the pointer to the start of the array. Remember: subtracting is just adding with a negative (2-complement's) value.
So:
int *p = NULL;
int a[100]; /* Let's assume for brevity's sake that this array is actually initialized */
int b = 50;
int v;
then:
v = a[b - 3];
is equivalent to
p = &a[b];
v= p[-3];
and:
p = &a[b];
v = *(p - 3);
I'm not saying that this is always good practice, but the many, many ways one can go about referencing an array (or any other object that is fundamentally a pointer under the hood) and its contents, simply warms my heart ;-).
No, not true.
0 is the start of the array. No space is allocated before 0.
Sure, you can potentially do negative indexing, but that would be illegal.
You might as well say that all arrays are huge because even if you declare a 3 element array you can still attempt to access the 20,000th element (and likely trigger an exception on any system with an MMU).
@@williamdrum9899 If you use malloc to allocate space, then in integer and a pointer (usually) are stored before the space itself to store the information required by the free() call.
@@cccmmm1234 You are confusing the convention with how the C compiler treats arrays under the hood.
An OS or firmware may put boundaries on the memory you are allowed to access, but the C compiler does not care about that, nor does the C language specifically say an array should be 0-based or that indices in an array should always be 0 or positive.
For instance:
//------------------------------------------
char *p = "Hello World!";
char *q = NULL;
int i, n = strlen(p);
q = (char *) malloc(n + 1);
p += n;
for(i = 0; i < n; i++)
{
*q++ = *p--;
}
q[n] = 0;
//------------------------------------------
is functionally equivalent to:
//------------------------------------------
char *p = "Hello World!";
char *q = NULL;
int i, n = strlen(p);
q = (char *) malloc(n + 1);
p += n;
for(i = 0; i < n; i++)
{
q[i] = p[0 - i]; // Remember p points to the last non-nul character of the string
}
q[n] = 0;
//------------------------------------------
In C, strings are merely character arrays. By convention we assume 0 as the start of the array, but there are circumstances where a function may return a pointer to a portion of memory where the "left hand side" (negative index) contains data we may want to use as well as the "right hand side" (positive index). In the above example, after the initial loop, p points to the last non-nul character in the array, but not to the very last character in the array (which is the nul-character). In other words: we have valid data both on the left side of p and on the right side of it. p[0] contains the exclamation mark, p[-1] contains 'd', and as mentioned before p[1] contains the string terminator.
Lua has tables instead of arrays, its like a dictionary, the index are actually keys and values are values assigned to that keys, also lua stores tables in heap and not stack and its size is dynamic, thus it is very possible for a table to be like {9: "9th", 5: "5th", "aString": "AStringValue"}, and when you iterate through it with pairs method, it goes from 9 key to "aString" key.
Turbo Pascal string arrays back in the day was fun; 0 holds the length of the string, 1 is the first character. Now just don’t think to much about text longer than 255 characters, such thoughts are illegal :)
i completely forgot about the funky array accessing syntax. i usually do pointer math rather than use square brackets.
Yep all pointer arithmetic occurs in this fashion (and array style dereferencing is just that with some added syntactic sugar),
this is also why pointer arithmetic isn't allowed with void pointers - it doesn't "know" the size/alignment of the underlying data.
Why can't it just default the size of the data to 1 bit or 8 bits? That would be a pretty understandable thing. Or maybe 64-bits, since most systems use 64 bit memory addresses.
@@simonwillover4175 void pointers are intentionally defined as "typeless" so that they may be used to abstract away the underlying type it's pointing to.
Assigning any default size is going against that, if you want to inspect the memory byte-wise you can always cast (void*) to (char*) - since their alignment is guaranteed to match.
Also bitwise memory access isn't a thing afaik, memory granularity is generally on a byte scale.
@@natnial1 Yeah. If the bitwise memory access was a thing, it would just compile into an inefficient mess, probably.
Lua is a nice and simple scripting language, but it it's good to understand that it has a ton of Pascal (which has almost the same control statements) and VB style design in it, and all those languages have 1 based indexing or they even mix things up.
VBA and COM interop stuff on Windows are the worst actually. I have had a lot of headache moments in the past programming code around spreadsheets that have their first cells start at row 1 and index 1 while I started from 0 as I am used to. 😤
In Pascal, strings indexes start from 1. "But where is the zeroth element?" - 0th element stores the size of the string.
I like this, but it has a downside. Let's say I store these two strings:
(7) "go home"
(13) "Don't go home"
Now if these were null terminated I could just store the second string, and still print the first with a little pointer arithmetic. With a pascal string you can't really do that
Lua doesn't use arrays, Instead, it uses tables, which are a more abstract data type separate from arrays (though simple tables are represented as c-arrays under the hood). Lua using 1 as the first index in a table isn't necessarily 'incorrect', just different. Since tables in Lua also function as trees, dictionaries, etc., you can start a table at index '0' and implement a custom iterator function to simulate how arrays work in other languages. I do still agree that all array-like structures should start indexing at 0 just out of convention alone, but it's not wrong in any way to index from 1 in Lua's case.
Example:
--// Custom iterator
local function zpairs(t)
local i = -1
return function()
i = i + 1
if t[i] ~= nil then
return i, t[i]
end
end
end
local tab = { [0] = 1, [1] = 2 } --// Table indexed from 0, will not work with ipairs function.
--// Using the custom iterator
for i, v in zpairs(tab) do
print(i, v)
end
--[[ Expected output:
0 1
1 2
]]
I kinda hate that so many people say lua doesn't have arrays or classes. It does!
a = {1,2,3}
Huh, I never knew basic tables were represented as arrays under the hood. Guess I should change my comment then, though, that still doesn't really change the fact that the actual name for this is a "Table", not an array in the Lua programming language. It still functions as a dictionary which keys increment from 1.@@skaruts
@@BeconIsYeck the name isn't very relevant, though. An array is simply _"an ordered series or arrangement"_ (google), and it can apply to lists or groups of things, like solar panels. The names we use are just conceptual distinctions for arrays with different functionalities. A Set is an array that excludes duplicates. A Deque is an array with a specific mode of access.
The name _"associative array"_ is often used to refer to Dictionaries / hash-tables / maps.
The Lua table can be made to work as any of the above and more.
I would like to make (what I believe to be) a few important points regarding 1-based indexing:
-It is not less optimal than 0-based indexing at a low level. Any optomizing compiler will simply use a pointer that begins 1 index before the start of the array. In fact, whenever your write a loop that contains an expression of the form myArray[constant offset + i], the base address used for the array is the normal base address + constant offset.
-It is not less natural than 0-based indexing. Both are arbitrary decisions. Just like pi is an arbitrary multiple of the circumference of a unit circle, 0 is an arbitrary offset into the array. Often it is more convenient to start at 0, but it is also sometimes more convenient to start at 1 or any other number of offsets, depending on the problem.
Overall, 0-based indexing is often most convenient. However, it is not objectively "better" than 1-based indexing. Most people are used to using 0-based indexing, of course, so it should stixk around for now. However, compilers also do plenty of things that seem less convenient or "natural" at a low level because they are more intuitive.
You can make the same argument for -17 based indexing. ;-)
I assume 0[pointer] compiles to the same as pointer[0] due to how array accesses are just *(array+index) internally.
It does. The reason you can write either one is that architectures access arrays slightly differently but are all capable of doing it, some cpus just need to take extra steps. For example, in MIPS Assembly you can only use constants as offsets for a memory load. If you want a variable offset you must add it to the array's base pointet first.
C arrays are arrays, not pointers . They are pointer-like types so their "value" is indeed the address of their content but if you try to get & myarray you'll get the same value as myarray meaning we got the address of the array.
Being a specific type allows typing of multiple dimension arrays because now you can reason about array of arrays (packed, no multi-indirection kind) . You could not do it if C had no "array of N objects of type T" type and everything was translated to pointers
True, but it'll really boggle you when you try to use _Generic and it matches every array passed to it as a pointer to the given type instead of an array of any dimension. It's just super annoying because it kind of reduces the utility of the functionality. I can't seem to determine if it's a bug in gcc or if that's accurate to the standard, but I don't like it either way.
@@anon_y_mousse i really don't know, didnt use these features a lot ^^
Sees the thumbnail "yeah of course that works."
Like it just logically makes sense your accessing the array ptr bytes in from 0 thats just accessing the array again
Indeed, the explanation doesn't make sense to me. If you are indexing from 0 using array syntax I would expect that the 0 would be treated as a void*, so the compiler wouldn't multiply any type size, since that's unknown, and just work with raw bytes instead.
1-based indexing is not evil nor incorrect. That just happens so C-style arrays can work with math better if they start at 0.
Also, nerd font is broken
"*(array + index)" is also valid in C since an array is simply a sequence of memory and you access each item by their memory address
This is actually the same as writting array[index], I've always seen the array brackets as another dereferencing method.
You can do pretty weird stuff with that, f.e:.
typedef struct {
int x, y, z;
} Vec3;
void printFoo(Vec3* foo) {
printf("x = %d
", foo->x);
printf("y = %d
", *((int*)foo + 1));
printf("z = %d
", ((int*)foo)[2]);
}
Those dereferencing methods are completely valid, as you always interpret a block of memory.
Pascal's array are based since you can define an array from 2018 to 2020 for example. I haven't seen this feature in other lenguages.
Yeah, I was going to mention Delphi which naturally can do this too.
@@FinkelfunkI think he means an array of sized 3 where the indices are just 2018,2019,2020. Afaik you can replicate with a hashmap / dictionary.
@@bayzed Indeed you can, assuming you’re prepared to accept the performance hit.
Lua can also do this. It just starts at 1 by default.
Technically, if you look closely in just the right way, you’ll see that arrays have the type of array, not pointer. (Big example is with `sizeof`, but there are others). It’s just that they’ll decay to pointers very easily.
I have long wondered why arrays started with zero, this was a good answer. I used to think that we just didn't have any reason to waste that 0th index, so we used it haha. Also that i[a] thing is very cool I didn't know that could work!
1:41 can tell it’s a stack based variable because the address starts with 0x7F on a 64-bit architecture
index 0 exists in lua, it is used to say "invalid index". since it can't use -1 like in C-like languages for thinks like indexOf. (since -1 is a valid index in lua)
I always find your videos clear and easy to understand. Thanks for another one!
Wow, this video is incredibly helpful to understand how arrays actually work!
Fun fact: you can malloc an array, feed it with assembled instruction, and execute it. Unless you're using linux-hardened kernel or similar
It's a great trick but you have to be careful when writing the assembly for it. Use relative offsets for jumps, and absolute addresses for calls. Otherwise you end up just executing the original code in the former and risk a program counter escaping in the latter
great explanation. additional👍 for mentioning that 7 in address is related to stack.
basically, `array` is a pointer, `array[0]` gets the value @ address array+0, `array[32]` gets the value @ address array+32
array[index] produce memory address like:
array_base_address + index * sizeof(array_element_type)
The real question is why do we NOT zero index EVERYTHING
Maybe because of ancient numerals didnt had zero, like roman numbers dont have a zero at all.
[commenting this before watching the video]
It makes sense - the array is a pointer to a block of memory and you're adding x times the size of whatever is in there. And since addition gives the same result in both directions, you can index x with the pointer and still be correct.
Pascal is beyond your understanding
Array can be indexed as -int32 to +int32
So basically you can index an array from -int32 number
I was on stream when this topic was discussed haha.
3:47 "Plus the size of the array" should be "plus the size of an element in the array"
The way I look at it is just... The Index -1, which is something you have to note sometimes in loops
array starts from zero because it's reduces the time of calculating the address.
the formal is:
base address+index * sizeof(ex int)
if it starts from 1 not zero
the formula would be
base address +( index -1 ) * sizeof(ex int).
Finally got some configured vim with plugins. Tho writing code in raw vim is also pretty dope.
How do you know the memory start with 7F is on stack section ?????
That's some cursed information that will live rent-free in my brain!
Well done -- Readable text size! nice!
I get why 0[myarray] works, but it really should't.
Shorter answer why 0[a] works: Arrays in C are just syntactical sugar. You can make the compiler do the very same thing without ever using array syntax in C. a[x] is just nicer way of writing "*(a + x)" and that's why a[x] is the same as x[a], as addition is commutative (a + x = x + a)
Honestly it would make sense to start at 0 because of -1, which points to the end of the array, but if arrays started at 1, it would be pretty wierd (you would use 0 instead)
In C the -1 element is not the end of the array but the element before the address pointed to by the array pointer:
int myarray[] = {1, 2, 3, 4, 5, 6, 7, 8};
int *myarrayptr = &myarray[5];
printf("%d
", myarrayptr[-1]);
will display the number 5 as myarrayptr is pointing to myarray[5], which contains 6, and the element before it is myarryt[4] which contains 5.
Similarly printf("%d
", myarrayptr[-5]); will print the value of myarray[0] which is 1.
C has no array bounds checking (you are supposed to know what you are doing) so you can quite happily run off _either_ end of any array you've defined. This was used in des.c (which did the [Lucifer] DES encryption, as used by unix password encryption back in the 1980s): it defined two arrays L[] and R[] next to each other and effectively merged them into a single array for processing by using the first array defined (L) until it specifically wanted to use the two halves (Left and Right) separately.
Nifty didn't know that was a thing.
very cool.
What is an array? A miserable little pile of offsets! But enough talk -- have[ye]!
When you are using a thing you should not be forced to think what's going under the hood. As the creator of an array you should be able to decide what indexing you want to use. When I started to learn programming in school they used Pascal to teach and Pascal has this interesting quirk - it can have an array with any arbitrary integer indexing, any step. You can have an array that starts at 100, and then goes 105, 110, 115 and so on. Indexing is an interface and the interface user should decide how does the interface work. You should be able to choose any indexing you want.
I love pointer arithmetic, as soon as you start interpreting everything as a chunk of memory, instead of arrays, structs,... , the possibilities get endless.
For example:
typedef struct {
int x, y, z;
} Vec3;
void printFoo(Vec3* foo) {
printf("x = %d
", foo->x);
printf("y = %d
", *((int*)foo + 1));
printf("z = %d
", ((int*)foo)[2]);
}
Always remember arr[i] is equal to *(arr + i). And the index always increments by the sizeof() the datatype (int, char, ...).
This is valid too:
int a = 0xAABBCCDD;
int b = (int)(*((char*)&a + 2));
printf("%x", b);
Which will print BB, because you only take one byte (char) out of a 4byte integer, as you interpret the integer memory as char.
Pointers are amazing 😅
I found the explanation of why 0[arr] works confusing… explaining how arr[0] works and _then_ how 0[arr] works would’ve made it clearer for me I think.
Great video nonetheless!!
Are you gonna bring back low-level code reviews? I have a great project you could feature
1-based array indexing is much better (totally not rage bait)
>:(
The lack of an argument when coming with an opinion speaks for itself ;)
5:40 I can’t understand how this holds true for any index that isn’t 0, like I don’t see this working with an index of 1 since with 1[array] => *(1 + array), array is the non pointer type that gets upgraded to an index which would leave us with *(1 + array * 4), which isn’t what we want at all
If you work with PLCs some platforms let you choose whatever arbitrary array bounds you want
Pointer arithmetic and array to pointer decay are one of the best features of C, contrary to what some C++ fanatics would suggest
have fun debugging bro lol
Lua doesn’t actually have arrays though. It has tables, which are dynamically sized associative containers that can be keyed using almost any data type. In other words, you can think of a Lua table as being like std::map. As such, you *can* use 0 as a key if you want. However, convention is that you don’t.
I have a soft spot for Lua but I do wish the arrays were 0 indexed like they should be lol. Either way, arrays in Lua are insane abstractions that you can index with basically anything, iirc you can do it with a string or function or whatever you want lol
This could also be explained by pointer arithmetic being the same as array arithmetic. In pointers you usually do *(p+i) being “i” the index. This said, you can also do *(array + i) and it would still work, as p[i] also works.
Pd: just finished watching the video and you explained this, must watch all the video before commenting hahaha
Under the hood, i is being multiplied by the size in memory of the variable type in both occasions as you explained in the video
didn't expect NVChad here
Have been programming since years. But didn't have an idea on this thing.
I feel like 1-based indexing is superior. It is way more intuitive and tbh it also makes more sense when thinking about memory. It is the first part of the allocated memory for the array. Yes, when skipping over to other elements you then multiply by the index-1, but that can’t possibly be a problem for performance or security, right? I feel like 0-based indexing is just a flex of programmers on other people.
Arrays are just pointers in memory to a start, that span an x amount of elements. A pointer + (any intergral value or address) = an address (pointer arithmetic hmm yes). Memory is funny, and when we want something we just ask for the address the value starts at. Oh yea we know that we take 4 bytes because it is an integer. So the datatype * (how many items) desides the span, the index * typesize + array pointer will be the actual thing you want. Oh yea just read an x amount of bytes starting from there (where x is the typesize). Tadaaaaa, you have successfully buffered an integer into memory. Incredible yes. I always try to explain to people that index 1 and position 1 are two different things. They do not seem to understand...
But why does the compiler allow the second syntax? What's the point? And would it work with multi dimensional arrays?
Yes, because of the way C does multidimensional arrays. Though, an array of pointers doesn't qualify as a multidimensional array, and in general you shouldn't do it, so don't.
I never understood these until i started to learn assembly
I had the same nvchad visual bug not showing the bar correctly
Summary:
* indexing is actually a commutative operation
* an array (or vector) is actually just a pointer; if we some array, named `items`, the compiler represents `items` as a pointer; the compiler does know that this pointer is pointing to a list of data, rather than a "single" piece of data, but it treats most pointers just like they are numbers; this is done this way for the sake of simplicity, really; there is no need to differentiate pointers to lists, pointers to single pieces of data, numbers, and booleans in certain contexts!
* `array[index]` is actually a shorthand for `*(array + index)`; this accesses the value at the "location" of the sum of `array` and `index`; really, `(array + index)` is just another pointer, to a specific piece of data, and pointers can simply be represented by numbers;
* well, addition is obviously commutative; therefore, anything that uses addition in the right way also has the opportunity to be commutative; in our example (of array indexing), the addition is used commutatively; notice that we can swap `array` and `index` in the code: `(array + index) == (index + array)`; this equality obviously holds under an unary operation, such as `*`: `*(array + index) == *(index + array)`;
* we can see from the previous conclusion that our indexing shorthand is also commutative: `array[index] == index[array]`;
Thank you for your C explanation and your time. It would be interesting to see C++ also. In “modern” languages like Go or Rust the classes were cut off cause they decrease of code execution speed and they use structs like replacement. What do you think about? Is it affect on code execution speed.
Classes use what's called a "vtable" which means they store a function pointer. The youtuber Creel makes a great video explaining it called "Object Oriented Programming is a Dirty Rotten Low-Down Trick."
In short, every class object has a hidden variable - a pointer to its "version" of a polymorphic function. This means you have an extra pointer to dereference.
Now, this isn't always a bad thing. In fact, this "polymorphic" style is very important in system calls on many 80s computers, to maintain compatibility between different firmware versions
7f is the heap and executable space, on Windows, most of the time. Is it the stack on Linux? I didn't know that. Usually stack addresses are much lower for me.
Type of the array in C is array, not a pointer. Array type degrades to the pointer when operated on it, basically like when you assign integer to float or function name to the pointer to function. You can prove that by taking sizeof of array and you will see that it is of size `basic type * count of objects`.
Don't forget the very cursed
int test[] = {1, 2, 3};
long tp = (long) test / sizeof(int);
int* cursed = NULL;
printf("%d
", cursed[tp]);
If you want to call it "index", then you should start at 1, per mathematical tradition and day-to-day experience: when you assign numbers to things -- which is one of the definitions of indexing -- you always start with 1; for example if you tell someone you live in the 4th house from the intersection you expect them to start counting from 1, not 0.
If you want to start at 0 then just call it what it is: an "offset".
You're confusing _counting_ with _indexing._ They're not the same thing, neither conceptually nor in practice. Consider these two arrays:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0] -- array with a 10 element count, indexed from 1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] -- array with a 10 element count, indexed from 0
The actual values are irrelevant, I just used them to illustrate the different indexing. As you can see the count is the same, regardless of the indexing.
In practice the indexing math -- that you need for, e.g., convert an index to an X, Y or vice versa -- will only be simple and straightforward if you're indexing from 0. I'm talking about things like this:
index = x + y * width
x = index % width
y = floor(index / width)
Pretty simple stuff. But if your array is 1-indexed then you'll have to waste time overcomplicating that math, and you'll probably gonna get it wrong too.
@@skaruts Why do you think I'm confusing them? All I'm saying is that in real life indexing (assigning numbers to objects) is TYPICALLY done starting from 1 and counting up. You can show 3 shirts to a friend and tell them: "this is 1, this is 2, this is 3, which one do you think looks best "? Of course you can also say "this is 0, this is 1 and this is 2" or even "this is 5, this is 17 and this is 611" but your friend may find that odd.
That is also how it's TYPICALLY done in math. Go to Wikipedia and search for "Row and column vectors" and you'll see it. It's probably why languages like Matlab, Mathematica and Julia are also 1-based.
If you're talking about pointer + distance then I think "offset" is a much better name than "index".
There are tons of things in maths that are indexed from zero. Infinite cardinals, base vectors in spacetime algebra, polynomial coefficients, and so on.
@@vytah Sure. And the things that resemble arrays in programming languages the most (row vectors) are indexed from 1.
This explains why in the C implementation arrays start at 0, but the answer to the question "why were c arrays implemented that way (start from 0)?" is probably mainly because if they started from 1, you'd not only loose 1 index from the addressable integer range (which may not be much today with 32 bit or 64 bit integers, but if you are working on enbeded systems with bytes, especially in the old days, that's significant), you'd also have to check for both upper bounds (length) and lower bound (1) when accessing en element, instead of just checking that the index is below length.
0-indexing also simplifies the indexing math a lot.
index = x + y * width
x = index % width
y = floor(index / width)
None of that works with base 1. If you really wanted base 1, you'd have to overcomplicate that math, and it's actually quite tricky to get right. And if you're working with 3D grids, I don't even want to think about it.
@@skarutsIt is not really that complicated.
index = x + (y-1) * width
Just a wasteful subtraction.
@@atomgutan8064 hmm, that does work indeed (I've just tested it). It's actually simpler than I thought, but I personally wouldn't have figured it out.
What about the conversion from index to x,y, though?
@@skaruts
x = index % width
y = ((index - x) / width) + 1
again a wasteful addition
@@atomgutan8064that won't work. That will never point you to the last index of the array. In a 16x16 matrix, the last element is the 256th. If *_index = 256_* , then *index%width* is 0, which is incorrect. Well, it will break anytime *x == width.*
As for the Y, it's also wrong. If *x == width,* then that equation will break as well.
My Y was also wrong, as I forgot to floor it. For base 1 you might want to just *ceil(index/width),* perhaps.
But this is why I was saying this is quite tricky to get right.
my preferred way to index in c is array
MATLAB at least makes a justification of being based around matrices which starts indexing at 1. Not sure what LUA's excuse is though.
probably that it's educational or something. but I agree that if you are getting into programming you should learn 0-based indexing right away.
Bro I don't like C... but I still watch every thing and I don't know why. It's just so impressively difficult to understand C sometimes.
C is basically just a smidgen of syntactic sugar on top of what your CPU instruction set does (that's why it does not hide endianness, for instance, because that would be an awfully inefficient thing to do on CPUs that have the "wrong" bye order). It requires you to think at least a little bit about the hardware that your program runs on. Not too much, just enough to make you a really good programmer over time. If you are trying to avoid that level of knowledge, then you will always stay a mediocre programmer because performance is, whether we like this or not, hardware dependent. Not saying you can't get around low level stuff most of the time by using optimized libraries and LLVM, but nobody ever died from looking underneath the hood of their car.
Matlab and Dreamberd: Hold my beer
friendship with matlab over
I did not know that it is possible but in my opinion
it is some weird bug in compiler parser which is related
to token parsing priority.
Prove:
Ok, you said that:
a[i] = *(a + i);
i[a] = *(i + a);
When i compile:
int index;
int index0 = *(index + 0);
compiling fails, error: invalid type argument of unary '*' (have 'int')
but when I compile:
int index;
index[0];
compiling fails, error: subscripted value is neither array nor pointer nor vector
and then when I compile:
int index;
0[index];
compiling fails, error: subscripted value is neither array nor pointer nor vector
Which clearly states that id from lexer before brackets token can't be a number and compiler
specially checks that rule before doing any optimization. Compiler always checks variable type of token before [] otherwise compiling "index[0];" and "0[index]" and "int index0 = *(index + 0);" should generate the same error.
So in this case it is bug not a feature.
Lua better watch their back. Pissed of the whole gang.
Not to be a Rust soydev but yeah C's "the developer is always right" attitude has been disastrous
What are the Vim plugins that you're using in this video? They look awesome.
There's also indexing math that you have to do yourself that only works if the arrays are 0-indexed. If you are making a platformer game, you'll have a 2D array of tiles for the levels, and you'll certainly use "index = x+y*width" or "x = i%width" and "y = floor(i/width)". None of it works with 1-indexing unless you spend some time figuring out how to -adapt- overcomplicate the math (and I'm not sure it's even possible to make it work).
I've talked about this with a lot of people over the years, and I've seen many people who confuse indexing with counting, and also many who think 1-indexing is just something you get used to and it becomes a complete non-issue.
It doesn't, ever. You just learn to live with it.
It's not the worst thing, to be fair, but it's a perpetual rock in your shoe. While Lua (and also Julia) actually allows you to easily 0-index arrays, realistically you won't do that with every single array you ever create, because the language itself pushes for base-1. If you create an array literal, like "a = {1,2,3}", it will be naturally 1-based. The # operator only counts the elements from 1. The _for_ loops include the upper limit, because Lua expects you to loop from 1 to limit, not from 0 to limit-1.
All of this plays a part in making it quite annoying and very prone to human mistakes.
- You have to worry about not forgetting to -1 the for loop limits when looping from 0, or you get an extra iteration that can cause problems.
- Sometimes you have to waste time thinking whether you should 0-index an array or just let Lua have it its way. I've had times I chose the latter, only to then regret it and have to waste eve more time carefully changing my code to accommodate to 0-indexing.
- Your code becomes inevitably inconsistent, with some 0-based arrays and some 1-based arrays, and then you have to be extra careful to keep in mind the ones that are 1-based, because you might have to +1 or -1 whatever variable carries the index.
- It's harder to do utility functions that deal with arrays, because you can't predict the base of the arrays users might throw in there, and you have to waste more time making them work for both.
- It's harder to port code to and from Lua. It requires extra care and attention, because loops will need corrections, arrays may or may not need to be made 1-based, and consequentially some code may need to account for that, etc. And then if the code isn't working, you have to double check all of the above on top of double checking if the translation is correct.
I've been coding in Lua for about half a decade, and that's been my experience. Lua is actually a brilliant language, maybe my favorite ever, but this was a really unfortunate design decision that I wish has never happened.
My initial months with Lua (not a beginner programmer), were also quite confusing. It took me quite some time to figure out when I should 0-index and when I shouldn't, and to this day, sometimes I'm still not 100% sure in all cases.
I had to test, and actually it works.
I was hoping you'd explain how Lua works under the hood and how it differs.
As far as I know arrays in lua are what’s called a table. It is the single complex datatype after functions and c object data.
All things are handled via tables, there is nothing like tuples, dictionaries, lists or even classes.
You want to have OOP?
You have to realize it with tables.
Tables are simply key value pairs (skipping over modifications you can do with metatables). You can use everything as a key, a string, a number, even boolean values.
So there the numbering is not important for the underlying datastructure.
Lua is interesting under the hood. If your table is being used solely like an array with base 1, then it uses a C-array internally. If you deviate from that, then Lua will turn it into a hash-table internally. This is actually explained in Lua 5.1 book by the creator of Lua.
I presume that that means Lua has to correct your indices in some way under the hood, when the table is a C-array under the hood.
Honestly the best reason to use 0-indexing is that n%len is always in the array (assuming modulo and not remainder)
Aren't modulo and remainder the exact same?
@@atomgutan8064 remainders can be negative
@@atomgutan8064 for positive numbers yes, but for negative numbers it differs. For example: -1 mod 4 = 3, but -1 rem 4 = -1
can't remember the exact difference but no@@atomgutan8064
I think it has something to do with negative numbers
C is da best, I feel like writing in C feels very much like python. Hail the evergreen C langauge!
I one had to index arrays using floats, in a language that distinguished between floats and ints, truely terrible
Why did you have to index an array using floats?
@@ultimatedude5686 I was modding a game and was using its crappy api, the thing barely worked and was held together with duct tape and glue
@@hughjanes4883 Makes sense. My first thought was that if you're doing array indexing using floats something has already gone horribly wrong.
That's actually really intuitive but not something you think about
So 1[myarray] crash?
I presume it would be equivalent to a pointer to the next memory value after where myarray is starting, but first if myarray contain stuff that are not of size 1, I would get gibberish, and I the type of the array is kinda lost in my assumption