Happy new year everyone! Hope yours is off to a great start. Here are the exercises and a (very small) list of books I used to support this video - www.0de5.net/stimuli/what-is-the-everyday-list-really
The constant time array construction: go through the entire available memory and check one by one if a cell should be written to. Since the number of operations doesn’t change depending on the input size, this would be a constant time algorithm, but as you said it’s very impractical because in reality the constant factor will always be worse or equal to the linear multiplication of a O(n) construction
I forgot to add that your videos are amazing. The visualizations are beautiful and very instructive. A video on Forth in your style could easily be the best Forth video on YT. There aren't many good ones unfortunately.
I'd LOVE a video on Forth. It's such an interesting language. Lisp is cool too, those two really feel like a yin yang type pair. They're both really made from first principles and Forth starts at the machine, while Lisp starts at lambda calculus, which feel like opposite ends of some spectrum.
Definitely noted :) I have done some of the research already. One thing I discovered is that Chuck Moore (early on) had the internals of Forth so in his head that he would just reimplement it every time he worked on a new machine. And in fact, he didn't like code reuse very much at all! Lots to consider.
@ He's still around, chairing the Green Arrays company. They make a fairly interesting distributed computer architecture, slightly reminiscent of the Transputer, which also has a current descendant in XMOS.
@@neoeno4242 Love to hear it. Very interesting stuff. I saw a Strange Loop talk he did on his green arrays chips, fascinating stuff. I've heard that Forth is (like lisp) a write-once language and I guess he just leaned into that with his ideas about code reuse 😆
Great video as always! A fun fact I was reminded of by Kay's assembly implementation of the linked-list. In lisp, the functions to access the head and the tail of each element in a linked-list are called "car" and "cdr", which are shorthand for "contents of the address register" and "contents of the decrement register". An early example of implementation details leaking into an API!
It appears "part" fell out of that description at some point. The IBM 704 had 36 bit words, which could be subdivided into parts, of which those two were 15-bit (the address size). It's similar to the instruction and address parts in PDP-1. Later computers divided words into bytes.
That's not quite 100% correct. The first element is `car`, but `cdr` refers to the rest of the list. So if you have (1 2 3) then car is (1) and cdr is (2 3).
JavaScript's arrays are not really any of the mentioned structures though. Objects in JavaScript are basically hashmaps of members by string key, with a lookup chain for the prototype delegation, so lookup misses on the actual object are delegated up the prototype chain. Keys on Array typed objects that happen to be positive integers, as well as the "length" member., have special getter and setter behaviour which makes them act similar to actual in-memory arrays in closer to metal languages. I know the Opera browser, back when it was using it's own Presto engine, was the only mainstream browser that special cased integer keys to actually use a dynamic array backend, but that was a whole while ago and the other browsers might have optimised their behaviour for integer keys on arrays (or even any object, might be easier to just have a single bahaviour to go down for lookups) since then.
I checked cpython because I was curious, it uses this formula: `new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3; ` which gives sizes: 0, 4, 8, 16, 24, 32, 40, 52, 64, 76 It seems to multiply by 9/8, add 6, then round down to a multiple of 4
22:54 Python does have a prettier way of expressing that now, with pattern matching: match lst: case [head]: return head, [] case [head, *tail]: return head, construct(tail)
While this is nice, the typing specification is too limited for type-checkers to be able to tell whether this pattern matching is exhaustive (in fact, it would be because it doesn't handle the empty list case
@ Yeah, I would have done the empty case and skipped the single case (because that’s included in the other one, with an empty tail) but I wanted to follow the video version more closely.
Got sucked into trying a more pythonic way: I was able to split the construction function, changing the 'retrieve' to match, and raising an 'IndexError'. the retrieve could technically be made in the same number of lines as the original, but it's considered more "pythonic" to have the exhaustive "case _:" rather then completing the match and raising the error from typing import Any, Iterable def _LinkedList(item: Any, *rest) -> tuple: return (item, _LinkedList(*rest) if rest else None) def LinkedList(itr: Iterable): return _LinkedList(*itr) def retrieve(linked_list: tuple, idx: int): match *linked_list, idx: case head, tail, 0: return head case head, tail, idx if tail is not None: return retrieve(tail, idx-1) case _: raise IndexError(f"Failed to traverse linked list: reached {linked_list} at index {idx}")
For a way to do Array Construction in constant time... Assuming we are told to put specific values in it like list = [1,2,3] I think you could do the construction lazily. Only "really" construct it when we actually have to read/write. Perhaps for example you reserve the memory for the array, and use a variable to track how much of the array has actually been constructed. If we read/write to a location that is past what we actually constructed, construct up to that point and then do the read/write, and update the tracking variable. For the case of a hardcoded array literal, to know how we need to construct during the read/write... at compile time we can make a copy of what the array should look like when constructed, and refer to that as needed during runtime.
I think constant time array construction would just be specifying the bounds of an array and the size of each element. This is assuming that the array is not populated with any useful data, making the method impractical.
Modern Lisp code uses arrays (static or extendable) rather than linked lists for serious things. Obviously linked lists are still important and used for stuff like macros and the structure of the code itself, but it isn't the main data structure for storing sequences of data anymore.
And in case this doesn't get filtered like so many posts I make on RUclips, the way to have O(1) array construction is to do it at compile time so that the user of your program doesn't have to wait for said construction as it's already done in the binary. C strings are handled like that by nearly every compiler, for instance, and char *p = "string"; will usually point to constant space.
I've seen a claim that the golden ratio is optimal for sufficiently large resize numbers (i.e. large, non preallocated arrays) as a realloc ratio, which is interesting
Fascinating thank you (for both comments). I looked up that golden ratio point and found this which after puzzling I think makes sense - gist.github.com/jemmanuel/b8277e7922e9b9947e2f171cc85f1d01 . You may already know this but for others who don't - this optimises for being able to fit new bigger arrays into the space held by older, smaller arrays added together.
@@neoeno4242 Just as a point of pedantry, powers of two will fit most memory allocation strategies better than oddball sizes and some dynamic array implementations have deallocation strategies as well to scale down when huge gobs of memory aren't needed. A contrapoint is that most memory allocators will consolidate free blocks of memory as they continue processing requests and therefore the previous allocations won't actually go to waste. In a virtual memory setup, which is primarily all we have on desktops these days, the operating system may even reorganize what you have stored in RAM allowing non-contiguous blocks to be consolidated and remapped.
Iirc on x86 0 address in physical memory is the start of interrupt table and 0 address in virtual memory is never mapped to anything to catch dereferencing nullptr, but I can be absolutely wrong. Btw, awesome vid
Awesome video, thank you! You mentioned it slightly but I was wondering how it works when, in JavaScript, an "array" is actually a dictionary/hashmap. I hope you can do a video about HashMaps in the future (maybe the most common implementation is to just loop through the map and look for the key, but this seems very inefficient)
you hash the key and then, according to that hash, you look at one specific bucket of elements which will contain the element you're looking for. That's the gist of it. Ofc that 'gist' isn't nearly enough to end up with a reasonable data structure and in practice you'll need at least some amount of trickery but the core idea should be understandable i believe.
JS dicts also save the insertion order. So they are not really "just hashmaps" (it's more like an array of dictionary entries and a hashmap of key→entry-index associations).
I guess the address 0x000000 at the RAM is part of the group of bytes we call the "memory map" or "خريطة الذاكرة" in Arabic. Which holds important information about the memory. Such like the size of the full memory, the current address accessed and other more important stuff. Either that or: It holds the holy instance of the TERMINATOR Singleton we call "NULL" Or maybe maybe just maybe: A link to the next RAM start OwO
@@AK-vx4dy that's interesting cause when I used to go to a tech university, I learned there that memory maps are usually stored in the last few bytes, but I guess that differs from implementation to implementation. Either way thanks for confirming to me I really needed that.
@ It depends on CPU and platform. I was slightly trolling recalling Commodore C-64. I maybe not been specific, in case of C-64 adres 0 drives physical digital pins (like GPIO in Ardduino), which in result change working of address decoder workings physically maping ROM/RAM/IO to address space of CPU. Some CPUs start executing at 0 so at least at boot there must be some code (ROM). But maybe whole discusussion is obsolete, becuse most of our code now is user space in paged and virtual memory, so 0 can direct to anywhere or to nowhere (page not even allocated).
@26:22 This big O(tetha) notation is teoretically correct but in current times very missleading, it misses te fact of dobule memory access and non-continuity of elements in memory (you showed it like elements are adjacent like in array but it is often not true in real life) and in those times access to not adjacent address in memory is very costly.
Point of pedantry: you can have zero cost 1-indexed arrays by having the pointer to the array point to an imaginary element just before the array begins, rather than at where it begins.
I just looked through the source of Lua, everyone’s favourite 1-indexed language… it does have: return &t->array[key - 1]; …but this subtraction is in the midst of a sea of bounds checking and other stuff to the point where one operation would not make much difference.
@VaughanMcAlley lua is garbage collected? So it probably uses pointers to work out what objects are live within the heap. Point-before arrays would greatly complicate this!
I recall that this approach was taken when the Numerical Recipes book was "ported" from the original FORTRAN code into C (at least in the first edition). The authors took this approach to avoid re-writing all of the formulas in the book, because FORTRAN is 1-indexed.
Very nice video! But I'm not so sure that SOTA real-world dynamic array implementations use a grow factor > 2. E.g. folly uses one < 2 to work better with jemalloc behaviour.
Why is it that people are very keen to mention haskell, F# and Ocaml.. but next to never talk about SML 😅 or ML (meta language) it is the ancestor to all of them 😅 maybe not that used.. but still it has a place in history when i comes to type theory 🤔
[] is an array and only an array, in contiguous memory, with no idea of how long it is, how many elements it has, or that it's even anything other than a pointer... and you can't convince me otherwise :D
In C, arrays are not pointers. C knows how many elements an array has (for example, sizeof(int[10]) is 40, not 8, on an LP64 system). It is simply that arrays decay to pointers very easily (whenever passed to a function, or when accessed via the dereference operator), and that function declarations do a decay on their argument types for Reasons. Examples: void array_arg(int[]); void func_arg(int()); int main() { void (*p)(int*) = array_arg; /* array_arg's first argument has type int*, because that's what happens when you decay int[] */ void (*q)(int (*)()) = func_arg; /* func_arg's first argument has type int (*)(), because that's what happens when you decay int() */ }
Happy new year everyone! Hope yours is off to a great start. Here are the exercises and a (very small) list of books I used to support this video - www.0de5.net/stimuli/what-is-the-everyday-list-really
12:11 I didn't expect to get jump scared by a theta today lmao Great video btw
Haha same.
The constant time array construction: go through the entire available memory and check one by one if a cell should be written to. Since the number of operations doesn’t change depending on the input size, this would be a constant time algorithm, but as you said it’s very impractical because in reality the constant factor will always be worse or equal to the linear multiplication of a O(n) construction
I forgot to add that your videos are amazing. The visualizations are beautiful and very instructive. A video on Forth in your style could easily be the best Forth video on YT. There aren't many good ones unfortunately.
New year doesn't really start until Kay uploads a banger.
I really appreciate the exhaustiveness of your explanations.
I'd LOVE a video on Forth. It's such an interesting language. Lisp is cool too, those two really feel like a yin yang type pair.
They're both really made from first principles and Forth starts at the machine, while Lisp starts at lambda calculus, which feel like opposite ends of some spectrum.
Definitely noted :) I have done some of the research already. One thing I discovered is that Chuck Moore (early on) had the internals of Forth so in his head that he would just reimplement it every time he worked on a new machine. And in fact, he didn't like code reuse very much at all! Lots to consider.
@ He's still around, chairing the Green Arrays company. They make a fairly interesting distributed computer architecture, slightly reminiscent of the Transputer, which also has a current descendant in XMOS.
@@neoeno4242 Love to hear it. Very interesting stuff. I saw a Strange Loop talk he did on his green arrays chips, fascinating stuff. I've heard that Forth is (like lisp) a write-once language and I guess he just leaned into that with his ideas about code reuse 😆
babe wake up kay just dropped a new video
your animations are getting really good!!
Great video as always! A fun fact I was reminded of by Kay's assembly implementation of the linked-list. In lisp, the functions to access the head and the tail of each element in a linked-list are called "car" and "cdr", which are shorthand for "contents of the address register" and "contents of the decrement register". An early example of implementation details leaking into an API!
It appears "part" fell out of that description at some point. The IBM 704 had 36 bit words, which could be subdivided into parts, of which those two were 15-bit (the address size). It's similar to the instruction and address parts in PDP-1. Later computers divided words into bytes.
That's not quite 100% correct. The first element is `car`, but `cdr` refers to the rest of the list. So if you have (1 2 3) then car is (1) and cdr is (2 3).
Yay! Good to see you Kay!
Thanks for the ODES series👍
Kay rolls out new video, I hit the like button and watch :)
This channel is fantastic
JavaScript's arrays are not really any of the mentioned structures though. Objects in JavaScript are basically hashmaps of members by string key, with a lookup chain for the prototype delegation, so lookup misses on the actual object are delegated up the prototype chain. Keys on Array typed objects that happen to be positive integers, as well as the "length" member., have special getter and setter behaviour which makes them act similar to actual in-memory arrays in closer to metal languages. I know the Opera browser, back when it was using it's own Presto engine, was the only mainstream browser that special cased integer keys to actually use a dynamic array backend, but that was a whole while ago and the other browsers might have optimised their behaviour for integer keys on arrays (or even any object, might be easier to just have a single bahaviour to go down for lookups) since then.
I checked cpython because I was curious, it uses this formula:
`new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3; `
which gives sizes: 0, 4, 8, 16, 24, 32, 40, 52, 64, 76
It seems to multiply by 9/8, add 6, then round down to a multiple of 4
Looking forward to watching this, thank you for your work and research!
Thanking you most kindly K
22:54 Python does have a prettier way of expressing that now, with pattern matching:
match lst:
case [head]: return head, []
case [head, *tail]: return head, construct(tail)
While this is nice, the typing specification is too limited for type-checkers to be able to tell whether this pattern matching is exhaustive (in fact, it would be because it doesn't handle the empty list case
@ Yeah, I would have done the empty case and skipped the single case (because that’s included in the other one, with an empty tail) but I wanted to follow the video version more closely.
Got sucked into trying a more pythonic way: I was able to split the construction function, changing the 'retrieve' to match, and raising an 'IndexError'.
the retrieve could technically be made in the same number of lines as the original, but it's considered more "pythonic" to have the exhaustive "case _:" rather then completing the match and raising the error
from typing import Any, Iterable
def _LinkedList(item: Any, *rest) -> tuple:
return (item, _LinkedList(*rest) if rest else None)
def LinkedList(itr: Iterable):
return _LinkedList(*itr)
def retrieve(linked_list: tuple, idx: int):
match *linked_list, idx:
case head, tail, 0:
return head
case head, tail, idx if tail is not None:
return retrieve(tail, idx-1)
case _:
raise IndexError(f"Failed to traverse linked list: reached {linked_list} at index {idx}")
21:01 My first computer encounter was with a Soviet clone of PDP. I guess address zero contained „don’t steal - invent!“
For a way to do Array Construction in constant time...
Assuming we are told to put specific values in it like list = [1,2,3]
I think you could do the construction lazily. Only "really" construct it when we actually have to read/write.
Perhaps for example you reserve the memory for the array, and use a variable to track how much of the
array has actually been constructed. If we read/write to a location that is past what we actually constructed,
construct up to that point and then do the read/write, and update the tracking variable. For the case of a
hardcoded array literal, to know how we need to construct during the read/write... at compile time we can
make a copy of what the array should look like when constructed, and refer to that as needed during runtime.
I think constant time array construction would just be specifying the bounds of an array and the size of each element. This is assuming that the array is not populated with any useful data, making the method impractical.
What did you use for the visualisation of your pretend computer? I wish I'd had something like this when lecturing low level programming.
Modern Lisp code uses arrays (static or extendable) rather than linked lists for serious things. Obviously linked lists are still important and used for stuff like macros and the structure of the code itself, but it isn't the main data structure for storing sequences of data anymore.
All array construction is technically constant time on an architecture with fixed width pointers. There’s a limit to the size of an array.
Another phenomenal video!
However, explanation how computer/cpu may do it, excelent, great job !
Happy Christmas 🎄🎁🎄🎁 and happy New Year 🕛🎊🎊🎊
Lecture of theta got me by suprise ;)
Hi Kay! Happy New Year!!
And in case this doesn't get filtered like so many posts I make on RUclips, the way to have O(1) array construction is to do it at compile time so that the user of your program doesn't have to wait for said construction as it's already done in the binary. C strings are handled like that by nearly every compiler, for instance, and char *p = "string"; will usually point to constant space.
I've seen a claim that the golden ratio is optimal for sufficiently large resize numbers (i.e. large, non preallocated arrays) as a realloc ratio, which is interesting
Fascinating thank you (for both comments). I looked up that golden ratio point and found this which after puzzling I think makes sense - gist.github.com/jemmanuel/b8277e7922e9b9947e2f171cc85f1d01 . You may already know this but for others who don't - this optimises for being able to fit new bigger arrays into the space held by older, smaller arrays added together.
@@neoeno4242 Just as a point of pedantry, powers of two will fit most memory allocation strategies better than oddball sizes and some dynamic array implementations have deallocation strategies as well to scale down when huge gobs of memory aren't needed. A contrapoint is that most memory allocators will consolidate free blocks of memory as they continue processing requests and therefore the previous allocations won't actually go to waste. In a virtual memory setup, which is primarily all we have on desktops these days, the operating system may even reorganize what you have stored in RAM allowing non-contiguous blocks to be consolidated and remapped.
Time for me too learn again
Iirc on x86 0 address in physical memory is the start of interrupt table and 0 address in virtual memory is never mapped to anything to catch dereferencing nullptr, but I can be absolutely wrong.
Btw, awesome vid
Awesome video, thank you! You mentioned it slightly but I was wondering how it works when, in JavaScript, an "array" is actually a dictionary/hashmap. I hope you can do a video about HashMaps in the future (maybe the most common implementation is to just loop through the map and look for the key, but this seems very inefficient)
you hash the key and then, according to that hash, you look at one specific bucket of elements which will contain the element you're looking for.
That's the gist of it. Ofc that 'gist' isn't nearly enough to end up with a reasonable data structure and in practice you'll need at least some amount of trickery but the core idea should be understandable i believe.
@@Musikvidedo I think Modern Dictionaries by Raymond Hettinger (p33CVV29OG8) is an excellent explanation of the practical tricks.
JS dicts also save the insertion order. So they are not really "just hashmaps" (it's more like an array of dictionary entries and a hashmap of key→entry-index associations).
I guess the address 0x000000 at the RAM is part of the group of bytes we call the "memory map" or "خريطة الذاكرة" in Arabic.
Which holds important information about the memory.
Such like the size of the full memory, the current address accessed and other more important stuff.
Either that or:
It holds the holy instance of the TERMINATOR Singleton we call "NULL"
Or maybe maybe just maybe:
A link to the next RAM start OwO
Exactly, 0x0000 is port (in 6510 CPU) in Commodore C-64 wich controls memory map :)
@@AK-vx4dy that's interesting cause when I used to go to a tech university, I learned there that memory maps are usually stored in the last few bytes, but I guess that differs from implementation to implementation.
Either way thanks for confirming to me I really needed that.
If im right, the first byte in the x86 address space is the first byte of the divide by zero IVT entry. That is what OSDev says, atleast.
@ It depends on CPU and platform. I was slightly trolling recalling Commodore C-64.
I maybe not been specific, in case of C-64 adres 0 drives physical digital pins (like GPIO in Ardduino), which in result change working of address decoder workings
physically maping ROM/RAM/IO to address space of CPU.
Some CPUs start executing at 0 so at least at boot there must be some code (ROM).
But maybe whole discusussion is obsolete,
becuse most of our code now is user space in paged and virtual memory, so 0 can direct to anywhere or to nowhere (page not even allocated).
@@CripsyFries IVT=Interrupt Vector Table, list of addresses where CPU jump when hardware or software interrupt occur.
In so called Real Mode at least.
@26:22 This big O(tetha) notation is teoretically correct but in current times very missleading, it misses te fact of dobule memory access and non-continuity of elements in memory
(you showed it like elements are adjacent like in array but it is often not true in real life) and in those times access to not adjacent address in memory is very costly.
is it theoretically correct? my mind jumps to Ω(1) and Θ(n) in these cases
@kekaci You have very efficient brain 🙂
hell yeah
Point of pedantry: you can have zero cost 1-indexed arrays by having the pointer to the array point to an imaginary element just before the array begins, rather than at where it begins.
I just looked through the source of Lua, everyone’s favourite 1-indexed language… it does have:
return &t->array[key - 1];
…but this subtraction is in the midst of a sea of bounds checking and other stuff to the point where one operation would not make much difference.
@VaughanMcAlley lua is garbage collected? So it probably uses pointers to work out what objects are live within the heap. Point-before arrays would greatly complicate this!
I recall that this approach was taken when the Numerical Recipes book was "ported" from the original FORTRAN code into C (at least in the first edition). The authors took this approach to avoid re-writing all of the formulas in the book, because FORTRAN is 1-indexed.
How to get constant time complexity for array creation: read and write all of RAM (a constant)
Very nice video!
But I'm not so sure that SOTA real-world dynamic array implementations use a grow factor > 2.
E.g. folly uses one < 2 to work better with jemalloc behaviour.
Interesting
Every body knows that at adres 0 is I/O port in C-64 and thats a fact!
GO KAY
Forth please!
5:46 this simple model of memory allocation happens to be how the HERE pointer in Forth works.
Engaging!
12:20 pls don't >-<
I have enough trauma from theo compsci in uni
DOU
huge plot twist
ruclips.net/video/OKc2hAmMOY4/видео.htmlsi=vjLB7TqvO4Rscdfy
Why is it that people are very keen to mention haskell, F# and Ocaml.. but next to never talk about SML 😅 or ML (meta language) it is the ancestor to all of them 😅 maybe not that used.. but still it has a place in history when i comes to type theory 🤔
[] is an array and only an array, in contiguous memory, with no idea of how long it is, how many elements it has, or that it's even anything other than a pointer... and you can't convince me otherwise :D
Except in all the languages where it isn't that.
Not even array, is only sugar syntax on pointer arithmetic ;)
@@zionklinger2264 wait there are languages other than C? heresy
In C, arrays are not pointers. C knows how many elements an array has (for example, sizeof(int[10]) is 40, not 8, on an LP64 system). It is simply that arrays decay to pointers very easily (whenever passed to a function, or when accessed via the dereference operator), and that function declarations do a decay on their argument types for Reasons.
Examples:
void array_arg(int[]);
void func_arg(int());
int main() {
void (*p)(int*) = array_arg; /* array_arg's first argument has type int*, because that's what happens when you decay int[] */
void (*q)(int (*)()) = func_arg; /* func_arg's first argument has type int (*)(), because that's what happens when you decay int() */
}
@@zionklinger2264except in all the languages that are wrong /hj
You are a great teacher, but i wanted to ask are you a girl or a boy
Dude! Are you 12?
@@mrpocock They might just be curious, not malicious