Видео

Просмотров

Комментарии •

@abdia-lb2yx 4 месяца назад ⁺¹⁰²
I ain't gonna lie having an influencer style videos in the coding world just entertainment non educational helps me stay on track when i feel not motivated to study keep it up bro.
@Lewini 4 месяца назад ⁺⁴
Hard agree. Keeps my algorithms in check
@cyrilemeka6987 4 месяца назад ⁺²
Agreed.
@DjoumyDjoums 4 месяца назад ⁺¹¹⁶
My guess on why 3 is slower than 2 : index bound checks. Compilers can very often eliminate them when you loop on the actual length of the array, not a pre-cached numerical value.
@grifon97 4 месяца назад ⁺⁹
sounds like a very good guess to me!:d
@Nenad_bZmaj 4 месяца назад ⁺¹⁹
@@chigozie123 Not true.
@Pipe0481 4 месяца назад ⁺²¹
@@chigozie123That's just completely false lol
@chigozie123 4 месяца назад
@@Pipe0481 I should clarify that I'm comparing it to what some C/C++ compilers do. If you look up optimizations done by the Go compiler, you will find that the only optimization even worth mentioning is the conversion of "zeroing out a slice", to a simple memclr instruction. In C, we get that for free using `memset`.
Maybe also function inlining, but that's nothing given that in C, the `inline` keyword is free to use. That's just the tip of the iceberg. Compare that list to what GCC does, and you will see why it's difficult to claim that the go compiler is an optimizing compiler.
The main reason why I said that go doesn't do optimizations is that in C, there is no reason why the second and third code snippets should not run at the same speed. The compiler should have optimized the len call out of the second, to make it pretty much the same as the third.
@Nenad_bZmaj 4 месяца назад ⁺⁴
@@chigozie123 The for loop with the range clause is clearly more optimized by the compiler compared to standard for loop with the condition clause, as he has shown. I agree with the len call - it should have optimized it, but he measured with the time package, not benchmarked it, and did not show us standard deviations. In Go you can opt for no inline. I simply disagree with the statement that Go compiler doesn't do exhaustive optimizations, as long as they are worth doing and as long as they would not bloat the compile duration.
@ColinFox 4 месяца назад ⁺¹⁶
Go uses symbol lifetimes to determine whether to allocate on the stack or on the heap. If the object survives after the function returns, it's on the heap. Pretty straightforward.
@susiebaka3388 4 месяца назад ⁺¹⁶
"The more you write standard code, the more likely you're generally doing well." - ThePrimeagen
@samsqu4red 4 месяца назад ⁺¹¹
Found the channel just few weeks ago.
Coming from a background in Webflow, with only a grasp of HTML, CSS, and a bit of JavaScript, I see myself as a puppy. Recently, I've decided to take coding more seriously because it's interesting, and I also have a web app idea that I'm genuinely determined to bring to life.
Though 90% of what the man is saying still flies right over my head, I feel like my understanding is expanding. I've never attended school, worked in-house with a team, or studied with guidance. It seems like I found a semblance of a mentor who's bombarding me with jargon I can't possibly comprehend, but ultimately is helping me see throw the fog in a way.
Feels like little by little I'm starting to grasp how everything is connected.. weird.
@HairyPixels 4 месяца назад ⁺³⁷
The compiler writes better code than you but what if you're writing a compiler? BRAIN MELTING
@ninocraft1 4 месяца назад ⁺¹¹
if you write your own compiler, its probably gonna be even slower xD
@Patterner 4 месяца назад ⁺¹⁰
just compile the compiler with the compiler. repeat. ... profit
@OkAtheistExplainVim 4 месяца назад ⁺⁸
if the compiler compiles your compiler is it really your compiler?
@HairyPixels 4 месяца назад ⁺³
@@Patterner In Soviet Russia compiler compiles you.
@joranmulderij 4 месяца назад ⁺¹
@@OkAtheistExplainVim If you wrote the compiler that wrote the compiler then yes
@andygarfield6529 4 месяца назад ⁺²²
There’s all sorts of things he could have done here. Changing the order of the loops can give you wild improvements because it can change the proportion of cache hits. Also loop unrolling. Also using arrays in Go (which are a real thing) instead of slices.
@CjqNslXUcM 4 месяца назад
the loop order is correct, it goes through the innermost slice first.
@SaHaRaSquad 4 месяца назад ⁺⁷
If you do something like manual loop unrolling always benchmark it, because most likely it will actually make your code slower by blocking much better automatic optimizations.
@Nenad_bZmaj 4 месяца назад ⁺¹
In my experience, array gives me worse performance than slice. After I checked that, instead of declaring a := [5]something, I always do: a := make([]something, 5, 5)
@Zullfix 4 месяца назад ⁺¹
Agreed. His single-array test could have been much faster if the locality of coordinates were stored contiguously.
@Yawhatnever 4 месяца назад ⁺²
Loop unrolling might be one of those things where you end up fighting the compiler (would need to measure), but loop tiling might have a real chance of increasing cache hits
@eni4ever 4 месяца назад ⁺¹³⁶
blue hair no?
@dejangegic 4 месяца назад ⁺¹⁹
Old video
@sparrow243 4 месяца назад ⁺²⁷
He's getting bald
@greendsnow 4 месяца назад
@@sparrow243 hahahahahah
@yobson 4 месяца назад
no
@XDarkGreyX 4 месяца назад
Brain no?
@jagc2206 4 месяца назад ⁺⁶
There is a big distinctions between array an slices in go. This video uses slices, which are pointers to arrays. A slatic array will have a compile time length and allows the compiler to do a lot more optimization
@Nenad_bZmaj 4 месяца назад
My experience with performance is the opposite, so I stick with slices.
@philipp04 4 месяца назад ⁺¹
A year ago my prof had us write gaussian elimination, but the algorithm we had to implemeny was slightly different for each of us. I had to find the inverse of a matrix using Cholesky decomposition, which is designed to be fast for large symmetric matrices. I'll cut to the chase, the code I wrote was slow as hell and I couldn't figure out what's going on, though I could tell that I had tons and tons of cache misses. The professor though could tell what I did wrong immediately: the matrix was stored as a 1d array row by row, but I was traversing it columnwise. I transposed the matrix and swapped all indices in my code, and the total runtime decreased by a factor of 20 (on 8000x8000 matrices). Moral of the story: go through your data sequentially, it makes a huge difference.
@rt1517 4 месяца назад ⁺³
The video author should have disassembled the resulting binary to see if he would be able to figure out what the hell is going on.
@Satook 4 месяца назад
The 2nd form in part 1 is pretty easy for a compiler to optimise with constant propagation. The x and y don’t change so the sub-expressions can be moved up out of the innermost loop.
@debemdeboas 4 месяца назад ⁺¹
reading contiguous memory is generally faster because the compiler may not have to deal with paging and segmentation for every index
@swedishpsychopath8795 4 месяца назад ⁺⁴
I'm planning to throw a SIMULA-67 party in 3 years to celebrate its 60 years as the first OOP language. Hate it or love it the norwegian SIMULA-67 invented Object Oriented Programming (OOP) with Classes, Inheritance, Encapsulation, overloading, overriding and all the other good stuff. It is crazy to think that this paradighm shift in programming is 60 (SIXTY) years already!! Press LIKE if you would like to get an early invitation!
@MattBolt35 4 месяца назад
There are ways to pass link options to go’s compiler to show which values escape. Slices are typically heap allocated. You can stack allocate an array if you use a finite length (as he mentions).
@Nenad_bZmaj 4 месяца назад ⁺²
But only small sized arrays.
@0dsteel 4 месяца назад ⁺²
Slices are just references to an array with set length and capacity. They are reallocated if the values wouldn't fit said capacity.
On the other hand, arrays *may* be stack allocated, if the compiler is certain they don't escape the given scope and, well, fit in the stack frame.
(this is not legal advice, please ask your friendly neighbourhood debugger)
@samhughes1747 4 месяца назад ⁺⁶
EDIT: this is wrong. Only the comprehension-inlining bit made the boat in 3.12.
Hey. Python doesn’t have a loop problem anymore. There’s still the “it’s python” problem, but Python 3.12 introduced loop unrolling,
@ruroruro 4 месяца назад
source?
@samhughes1747 4 месяца назад
@@ruroruro, oof. Yeah, that didn't get approved for 3.12, and I got that mixed up with the comprehension-function inlining that did get released in 3.12.
@Gusto20000 4 месяца назад
I’ve done the same thing a week ago after 1.22 released, benchmarking with testing.B and measured exactly the same result for each case… So I decided simple range is good enough. Bill Kennedy talks about “mechanical sympathy” in “Mastering Go” about this exact problem - it’s all about low level caching and prefetching by cpu and compiler
@Nenad_bZmaj 4 месяца назад
Small arrays can be on stack, but slices are always allocated on the heap (in Go).
@AlexnderMisuno 4 месяца назад
Once I implemented an Mineswepper game in go using "normal" approach to store data about field, cell state etc. And afterwards I did the same but using bit-wise operations (just for fun). To my surprise, not only the performance was the same, but also memory consumption
@miroslavhoudek7085 4 месяца назад ⁺¹
A guy watches a guy who doesn't know what's going on, while also not knowing what's going on, while people in chat suggest what might be going on, but they can't be all right. Good fun overall, would recommend, but don't know why.
@g.c955 4 месяца назад ⁺¹
he could to a follow up by profiling the code to see where CPU and memory are spent, then analyze the assembly output of the code to understand why.
@haraldfielker4635 4 месяца назад ⁺¹
I think the range works better with CPU caches (in that szenario).
@justbrad_v3906 4 месяца назад
yeah, that sounds like buffer overflow hell lol
@Nenad_bZmaj 4 месяца назад ⁺¹
The difference of cca. 250 ms for about one billion elements is to be expected if the additional GC sweeps are triggered. He should have insulated the three scenarios and benchmarked them separately, with the SD, not consecutively in one test. It seems that an additional GC sweep was turned on sometime after the second test. Also, package "time", in my experience, may interfere somehow with either GC or the execution. There are better tools for benchmarking, although I do use "time" to give me a sense.
@shapelessed 4 месяца назад ⁺²
Omg GO...
I only had a week with that language when I tried it but I love the errors as values concept. To the point that I started using them all over my JS.
@0dsteel 4 месяца назад
haters gonna hate, but you mention their uncaught runtime exceptions and they'll be in shambles :)
@max3446 4 месяца назад
I like errors as values but Go would be so much nicer if they had monadic operations on errors (or at the very least, some sort of shorthand for `if err != nil { return err }`).
@lukekurlandski7653 4 месяца назад
These are dynamic arrays, right? I feel like you should be able to call a .contiguous() function or something that moves all of the elements of the tensor into a contiguous arrangement. Then again, I’m guessing there is some other data structure in Go that is closer to a traditional array or something…?
@Nenad_bZmaj 4 месяца назад
Yes, array : a := [k]something, as opposed to slice- a:= make ([]something, k). But in my experience, slices perform better.
@OnFireByte 4 месяца назад ⁺¹
really interesting that flatten slice is a bit slower than actual 3d slice
@CjqNslXUcM 4 месяца назад ⁺²
the slice is a pointer with a type and a len, so every time you move from the innermost slice it jumps to a cache miss. the flat slice is one fat blob of memory. no cache misses.
@OnFireByte 4 месяца назад
@@CjqNslXUcM that’s why im surprised that flatten one is slower
@miracleinnocent2649 4 месяца назад ⁺¹
Go's allocations all depend on its escape analysis sometimes even interfaces can be stack allocated and I think the highest bytes for stack allocation is 32kb
@sotiriosmantziaris6097 4 месяца назад
The compiler decides based on escape analysis where to put things.
@reactjs1900 4 месяца назад
I think slices in go are like vectors and stored in heaps
@yugioh8810 4 месяца назад ⁺¹
using only index with range is usually faster especially if the array is long or has heavy structs since it eliminates the copy operation to the variable in the for loop.
for i := range x
is faster than
for _, val := range x
@georgehelyar 4 месяца назад ⁺²
I'm not a go expert but it sounds like the reason 3 is slower than 2 and reverse is faster than forwards is because of bounds checking.
@defeqel6537 4 месяца назад
which I think is something go should probably be able to optimize away in cases like this
@georgehelyar 4 месяца назад
@@defeqel6537 is it a jagged array though? If len(a[0]) can be different to len(a[1]) then it could potentially go out of bounds, but again I'm not a go expert.
C#, for example, has multi-dimensional arrays of contiguous memory, indexed with a[x,y,z] where the lengths of each dimension are guaranteed, or you could make a jagged array by making an array of arrays where they aren't, and then you'd have to use a[x][y][z].
Reverse being faster in several languages always seemed like it could be optimised though. Reverse is faster because it always knows that the lower bound is 0 regardless of the array so if it starts inside the bounds and then never goes below 0 then it doesn't have to check bounds for each iteration.
And of course dynamic arrays could change size at runtime
Rust probably does bounds checking at compile time
@Nenad_bZmaj 4 месяца назад
@@defeqel6537 I believe that, too. The very typical difference of cca 200 ms for arrays of this size tells me that probably an additional GC sweep was triggered.
@joyboricua3721 4 месяца назад
@6:22 somewhere a hacker says "Oh yes we could"
@awesomedavid2012 4 месяца назад ⁺¹
My assumption is that if you create a slice with a fixed compile time size and it doesn't outlive the function, Go will simply implement it as an array on the stack. The compiler knows its size and it knows how long it lives.
Notice that even if you call append, it constructs a new slice. So the original could still be an array on the stack in implementation as long as you don't pass it outside of the function.
@Nenad_bZmaj 4 месяца назад
Most arrays and ALL slices are allocated to the heap in Go.
@Nenad_bZmaj 4 месяца назад ⁺¹
Also, when you call append, Go runtime doesn't necessarily create a copied array of twice the capacity. If there is enough capacity in the original array, appending a slice that points to it doesn't create new array. (On the side, a new slice is not declared with reasignment: a = append(a, el) .)
@what1heh3ck 4 месяца назад
@@Nenad_bZmajI thought append always create a new slice (still point to the original array if it has enough capacity), am I missing something here?
@awesomedavid2012 4 месяца назад ⁺¹
@@Nenad_bZmaj I agree about the append thing. Though I still think the compiler could theoretically optimize a slice to be on the stack if you never call append, give it a constant time size, and don't return it.
When you say it's always on the heap in go, do you mean that is how the main Go compiler implements it or is that part of the Go standard?
@Nenad_bZmaj 4 месяца назад ⁺¹
@@awesomedavid2012 I remember that this was the standard, although not in the language specification, since this doesn' t belong there, but this behaviour may change in future without changing the language specs. Somebody just told me that he looked at this more closely and that the current compiler does keep smaller slices that have properties you describe on the stack.
@freedomextremist7921 4 месяца назад ⁺³
It's a slice not a list. And Go does also have arrays which are contiguous regions in memory.
@Leeway4434 4 месяца назад ⁺²
yes in go arrays are stack allocated and are created with a fixed length. [5]int is an array of length 5 on the stack and []int{1,2,3,4,5} is a slice of length 5 on the heap, pointer to it on the stack. I believe there is a size limit for stack allocated arrays at some point the compiler puts them on the heap* to avoid stack overflows
@Nenad_bZmaj 4 месяца назад
@@Leeway4434 All true.
@Leeway4434 4 месяца назад ⁺¹
@@Nenad_bZmaj I actually looked into it some more and it is not guaranteed for slices to be put on the heap. The compiler has heuristics for whether or not something should end up on the heap or the stack. Large values usually end up on the heap. Small values (including slices of sufficiently small constant length) can often be allocated on the stack.
@Nenad_bZmaj 4 месяца назад
@@Leeway4434 Thanks. I didn't know that.
@michaelrobb9542 4 месяца назад ⁺¹
You can't put Sys calls in for loops. Calls like time.Now are calls to the OS, this will produce inconsistent results on each run.
@Nenad_bZmaj 4 месяца назад
True. There is a bigger for loop in this particular test, encompassing three tests.
@martinpata2899 4 месяца назад
I like the idea but he should be using go benchmark for this tests
@rumplstiltztinkerstein 4 месяца назад
The notion of "code optimization" is mostly popular because of Javascript. We invented compilers so we don't have to worry about looking at everything the cpu is doing in every clock cycle.
@bbrodriges 4 месяца назад ⁺¹
Slices can be stack allocated. If fact they are by default
@metaltyphoon 4 месяца назад ⁺³
Not they r not. The stack type, which is a pointer + length, will be stack allocated but the data won’t be in the stack.
@antoniusupov 4 месяца назад ⁺¹
All pointer values in go will go to heap. Go check out the runtime package in go and see what is inside. Only objects of fixed length live on stack, in go even strings contain pointers in them and go to heap.
@Nenad_bZmaj 4 месяца назад
The so called slice-header will be on stack (basically 3 numbers). It always points to the memory allocation on the heap.
@RandomGeometryDashStuff 4 месяца назад
08:55 "a[a.length]=1" is slower than "a.push(1)" in v8
@Nenad_bZmaj 4 месяца назад
Regarding some comments: I did try arrays instead of slices of length known at the compile time, but performance was always slightly worse. In my experience, Go programmers can forget the arrays, since slices seem to be very well optimized. And, I like using slices when I have to do something differently in the last iteration or in distinct subranges. So, instead of
for i := 0; i < k; i++ {...}; for i := k; i < len(a); i++{...},
I can write:
b := a[: k]; c := a [k:]; for _, el := range b {...}; for _, el := range c {...}.
@flga_ 4 месяца назад
I'm not sure what you did, but when switching to arrays you need to be careful to avoid copies.
There is no inherent reason for arrays to perform slower, other than that.
@Nenad_bZmaj 4 месяца назад
@@flga_ I passed them as reference. Very small arrays, for instance, small hash tables with short slices of bytes as elements, or arrays of ints. But passing pointers to them apparently always triggers GC sweep. When I turn them to slices, the execution is faster overall for the duration of one GC sweep, which on my comp., for several programs I wrote, apparently (inferred) takes about 200 ms on average.
@flga_ 4 месяца назад
@@Nenad_bZmaj I don't think you were measuring things correctly then, I highly doubt gc is a factor.
Passing a pointer to an array will require a nil check, that's the only difference and can be done by you, upfront, instead of by the compiler per iteration.
Put in other terms, using arrays is asymptotically equivalent to using a 1d slice. Tho for small sizes the distinction is unlikely to be meaningful as the chance of it all being cached in either scenario is high.
@defeqel6537 4 месяца назад ⁺¹
Optimization videos like this, I'd always want to know the size of the data, and the theoretical memory bandwidth, etc. to have some actual context. No point trying to optimize things if you are already limited by the hardware (or the optimization would need to be radically different at least).
@rosehogenson1398 4 месяца назад
go has escape analysis which can decide to place slices on the stack if it can guarantee they don't escape the current scope
@Nenad_bZmaj 4 месяца назад
True for some types of variables, not sure if this is true for slices. I think they are always on the heap.
@Jarikraider 4 месяца назад ⁺¹
Personally, I thought a 100ms improvement on 1 billion elements was pretty good.
@RandomGeometryDashStuff 4 месяца назад
09:19 "while" is usually worse enemy
@mannycalavera121 4 месяца назад
We Gorupies now
@adambright5416 3 месяца назад
The fact that he used time in one function instead of builtin benchmarks is just... Wow
@lancemarchetti8673 4 месяца назад
Nice
@MethodOverRide 4 месяца назад
Write smooth easy code? Sounds like something that genius Tom would do with JDSL.
@someman7 4 месяца назад
Whoa, his parentheses are different colors. It only took us what, 30 years to see "such" progress in syntax highlighting?
@blenderpanzi 4 месяца назад
I'm at 1:53 - I think the 2nd one could be the fastest, because an intelligent compiler could inline len() and eliminate bound checks in that case. It can't eliminate bound checks in the last version. Let's see if my intuition is right.
@huge_letters 4 месяца назад
wouldn't it be faster to access an underlying list at each loop layer?
instead of
loop x
loop y
loop z
t += arr[x][y][z]
you do
loop x
xa = arr[x]
loop y
ya = xa[y]
loop z
t += ya[z]
that way you do access on x and y only when needed and not every iteration? If you're saying it's likely memory access that's the issue
@Nenad_bZmaj 4 месяца назад
No difference. You still have to access the element of the array via xa. xa and arr[x] are both just pointers. And the array is one same. The compiler will transform your code into the one in the video, anyway, because xa and ya are just transient variables of slice (pointer) type.
@huge_letters 4 месяца назад
@@Nenad_bZmaj I mean that in option 1 you do access by x x*y*z times while in option 2 you do it only x times.
Same for y - y*z times vs y times
@Nenad_bZmaj 4 месяца назад
@@huge_letters Oh, I wish. But simply not possible. No matter how you write it, the array contains x*y*z elements, and you need to access them all.
@metropolis10 4 месяца назад
I liked that he said the 100ms improvement wasn't worth it. How often do people admit the technically faster thing isn't worth the headaches.
@mohamedaityoussef9965 4 месяца назад ⁺¹
Blue hair gone nooooooo
@josefaguilar2955 4 месяца назад
These complications from loops is why I believe C/C++ will be around FOR a WHILE longer.
@RenThraysk 4 месяца назад ⁺¹
His problem is that Go compiler doesn't perform loop unrolling, this is a deliberate decision by the go devs. If you manually loop unroll he'd get an improvement.
@RandomGeometryDashStuff 4 месяца назад
08:39 "a.length = 0" doesn't create new object so less stuff to gc
@Nenad_bZmaj 4 месяца назад
Something I noticed: you'll get slightly worse performance if you write involved expressions instead of simple indexes. Hint: assign such expression to a variable first and use that as an index. and let compiler optimize it. Simpler expressions are better put as an index directly, e.g. a[i+3], but write: ind := b[i]; do something with a[ind], rather than: do something with a[b[i]]. Not a big deal, though, since difference is marginal.
@dejangegic 4 месяца назад ⁺²
I guess Go really is K.I.S.S. from the bottom-up
@ninocraft1 4 месяца назад
S.S.I.K. ?
@Arwahanoth 4 месяца назад
I guess range statement avoid bound checking.
@davidzwitser 4 месяца назад
Meanwhile in APL languages you just write ‘matrix + number’ and it’s fully optimised
@k98killer 4 месяца назад
Go will store slices in stack memory if they are small enough (less than 2^16 bytes). Allegedly.
@masterchief1520 День назад
Thought it all depended on escape analysis?
@k98killer День назад
@@masterchief1520 I think it is a mix, but it has been a while since I last looked into this, so idk for sure.
@FunkyELF 4 месяца назад
The python range() is not like ranges in other languages. Python range just gives back numbers, that's it. I think go's range is more equivalent to general iteration in Python and the "in" keyword... `for thing in things: ...`
@arkie87 4 месяца назад
numba allows you to write fast python loops by compiling using llvm
also, am I the only one who pronounced numpy in their head as num-pee for the longest time
@GaborGyebnar 4 месяца назад
Case 5 is fastest because the code is buggy: y is never used. It's just reading the same row multiple times, cache hits make it fast but the result is incorrect.
@Zullfix 4 месяца назад
Rather than trying to optimize iterating, it sounds like his array needs to be split into partitions (or chunks if you play minecraft) to allow for multithreading his calculations.
@ErazerPT 4 месяца назад
Not everything can can be parallelized. It all comes down to whether or not your next step depends on the results of the previous step. If no, you can, if yes, you can't. Simple example, a loop that steps through a vector adding the previous element to the current one in place. You can't parallelize it because you need the previous step's result before you execute the next.
@Zullfix 4 месяца назад
@@ErazerPT You don't parallelize by computing steps out of order, you parallelize by partitioning your space into chunks and computing each chunk on a separate thread, then combine the results. Obviously nothing is that simple, but at a surface level this is *the* technique used by professionals to create multi-threaded physics simulations and it very likely can be applied to whatever the guy in the video is trying to compute.
@ErazerPT 4 месяца назад ⁺¹
@@Zullfix Read the example i gave you. It CAN'T be parallelized. You CAN'T partition things if ANY of the subsequent operations depend on the results of its predecessors. As for physics simulations, be it rigid body dynamics or fluid, search around and you'll find a simple "some parts can, some parts can't" answer. Think like this, in a perfectly inelastic collision of billiard balls in a perfect line, only the cue ball and the last ball will move. But the last ball can't know it will move before the "one before last" knows it should try to move. And the one before last won't know... etc. Things like for example "grass simulation" can be done IF you take some "artistic license", ie, for example assuming wind interacts with grass but grass blades don't interact with each other. THAT is when you can partition things, because wind is a common entity and each blade of grass does it's own thing.
@atijohn8135 4 месяца назад
@@ErazerPT for your example, you could still optimize it by adding up parts of the array in parallel, then for each part add the final element of the previous one. this would be faster for large enough arrays, since you can further parallelize the subsequent block adds no problem.
many smaller dependencies like that can be eliminated by experienced parallel programmers, it's really more common than you seem to think, you can't just say that you can't parallelize an algorithm just because "not everything can't be parallelized"
@ErazerPT 4 месяца назад
@@atijohn8135 Jesus Mary, you people really don't give up... YOU CAN'T parallelize that example (and be more efficient). THINK. If one thread is adding a[0] to a[1], another CAN'T add a[1] to a[2] because it DOESN'T know what a[1]'s final value will be before the first thread finishes.
But let's say we ignore that, and have a cpu with unlimited (real) threads. So, one does a[0]+a[1],while another does a[1]+a[2], etc until a[n-1]+a[n]. When they all finished, you now have to run another batch, look at differentials from the previous ops, and do it all over again for a[1] to a[n]. But... it has a "carry effect", so, you have to do it again for a[2] to a[n], then for each element upto a[n-1]+a[n].
Congratulations, you just parallelized something and ended up being less efficient than a single thread, because you still had to run n-1 steps, do more work per step and waste more memory because you had to keep track of both old values and new to calculate differentials. Great job.
@alexpyattaev 4 месяца назад ⁺¹
1 billion elements... The problem is likely memory bound. Make use of smaller data type, it will run faster.
@Nenad_bZmaj 4 месяца назад
Yeah. Also the ratio of used to total memory influences how frequent GC will run. (kinda; it is more complicated). GC sweep for this size typically takes 200 ms (on my old comp with old single core Athlon CPU).
@alexpyattaev 4 месяца назад ⁺¹
@@Nenad_bZmajGC does not traverse an array of integers. As far as GC is concerned, it is one "leaf" object in the object graph, so it does not get traversed, no matter how big it is. Now if it contained other objects like strings the GC would have to walk it.
@Nenad_bZmaj 4 месяца назад
@@alexpyattaev I am not talking about GC traversing an array, but about the GC mark-and-sweep cycle occurring during the execution at some time points, at which times "the world is frozen". The way he structured this test - you don't know when the GC was active, so benchmarking three test-cases consecutively in one outer loop is not going to be informative for comparisons...
I mentioned size dependency, because the program runs longer on larger arrays and GC runs more times. I see that in my programs. Although these programs involve quite complex data processing, the mere size of input array dictates how many times GC will run or for how long in total it freezes the world. When my input slices point to array of 50 Mbytes, one GC sweep takes about 50 ms on average, and when the array is 250 Mb - 250 ms. (on my very old comp.)
I assume that when each innermost slice is done its array can be marked for sweeping since it is not going to be used anymore, and so on upwards.
@alexpyattaev 4 месяца назад
@@Nenad_bZmajOh I see, with more memory allocated in general it just runs GC more often. That is crazy, so the more RAM you use the slower your code runs (as it does more pauses)? Brrr....
@Nenad_bZmaj 4 месяца назад
@@alexpyattaev It is highly optimized, though, so it uses the least number of cycles possible for the amount of garbage it needs to collect. Plus, Go gives you options for GC so you can optimize it for your specific program during the build.
If your program approaches the memory limit, Go GC will take that into account when calculating how often must sweep, and adjust its cycles so you don''t run out of memory. You can opt out of this behavior. When you have enough memory, GC doesn't take a large fraction of total time even if you have a lot of garbage. If you have a miltithreaded CPU, then you almost won't notice the GC.
@GermanClaus 4 месяца назад ⁺²
Try out some Kotlin ;)
@bloomingbridges 4 месяца назад
🐼
@gavipk 3 месяца назад
solution: c++
@jony1710 4 месяца назад
ITT nobody knows what the stack is
@AlexanderNecheff 4 месяца назад ⁺¹
I used to believe the compiler knew best and especially that any bug in the code was my fault, never the compiler's fault: then I started having to use the Intel compilers...
And oh boy howdy - some of the worst compilers ever. Poor standards compliance, lots of weird bugs in both the front and back end, horrible diagnostics to boot.
Still generally think the compiler usually knows best, but now I have an disclaimer associated with that rule of thumb.
@dickheadrecs 4 месяца назад
uh excuse me i prefer my memory malloc carte
@yoloswaggins2161 4 месяца назад ⁺²
Can't you just disassemble the binary and look at what they're actually doing?
@nonaligned293 4 месяца назад
react equals pain
@ofekshochat9920 4 месяца назад
arr[a][b][c] = arr[a * (second_dim + third_dim) + b * third_dim + c]. this shouldn't have hops...
@leeroyjenkins0 20 дней назад
They're slices so you can append to them and thus they can move. They're not contiguous memory. If you want contiguous memory you use fixed-size arrays.
@nevokrien95 4 месяца назад
Pandas is slow use numpy
@dixztube 4 месяца назад
Man the new react next is sucks! I just popped out an htmx and go project. Silky smooth, fast and sexy. Been bumming around this react nonsense for two months now after getting decent at the last version of react
@deathzombee 4 месяца назад ⁺¹
First
@ryankimeu9013 4 месяца назад ⁺¹
i am an arch user
@uzbekistanplaystaion4BIOScrek 4 месяца назад ⁺¹
that's so poggers
@RT-. 4 месяца назад
Here's your trophy
@BlitzDjingga 4 месяца назад
This is really shocking result, but I do believe the second one is better approach most of the time, while the third one is super optimized that it is so tiring to code and maintain.
To bad the video does not show memory performance for this, I believe what is happening is that the second and third one operate on HEAP does make the performance slower. While the first one is lucky enough not to encounter memory overflow. thing is when you do `range` it make copy of the object, albeit 3D array maybe the host machine having ram big enough to account for all of that iterating.
Will be more educational if can show the garbage collection and memory / space performance for each of the method.
@pyaehtetaung 4 месяца назад ⁺¹
Black?
@spartan_j117 4 месяца назад
Hahahaha Prime! So you DO reupload your old videos! Now that's lame...
@LengCPP 4 месяца назад
admit it you watch your own videos
@svetlinzarev3453 4 месяца назад
Golang is such a a crappy language
@adjbutler 4 месяца назад
But Python is strongly typed and JS is weakly typed... why you hate on Python so much? It is not as slow as it used to be....