We should soon have the "inline" keyword. A huge effort is being performed for this in the JVM so that, for example, an array of inline objects will use contiguous memory. When iterating through, you get huge speedups as you avoid all those cache misses (30x have been seen).
I think branchless programing fast too because of the CPU cache memroy loads with bulk (from the ram to l1 l2 l3 cache), like load the hole 64 bit block the requsted data around, therefore not only requested data load, the cpu loads after the data and there is a next section of the data, this adds more optimization
As far as I remember constant maths can be performed at compile time (such as static final int foo = 1+1 would be written to the file as static final int foo = 2). Could you trick the compiler to do all of this heavy lifting and only produce the binary that contained final results?
That's what I was thinking of. Every level of abstraction that Java gives is intentionally thrown away to go as close to the low level details of the hardware as possible.🤣
@@gergonagy2733 I agree. If someone is making such language like Java, he can't not to provide some way to access low-level memory management. But could someone show me a Java dev, that committed a code with a direct memory processing to a real-life commercial product? 2-sec projects are still Core Java, but it's not "vanilla" Java.
Crazy engineering effort went in this challenge.
We should soon have the "inline" keyword. A huge effort is being performed for this in the JVM so that, for example, an array of inline objects will use contiguous memory. When iterating through, you get huge speedups as you avoid all those cache misses (30x have been seen).
It's disheartening to think that in this day and age performant code still has to look this awful.
But the talk itself is great 😃
Roy's talk has been featured in the last issue of Tech Talks Weekly newsletter 🎉 Congrats!
This reminds me why I fell in love with CS and CPU Architecture. Though, it was probably easier to write this in ANSI C
I think branchless programing fast too because of the CPU cache memroy loads with bulk (from the ram to l1 l2 l3 cache), like load the hole 64 bit block the requsted data around, therefore not only requested data load, the cpu loads after the data and there is a next section of the data, this adds more optimization
As far as I remember constant maths can be performed at compile time (such as static final int foo = 1+1 would be written to the file as static final int foo = 2). Could you trick the compiler to do all of this heavy lifting and only produce the binary that contained final results?
Volkswagen would love to hire you for their diesel division.
Great Talk!
It was fun, but it felt like they were inventing an assembler
It was fun and interesting until i saw unsafe. after that it felt meaningless, empty satisfaction imho
Same feeling, like using inline asm in C/C++ 😂
Why though?
You don’t *need* unsafe to do all of this, for me it was actually a fun challenge to learn and use it.
Unsafe is deprecated now
So to process 1 billion rows in Java in 2 seconds you need to use C\C++. Great job anyway!
That's what I was thinking of. Every level of abstraction that Java gives is intentionally thrown away to go as close to the low level details of the hardware as possible.🤣
@@gergonagy2733 I agree. If someone is making such language like Java, he can't not to provide some way to access low-level memory management. But could someone show me a Java dev, that committed a code with a direct memory processing to a real-life commercial product? 2-sec projects are still Core Java, but it's not "vanilla" Java.
imagine using java in 2024...
Why not? Real persons do use it 😉😂 Pun intended.
Curious to know what you use ?
Imagine saying something that brainωormed in any year
You don't need to imagine. It's the 2nd most popular programming language in the world after python.