awesome explanation. i wish this video would have existed longer. i spent too much time figuring out on my own with trial and error this weird mat3 alignment. now that i know of std140 my code will become a lot cleaner. thank you for this great content 👍☺️
The one thing still confuses me is mat3 has to start at an offset multiple of 48 bytes or is 16 bytes enough, the same for mat4 64 bytes or 16 bytes? Let's say I have a block of vec4 and mat4, vec4 starts at 0 then should mat4 start at 16, or should I add padding of 48 bytes to make it start at 64 bytes alignment?
Under STD140, for mat3 and mat4, because they are treated like arrays, they must start on their own row. But that's it. Which row? That ends up being depends *completely* on what came *before* it. So If you had a vec4 + mat4 + mat3: 1. the vec4 would start at byte 0, taking up one row of 16 bytes 2. the mat4 would start at byte 16 and take up the very next four rows or 64 bytes 3. and the mat3 would start at byte 80 (16 + 64) and take up the next three rows or 48 bytes If there were anything after that, it would have to start on the 9th row (again, that's after 1 row + 4 rows + 3 rows), or byte 128 (16 bytes + 64 bytes + 48 bytes). If the first row had just a single float or a vec2 or a vec3, this structure would be the same. The only thing that would force the mat4 and mat3 down would be something that took up more than a single row. I hope that makes sense.
a small problem i enountered while trying out is the bufferData size must atleast be 16 ( to align to atleast one vec4) , if you its not working , ensure that bufferData is allocated atleast with size 16.
i have a question and i hope you can answer it. Which approach is better in your opinion and why? First approach: - create 1 uniform buffer object (ubo) per scene - update the current game object always with 3 webgl calls: - bindBufferBase - bindBuffer - bufferSubData Second approach: - create a ubo for every game obj - create a dirty flag in js land for each game obj - if dirty flag is true for current game obj it requires 3 webgl calls - bindBufferBase - bindBuffer - bufferSubData - if dirty flag is false for current game obj it only requires 1 webgl call - bindBufferBase First approach would require a lot less gpu memory with the cost of always making 3 calls to webgl for every game object. Second approach requires more gpu memory, makes the code more complex but saves me probably 2 webgl calls per game object (if game object does not need an update)
It usually makes a lot more sense to use a single UBO for your application. You'll end up with a single buffer as a result, so the real question then becomes whether to call bufferSubData() multiple times or just once -- which is a pretty similar question to what you are asking. And the answer is "it depends". If everything has changed since the last frame, then of course you just upload the entire thing again with one call. If only a single, small part has changed, then upload only that small part. These examples are pretty obvious. But if you have a HUGE buffer (as in megabytes of data) and you are only making a few, tiny changes spread across it, it's difficult to predict without profiling. Uploading huge amounts of data can hurt performance, so maybe doing a single bufferSubData() is not a good idea. But each call to bufferSubData() several times with small amounts of data also can hurt performance. Sometimes you can structure your data so that the most mutable parts are together (making for a smaller, single update). Personally, I'd try to keep my update as small as possible and call bufferSubData() only once. But honestly, this is usually only a difficult question if you have megabytes of UBO data to upload every frame. It's worth also mentioning that bindBufferRange() is usually considered an optimization for most applications. I didn't explain it in this video, but it allows you to associate multiple uniform blocks with a single buffer. So it's normal to consider "one buffer" to be more performant than "many buffers". Also, remember that bindBufferBase() is only called during initialization time. It's only really necessary before allocating space on the GPU (in this case for a UBO). Once you have that memory, you never need to call bindBufferBase() again. Instead, just use bindBuffer() before calling bufferSubData().
Thank you for the kind words. Honestly? Everything is iterative. Circular. Start with documentation, then go through books and blogs, then write some buggy code and figure out how to fix it. The first time through, none of it makes any real sense. Then do it again and again. Every time around the loop, understanding grows, but more questions pop up, so you start over with a different level of familiarity and a new set of questions. Hopefully, by the time you run out of questions, you have no more misunderstandings. Most people don't need that level of detail; they just need to get things to work. In my own case, I just genuinely hate having questions I can't answer.
@@osakaandrew Thank you for the nice explanation and thank you again for producing such lovely videos. We are lucky to have them! That's interesting about things being iterative. I have also found myself carefully adding stuff and hoping not to break everything. On a couple of occasions, it took me days to narrow down what the heck was going wrong! Luckily, I am really enjoying the challenge. For better or worse, I am also translating all your examples into twgl, which definitely adds to the fun.
I have been trying to understand uniform buffer objects for so long, your video made it click for me. Thanks for explaining std140.
Thank you for this video
Best gl series!
awesome explanation. i wish this video would have existed longer. i spent too much time figuring out on my own with trial and error this weird mat3 alignment. now that i know of std140 my code will become a lot cleaner. thank you for this great content 👍☺️
Thank you
That was an amazing explanation, thank you very much.
Thank you for this video, I'm glad I found your channel
this is gold
The one thing still confuses me is mat3 has to start at an offset multiple of 48 bytes or is 16 bytes enough, the same for mat4 64 bytes or 16 bytes?
Let's say I have a block of vec4 and mat4, vec4 starts at 0 then should mat4 start at 16, or should I add padding of 48 bytes to make it start at 64 bytes alignment?
Under STD140, for mat3 and mat4, because they are treated like arrays, they must start on their own row. But that's it. Which row? That ends up being depends *completely* on what came *before* it. So If you had a vec4 + mat4 + mat3:
1. the vec4 would start at byte 0, taking up one row of 16 bytes
2. the mat4 would start at byte 16 and take up the very next four rows or 64 bytes
3. and the mat3 would start at byte 80 (16 + 64) and take up the next three rows or 48 bytes
If there were anything after that, it would have to start on the 9th row (again, that's after 1 row + 4 rows + 3 rows), or byte 128 (16 bytes + 64 bytes + 48 bytes).
If the first row had just a single float or a vec2 or a vec3, this structure would be the same. The only thing that would force the mat4 and mat3 down would be something that took up more than a single row. I hope that makes sense.
@@osakaandrew Hey, thank you so much for your time, everything makes sense now.
a small problem i enountered while trying out is the bufferData size must atleast be 16 ( to align to atleast one vec4) ,
if you its not working , ensure that bufferData is allocated atleast with size 16.
i have a question and i hope you can answer it. Which approach is better in your opinion and why?
First approach:
- create 1 uniform buffer object (ubo) per scene
- update the current game object always with 3 webgl calls:
- bindBufferBase
- bindBuffer
- bufferSubData
Second approach:
- create a ubo for every game obj
- create a dirty flag in js land for each game obj
- if dirty flag is true for current game obj it requires 3 webgl calls
- bindBufferBase
- bindBuffer
- bufferSubData
- if dirty flag is false for current game obj it only requires 1 webgl call
- bindBufferBase
First approach would require a lot less gpu memory with the cost of always making 3 calls to webgl for every game object.
Second approach requires more gpu memory, makes the code more complex but saves me probably 2 webgl calls per game object (if game object does not need an update)
It usually makes a lot more sense to use a single UBO for your application. You'll end up with a single buffer as a result, so the real question then becomes whether to call bufferSubData() multiple times or just once -- which is a pretty similar question to what you are asking. And the answer is "it depends".
If everything has changed since the last frame, then of course you just upload the entire thing again with one call. If only a single, small part has changed, then upload only that small part. These examples are pretty obvious. But if you have a HUGE buffer (as in megabytes of data) and you are only making a few, tiny changes spread across it, it's difficult to predict without profiling. Uploading huge amounts of data can hurt performance, so maybe doing a single bufferSubData() is not a good idea. But each call to bufferSubData() several times with small amounts of data also can hurt performance. Sometimes you can structure your data so that the most mutable parts are together (making for a smaller, single update). Personally, I'd try to keep my update as small as possible and call bufferSubData() only once. But honestly, this is usually only a difficult question if you have megabytes of UBO data to upload every frame.
It's worth also mentioning that bindBufferRange() is usually considered an optimization for most applications. I didn't explain it in this video, but it allows you to associate multiple uniform blocks with a single buffer. So it's normal to consider "one buffer" to be more performant than "many buffers".
Also, remember that bindBufferBase() is only called during initialization time. It's only really necessary before allocating space on the GPU (in this case for a UBO). Once you have that memory, you never need to call bindBufferBase() again. Instead, just use bindBuffer() before calling bufferSubData().
Another amazing video! Many thanks! Forgive me for asking, but how did you learn all this stuff?
Thank you for the kind words. Honestly? Everything is iterative. Circular. Start with documentation, then go through books and blogs, then write some buggy code and figure out how to fix it. The first time through, none of it makes any real sense. Then do it again and again. Every time around the loop, understanding grows, but more questions pop up, so you start over with a different level of familiarity and a new set of questions. Hopefully, by the time you run out of questions, you have no more misunderstandings. Most people don't need that level of detail; they just need to get things to work. In my own case, I just genuinely hate having questions I can't answer.
@@osakaandrew Thank you for the nice explanation and thank you again for producing such lovely videos. We are lucky to have them!
That's interesting about things being iterative. I have also found myself carefully adding stuff and hoping not to break everything. On a couple of occasions, it took me days to narrow down what the heck was going wrong! Luckily, I am really enjoying the challenge. For better or worse, I am also translating all your examples into twgl, which definitely adds to the fun.