Hope you all enjoyed the _journey_ - I for one am definitely glad it's over. What's the worst dumb mistake you've ever made which cost you too much time?
Sometimes I change my functions to be way shorter and simpler just to debug the entire code more easily. Exactly like you did when you modified your fragment shader. I do at most probably between 5 and 10 simplifications. And when it's time to come back to normal, I just forget one of these simplifications and it generates a bug 400 new lines of code later. Then it takes me several days to understand that the problem comes from a wrong simplification I did earlier on purpose to debug my code on a previous debugging session...
"I don't know. Maybe I'm just doing errors only. Who knows. Maybe i'm a clown. I am a clown." This is the most relatable programming video ever lmaoooo
This just goes to show you. Whether you have 1 year or 10+ years experience programming.. It happens to all of us.. one little oversight, typo, you name it.. that will haunt us forever.
Man, let's appreciate for a second the legend that knew what was up with just a screenshot from the profiler. Game engine programmers are of a different breed!
@@The101Superman Also realizing that someone could accidentally place all their vertex data in system ram. Usually people coding with vulkan really emphasize gpu ram.
@@gileeeI know what you are saying... but you wouldn't be saying the same thing if you're engineering a CPU and you have to account for every single wire, connection, transistor, etc... then when all of that is properly connected and you think you are done, then it's a matter of designing the ISA and how both instructions and data are represented... from there it's a matter of writing your own assembler. For granted, there are many highly sophisticated tools today to help streamline that process, but imagine having to do that by hand without the aid of any modern computer, device or software! They may teach the basics of this in some colleges and universities, but there is just as much that they leave out! Also, I'm 100% self taught, 0 college education! I took the initiative to follow my own ambitions, desires and goals. I've always been intrigued with electronics.
Cherno: "Zooms in on CPU only" Me: "Starts laughing both hilarious and empathetic, feeling the pain and relief of not finding that one bug for a week, that someone else points out in a second"
Man. Computer programming is like blindly assembling machine. You never actually know how all of it working. In fact you might not even remember how you assembling each piece of code together.
@@gittawat6986 no... Just no... Programming is about knowing exactly how everything works. The problem is that schedules are often made assuming that you don't need to know everything, and those schedules are broken, because you do need to know how everything works otherwise you're not engineering you're gambling (that something will work as you want it to). When you fly on a modern fly-by-wire airliner you better hope that the programmers understand how everything works because that airliner can't stay in the air unless that software works perfectly. This is why every programmer needs to know assembly and why interpreters are a fundamentally broken concept (they are obfuscation that makes it inherently more difficult to understand how everything works).
Something eerily similar happened to me, only with my DirectX12 implementation. It was waaaay slower than the OpenGL. Like you I fired up the nsight profiler and saw PCI throughput being the bottleneck. Turned out I was still using upload heaps for my vertex buffers (instead of default heaps). I even had a TODO comment there saying I need to fix that. Oh well, learning happened.
Open GL holds your hand and makes many assumptions, "Give me your data and I'll try to draw it accordingly". Vulkan on the other hand is Explicit: "Tell me everything you want to do and how you want it done and I'll do it exactly that way!"
The LunarG Vulkan debug layers does in fact gave a bit of performance validation if you ask, quite handy, if fairly minimal right now. My first guess was that it was just hitting V-Sync 😄. Pretty obvious that it was buffer residency as soon as I saw that PCI bandwidth cheese block, but I wondered if you had somehow managed to allocate your render target images as CPU only!
I have so much respect for graphics programmers. I had to write some OpenGL stuff in uni and was so dramatically overwhelmed by the fundadmentals and terminology. You can spend so much time and energy on this topic, can't even begin to imagine how brutal vulkan must be. Also really cool content, thank you!
One time I set the RenderPass attachment store op to DONT_CARE and didn't even realize it. I was banging my head for like an hour wondering why my window was displaying garbage. Vulkan basically did my rendering, saw the enum and just dumped the result into undefined territory, because, after all, the programmer said to discard the output. This was before I had access to NSight. I wonder how fast I would've solved the mistake if I did have it.
Wasn't the GPU memory almost empty then? Noone noticed? It's pretty scary that so small difference can have such big change... i've heard recently that someone got order of magnitude performance increase by adding some noop instructions to his functions so they would have better layout in cache (hot code in 1 cacheline) xD programming is brutal sometimes
Well it's almost empty one way or another. Sponza geometry is like what, 8MB worth of buffers? Textures and render targets were in VRAM anyway, they're a good chunk bigger all together.
Honestly, this was pretty comforting because I was expecting a far more stranger vulkan behavior thing. This is certainly unfortunate and hard to debug but the reason for it being slow is very obvious
24:42 This is implementation defined. The GL_STREAM/STATIC/DYNAMIC_DRAW hints are exactly that - hints. From what I understand, these used to mean something, but driver vendors have more or less stopped caring about them, since misusing them was so common that it was better to just have the driver decide.
I think the new ARB_buffer_storage API (Core in OpenGL 4.3) has a more detailed impl of that for glBufferStorage(): instead of hints, it uses flags really similar to vulkan's: GL_CLIENT_STORAGE_BIT will make the buffer cpu side if possible, GL_MAP_PERSISTENT_BIT will make the buffer persistently mapped but likely to be cpu side too, GL_DYNAMIC_STORAGE_BIT will make the buffer able to copy from cpu data but still be gpu-side, and 0 will make the buffer fully invisible cpu-side, therefore fully on the GPU. Persistent mapping is often used for staging buffers btw, and you can use glCopyBufferSubData to copy the buffer'scontentd
Why? Whyyy? Whyyyyyyyyyyy?? Whyyyyyyyyyyyyyyyyy??! Whyyyyyyyyyyyyyyyyyyyyyyyyyyyyy??!!! WHYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY?????!!!!! A yes this lil thingy over here.
Insane story ahah, just a testiment to how useful these debugging and profiling tools are. Hope this is the kick for us to stop debugging by using printfs
I wish school/college would teach you to use debuggers effectively. Instead we get tested on ide with no debuggers and the only help you get is just pritnfs
Watching you trying to profile your vulkan renderer looking at disgusting CPU timings was really painful, I had to stop somewhere midway. I have an ImGUI CPU/GPU profiler renderer (look up LegitProfiler on github). Having a good profiler with CPU data alongside GPU and with history not only helps you clearly see your bottlenecks (obviously), you also see synchronization hiccups, frame delays (when a slow GPU frame stalls your CPU), "bad" frames when somethings stalls your app and you can make sure that waits happen where you expect them to happen. For example, a lot of older nvidia drivers used to wait on Presents (instead of waiting on fences, where they're supposed to).
Many moons ago I was using DirectX, and trying to get my game to recover from alt-tab or minimizing and restoring the game window. When the window is reactivated, I had to reload all the textures into the gpu from system memory, which meant I had to keep a copy in system memory. And I didn't have enough gpu ram to store all the textures, so I killed two birds with one stone by implementing a caching system with an lru list. Whenever I went to draw something, if my gpu buffer for a given texture was either not loaded or invalid, I'd reload it from it's system memory buffer, and keep track of how much total gpu ram was in use (for textures), and when it passed a threshold, I would repeatedly unload the gpu buffer for the least recently used texture (that was still loaded) until I (theoretically) had enough gpu ram to load the texture I was trying to load. Before I created that system I did suffer from a lot of artifacts and slowness and headaches. After I implemented it, things were much smoother and faster. Many, many moons ago.
Why did this video come on my dash today, if it was posted 2 years ago? I keep doing that. I keep necro-posting without realizing it. Argh. Oh well. (I guess it must be because I'm just now finally looking at Vulkan.)
As soon as I saw the PCIE graph, I screamed NOOOO and laughed. I knew exactly what was coming. You had a good intuition to what you might be doing, but were guessing the wrong target.
bro me too I just barely know what I am doing though I took a programming plus lab course in college so I know enough about c++ to make classes, files, loops, conditional expressions, arrays... I bought a book on game programming in console learned about enums and vectors and this shit is becoming so complicated but using a tutorial online for SDL I was able to make a window lol how far did you get?
3 года назад
@@zyrxom creating an SDL window is literally 2 lines.
@ I'm not that good at this right now you might might be able to do it in two lines but it took me a whole game class header(20 lines) and cpp(35). and 18 lines in my main.cpp but I'll get better at it.
3 года назад+1
@@zyrxom thats the spirit! don't give up and remember even cherno gets into the state where he can't do anything but sit and watch the screen. you'll get better keep going!
Thank you for sharing this and most important the investigation process/method its really helpful!, now I need to get a windows pc there are no tools like that for OSX
In Vulkan validation layers you can enable "PERF" level, which gives hints about various sub-optimal uses of Vulkan API. I am not sure if it would catch this one, but it is worth a try. There is VK_LAYER_LUNARG_assistant_layer , which is essentially designed for these purposes and should detect this issue. Also, there is nothing wrong that Vulkan allows you do to do that or other stupid things. Using CPU only memory for some stuff sometimes makes perfect sense actually. I am glad you found the issue and fixed it, and learned something new.
Here's a fun fact: say you have some HOST_COHERENT memory that you're reading from a shader. Say, you're storing there a vertex buffer of your big point cloud (50m+ points). You can get the total size of your vertex buffer, divide it by the time of your GPU pass using this vertex buffer and you'll get your PCI-e speed almost exactly (give or take 2%). Because your shader literally reads your RAM by streaming it over your PCI-e with its full bandwidth, introducing practically no extra latency. I don't know about you, but I find this practically magical how they(hardware guys) achieve that.
Thank you for uploading this video. It's very educational. I mean many people face problems like that and once it's over they breath a sigh of relief and move on. However, you took your time to tell us about it and that's really important.
I’am not a programmer. But do you have a common Z-axis nominator for all the stuff that’s placed onto the ground ? That could save a lot in the clock circles, just by have all that stuff optimized to be more logical. Like i said, i’am not a programmer.
Okay thats the best kind of videos i want to see you making/doing. Having a bug, extremely hard to track down and analyze it and actually solve it. That is the most valuable thing for me - especially when using modern graphics system - such as Vulkan. A as matter of fact, i started getting into Vulkan and even clearing the screen to a color does not work in all cases. On win32 it works perfectly, including re-creating of the swap chains - but on Linux X11 it renders but crashes on XDestroyDisplay(). Such bugs are so annoying, because they prevent you from continuing other things :-( Also i dont understand why the validation on Linux simply does not work (No instance extension detected), but on Win32 it works just fine - both have SDK installed and its the same system (Multiboot) O_o
Hi the cherno, thanks for this content of graphics programming, a little inspired by you, I've started to learn Vulkan and D3D12, and I found the VMA brother the D3D12MA, and now I've pulled request 2 features, all in cmake. I've followed the cmake path, instead of premake that your prefer, but i really thank you to be introduced in this content.
When was doing my first Vulkan triangle for some reason the triangle would not show on screen. I spent a month investigating, refactoring and changing everything to find what my problem was until I found it, there was a VK_TRUE where there should have been a VK_FALSE and that was it. Something like that flag you missed
For anybody reading this in future. Not only is it possible to store things like vertex buffer on system RAM. but it HAS to be stored on RAM before it is copied to the GPU, it's just how computers work. What you are really controlling in graphics API is hinting when and how you would like to pass around data like vertex buffers, and Cherno accidentily set it to basically recopy the vertex buffer every draw command. I basically took 1 look at that profile, saw the high Async Copy Engine and immediately realized it was excess, repeat data copying. I had my suspicions before that but that's only because CPUs and GPUs are so fast that literally 99% of performance problems is data copying from backing memory. Calculations are often basically free in comparison to IO. I call it a hint because the graphics API is just that, an API and the underlying OpenGL/Vulkan/D3D implimentation has an active program running in the background on CPU that governs the actual behaviour of the graphical context. and whilst you might request it to do some copying it'll just do it when it makes sense, so long as it conforms to the spec. Technically on some systems you can use DMA (Direct Memory Addressing) to write directly into the GPU buffer but this is objectively worse than having a well orchestrated graphics context manage the copy when it is sensible. Anyway.
great video welcome to Vulkan memory flags xD complexity explosion xDD then there goes depth-buffer flags... changing single flag you can have 100x faster performance to 100x slower
New movie, Vulkan detective story 😎, but the truth is that we all was there, something similar I have experienced with RegEx, after I have dropped RegEx on the most critical parts I have gained 50% performance in a few hours, this inspired me and next week I'm going to do some perf. tunning. 🤓🚀
Once, I spent about two days trying to fix my texture coordinates - turns out, I accidentally set the 2D texture coordinates as a 3D GLSL layout... Not as annoying as his mistake, but still hilariously dumb
I don't understand why you didn't just implement staging buffers on a transfer queue in the first place. They're in the Vulkan Tutorial. 30 minutes of my life I'll never get back
A bit late to the party, but I got hung up in the "#type" preprocessor directive in the shader, which doesn't appear in any documentation. Is this just an addition of your own, to let the engine split one file into respective shader types before compiling them?
I think, people could easily jump to conclusion that vulcan is worse. I spent the last week figuring out why my opengl went from working to a black screen. It was because a buffer became unbound during rendering, I had to spend so much time in renderDoc trying to figure this out. the moral is, new technologies are hard to figure out and it takes time always lol
I've had multiple of those small things in my Vulkan renderer, fortunately never this one, I followed a tutorial while implementing mine that used staging buffers. (: Anyways - What differences did you notice when switching to the Vulkan Allocator? Just lifting the number of allocations limit? Or was there any performance differences? Implementing that allocator has been on my to do list for some time now. (:
1:22 This is something that bothers me deeply and I _hate it._ If you're going to watch a coding video on your silly 4in phone *don't complain that you can't see crap!* I am watching it like a normal person on my PC and I can't see crap because the presenter has to blow up the text to idiotic proportions so you can see something on your puny Apple watch or whatever. *STOP IT!* Get off the damn toilet and watch this at your computer.
I feel you! I knew what was coming when you showed us that PCI throughput graph :) I love Vulkan and the control you get, but with great power comes great responsibility. Looking forward to your next video.
U do know that U could just lower the resolution in windows rather than developing a feature? xD I know many who screen records tutorials do that, like they lower res but uses the scaling so it's still sharp.
Yeah.. really painfull problem to solve, you just can't re-read your entire engine to spot the bug, some times only a person with a fresh view of the problem can spot the issue, this hapened to me tons of times.
Love this video. I'm always tell young developers "no matter how long you've been programming, you will make silly mistakes." Unit and regression tests are your friend. If you don't have good test coverage, simple typos will bite your butt.
Am your biggest fan!!!!!!!!!!!!! You are pretty amazing am a pro now am done with your C++ series. Am starting your game engine series. My plan is try to learned C++, Java, Linux, and HTMLs. My brother in law mention C++ is the first langue to learned its is simple. Hes right. Am pro!!!!!!!!!!!!!!!!!!!!!!!!!!!! Thank you for all you do me. I wish you good luck in luck in life and programing.
MUAHAHAHAHAHA, you just really showed how I was watching on slow Metal API rendering. We rewrote OpenGL -> Metal API for iOS part and for some reason the app was somehow managing 60fps on iPhone X and higher (with no multisampling) yet ~20 fps with 4 samples. And in the end the problem was that we were doing STORE on each render pass end + LOAD on the next render pass start. The program could have about 6-10 of those. And yeah, simply removing that reduced the memory bandwidth a lot and made it nearly 40-50fps. So it was rly saving the framebuffer data to the CPU and then loading it back to GPU kek
Hope you all enjoyed the _journey_ - I for one am definitely glad it's over. What's the worst dumb mistake you've ever made which cost you too much time?
Not learning to code at a young age..
forgetting a semicolon
Abandon the old project and create a new project which is the same as the old one
;)
Sometimes I change my functions to be way shorter and simpler just to debug the entire code more easily. Exactly like you did when you modified your fragment shader. I do at most probably between 5 and 10 simplifications. And when it's time to come back to normal, I just forget one of these simplifications and it generates a bug 400 new lines of code later. Then it takes me several days to understand that the problem comes from a wrong simplification I did earlier on purpose to debug my code on a previous debugging session...
"I don't know. Maybe I'm just doing errors only. Who knows. Maybe i'm a clown. I am a clown." This is the most relatable programming video ever lmaoooo
Also "maybe the comment is the slow part" at 13:36" lmao. That's the kinda stuff I start thinking in frustration after debugging for hours
lolololololol
This just goes to show you.
Whether you have 1 year or 10+ years experience programming..
It happens to all of us.. one little oversight, typo, you name it.. that will haunt us forever.
Debugging is like 80% to 90% of a programmer's job. You type a line of code and you forget the semicolon at the end, that's already a bug to fix.
You can use "design by contract" to prevent messing up subtle invariants....
Experience actually will never prevent you from making mistakes. there are other methods that prevent you from making them.
Man, let's appreciate for a second the legend that knew what was up with just a screenshot from the profiler. Game engine programmers are of a different breed!
tbf as he mentioned himself seeing pcie usage at near max should have hinted it abundant enough
@@The101Superman Also realizing that someone could accidentally place all their vertex data in system ram. Usually people coding with vulkan really emphasize gpu ram.
Try Verilog with designing the layout of a CPU's internals via it's logic gates, data and address paths...
@@skilz8098 That's college level stuff. This is basically working with a black box running tens of millions of lines of code, with a billion gotchas.
@@gileeeI know what you are saying... but you wouldn't be saying the same thing if you're engineering a CPU and you have to account for every single wire, connection, transistor, etc... then when all of that is properly connected and you think you are done, then it's a matter of designing the ISA and how both instructions and data are represented... from there it's a matter of writing your own assembler. For granted, there are many highly sophisticated tools today to help streamline that process, but imagine having to do that by hand without the aid of any modern computer, device or software! They may teach the basics of this in some colleges and universities, but there is just as much that they leave out! Also, I'm 100% self taught, 0 college education! I took the initiative to follow my own ambitions, desires and goals. I've always been intrigued with electronics.
Cherno: "Zooms in on CPU only"
Me: "Starts laughing both hilarious and empathetic, feeling the pain and relief of not finding that one bug for a week, that someone else points out in a second"
one day I've accidentally used an exit current thread call instead of killing a specific thread. Took me about half a day to figure it out
Haha I thought it was so amazing how his friend pointed it out immediately only from looking at the NSight screenshot
Man. Computer programming is like blindly assembling machine. You never actually know how all of it working. In fact you might not even remember how you assembling each piece of code together.
@@gittawat6986 no... Just no... Programming is about knowing exactly how everything works. The problem is that schedules are often made assuming that you don't need to know everything, and those schedules are broken, because you do need to know how everything works otherwise you're not engineering you're gambling (that something will work as you want it to).
When you fly on a modern fly-by-wire airliner you better hope that the programmers understand how everything works because that airliner can't stay in the air unless that software works perfectly.
This is why every programmer needs to know assembly and why interpreters are a fundamentally broken concept (they are obfuscation that makes it inherently more difficult to understand how everything works).
Yeah this one is hilarius since it is such an obvious thing to run into when writing a vulcan renderer as well
The emotional rollercoaster of this video is a great all-around explanation of how programming is :D
**spends 3 days on fixing a bug**
"ALL I DID WRONG WAS TYPE "C" INSTEAD OF "G"!!!"
Lol
Something eerily similar happened to me, only with my DirectX12 implementation. It was waaaay slower than the OpenGL. Like you I fired up the nsight profiler and saw PCI throughput being the bottleneck. Turned out I was still using upload heaps for my vertex buffers (instead of default heaps). I even had a TODO comment there saying I need to fix that. Oh well, learning happened.
Oh that was a fun journey... Here's to all the friends that solve our issues with a quick glance and a fresh mindset!
You are have mastered the art of thumbnails. This is peek.
Faxxx
oh hey, I know you!
@@nahu4870 you know who?
Hello, Nolram 👋🏾
@@_lapys Uhm hello ? Where does everyone know me from lol ?
Open GL holds your hand and makes many assumptions, "Give me your data and I'll try to draw it accordingly".
Vulkan on the other hand is Explicit: "Tell me everything you want to do and how you want it done and I'll do it exactly that way!"
Skia: j-just tell me what to draw, but it has to be 2D
The LunarG Vulkan debug layers does in fact gave a bit of performance validation if you ask, quite handy, if fairly minimal right now.
My first guess was that it was just hitting V-Sync 😄. Pretty obvious that it was buffer residency as soon as I saw that PCI bandwidth cheese block, but I wondered if you had somehow managed to allocate your render target images as CPU only!
I love how niche of a video this is and the community that has rallied behind this guy to succeed. Never leave a nerd behind.
One of Vulkan's validation layers is for best practices. If this isn't on there, you might want to request its addition.
*stares into the screen for a week.
*changes one letter
*speed go brr
that's scary, man!!
I would LOVE to see a Vulkan Tutorial from You
So happy to see some Vulkan content on this channel! Keep up the good work!
I have so much respect for graphics programmers. I had to write some OpenGL stuff in uni and was so dramatically overwhelmed by the fundadmentals and terminology. You can spend so much time and energy on this topic, can't even begin to imagine how brutal vulkan must be. Also really cool content, thank you!
One time I set the RenderPass attachment store op to DONT_CARE and didn't even realize it. I was banging my head for like an hour wondering why my window was displaying garbage. Vulkan basically did my rendering, saw the enum and just dumped the result into undefined territory, because, after all, the programmer said to discard the output.
This was before I had access to NSight. I wonder how fast I would've solved the mistake if I did have it.
Wasn't the GPU memory almost empty then? Noone noticed?
It's pretty scary that so small difference can have such big change... i've heard recently that someone got order of magnitude performance increase by adding some noop instructions to his functions so they would have better layout in cache (hot code in 1 cacheline) xD programming is brutal sometimes
Well it's almost empty one way or another. Sponza geometry is like what, 8MB worth of buffers? Textures and render targets were in VRAM anyway, they're a good chunk bigger all together.
Honestly, this was pretty comforting because I was expecting a far more stranger vulkan behavior thing. This is certainly unfortunate and hard to debug but the reason for it being slow is very obvious
This is really good content. Love the vulkan debugging stuff. Don't see much of this on RUclips.
As soon as I saw that PCI throughput, I realized it was system RAM, but I didn't realize you could manually do that.
Vulcan is like C. It gives you a lot of ammunition to shoot yourself in the foot. But you can also shoot things that OpenGL can't :D
24:42 This is implementation defined. The GL_STREAM/STATIC/DYNAMIC_DRAW hints are exactly that - hints. From what I understand, these used to mean something, but driver vendors have more or less stopped caring about them, since misusing them was so common that it was better to just have the driver decide.
I think the new ARB_buffer_storage API (Core in OpenGL 4.3) has a more detailed impl of that for glBufferStorage(): instead of hints, it uses flags really similar to vulkan's: GL_CLIENT_STORAGE_BIT will make the buffer cpu side if possible, GL_MAP_PERSISTENT_BIT will make the buffer persistently mapped but likely to be cpu side too, GL_DYNAMIC_STORAGE_BIT will make the buffer able to copy from cpu data but still be gpu-side, and 0 will make the buffer fully invisible cpu-side, therefore fully on the GPU. Persistent mapping is often used for staging buffers btw, and you can use glCopyBufferSubData to copy the buffer'scontentd
Am i watching a man slowly descending into madness while trying to learn vulkan?
What GUI library do you use for Hazel?
Dear imgui
dear imgui
ImGui i think
Why? Whyyy? Whyyyyyyyyyyy?? Whyyyyyyyyyyyyyyyyy??! Whyyyyyyyyyyyyyyyyyyyyyyyyyyyyy??!!! WHYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY?????!!!!! A yes this lil thingy over here.
Insane story ahah, just a testiment to how useful these debugging and profiling tools are. Hope this is the kick for us to stop debugging by using printfs
haha 😎👍
Fuck that's me
I wish school/college would teach you to use debuggers effectively. Instead we get tested on ide with no debuggers and the only help you get is just pritnfs
@@prateekkarn9277 sometimes even stupid things like writing code on paper in an exam
You're so lucky to have someone you can ask for help. Imagine looking for this error all on your own. :(
I've been trying to make the basic "Hello triangle" app in Vulkan and _"sounds like your vertex buffers are stored in RAM."..._
I felt that.
Watching you trying to profile your vulkan renderer looking at disgusting CPU timings was really painful, I had to stop somewhere midway. I have an ImGUI CPU/GPU profiler renderer (look up LegitProfiler on github). Having a good profiler with CPU data alongside GPU and with history not only helps you clearly see your bottlenecks (obviously), you also see synchronization hiccups, frame delays (when a slow GPU frame stalls your CPU), "bad" frames when somethings stalls your app and you can make sure that waits happen where you expect them to happen. For example, a lot of older nvidia drivers used to wait on Presents (instead of waiting on fences, where they're supposed to).
The thumbnails! =)
omg , that is the life of a damn dev 😭😭 , one small things make life sucks
I love how dramatic these thumbnails have gotten.
Many moons ago I was using DirectX, and trying to get my game to recover from alt-tab or minimizing and restoring the game window. When the window is reactivated, I had to reload all the textures into the gpu from system memory, which meant I had to keep a copy in system memory. And I didn't have enough gpu ram to store all the textures, so I killed two birds with one stone by implementing a caching system with an lru list. Whenever I went to draw something, if my gpu buffer for a given texture was either not loaded or invalid, I'd reload it from it's system memory buffer, and keep track of how much total gpu ram was in use (for textures), and when it passed a threshold, I would repeatedly unload the gpu buffer for the least recently used texture (that was still loaded) until I (theoretically) had enough gpu ram to load the texture I was trying to load. Before I created that system I did suffer from a lot of artifacts and slowness and headaches. After I implemented it, things were much smoother and faster. Many, many moons ago.
Why did this video come on my dash today, if it was posted 2 years ago? I keep doing that. I keep necro-posting without realizing it. Argh. Oh well. (I guess it must be because I'm just now finally looking at Vulkan.)
As soon as I saw the PCIE graph, I screamed NOOOO and laughed. I knew exactly what was coming.
You had a good intuition to what you might be doing, but were guessing the wrong target.
I’m building a 3D game engine inspired partially by you!!!
bro me too I just barely know what I am doing though I took a programming plus lab course in college so I know enough about c++ to make classes, files, loops, conditional expressions, arrays... I bought a book on game programming in console learned about enums and vectors and this shit is becoming so complicated but using a tutorial online for SDL I was able to make a window lol how far did you get?
@@zyrxom creating an SDL window is literally 2 lines.
@ I'm not that good at this right now you might might be able to do it in two lines but it took me a whole game class header(20 lines) and cpp(35). and 18 lines in my main.cpp but I'll get better at it.
@@zyrxom thats the spirit! don't give up and remember even cherno gets into the state where he can't do anything but sit and watch the screen. you'll get better keep going!
honestly, this is some quality tv for me :D glad you did it!! keep it up
Great video. Vulkan is pretty interesting for me. And I think most of programming issues is a tiny stupid bug, even outside of game engine dev.
Thank you for sharing this and most important the investigation process/method its really helpful!, now I need to get a windows pc there are no tools like that for OSX
I think it would be pretty funny if he went about all of this debugging and discovered that he was some how using his CPU integrated graphics
What kind of keyboard do you use? I like the sound of it
This was actually so useful to see. Thanks for sharing!
In Vulkan validation layers you can enable "PERF" level, which gives hints about various sub-optimal uses of Vulkan API. I am not sure if it would catch this one, but it is worth a try.
There is VK_LAYER_LUNARG_assistant_layer
, which is essentially designed for these purposes and should detect this issue.
Also, there is nothing wrong that Vulkan allows you do to do that or other stupid things. Using CPU only memory for some stuff sometimes makes perfect sense actually.
I am glad you found the issue and fixed it, and learned something new.
Vulkan is hell and heaven at the same time
Hell to make, but heaven to the user
CEMU Open GL vs CEMU Vulkan is a huge difference
Here's a fun fact: say you have some HOST_COHERENT memory that you're reading from a shader. Say, you're storing there a vertex buffer of your big point cloud (50m+ points). You can get the total size of your vertex buffer, divide it by the time of your GPU pass using this vertex buffer and you'll get your PCI-e speed almost exactly (give or take 2%). Because your shader literally reads your RAM by streaming it over your PCI-e with its full bandwidth, introducing practically no extra latency. I don't know about you, but I find this practically magical how they(hardware guys) achieve that.
Thank you for uploading this video. It's very educational. I mean many people face problems like that and once it's over they breath a sigh of relief and move on. However, you took your time to tell us about it and that's really important.
I’am not a programmer. But do you have a common Z-axis nominator for all the stuff that’s placed onto the ground ? That could save a lot in the clock circles, just by have all that stuff optimized to be more logical. Like i said, i’am not a programmer.
Would love more storytime
Okay thats the best kind of videos i want to see you making/doing. Having a bug, extremely hard to track down and analyze it and actually solve it. That is the most valuable thing for me - especially when using modern graphics system - such as Vulkan.
A as matter of fact, i started getting into Vulkan and even clearing the screen to a color does not work in all cases. On win32 it works perfectly, including re-creating of the swap chains - but on Linux X11 it renders but crashes on XDestroyDisplay(). Such bugs are so annoying, because they prevent you from continuing other things :-(
Also i dont understand why the validation on Linux simply does not work (No instance extension detected), but on Win32 it works just fine - both have SDK installed and its the same system (Multiboot) O_o
Hi the cherno, thanks for this content of graphics programming, a little inspired by you, I've started to learn Vulkan and D3D12, and I found the VMA brother the D3D12MA, and now I've pulled request 2 features, all in cmake. I've followed the cmake path, instead of premake that your prefer, but i really thank you to be introduced in this content.
It can be a programming joke but its an anecdote that your program was slow only for one letter
When was doing my first Vulkan triangle for some reason the triangle would not show on screen. I spent a month investigating, refactoring and changing everything to find what my problem was until I found it, there was a VK_TRUE where there should have been a VK_FALSE and that was it. Something like that flag you missed
Was it the rasterizerDiscardEnable in the pipeline rasterization state structure by any chance? I spent a couple days on that stupid thing.
@@zoltankurti probably
For anybody reading this in future.
Not only is it possible to store things like vertex buffer on system RAM. but it HAS to be stored on RAM before it is copied to the GPU, it's just how computers work.
What you are really controlling in graphics API is hinting when and how you would like to pass around data like vertex buffers, and Cherno accidentily set it to basically recopy the vertex buffer every draw command. I basically took 1 look at that profile, saw the high Async Copy Engine and immediately realized it was excess, repeat data copying. I had my suspicions before that but that's only because CPUs and GPUs are so fast that literally 99% of performance problems is data copying from backing memory. Calculations are often basically free in comparison to IO.
I call it a hint because the graphics API is just that, an API and the underlying OpenGL/Vulkan/D3D implimentation has an active program running in the background on CPU that governs the actual behaviour of the graphical context. and whilst you might request it to do some copying it'll just do it when it makes sense, so long as it conforms to the spec. Technically on some systems you can use DMA (Direct Memory Addressing) to write directly into the GPU buffer but this is objectively worse than having a well orchestrated graphics context manage the copy when it is sensible. Anyway.
great video
welcome to Vulkan memory flags xD
complexity explosion xDD
then there goes depth-buffer flags... changing single flag you can have 100x faster performance to 100x slower
Vulkan is clearly toying with you at this point. My condolences to you, good sir.
This is totally the best video I have seen on youtube so far! I loved every part of it :D Absolutely amazing story!!!
that's why console development is actually easier , shared memory baby. no pushing stuff between CPU and GPU memory .. making games is NOT easy
To answer the question at the end, yes, videos like this are helpful :")
New movie, Vulkan detective story 😎, but the truth is that we all was there, something similar I have experienced with RegEx, after I have dropped RegEx on the most critical parts I have gained 50% performance in a few hours, this inspired me and next week I'm going to do some perf. tunning. 🤓🚀
You are right, that should have come up as a warning message.......
Like "Are you sure you really, really, really wanna do this?" Haha
Once, I spent about two days trying to fix my texture coordinates - turns out, I accidentally set the 2D texture coordinates as a 3D GLSL layout...
Not as annoying as his mistake, but still hilariously dumb
I don't understand why you didn't just implement staging buffers on a transfer queue in the first place. They're in the Vulkan Tutorial. 30 minutes of my life I'll never get back
A bit late to the party, but I got hung up in the "#type" preprocessor directive in the shader, which doesn't appear in any documentation. Is this just an addition of your own, to let the engine split one file into respective shader types before compiling them?
I think, people could easily jump to conclusion that vulcan is worse.
I spent the last week figuring out why my opengl went from working to a black screen.
It was because a buffer became unbound during rendering, I had to spend so much time in renderDoc trying to figure this out.
the moral is, new technologies are hard to figure out and it takes time always lol
I just spent 15 mins (2x speed) for cherno to change a flag
…
It was still nice to learn about nSlice though. Thanks for the video
I've had multiple of those small things in my Vulkan renderer, fortunately never this one, I followed a tutorial while implementing mine that used staging buffers. (: Anyways - What differences did you notice when switching to the Vulkan Allocator? Just lifting the number of allocations limit? Or was there any performance differences? Implementing that allocator has been on my to do list for some time now. (:
1:22 This is something that bothers me deeply and I _hate it._ If you're going to watch a coding video on your silly 4in phone *don't complain that you can't see crap!* I am watching it like a normal person on my PC and I can't see crap because the presenter has to blow up the text to idiotic proportions so you can see something on your puny Apple watch or whatever. *STOP IT!* Get off the damn toilet and watch this at your computer.
Why do we use copy paste?
You know why? Because if we didn't we'd never learn how to use profiling tools.
LoL!
Let me ask you something!
Why is it even rendering when you are not moving the camera???
It's just recreating the same picture over and over again.
I feel you! I knew what was coming when you showed us that PCI throughput graph :) I love Vulkan and the control you get, but with great power comes great responsibility. Looking forward to your next video.
U do know that U could just lower the resolution in windows rather than developing a feature? xD I know many who screen records tutorials do that, like they lower res but uses the scaling so it's still sharp.
Yeah.. really painfull problem to solve, you just can't re-read your entire engine to spot the bug, some times only a person with a fresh view of the problem can spot the issue, this hapened to me tons of times.
As soon as I saw your GPU trace with PCI 95% I was, his vertex buffers are stored on the CPU :D
how to draw a cube in opengl and control its faces like how minecraft cube rendering works in minecraft?
please help
I wish it would’ve helped me with my problem but sadly I am on OpenGL only :(
Don't worry, you can accidentally do scary stupid slow things in OpenGL too.
Wow a cool friend, he right off the back knows what is wrong with your homemade engine, I need a friend like that
Just today also spent a lot of time to find out that the problem was also in some enum in the CreateWindowExW function
When the video doesnt immediately start with "hey what's up guys my name is the cherno" you know something is up
My code doesn't work; I'm a worthless clown looking at text editor whole day.
Lesson learned, having a friend who works in EA is very helpful XDD
Love this video. I'm always tell young developers "no matter how long you've been programming, you will make silly mistakes." Unit and regression tests are your friend. If you don't have good test coverage, simple typos will bite your butt.
I'm also a clown, and they keep paying me. No worries.
Looks like C++. Can anyone confirm? (I'm a newbie)
Can you look at the new screenshots of Halo Infinite and analyze them pls
is lighting an entity and do you have a rendering component for each model ?
meanwhile me getting 50ms frame time with 2k triangles in openGl
What a legend
Please make a video for internal hacks on assault cube
Hey cherrno. Which C++ advanced book do you recommend ?
Ok, now vulkan looks scarier than before...
I laugh, but I too shall experience this pain.
Did you say "dev log monday" or "deadlock monday" at the end?
thanks for the large font
24:43 it is actually possible to allocate an OpenGL buffer in CPU memory with glBufferStorage and GL_CLIENT_STORAGE_BIT
Well thanks to you now this Vulkan buddies will improve I guess xDDDD idk...
Why "A" cherno ?
Please explain
Am your biggest fan!!!!!!!!!!!!! You are pretty amazing am a pro now am done with your C++ series. Am starting your game engine series. My plan is try to learned C++, Java, Linux, and HTMLs. My brother in law mention C++ is the first langue to learned its is simple. Hes right. Am pro!!!!!!!!!!!!!!!!!!!!!!!!!!!! Thank you for all you do me. I wish you good luck in luck in life and programing.
MUAHAHAHAHAHA, you just really showed how I was watching on slow Metal API rendering. We rewrote OpenGL -> Metal API for iOS part and for some reason the app was somehow managing 60fps on iPhone X and higher (with no multisampling) yet ~20 fps with 4 samples. And in the end the problem was that we were doing STORE on each render pass end + LOAD on the next render pass start. The program could have about 6-10 of those. And yeah, simply removing that reduced the memory bandwidth a lot and made it nearly 40-50fps. So it was rly saving the framebuffer data to the CPU and then loading it back to GPU kek