I actually wanted to listen to your optimization explanation cuz optimizing is something I struggle with immensely but your cat clips were too distracting :(
Graphics Programmers: My program renders an image for 3 days, but I was able to shave off 4.3 hours, I'm so good at it Real-Time Graphics Programmers: I just wasted quarter of a millisecond, it was reserved for pathfinding calculations, my managers is going to kill me
In art class, I opened my photos in notepad++ and wrote some stories. My teacher was shockingly confused how I'd intentionally altered a pic to make it "glitched." They're just numbers in an order, and I changed it.
I did that too, way to go you can really get a variety of effects depending on type of file, bmp, png, jpeg, propably a bunch more interesting lossy and lossless compression algorithms, but jpg is really good if you want a lossy compression artifacts, and for lossless compression you can introduce major glitches into a png file, also you can mess with pure non compressed bitmap if you want more precise glitch manipulation, try them all out and remember that there are different types of .bmp to choose from
I would like to point out that the effect would be great for gameplay if it were only applied to certain objects in the game, via a two-color buffer that shows which areas are allowed to be affected by the effect before the contrast map is applied to those areas, allowing the effect to be limited to certain portions of the image. the uses would be awesome. you could have a character's skin glitching out, but their clothes stay stable, you could have a sword that leaves behind a glitchy trail in the air, and so on and so forth.
A game mechanic that allows you to interact with "glitched" objects in a specific way would be pretty damn cool, it'd definitely be an eye-catching indicator.
I would love to see a game made with this filter. something like psychological or surreal horror. I think this filter paired with good sound design and a decent art style would make for a really mind bending and fun game
I'm not gonna retain any of this information, but I always feel like acerola got to learn so much in making each video! This was really interesting to watch!
just yesterday Andreas Kling in his 1000th video talked about consumption patterns and how he found it hard to retain information from watching tech videos himself. he mentioned trying taking notes about the videos and how that led him to being more selective in what to watch. not sure I'd ever have the discipline but found it interesting anyway.
i have never and likely will never need to create graphics shaders, yet these videos are so endlessly entertaining and informative that i can't stop watching
Actually one of the most beginner friendly descriptions of Compute Shaders i have ever heard. You should do more with them. I had a hard time when learning Compute Shader Concepts the first time.
If the parallel bitonic merge sort was fast enough you can also use it in the span sorting case: Do a first pass to assign to each pixel the index of its corresponding span, and then sort the entire column lexicographically first by span index. Lexicographical sorting can be accomplished by putting span index into some bits higher than the highest data bits of the sort keys.
I think there IS an improvement on the shader code to be made: If your sortvalue buffer is of a known data type (e.g. uint8), I think you can use a radix sort - which should be a lot faster than your current alg.
@@Acerola_t radix sort is like O(3n) space and O(7n) time, we won't even need to transfer the buffer to group memory as each value in the buffer is accessed only once Its only problem is that its slower than the current sorting alg for small spans.
@@Sloimay why not both then. The control mask already has the span lengths! Each group can decide it's preferred algo (AFAIK doing this means that you should not even try to do more than one thread per group, GPUs are SIMT machines, right?)
@@IgnacioLosiggio imo probably not worth the effort. As long as the short spans don't take longer than the long spans, it doesn't help to optimize them.
There exist efficient `stable_sort_by_key` algorithms for the GPU. The solution here would be to sort the original pixels using a key. We can calculate it by first running a prefix sum over the mask, giving us 'spans' each filled with a unique index. Multiply the prefix_sum by the maximum value that will be used in the sort (ensures a pixel can't 'escape' it's span). Then, calculate the sort key using (original value * mask + prefix_sum). All the masked-out values will have the exact same key within a span. All the masked in values will retain the same local delta within a span. Each span's maximum value is guaranteed to be less than the following spans minimum value. Using a stable sort ensures that although masked out values have the same key value, their order doesn't change.
As an engineer with over a decade of experience in a completely different part of the field (distributed systems), graphics and shader programming has always felt like magic to me. Very cool to see it broken down like this.
I'm a video editor who loves glitchy effects and learning about how pixel sorting actually works has been very entertaining. I love how the post-processing workflow is so similar to doing VFX and colour correction too lol
You would get an effect that looks very similar if you simply set a hard number of samples per span. Because they're sorted they tend to look like gradients, so even a simple min and max of a span would look very similar.
I'm pretty new to shader programming and have no idea of 90% of the shader optimizations you were talking about, but just so I didn't feel dumb, I covered the cat videos with my hands in order to avoid getting distracted! Also, instead of having groups that have already finished a thread wait for the others to finish, can't you divide the spans in 2s (two spans a thread) so you can roughly get them 2 spans done by one group at the same time as others? It may sound weird because I'm probably mixing up some of these terms
That would make the worst case be twice the amount of time that it is now(bc you have no way to prevent a group of 2 spans to not be 2 worst-case ones). Right now we have the same amount of active groups than there are spans in the image. Reducing the number of groups is not a thing that's going to speed the whole thing up bc GPUs can easily do thousands of things in parallel. On a CPU implementation, this would be a better approach, bc CPUs can't do that many threads in parallel, so there'd be a queue of spans to still be processed.
@@dotdotmod no, bc best case scenario of the original is still the time of one group processing a single small span. In your case, the best case is one group processing two small spans. Still double the time
The trouble is knowing you need to do that, because it's the CPU that starts threads, but the GPU that knows how many threads are needed. It is doable, I'm pretty sure, but I think a better approach would be to try to figure out how to apply quicksort, O(n log n) is a lot better than O(n²). I think you could maybe pull it off by coloring the entire span in the thread mask with a span id instead of the start index having the length? Damn, now I want to try and figure this out.
Thank you for providing a full, in-depth explanation of how everything works. Graphics programming is a niche field and basic theory is hard to come by when searching the internet.
I think they used an effect like this in cyberpunk. It would actually be useful for small segments of play in a sci-fi game where the matrix is glitching.
FINALLY THERE'S A VIDEO ON THIS!!!!!! LET'S GOOOOOOOO!!!!!!!!! this effect is such a banger. they should implement it into every competitive game ever made.
you may be able to speed up the algorithm by using the parallel prefix sums algorithm to calculate the span of each row. If you have enough compute cores, it can drop the time for creating this mask from O(n) to O(logn). Also, once you build the span mask using prefix sums, there are some handy parallel sorting algorithms that should let you divide up the sorting of each sub section without having to ask the CPU for guidance. If you want I can see if I can find my psudocode for this (im pretty sure we solved basically this problem in my parallel algorithms class), but I havent worked much with shaders, so your on your own translating it
@ianI think cs.wmich.edu/gupta/teaching/cs5260/5260Sp15web/lectureNotes/thm14%20-%20parallel%20prefix%20from%20Ottman.pdf covers prefix sums well and maybe www.dcc.fc.up.pt/~ricroc/aulas/1516/cp/apontamentos/slides_sorting.pdf for sorting?
I agree. I also thought it was interesting that a woman was an early pioneer of sorting algorithms. The world was a lot more sexist in the 50s, so it's pretty impressive that she made the contributions that she did.
@@strongmungus Except that it really isn't that surprising when you know women ended up working in the early programming profession a lot, as back then it was kind of an extension of a data entry job or similar. Then it transitioned into an engineering job, which in turn was more popular among men.
I absolutely love the video and the idea. The one thing I have to ask for the next one is a short explanation at the start with example images. I wasn't sure what you were going for with the pixel sorting, since I thought you would do so over the entire image, thus rendering it entirely unrecognizable. It wasn't until 4:42 that I finally understood what the goal was.
Came here to say that pixel sorting is something academic research has sort of looked at, it's just generalized into 2D grid sorting for visualizations (and def not real time lol). Pixel sorting is actually the use case I'm currently testing for my research into these methods 😅
I think they may have used pixel sorting shaders to do some of the effects in Cyberpunk 2077. The effect at 16:15 looks really similar to the effect when you're on the Net or while viewing the edges of braindances
Idea for making it possible to parallelize the pixel sort algorithm (keep in mind I have no idea what I'm talking about): Instead of generating a texture with the start of each span encoded by the position of pixels, generate a texture where the value of each pixel in a column represents what segment of the column it's pixel is in. Say there are two spans in a particular column, then the values of pixels in the texture from top to bottom would be a block of pixels with value 1, then for the extent of the first span, pixels would have a value of 2, then a value of 3 between the spans, 4 within the bounds of the second span, and 5 to the end. Also, if the first pixel is part of a span, then the first block of pixels should be even, so it can start at 0, to maintain the relationship that even numbers represent pixels within a span, and odd numbers represent pixels outside a span. Then during the sort phase, instead of using a thread for every pixel, use a group of n/2 threads for every column of pixels, where n is the height of the columns. Then sort the entire column using the parallel bitonic merge sort algorithm, except make sure to first multiply the pixels sort value by 255 times the value from the corresponding texture location, or use a second comparison between the two texture values. Either way, the sorting algorithm will sort the entire column, and the increasing index will prevent mixing of spans and inter-span regions by making each region's values all greater than the previous region's values, and all smaller than the next region's values, or else achieve the same by some other logic. Then, either within the sort logic, or on a separate pass, take the original (unsorted, or just don't swap in the sort algorithm if both tex values are odd) value for pixels with an odd numbered texture value. So, in the end, the CPU can dispatch the same number of thread to every column of the image regardless of the number of spans in each column, and you can do the parallel sort algorithm instead of single threading them.
For someone who has no idea what they're talking about, this is pretty spot-on to what I was going to suggest as a graphics programmer. You would probably keep the ranges that don't want sorting by simply modifying the swap logic, so that either even or odd spans (depending on code design and/or artistic choice) evaluate as their pixel position instead of value (easiest way to preserve position). This would be preferable to a second access since you're already loading the texture and writing anyway, so copying the original values in would be less efficient by adding more accesses.
is parallel bitonic an in-place sort algorithm? If so, i'd not worry about spans per-se but just play around with functions you AND with the comparison logic to see what happens. Maybe only swap two values if their luminance has the same first 3 bits or if their original positions are within n pixels of each other.
Your videos are amazing. You’re good enough at coding to talk about it in a easy to understand way, you structure your videos in easy to digest and entertaining way, and your editing complements it all perfectly.
If you think the techniques used in this video make you uncomfortable, check out per-pixel linked lists! The shaders for those look like they were written by somebody who doesn't quite understand shaders yet. Including 'for' loops over every "fragment"
Never did graphics programming, but what about the following solution? Calculate for each pixel whether it is inside a span and if so, what span number. Then use the bitonic sort with the following comparator: If pixels are both in a span and have the same span number then compare them by value. Otherwise, compare them by index.
That was my thought as well! We could even combine the process of computing the simplified sorting value into span marking to avoid excess data writing.
I really dig this aesthetic that you created from this. It's really a awesome vibe. The amount of creativity that could be utilized by this is, is huge. Especially with Analog Horror, and other Avant guard type art styles.
I'm sorry, but this kind of algorithmic manipulation of discrete image pixels is exactly the opposite of analog horror. It literally could not be more digital. Like I know this is a nitpick, but not everything glitchy is analog horror!
genious, i used to work a lot with pixel sortring processing scripts back in the day but this is kind of a game changer, to run it on the GPU in realtime is quite a challenge you overcame, love the content and thanks for sharing the code!
I crack up completely from these videos. Love how you manage to keep it entertaining while still being so informative. Considering you set a cap on the span length anyway, would it make sense to make an indirect dispatch with one threadgroup per span to bitonic sort them instead of just running one thread per span?
In the absolute worst case yes since you know the whole image is being sorted but like you could just do a full bitonic merge sort if you want to sort the whole image where as I wanted to only sort the spans in the mask, which the cpu can't optimally dispatch groups for.
I may have missed the point, but my idea wasn’t that you would dispatch the groups from the cpu. You would do an indirect dispatch based on what you get from the pass that generates the span-mask you had. Instead of writing that mask, you could count the number of spans (using some atomic counter) and store that in an indirect-buffer. For each unique span you also store the start pixel and the number of pixels that span covers. Then you can just dispatch the number of groups (one per span) indirectly using the indirect buffer. So the cpu does not need to know anything about spans, and that data is kept on the GPU :) Are there reasons that would not work?
I was looking for the comments if anyone had mentioned this. IndirectDispatch would work, you can even bucket a few fixed span lengths and dispatch shader variations. 1 threadgroup per span so you could do the bitonic sort in parallel. The problem is not dissimilar to tiled classification deferred shading . First video I've seen from this channel, but lots of fun :)
I was so damn happy watching this, I never thought I'd see an in depth exploration of my niche interest like this! I do Glitch Photography and I'm so passionate; I almost cried when you mentioned Kim Asendorf!
Incorporating this into a game itself instead of as a filter would be interesting. Imagine if you're playing as a character who's had a memory-altering implant for something like schizophrenia, and slowly start to see phantoms appear from a malfunction of the memory device. The character could slowly begin to realize the device was hiding the true reality all along, and fight to discern what the world is really like beyond the deep-fried digital veil. There could be quests exploring previously invisible areas, and a neverending struggle to balance the mental illness and the glitchy device.
Great video! I think I know of an actual efficient solution: 0. You create your mask as you do; 1. You allocate a buffer of integers with elements for each pixel and assign 1 where there's a transition between neighboring segments; (1 operation per pixel; single dispatch) 2. For each row, you create a segment tree of sums from the buffer above; (Roughly log(row/column) size dispatches; can be reduced with shared memory) 3. For each pixel, you evaluate the segment tree from row/column start to the pixel index in that axis and store it in a buffer; this will give you "segment index" for each pixel in your desired direction (O(log(n)) operations per pixel; single dispatch) 4. Do the parallel bitonic merge sort per row/column as usual, but replace 'less' operator with (a.groupId < b.groupId || (a.groupId == b.groupId && a.value < b.value)); (Exact same O(N * log^2(N)) you had); Don't think I'll be doing that, but have done enough stuff like this to believe this'll likely give you decent enough performance. Not sure about 2ms Update: I think someone already mentioned something very similar, I have not really looked into the comments before commenting. Not entirely sure if using segment trees is new or not...
Your videos really helped my five year old learn about real-time image rendering, it's way better than the cocomelon version. she always asks for "the long hair man" (: thank you so much
You won me at the moon album cover. Great video and amazing explanation. I like a lot the way you explain aspects in an academic manner while still keeping it funny and entertaining. Great video and great skills!
I think a sorting algorithm for image data could be an interesting way to compress its size, even if it's not as efficient as other methods. Go through the image, list the colors as 24bit values, and use the numbers they correspond to and how many of each are along the lines.
I like how simple yet in depth the explanations were. Im at university wanting to do this and thankfully this was easy enough for me to understand but no so simple to know without any context. Thanks !
I think you could use simple quicksort or mergesort (no tread division) to make the sorting of the spans significantly faster, maybe not 2ms fast but way faster
The sort "keys" are small one- or two-byte integers, so you could also try a radix sort. That's O(N), whereas the quicksort or mergesort algorithms are O(N log N).
Honestly blows my mind seeing how effects like these work behind the scenes. Loved playing around in after affects using datamosh 2 and ae pixel sorter 2, but to see a pixel sorter in real time in a game engine while also being open source is mind blowing. Been recently been trying to create a little suite of shaders in unity of the majority of effects that I use in After Effects, the main ones being datamoshing, pixel sorting and ae's colorama effect (like a custom heat map effect, annoying me atm 💀). This pixel sorting video and the rest of your series of creating these custom shaders for FF14 has given me a hell of a lot of motivation and i cant thank you enough for that. 🙏
I do QA for a rendering software and this is genuinely so much more interesting than I imagined. It’s facinating to take a look under the hood even if it’s not something the devs I work with would ever do.
What if you use the mask's X (or Y) coordinate as a high significance value added to the value you're sorting. You'd use the start of the span as the number, so on black areas it would keep counting up but on white areas it would stall and create a span that all has the same high sig value added to it. Then you can use the Parallel Bitonic Merge Sort again.
Thank god he put another video of his cat on the other side of the screen! That way my eyes were drawn to the middle of the screen, so I could actually follow what was happening, truly a life saver!
I actually used this shader before finding this video. I had no idea what it was supposed to be for. Now, after watching this video, I have even less of an idea about what it's supposed to be for. Great job!
Out of curiosity, have you looked at odd-even transposition as an alternative to the Bitonic merge sort? Their parallel runtime complexity should be about the same, but the implementation is simpler (more structured) and might therefore be more cache friendly.
They both have their uses developer.nvidia.com/gpugems/gpugems2/part-vi-simulation-and-numerical-algorithms/chapter-46-improved-gpu-sorting here's a cool article on it
You don't need to sort each span of "glitched" pixels separately. If you modify the sort key to impose a mostly-monotonic order on pixels, you can globally sort the row (with massive parallelism) and preserve the relative locations of each span. Consider: A masked pixel (not sorted) at index i will have sort key 256*i. An un-masked pixel (to be sorted) at index i will have a sort key of (val) + 256*(largest j < i such that pixel[j] is masked). This value can be found via binary search in log(rows) steps, using conditional moves that should still play nicely with a shader. Separate runs of to-be-sorted pixels within a row will occupy a disjoint interval of sort keys, such that sorting the whole row will preserve pixel placements within their respective runs. The massively parallel sorting network can be used without further modification. The overall runtime for an image is then bound by the O(log(rows)^2) time required for the bitonic sort.
the 30 second parallel bitonic merge segment took me 15 hours to edit
anyways what was your favorite cat clip
the cat one definitely
I actually wanted to listen to your optimization explanation cuz optimizing is something I struggle with immensely but your cat clips were too distracting :(
the one with the cat
gotta be cat clip #2
ill watch it again if it makes you feel better
Graphics Programmers: My program renders an image for 3 days, but I was able to shave off 4.3 hours, I'm so good at it
Real-Time Graphics Programmers: I just wasted quarter of a millisecond, it was reserved for pathfinding calculations, my managers is going to kill me
“My optimized bogosort only takes one billion years to finish sorting now, down from 1.1 billion, it’s so much faster”
@@Numbabuto be fair that is 100 million years less than before
@@Numbabu the optimization? remembering not to choose the same random numbers as last time
@@Numbabu bogo sort lowkey underrated, literally the fastest sorting algorithm if you're lucky enough
@@senzmaki that's like saying "casting fireball on dnd is underrated, it creates a portal to the hell realm if you're lucky"
In art class, I opened my photos in notepad++ and wrote some stories. My teacher was shockingly confused how I'd intentionally altered a pic to make it "glitched." They're just numbers in an order, and I changed it.
niice
did you just straight up type into it?
Like Counter: 40 (in case yt hides it)
A am wondering what that looks like. I'll have to try that out at some point.
I did that too, way to go you can really get a variety of effects depending on type of file, bmp, png, jpeg, propably a bunch more interesting lossy and lossless compression algorithms, but jpg is really good if you want a lossy compression artifacts, and for lossless compression you can introduce major glitches into a png file, also you can mess with pure non compressed bitmap if you want more precise glitch manipulation, try them all out and remember that there are different types of .bmp to choose from
@@Cyberfishofant Yep, you can open an image in a text editor and just start typing. Or even manually write an image, but most formats are compressed.
I would like to point out that the effect would be great for gameplay if it were only applied to certain objects in the game, via a two-color buffer that shows which areas are allowed to be affected by the effect before the contrast map is applied to those areas, allowing the effect to be limited to certain portions of the image.
the uses would be awesome.
you could have a character's skin glitching out, but their clothes stay stable, you could have a sword that leaves behind a glitchy trail in the air, and so on and so forth.
A game mechanic that allows you to interact with "glitched" objects in a specific way would be pretty damn cool, it'd definitely be an eye-catching indicator.
considering it has to sort less pixels too, if the screen isnt covered in areas to pixel sort im sure it'd get to 2 ms or less
I would love to see a game made with this filter. something like psychological or surreal horror. I think this filter paired with good sound design and a decent art style would make for a really mind bending and fun game
I think he needs to blur the final stage a bit, but interesting..
cyberpunk isnt having it?
cyberpunk kind of implemented it when the relic effects affected the player.
Not a horror game but Splatter uses effects like this very often
Cyberpunk is your game if you like existential horror
I'm not gonna retain any of this information, but I always feel like acerola got to learn so much in making each video! This was really interesting to watch!
just yesterday Andreas Kling in his 1000th video talked about consumption patterns and how he found it hard to retain information from watching tech videos himself. he mentioned trying taking notes about the videos and how that led him to being more selective in what to watch. not sure I'd ever have the discipline but found it interesting anyway.
as a 3D graphics student, this comment is every day of my life lmao
Me neither, especially because I was 100% focused on the cats
the cat cams were the best parts
i have never and likely will never need to create graphics shaders, yet these videos are so endlessly entertaining and informative that i can't stop watching
i love learning about other programming specialties through vids like this
But because of this vid, you were there to witness how someone else was! And it was *so* cool
hello fellow void pfp
I mean, cat videos, right?
i am no longer void pfp
I read the title of the video as “I tried snorting pixels” - so you’re welcome for your next video idea.
They make me feel like... So digital... Like fingers, yo!
I almost read the channel name as areola
Came for the programming knowledge, stayed for the reminders that the monogatari series live forever in my heart
The editing is very spesifically cultured and I now want more.
Actually one of the most beginner friendly descriptions of Compute Shaders i have ever heard.
You should do more with them. I had a hard time when learning Compute Shader Concepts the first time.
In animation, PixelSorters are a great way to create scene transitions with this glitch aestethic! Ist awesome to see how they work
What is "black scene"'s importance
@@kindauncoolI don't that think it is anything of importance.
@@kindauncool not sure if you still care, but it's a reference to the monogatari anime series
Babe, wake up. New Acerola video just dropped
Ok honey
@@lolcat69 😘😘😘
Just 5 more minutes ok
Babe, wake up, this comment trend started years ago
Im so fucking sick of seeing this comment trend
I thought the title said snorting pixels
No Comments?
@@RivertedYT-1 comment? (start a chain dont break)
@@zoranradakovic2199 chain dont break
new coke flavor (i broke the chain, watchu gonna do about it?)
THATS WHY I CLICKED ON THE VIDEO, LIKE WTF IS PIXELS
If the parallel bitonic merge sort was fast enough you can also use it in the span sorting case: Do a first pass to assign to each pixel the index of its corresponding span, and then sort the entire column lexicographically first by span index. Lexicographical sorting can be accomplished by putting span index into some bits higher than the highest data bits of the sort keys.
Oh boy time for another sorting rabbit hole.
Also having Nekopara as the associated visual for a "video game" just killed me every time.
I think there IS an improvement on the shader code to be made: If your sortvalue buffer is of a known data type (e.g. uint8), I think you can use a radix sort - which should be a lot faster than your current alg.
it would still be single threaded radix sort which would be a yikes but I should try it yeah
@@Acerola_t radix sort is like O(3n) space and O(7n) time, we won't even need to transfer the buffer to group memory as each value in the buffer is accessed only once
Its only problem is that its slower than the current sorting alg for small spans.
@@Sloimay why not both then. The control mask already has the span lengths!
Each group can decide it's preferred algo (AFAIK doing this means that you should not even try to do more than one thread per group, GPUs are SIMT machines, right?)
@@IgnacioLosiggio imo probably not worth the effort. As long as the short spans don't take longer than the long spans, it doesn't help to optimize them.
There exist efficient `stable_sort_by_key` algorithms for the GPU.
The solution here would be to sort the original pixels using a key.
We can calculate it by first running a prefix sum over the mask, giving us 'spans' each filled with a unique index.
Multiply the prefix_sum by the maximum value that will be used in the sort (ensures a pixel can't 'escape' it's span).
Then, calculate the sort key using (original value * mask + prefix_sum).
All the masked-out values will have the exact same key within a span. All the masked in values will retain the same local delta within a span. Each span's maximum value is guaranteed to be less than the following spans minimum value.
Using a stable sort ensures that although masked out values have the same key value, their order doesn't change.
As an engineer with over a decade of experience in a completely different part of the field (distributed systems), graphics and shader programming has always felt like magic to me. Very cool to see it broken down like this.
I'm a video editor who loves glitchy effects and learning about how pixel sorting actually works has been very entertaining. I love how the post-processing workflow is so similar to doing VFX and colour correction too lol
how would you implement this as a sort of Adobe or AE plugin? is it possible? Pleas get back to me
There’s another one called PixSort as well
13:42
Mr. Rola, as in, short for acerola.
I legit fucking died here, omg.
You would get an effect that looks very similar if you simply set a hard number of samples per span. Because they're sorted they tend to look like gradients, so even a simple min and max of a span would look very similar.
I'm pretty new to shader programming and have no idea of 90% of the shader optimizations you were talking about, but just so I didn't feel dumb, I covered the cat videos with my hands in order to avoid getting distracted!
Also, instead of having groups that have already finished a thread wait for the others to finish, can't you divide the spans in 2s (two spans a thread) so you can roughly get them 2 spans done by one group at the same time as others? It may sound weird because I'm probably mixing up some of these terms
That would make the worst case be twice the amount of time that it is now(bc you have no way to prevent a group of 2 spans to not be 2 worst-case ones). Right now we have the same amount of active groups than there are spans in the image. Reducing the number of groups is not a thing that's going to speed the whole thing up bc GPUs can easily do thousands of things in parallel. On a CPU implementation, this would be a better approach, bc CPUs can't do that many threads in parallel, so there'd be a queue of spans to still be processed.
@@sephdebusser So it essentially positively double the best case's performance and negatively doubles the worst case's performance?
@@dotdotmod no, bc best case scenario of the original is still the time of one group processing a single small span. In your case, the best case is one group processing two small spans. Still double the time
The trouble is knowing you need to do that, because it's the CPU that starts threads, but the GPU that knows how many threads are needed.
It is doable, I'm pretty sure, but I think a better approach would be to try to figure out how to apply quicksort, O(n log n) is a lot better than O(n²). I think you could maybe pull it off by coloring the entire span in the thread mask with a span id instead of the start index having the length? Damn, now I want to try and figure this out.
@@SimonBuchanNz "Damn, now I want to try and figure this out." This video is such a nerd snipe, yeah.
Thank you for providing a full, in-depth explanation of how everything works. Graphics programming is a niche field and basic theory is hard to come by when searching the internet.
I read the title as “I tried snorting pixels” and thought this was going to be a trip report on some new compound
I didn't understand 95% of the video.. but i feel smarter somehow.
I think they used an effect like this in cyberpunk. It would actually be useful for small segments of play in a sci-fi game where the matrix is glitching.
FINALLY THERE'S A VIDEO ON THIS!!!!!! LET'S GOOOOOOOO!!!!!!!!!
this effect is such a banger. they should implement it into every competitive game ever made.
you're so professional for displaying the other videos for each technique you applied toward the end of the video
you may be able to speed up the algorithm by using the parallel prefix sums algorithm to calculate the span of each row. If you have enough compute cores, it can drop the time for creating this mask from O(n) to O(logn). Also, once you build the span mask using prefix sums, there are some handy parallel sorting algorithms that should let you divide up the sorting of each sub section without having to ask the CPU for guidance. If you want I can see if I can find my psudocode for this (im pretty sure we solved basically this problem in my parallel algorithms class), but I havent worked much with shaders, so your on your own translating it
heya, this is crazy interesting.
do you have the source?
@ianI think cs.wmich.edu/gupta/teaching/cs5260/5260Sp15web/lectureNotes/thm14%20-%20parallel%20prefix%20from%20Ottman.pdf covers prefix sums well and maybe www.dcc.fc.up.pt/~ricroc/aulas/1516/cp/apontamentos/slides_sorting.pdf for sorting?
Hey! I like the fact that you spend time explaining us the history behind it all!!!
I agree. I also thought it was interesting that a woman was an early pioneer of sorting algorithms. The world was a lot more sexist in the 50s, so it's pretty impressive that she made the contributions that she did.
@@strongmungus Except that it really isn't that surprising when you know women ended up working in the early programming profession a lot, as back then it was kind of an extension of a data entry job or similar. Then it transitioned into an engineering job, which in turn was more popular among men.
13:10 where tf did my cats go >:(
17:09 In the end, the real time pixel sorter is the friends we made along the way.
I absolutely love the video and the idea. The one thing I have to ask for the next one is a short explanation at the start with example images. I wasn't sure what you were going for with the pixel sorting, since I thought you would do so over the entire image, thus rendering it entirely unrecognizable.
It wasn't until 4:42 that I finally understood what the goal was.
That tony the tiger picture is 100% certified cursed.
Came here to say that pixel sorting is something academic research has sort of looked at, it's just generalized into 2D grid sorting for visualizations (and def not real time lol). Pixel sorting is actually the use case I'm currently testing for my research into these methods 😅
Great video. All it needs is a quick historical recap of all sorting algorithms
I think they may have used pixel sorting shaders to do some of the effects in Cyberpunk 2077. The effect at 16:15 looks really similar to the effect when you're on the Net or while viewing the edges of braindances
Looks like the tarot too
Idea for making it possible to parallelize the pixel sort algorithm (keep in mind I have no idea what I'm talking about):
Instead of generating a texture with the start of each span encoded by the position of pixels, generate a texture where the value of each pixel in a column represents what segment of the column it's pixel is in. Say there are two spans in a particular column, then the values of pixels in the texture from top to bottom would be a block of pixels with value 1, then for the extent of the first span, pixels would have a value of 2, then a value of 3 between the spans, 4 within the bounds of the second span, and 5 to the end. Also, if the first pixel is part of a span, then the first block of pixels should be even, so it can start at 0, to maintain the relationship that even numbers represent pixels within a span, and odd numbers represent pixels outside a span.
Then during the sort phase, instead of using a thread for every pixel, use a group of n/2 threads for every column of pixels, where n is the height of the columns. Then sort the entire column using the parallel bitonic merge sort algorithm, except make sure to first multiply the pixels sort value by 255 times the value from the corresponding texture location, or use a second comparison between the two texture values. Either way, the sorting algorithm will sort the entire column, and the increasing index will prevent mixing of spans and inter-span regions by making each region's values all greater than the previous region's values, and all smaller than the next region's values, or else achieve the same by some other logic. Then, either within the sort logic, or on a separate pass, take the original (unsorted, or just don't swap in the sort algorithm if both tex values are odd) value for pixels with an odd numbered texture value.
So, in the end, the CPU can dispatch the same number of thread to every column of the image regardless of the number of spans in each column, and you can do the parallel sort algorithm instead of single threading them.
Would love to see Acerola try this. I saw someone else also suggest this same idea after reading all 300-something comments as of now.
For someone who has no idea what they're talking about, this is pretty spot-on to what I was going to suggest as a graphics programmer.
You would probably keep the ranges that don't want sorting by simply modifying the swap logic, so that either even or odd spans (depending on code design and/or artistic choice) evaluate as their pixel position instead of value (easiest way to preserve position). This would be preferable to a second access since you're already loading the texture and writing anyway, so copying the original values in would be less efficient by adding more accesses.
is parallel bitonic an in-place sort algorithm? If so, i'd not worry about spans per-se but just play around with functions you AND with the comparison logic to see what happens. Maybe only swap two values if their luminance has the same first 3 bits or if their original positions are within n pixels of each other.
Your videos are amazing. You’re good enough at coding to talk about it in a easy to understand way, you structure your videos in easy to digest and entertaining way, and your editing complements it all perfectly.
If you think the techniques used in this video make you uncomfortable, check out per-pixel linked lists! The shaders for those look like they were written by somebody who doesn't quite understand shaders yet. Including 'for' loops over every "fragment"
Never did graphics programming, but what about the following solution?
Calculate for each pixel whether it is inside a span and if so, what span number.
Then use the bitonic sort with the following comparator:
If pixels are both in a span and have the same span number then compare them by value. Otherwise, compare them by index.
That was my thought as well! We could even combine the process of computing the simplified sorting value into span marking to avoid excess data writing.
@@donaldhobson8873 right
I haven't seen the video yet but can't you just get the RGB average of every pixel and then just sort that
Would love to see a follow up with this approach. Someone else suggested this in another comment too after reading through all 300-something
7:12 you can't keep teasing me with that tiger
14:43 waittt i was literally thinking "hey this looks like a serial experiments lain edit" this ENTIRE time
Instructions unclear: I ended up sorting kitties by cuteness, but they were all so cute that my GPU melted.
cuteness overload
"tasteful chromatic aberration" was up there in hype quotes for this video, just below "lets switch over to FFXIV"
I really dig this aesthetic that you created from this. It's really a awesome vibe. The amount of creativity that could be utilized by this is, is huge. Especially with Analog Horror, and other Avant guard type art styles.
I'm sorry, but this kind of algorithmic manipulation of discrete image pixels is exactly the opposite of analog horror. It literally could not be more digital. Like I know this is a nitpick, but not everything glitchy is analog horror!
0:06 man of culture I see
10:23 no way💀 bro really pulled the subway surfers and family guy combo on us
that cat trick to retain my attention and not abandon the video worked very well
genious, i used to work a lot with pixel sortring processing scripts back in the day but this is kind of a game changer, to run it on the GPU in realtime is quite a challenge you overcame, love the content and thanks for sharing the code!
As much as I love your cat, it's getting in the way of all the interesting graphics programming optimization visuals.
I crack up completely from these videos. Love how you manage to keep it entertaining while still being so informative.
Considering you set a cap on the span length anyway, would it make sense to make an indirect dispatch with one threadgroup per span to bitonic sort them instead of just running one thread per span?
In the absolute worst case yes since you know the whole image is being sorted but like you could just do a full bitonic merge sort if you want to sort the whole image where as I wanted to only sort the spans in the mask, which the cpu can't optimally dispatch groups for.
I may have missed the point, but my idea wasn’t that you would dispatch the groups from the cpu. You would do an indirect dispatch based on what you get from the pass that generates the span-mask you had. Instead of writing that mask, you could count the number of spans (using some atomic counter) and store that in an indirect-buffer. For each unique span you also store the start pixel and the number of pixels that span covers. Then you can just dispatch the number of groups (one per span) indirectly using the indirect buffer. So the cpu does not need to know anything about spans, and that data is kept on the GPU :)
Are there reasons that would not work?
I was looking for the comments if anyone had mentioned this. IndirectDispatch would work, you can even bucket a few fixed span lengths and dispatch shader variations. 1 threadgroup per span so you could do the bitonic sort in parallel. The problem is not dissimilar to tiled classification deferred shading .
First video I've seen from this channel, but lots of fun :)
I was so damn happy watching this, I never thought I'd see an in depth exploration of my niche interest like this! I do Glitch Photography and I'm so passionate; I almost cried when you mentioned Kim Asendorf!
my brain stopped working after you put 4 cat videos on the screen, congratulations you overloaded my brain with a cat
The thumbnail made me think that this was gonna be a deep dive into the human psyche and the limitations of the human brain but this is cool to
I've seen similar looking effects in some glitch art communities, both in image and video, but never in a live rendering shader! Very cool video!
Incorporating this into a game itself instead of as a filter would be interesting. Imagine if you're playing as a character who's had a memory-altering implant for something like schizophrenia, and slowly start to see phantoms appear from a malfunction of the memory device.
The character could slowly begin to realize the device was hiding the true reality all along, and fight to discern what the world is really like beyond the deep-fried digital veil. There could be quests exploring previously invisible areas, and a neverending struggle to balance the mental illness and the glitchy device.
a scanner darkly?
13:36 I was already enjoying the video, and then the MGMT reference made it even better
every time i see that thumbs up tony the tiger image my mind numbs a little
it's like a painkiller for the math parts of the videos
Great video! I think I know of an actual efficient solution:
0. You create your mask as you do;
1. You allocate a buffer of integers with elements for each pixel and assign 1 where there's a transition between neighboring segments; (1 operation per pixel; single dispatch)
2. For each row, you create a segment tree of sums from the buffer above; (Roughly log(row/column) size dispatches; can be reduced with shared memory)
3. For each pixel, you evaluate the segment tree from row/column start to the pixel index in that axis and store it in a buffer; this will give you "segment index" for each pixel in your desired direction (O(log(n)) operations per pixel; single dispatch)
4. Do the parallel bitonic merge sort per row/column as usual, but replace 'less' operator with (a.groupId < b.groupId || (a.groupId == b.groupId && a.value < b.value)); (Exact same O(N * log^2(N)) you had);
Don't think I'll be doing that, but have done enough stuff like this to believe this'll likely give you decent enough performance. Not sure about 2ms
Update: I think someone already mentioned something very similar, I have not really looked into the comments before commenting. Not entirely sure if using segment trees is new or not...
1:40 I wanted history tho ;-;
Same
I think it would be really interesting to separate the chroma and luminance channels, and sort only one of them before recombining.
1:39 - video skips past Betty Holberton
*pauses video to google Betty Holberton*
She was one of the ENIAC developers!
betty holberton is one of the greats
didn’t understand literally anything in this video, i loved it
the amount of buff cereal tiger is crazy
Acerola, this channel is so damn cool. I feel like you're pioneering the "white paper" of the modern age.
The cat videos did not help my ADHD
Acerola is farming watch time from how often I've had to rewatch parts after getting distracted by the cats
Your videos really helped my five year old learn about real-time image rendering, it's way better than the cocomelon version. she always asks for "the long hair man" (: thank you so much
my true target audience
Never thought Ted could be used to depict the raw emotion in regards to the CPU and GPU having communication issues with each other
You won me at the moon album cover. Great video and amazing explanation. I like a lot the way you explain aspects in an academic manner while still keeping it funny and entertaining. Great video and great skills!
they did sorting on a pixel
indeed
multiple pixels in fact
I think a sorting algorithm for image data could be an interesting way to compress its size, even if it's not as efficient as other methods. Go through the image, list the colors as 24bit values, and use the numbers they correspond to and how many of each are along the lines.
I'm pretty sure PNG already does something similar with its lossless compression.
9:27 what if i wanna watch both the explanation and the cat!! they're BOTH cool!!
10:23 OH MY GOD NOOOO
@@vintage08too many cats!
11:35 T O O M A N Y
i love the monogatari style editing you have for some scenes
I like how simple yet in depth the explanations were. Im at university wanting to do this and thankfully this was easy enough for me to understand but no so simple to know without any context. Thanks !
That effect looks amazing! I can already think of usecases for it!
“Snorting Pixels”
I think you could use simple quicksort or mergesort (no tread division) to make the sorting of the spans significantly faster, maybe not 2ms fast but way faster
The sort "keys" are small one- or two-byte integers, so you could also try a radix sort. That's O(N), whereas the quicksort or mergesort algorithms are O(N log N).
Honestly blows my mind seeing how effects like these work behind the scenes. Loved playing around in after affects using datamosh 2 and ae pixel sorter 2, but to see a pixel sorter in real time in a game engine while also being open source is mind blowing.
Been recently been trying to create a little suite of shaders in unity of the majority of effects that I use in After Effects, the main ones being datamoshing, pixel sorting and ae's colorama effect (like a custom heat map effect, annoying me atm 💀). This pixel sorting video and the rest of your series of creating these custom shaders for FF14 has given me a hell of a lot of motivation and i cant thank you enough for that. 🙏
I do QA for a rendering software and this is genuinely so much more interesting than I imagined. It’s facinating to take a look under the hood even if it’s not something the devs I work with would ever do.
11:37 he he he cat-egorised
12:06 sus
What if you use the mask's X (or Y) coordinate as a high significance value added to the value you're sorting. You'd use the start of the span as the number, so on black areas it would keep counting up but on white areas it would stall and create a span that all has the same high sig value added to it. Then you can use the Parallel Bitonic Merge Sort again.
Thank god he put another video of his cat on the other side of the screen! That way my eyes were drawn to the middle of the screen, so I could actually follow what was happening, truly a life saver!
finally someone got the vision
I haven't thought of sorting in that way.. great video!
I watched this video and I will never be the same ! Wow !
0:29 ... bro what's up with that tony the tiger cropped yiff?
howd you know what it was? 🤨🤨🤨🤨🤨🤨🤨
oh hey a video that is not a short
lol I made the shorts cause those topics won the patron poll
This looks a lot like the Cyberpunk 2077 Braindance effect and Relic malfunction, but realtime.
This was beginning to lose hold of its position as my #1 favourite RUclips subscription, but they you rolled out the cat clips
Well done!
Did... Did you make a Monogatari reference throughout the video with the different colored scenes and chapters?
next you'll wonder where my name comes from!
@@Acerola_t ohhh. Didn't even think of that. Lol
10:55 help, I'm trying to pay attention but the cat cam's are distracting me
10:27 omg I just wanted to watch the video 💀💀 bro went full on z gen
I actually used this shader before finding this video. I had no idea what it was supposed to be for.
Now, after watching this video, I have even less of an idea about what it's supposed to be for.
Great job!
eyyy, Paradise killer song. my favorite!.
I appreciate the cats too. they helped with the whole paying attention thing.
Out of curiosity, have you looked at odd-even transposition as an alternative to the Bitonic merge sort? Their parallel runtime complexity should be about the same, but the implementation is simpler (more structured) and might therefore be more cache friendly.
They both have their uses
developer.nvidia.com/gpugems/gpugems2/part-vi-simulation-and-numerical-algorithms/chapter-46-improved-gpu-sorting
here's a cool article on it
10:44 What toy is that? (in the bottom left. the rod or whatever sticking out of the blue ball rolly toy) i think my cats would like it 🥺🙌
lol I just stuck one of the rod toys into the hole that was in the blue ball toy, so it's not an official thing you can buy.
16:15 legit looks like an effect in cyberpunk
That's what I thought! It look like when they are *SPOILERS* inside Mikoshi and talking with Alt
You don't need to sort each span of "glitched" pixels separately. If you modify the sort key to impose a mostly-monotonic order on pixels, you can globally sort the row (with massive parallelism) and preserve the relative locations of each span. Consider:
A masked pixel (not sorted) at index i will have sort key 256*i. An un-masked pixel (to be sorted) at index i will have a sort key of (val) + 256*(largest j < i such that pixel[j] is masked). This value can be found via binary search in log(rows) steps, using conditional moves that should still play nicely with a shader.
Separate runs of to-be-sorted pixels within a row will occupy a disjoint interval of sort keys, such that sorting the whole row will preserve pixel placements within their respective runs. The massively parallel sorting network can be used without further modification.
The overall runtime for an image is then bound by the O(log(rows)^2) time required for the bitonic sort.
i don't understand a single thing but i love this video