2024 update: there have been breakthroughs in AI handling mesh data directly, so this probably won't revolutionize anything, but still pretty cool i guess
Care to share the paper's you are referring to? Great job on this video. Am looking to catch up in the 3D space. Been working in ML for a long time but am just getting around to looking at nerf/splat
I first heard about gausian splatting from this video like 3 months ago. And just recently started using gausian splatting in real world application. Shoocker, I know! I work at a fab shop that designs and builds custom automatic lumber processing lines for sawmills. When we finnish a project I have to take hundreds of photos and videos for future reference, and customer support. Gausian splatting has made it possible to scan and view assambled units, that couldn’t be done using photogrametry or laser scanning, because of the size and complexity. Thank you!
Only problem is he seems to ignore the fact that this method requires exhaustive amounts of image/photo input, limiting its application especially for stylized scenes/games, and uh... Doom didn't have shadows so I have no idea what he's smoking.
@@acasccseea4434It's implicit. He started with the fact that you "take a bunch of photos" of a scene. The brevity relies on maximum extrapolation by the viewer. The only reason I understood this (~90+%) was because I'm familiar with graphics rendering pipelines.
Maybe we could get a hybrid approach: all static elements in a scene as gaussians, and then dynamic elements like the player, NPCs and interactable objects as traditional polygon objects. Collisions between the two could be calculated using an invisible relatively low-poly version of what's in the gaussian scene.
That "unlimited graphics" company Euclidean was working with tech like this at least a decade ago. I think the biggest pitfall with this tech right now is that none of it is really dynamic, we don't have player models or entities or animations. It's a dollhouse without the dolls. That's why this tech is usually used in architecture and surveying, not video games. I'm excited to see where this technique can go if we have people working to fix its shortcomings.
I wonder how easily it can be combined with a traditional pipeline. It would be kind of like those old games with pre-rendered backgrounds except that the backgrounds can be rendered in real time from any angle.
@@shloop. I wondered whether you could convert a splattered scene into traditional 3D but I realised mapping any of this to actual, individual textures and meshes would probably be a nightmare. Maybe you could convert animated models to gaussians realtime for rendering, and manually create scene geometry for physics? For lighting, I imagine RT would be impossible as each ray intersection would involve several gaussian probabilities. As a dilettante I think it's too radical for game engines and traditional path-tracing is too close to photorealism for it to make an impact here
Basically "real life graphics" using real life object as the base of the image. Cons : we need like 256 GB of Vram to render this smoothly because all gaussian need to be at vram to render as smoothly as possible.
The VRAM is only needed in training. Things like Polycam and Luma use gaussian splatting, and they can render in realtime on a phone or browser. This is basically just a lighter version of a NeRF. From my experience, Nvidia's Instant NGP software is much higher resolution than the latest Gaussian Splatting software (and is a couple years older), but it requires a powerful GPU to render.
@tbuk8350 so would each individual end user need to train it or would it be trained by the devs before hand and then it's no longer a concern for the end user?
Not needing to wait 5 minutes for the video to play the channel intro, thanking the sponsor, trying to sell t shirts and mugs with the channel logo, begging the viewer to subscribe, and giving an unwanted history on graphics, before getting to the damn point is such a breath of fresh air
well that’s because this video is for people who already understand a lot, most people need that intro minus the sponsors to understand things and most big channels need sponsors to fund their operations
Yeah that's the problem with zoomer attention span. Content/information needs to be shortened nowadays to miniscule clips. There are other more lengthy videos about this topic, but sadly they get less views because of the said problem.
If you want to actually understand this stuff you have to sit down with several papers full of university-level math for a few hours, and that's if you already have an education in this level of math you can draw on. Generally if you feel like a popsci video explains a lot without really explaining anything the reason is that they skipped the math. TL;DR: Learn advanced maths if you want to understand sciency shit.
@@imatreebelieveme6094There’s a spectrum. On one end, there are 2 minute videos like this, and on the other end is what you are talking about. I think there can be a happy medium, with medium to long form videos that explain enough about a topic to understand it at a basic level.
this requires scene-specific training with precomputed ground truths, if this can be used independently for realtime rasterization that could be a big breakthrough in the history of computer graphics and light transport.
@@astronemir Miniature scenes ... we improved virtual rendering of scenes to escape the need for physical representations of settings, now we go all the way back around and build our miniature sceneries to virtualize them. Wild.
@@xormak3935 it wont always be like that. We are still developing these technologies. This kind of stuff didnt exist 2 years ago. And many thought AI was just a dream until less than 8 years ago. By 2030 im sure we will be playing video games that look like real recordings from real life
This technique is an evolution, one could say it's an evolution from Point Clouds. The thing that most analysis I've seen/read is missing is that the main reason this exists now is because we finally have GPUs that are fast enough to do this. It's not like they're the first people who looked at point clouds and thought "hey, why can't we fill the spaces *between* the points?" EDIT: I thought I watched to the end of the video, but I didn't, the author addressed this in the end :) It's not just VRAM though! It's rasterization + alpha blending performance EDIT2: you know what I realised after reading all the new things coming out about gaussian splatting. I think most likely this technique will first be used as background/skybox in a hybrid approach
Just because we have gpus that are powerful enough now, doesn’t mean devs should go and cram this feature in mindlessly like they are doing for all other features recently, completely forgetting about optimization and completely crippling our frame rate.
How could one fit this kind of rasterization into a game, if possible? This whole gaussian thing is going completely over my head, but would it even be possible for an engine to use this kind of rasterization while having objects in a scene, that can be interactive and dynamic? Or where an object itself can change and evolve? So far everything is static with the requirement of still photos...
@@MenkoDany I have not worked for any AAA companies and was investigating the technique for indie stylized painterly renders (similar to 11-11 Memories Retold) If you build the point clouds procedurally instead of training on real images, it is possible to get blotchy brush-stroke like effects from each point. This method is also fantastic for LODs with certain environments because the cloud can be sampled less densely at far away locations, resulting in fewer, larger brush strokes. In the experiment I was working on I got a grass patch and a few flowery bushes looking pretty good before I gave up because of exponential overdraw issues. Culling interior points of the bushes and adding traditional meshes under the grass where everything below was culled helped a bit but then it increased the feasible range to like a 15m sphere around the camera that could be rendered well.
While being way too technical for any normal person to understand. He immediately alienates the majority of people by failing to explain 90% of what is actually happening.
We have done this for 12 years already. It’s not applicable to games. Everyone trashed Euclideon when this was initially announced, and now because someone wrote a paper everyone thinks it’s a new invention…
Would be great for something like Myst/Riven with premade areas and light levels. It doesn't sound like this method offers much for dynamic interactivity (which is my favourite thing in gaming). It would be great for VR movies/games with a fixed scene that you can look around in.
You don't really need to render different lighting, you can merely shift an area of the gaussian to become brighter or darker or of a different hue, based on a light source. So flashlights would behave more like a photo filter. Since the gaussians are sorted by depth, you already have a simple way to simulate shadows from light sources.
@@peoplez129 hmmm, but that's not how light works. It doesn't just make an area go towards white, there's lots of complex interactions to take into account.
When people begin to do this on a larger scale, and with animated elements, perhaps video, ill pay attention. If they can train the trees to move and the grass to sway, that will be extremely impressive, the next step is reactivity which will blow my mind the most. I dont see it happening for a long time.
exactly. these techniques are great for static scenes/cgi, but these scenes will be 100% static with not even a leaf moving, unless each item is captured individually or some new fancy AI can separate them, but the "AI will just solve it" trope can be said about pretty much anything, so for now it's a cool demo
@@yuriythebestIs there any major obstacle to like, doing this process on two objects separately, and then like, taking the unions of the point clouds from the two objects, and varying the displacements?
@@yuriythebestYeah, and the wishful thinking exhibited in that cliche is likely really bad for ML. Overshilling always holds AI back at some point, think of the 80s.
@@drdca8263 if i'm thinking about this in my head, one thing i can think of is that the program has no idea which two points are supposed to correspond, so stuff would squeeze and shift while moving.
Yup - as soon as any object or source of light moves you'll need ray tracing (or similar) to correctly light the scene and that is when a lot of the realism starts to break down.
I literally don't care about the subject, but the way you did this... possibly the greatest YT video of all time. 2 minutes flat. No nonsense. Somehow hilarious. Kudos!
In the current implementation it reminds me of prerendered backgrounds. They looked great but their use was often limited to scenes that didn't require interaction like in Zelda: OOT or Final Fantasy VII.
My thoughts exactly. Ok, well now do the devs have to build a 3D model to lay "under" this "image" so we can interact with stuff? And what happens when you pick up something or walk through a bush? How well can you actually build a game with this tech?
The funniest thing about Ocarina of time was: There was no background. At least, not really. What they did is they created a small bubble around the player that shows this background. There is a way to go outside of the bubble and see the world without it. I bet there's some videos on that, it's very fun and interesting.
@@Nerex7 Are you talking about the 3D part of the OOT world or do you refer specifically to the market place in the castle with its fixed camera angles?
I'm talking about the background of the world, outside of the map (as well as the sky). It's all around the player only. @@gumbaholic It's refered to as a sky box, iirc.
@@Nerex7 I see. And sure OOT has a skybox. But that's something different than pre-rendered backgrounds. It's like the backgrounds from the first Resident Evil games. It's the same for the castle court and the front of the Temple of Time. Those are different from the skybox :)
I absolutely love this style of informative video. Usually I have to have videos set to 1.5x because they just drag everything out. But not here! Love it.
Honestly, this video is *perfect* for people like me with ADHD and irritability. No stupid filler, no "hey guys", no "like and subscribe". Just the facts, stated as quickly and concisely as possible. 10/10 video.
I'm 40, and I hate watching videos instead of reading a text. Even if it's a 2 minute video on how to open the battery compartment (which is, frankly, a good use case for video). I really don't want to wait until someone gets to the point, talks through a segue, etc. This is closer to reading, very structured and fast. Wouldn't equate it with short attention span.
Ahh yes, another case of the older generation hating on the younger generation. I remember the exact same said about Millennials and Gen Y. A missed opportunity for unity and imparting useful lessons. Please see past your hubris and use the experience to create.
I'm not sure I agree. This video to me is like an abstract high level of the concept. It gets to the point. This is in stark contrast to a bunch of tech videos that stretch to the 10 minute mark just for ads, barely ever making a point. It's a good summarization. Saving time when trying to sift through information does not necessarily equate to short attention span.
At the moment, photogrammetry seems a lot more applicable as the resulting output is a mesh that any engine can use (though optimization/retopology is always a concern) whereas using this in games seems like it requires a lot of fundamental rethinking but has the potential to achieve a higher level of realism.
Like I saw in another video (I believe it was from Corridor?), this technique is better applied when re-rendering a camera recorded 2D path, using this technique, and then have a new footage but without all the shakiness of your real 2D recording. Kinda sucked to explain it, but I hope you got it.
This looks a lot like en.wikipedia.org/wiki/Volume_rendering#Splatting from 1991, I wonder if there is any big difference apart the training part, also I know everybody said the same, but your editing is so cool. It's so dynamic, yet it manages to not be exhausting at all
It is close but the new technique optimizes the Gaussians (both the number of gaussians and the parameters) to fit volumetric data while the other one doesn’t, leading to a loss of fidelity. Please correct me if I’m wrong, I haven’t actually read the old paper.
This could be used for film reshoots if a set was destroyed, but in a video game the player and NPCs would still need to have some sort of lighting/shading.
I watched a video about AI that confirmed this is already a thing. Not at this fidelity but tools already exist to reconstruct scenes from existing footage, reconstruct voices, generate dialog. Nothing in the future will be real.
It's more niche than photogrammetry because there's no 3D model to put into something else. But with a bit more work I'd love to see this be a feature on a smartphone.
Perhaps the process could be repeated with a 360 degree FOV to create an environment map for the inserted 3D model. Casting new shadows seems impossible though.
@@EvanBoldt "Casting new shadows seems impossible though." => It really depends. If your footage already has shadows, it'll be difficult. However, if your footage DOESN'T contain shadows, just add some skybox/HDRi, tweak some objects (holes, etc) and voilá.
@@fusseldieb If you think realistic graphics require only "some skybox/HDRi and tweaking some objects" You have a lot to learn, especially when it comes to required textures.
Please be the 2MinutesPaper we deserve (without fillers, making a weird voice on purpose and exaggerating the papers). Good stuff, really liked the video.
This is the first time I've seen an editing style similiar to Bill Wurtz that not only DIDN'T make me wanna gouge my eyes out, but also worked incredibly well and complimented the contents of the video. Nice!
Your video is spot on! HTC Vive/The Lab were the reasons why I got into VR. I loved the photogrammetry environments so much that it's my hobby to capture scenes. Google Light Fields demos were a glimpse of the future, but blurry. These high quality NeRF breakthroughs are coming much earlier than I thought it would be. We will be able to capture and share memories, places... it's going to be awesome! I don't know if Apple Vision Pro can only do stereoscopic souvenirs capture or if it can do 6dof, but I hope it's the latter :,D
The best thing about this technique is that it is not a NeRF! It is fully hand-crafted and that's why it beats the best of NeRFs tenfold when it comes to speed.
@@orangehatmusic225 There is no AI it that new tech, just good ol' maths and human ingenuity. TwoMinutePapers has a great explanation, but if you don't have time to watch it now, I hope a quote from the paper can convince you: "The un- structured, explicit GPU-friendly 3D Gaussians we use achieve faster rendering speed and better quality without neural components."
@@orangehatmusic225Ah yes, the only and primary reason people throw hundreds of dollars into VR equipment is to jerk off, so viciously in fact that it would splatter the headset itself. For sure, man, for sure.
I wonder how we'll get dynamic content into the 'scene'. Will it be like Alone in the Dark where the env is one tech (for that game, it was 2D pre rendered) and characters are another (for them, 3d poly). Or if we will create some pipeline for injecting / merging these fields so you have pre-computed characters (like mortal combat's video of people). Could look janky. Also I don't see this working for environments that need to be modified or even react to dynamic light, but this is early days.
well, it could already work in the state it's in right now for VFX and 3D work, even though you can't as of now inject models or lighting into the scene (to my knowledge), you could still technically take thousands of screenshots and convert the nerf into a photoscanned environment, and then use that photoscanned environment as a shadow and light catcher in a traditional 3D software, and then use a depth map from the photoscan as a way to take advantage of the nerf for an incredibly realistic 3D render, that way you can put things behind other things and control the camera and lighting, while still taking advantage of the reflections and realistic lighting nerfs provide
Yes the issue here is that these scenes are not interactable because the things in them are not 3D objects, they are mere representations from your particular perspective. Dunno how they would solve those problems (which is probably why we won't see it in games anytime soon if ever)
I don’t know much about rendering but this sounds so smart. You take a video, separate the frames, do the other steps, and now you have this video as a 3d environment
This would be an amazing thing for AR as you could use that detailed 3d model of an arbitrary room or place you are in and augment it with virtual elements. Or to have photorealistic environments in VR without needing a AAA production team. Imagine making a house tour abroad in VR where the visuals are rendered photorealistic in real time
@@redcrafterlppa303 "Or to have photorealistic environments in VR without needing a AAA production team" -> This already exists. Some people use plain old photogrammetry for that
It still is photogrammetry, just rendered with unlit shaders, with all its flaws too. Getting the required image material is tedious and we still won't receive specular information.
For now it’s niche. I imagine it could be used in games blended with traditional rendering pipelines. E.g. use this new method for certain areas that need a light level of detail.
Interesting!!! I always think back to when I was doing my degree in the 90s, and Ray Tracing was this high end thing PhD students did with super expensive Silicon Graphics servers, and it was always a still image of a very reflective metal sphere on a chess board with some kind of cones or polyhedra thrown in for kicks. It took days and weeks of render time. About 25 years passed between when I first heard of ray tracing and when I played a game with ray tracing in it. I might not be 70 when the first game using a Gaussian engine is released, but I wouldn’t imagine it happens before I’m 60. Still very interesting, though!!!
I don’t think it’ll be used at all personally but I’m stupid so we’ll see, I don’t see how this is better than digital scanning if you have to do everything yourself to add lighting and colission, someone said it could be used for skyboxes and I could see that.
Ray tracing was also popular on the Amiga, these chess boards, metal spheres etc would crop up regularly in Amiga Format, (I've no idea how long it took to render one on the humble Amiga) some of them were a bit more imaginative though and I remember thinking how cool they looked and wondering if games would ever look like that, tbh I'm a bit surprised it actually happened...I'm not that convinced I'm seeing the same thing here though, how does any of this get animated? It's already kinda sad that it's 2023 and interactivity and physics have hardly moved an inch since Half-Life 2, the last thing we need is more "pre-rendered" backgrounds.
You didn't need an SGI. I bought a 287 co-processor for an IBM PC to do raytracing in late 80s. Started with Vivid and then POV raytracers. By mid 90s we were using SGI's for VR.
Sick informative video, just a nit at 1:35, Doom doesn't have shadows per se since all light levels are manually "designed" into the map data by the level author. Quake on the other hand did introduce static light maps that were computed offline from level geometry and placed light sources, generating realistic (if lo-fi for today's standards) shadows.
Beyond the Gaussian Splatting, great Audio Splatting with the QnA style format to answer all the "ya but (key points)". Such an exceptional but concise rendering of edited snippets.
the issue is it does not have the 'logic' - so no dynamic light, no possibility to identify a gameobject directly like : 'hey do a transform of vector3 to the bush there' => no 'information' of such a bush. Let sees where it will bring us of course, but it looks more like a fancy way to represent a static scene. Probably it can be used in an FX for movie, where you can do something on a static scene, and you do something with 3D stuff inside this 'scene' I do not know ...
That was an incredible 2 minute video, i learned everything i needed to know and NOTHING ELSE, i don't even know your name! This was just... perfect, more of this please youtube.
I was in college back when deferred shading was just being talked about as a viable technique for lighting complex scenes in real-time. I even did my dissertation on the technique. Back then GPU's didn't have even close to the memory needed to do it at a playable resolution, but now pretty much every game uses it. I can see the same thing happening with 3D Gaussian Splatting.
@@womp47 You think the problem with 20 minute long videos where maybe a quarter is about the topic is people's attention span? Funniest shit I've read in a while
this could be done using images from hyperrealistic renders instead of irl photos too, right? to move around in a disney cgi level environment that wouldn't be possible in realtime normally
So long as the renders basically functioned as they are required (as in, enough of them, from enough different points) and can be converted into whatever format is used to create this...i dont see why not.
You could use something that uses Ray Tracing to create a scene, and once the dots fill in to 100%, you have your screenshot/picture, so then you move the camera to the next position, and allow the dots to fill in. Rinse repeat, and then you'll have RTX fidelity.
@@Lazyguy22 Yes, but you could add in bits redrawn as polygons with their own textures that could be interactable I'm thinking like the composite scenes from the late 90s fmv games
It'll be used for specific applications only. If you want to use that with video games you'll still have to give geometry to all the objects and environments, meaning it will still need to render all of that with polygons.
It probably can be used with most fps games (for things that won't have to move, interact or have colisions) and far or non-interactible, scenarios from games.
@@GabrielLima-pi4kw It's just not worth it. For a city block a la Dust 2 you'd need a physical set, any changes to the scene would require a reshoot which would require the same weather or you'd end up with different lighting, it would clash stylistically with trad. 3D assets, there'd be much less vfx or lighting available, you'd need _two_ rendering pipelines ballooning dev time and render time, you'd need a trad. 3D substitute anyway if you want users with slower systems, shooting at gaussians would give no audiovisual feedback...
Actually, you can use the point cloud rendered objects for the visuals entirely and then rough low-poly objects for collision mapping and skeletal work exclusively. The bigger problem is that this technology needs to know where the camera is and then render the data for that camera. You would have to render extra frames in every direction that the camera could move, otherwise the rendered viewport would always be playing "catchup" (You'd get an unclean, not-yet-decided face until the render catches up - think texture pop-in in games that use texture streaming.)
You can see this prominently in this video - look at the lack of depth for the curvature of the vase or the edges of the table. You can see that as the camera pans the "sides" of those objects blit into existence after the camera has moved. It's very obvious on the bicycle tires as well.
I'd like to see more research done into this to see how to superimpose dynamic objects into a scene like this before it has any sort of practical use in video games but for VR and other sorts of things, this could have lots of potential if you want to have cinema-quality raytraced renders of a scene displayed in realtime. Doesn't have to be limited to real photos.
I mean it'd be pretty simple to do, use a simplified mesh of the scene to mask out the dynamic objects then overlay them when they should be visible. The challenge is more making the dynamic objects look like they belong.
I've always wondered about this. All my life since i started getting into videogames graphics from when I was a kid. I knew it had to be possible and watching this is like the biggest closure of my life.
But this tech is not to be used in games/movies/3D printing, because these are not mesh 3d models. These are 100% static, non interactive models. Games need mesh models to be interacted with, lit, animated etc. It's also very demanding on the GPU. It's something that fe. Google Maps could use in the future.
Surely there'd ways to make them non-static? Maybe a lot more computationally demanding and beyond current consumer hardware, but for example you could develop some sort of weighted bone system that acts on the points of a cloud model in similar way that is done on 3d meshes? And I imagine ray/path tracing could be used to simulate lighting? etc. I'll admit I don't have a strong grasp on this technology so I could be completely off base, but I feel that these are only static because the means to animate, light and add interactability is just yet to be realised (and suitable hardware to support this), just like how it would not have been originally with 3d meshes@@Sc0pee
@@Sc0pee if this technique generates 3d objects, that is usually what I would call a mesh. Static meshes and images both are used in games often, and we have editing capabilities for more complex requirements like animation. You're definitely right about the GPU thing tho
So I’m a professional 3D artist by trade and I can tell you this won’t can’t on. Least not in its current form. It’s not dynamic for lighting, not technically physical, it’s effectively a cloud point rendering out all its photos all at once which… well that’s effectively a model with only vertices with no faces or lines. The size of each cloud point map (whatever the file is called) is also massive, like… not in vram but in shear file size. Currently photogrammetry is still a better way to bring models into a game since photogrammetry IS basically this, Photogrammetry is this baked into mesh form, aka a usable form.
5 second clip of something that happens later in the video "HEY WHAT'S GOING ON GUYS today we are doing a thing but first I want to thank this channel's sponsor NordVPN...."
You could walk through it. It's not actually lit and it's not geo so there's nothing to rig. It's basically 3D footage. I would think this could be very useful for compositors when they need to fill in background data when vfx builds a brand new camera angle that wasn't filmed. This is almost useless as cg data because there's just nothing you can do with it except move through it. This isn't like normal maps, which are a way to utilize lighting and simulate new additional things on stuff that has been created. You can't create these fields, you can only capture them. I take that back, you might be able to create them but you'd need to have modeled, textured, and lit your scene so that you can capture the field from your scene. Again, could be useful to compositors but more of a nuisance for cg. It's cool though
I saw something similar years ago with similar impressive results. (also just a point cloud being brute forced into a picture.) However, the downside of this type of technique is its memory requirement, making it fairly niche and hard to use in practice. For smaller scens it works fine, for anything large and it starts to fall apart. Beyond this we also have to consider that these renders are of a static world. And given the fairly huge amount of "particles" making up even a relatively small object, then the challenge of "just moving" a few as part of an animation becomes a bit insane in practice. Far from impossible, just going to eat into that frame rate by a lot. Most 3d content is far more than just taking world data and turning it into a picture. And a lot of the "non graphics" related work (that graphics has to wait for, else we don't know what to actually render) is not an inconsequential amount to work with as is. Moving a few tens of thousands of polygons around as a character model walks by isn't trivial work. Change those tens of thousands of polygons into millions of points (to get similar visual fidelity) and that animation step is suddenly a lot more compute intensive. So in the end, that is my opinion. Works nice as a 3d picture, but dynamic content is a challenge. Same for memory utilization, something that makes it infeasible for 3d pictures too.
I'll start believing that this is the future as soon as we have actual scenes with proper movement instead of some stilted pans. Also this is extremely niche because it can only be used for capturing real objects and then rendering a video of them. This allows for a lot of freedom to decide camera moves but still for a lot of things just recording a video is a lot more practical. But if you want to render something that doesn't exist (which is mostly what hyper realistic rendering is useful for) then this is no better than traditional rendering.
What makes it infeasible to, for example, pre-render high fidelity scenes then convert them to this "gaussian space"? I don't see why this can only be used for capturing real objects. Static objects only, sure. Real objects only? I don't follow. Not to say I see an improvement to traditional graphics here if we're pre-rendering stuff anyways, but "it can only be used for capturing real objects" is not a technical limitation.
@@memermemer6103 well since this is a rendering method if you already have a high fidelity render there's no point in rerendering it with this. and if you mean scenes that can't be rendered in real time, and then converting to gaussian space to run it in real time, read the first part of my comment. Plus since this can only be used for static scenes video games are out of the question, which are basically the only application for high fidelity real time redering. by "only being able to capture real scenes" I meant it can't render computer graphics by itself.
@@-CrySa- If rendering using this technique would be faster then the reason would be the same reason we have LOD for models. The only thing is, would this actually be faster, or be usable in the applications that games need? Otherwise it's just better photogrammetry.
@@davidburke4101 Well to be able to render with this you first need a high quality traditional render, it doesn't work on it's own. For video games it's useless because it can only do static scenes. And photogrammetry is used for getting 3d models of objects which this can't do.
This doesn't even compare to real time pathtracing--it's merely a faster way of getting baked af textures that can't do anything dynamic whatsoever lol
For gaming, you could potentially "pre-render" scenes at extremely high fidelity and then use this technique to produce real-time graphics in games based on those. Almost like baking lighting into your scene. You could even potentially leverage extremely low fidelity models and graphics to supplement or "approximate" these. Maybe even do it on a per-object basis for dynamic objects. And if you do so with flat-lighting, you could leverage global illumination for real-time lighting on top of that. This could get very interesting...
This seems like a client side revolution more than a development side one. I’m struggling to think of a method to build scenes based on this that isn’t just building the scene ‘traditionally’ and then converting it into a lot of gaussians. I also am kind of unsure how we’d animate in this. How do we vary light sources, have moving characters, change the environment itself? For all of those animations it seems like we’d have to essentially convert every frame with new gaussians, or automate the animation of generated gaussians from the prior frame or so. The applications I see benefitting from this would be architectural visualisation, and maybe some vr implementations, at this current stage it just doesn’t have the capabilities to take over other technologies though I am extremely excited to see how this develops. Beautiful work on the video!
You cant, which is why all these "infinite detail" solutions are mostly always limited to static scenes. Works great for applications that dont need dynamic elements, but not really at all for games.
That was the best 2(ish) minutes of listening to someone describing a new thing I've ever spent in my life. Also, this would be amazing if it could be implimented into games/VR/AR.
Reminds me of Euclideon ruclips.net/video/iVsyB938ovY/видео.html in the sense that animations seem difficult to do. Would you need to train the Gaussian splats for every bone / skeleton configuration? Do they interpolate well?
This is probably useful for certain types of games, scenes, and cinematics, even right now, today. I could see, for instance, a walking simulator using this technique. Alternatively, this could also be used to produce interactive cinematics that let you move around the pre-rendered cutscenes, though I shudder to imagine the file size. But for me the major thing that this technique would need to be useful in a variety of situations is some creativity in mixing it with the existing rednering pipelines. I could see an Unreal Engine 5 character not looking too out of place in a scene like this, for instance, so I could see it not being that bad to render character traditionally and the environment using this method. Beyond that, I don't think you'd specifically have to use photographs for it, either; I think you could do some advanced ray tracing (like in, say, Cyberpunk), and baked it into the scene itself to reduce the load on the GPU. It wouldn't be as dynamic, because you wouldn't have moving objects, or at least not as many...But I'll take that tradeoff in some games for 120FPS+ with super high quality ray tracing in the environment.
To me it almost feels like a different way of storing the scene, rather than a new rendering pipeline. Instead if vertices, edges and faces, the scene can be optimised as a gaussian point cloud.
Several gigs of vram? Nvidia is releasing 8gb consumer cards now, instead of 12gb. I really hope this gets raytracing levels of hype and nvidia gets forced to shit out 16+ gb mid tier cards anyways, this is a really good video, thanks for showing some chool tech in an approachable way! I don't know much about the rendering pipeline but still managed to understand at least the base concept of the technology. loved the editing style!
I really hope this gets picked up and adopted quickly by companies that are training 3-D generation on nerfs. The biggest issue I’m seeing is resolution. I imagine this is what they were talking about coming in the next update with imagine 3D. Fingers crossed that would be insane.😮
2024 update: there have been breakthroughs in AI handling mesh data directly, so this probably won't revolutionize anything, but still pretty cool i guess
Care to share the paper's you are referring to? Great job on this video. Am looking to catch up in the 3D space. Been working in ML for a long time but am just getting around to looking at nerf/splat
I first heard about gausian splatting from this video like 3 months ago. And just recently started using gausian splatting in real world application. Shoocker, I know!
I work at a fab shop that designs and builds custom automatic lumber processing lines for sawmills. When we finnish a project I have to take hundreds of photos and videos for future reference, and customer support. Gausian splatting has made it possible to scan and view assambled units, that couldn’t be done using photogrametry or laser scanning, because of the size and complexity.
Thank you!
Kewl
No nonsense. No filler. No BS. Just pure information. Cheers dude.
Editing probably took hours
Yes yes yes
Not enough explanation for me but pretty short, cut down and barebones although I’d like to understand the math a bit better
Only problem is he seems to ignore the fact that this method requires exhaustive amounts of image/photo input, limiting its application especially for stylized scenes/games, and uh... Doom didn't have shadows so I have no idea what he's smoking.
probably best format youtube video I've ever seen.
this man can condense a 2 hour lecture into 2 minutes. subbed
Same
Yeah and i didn’t understand a thing
it's in the description, where it says "original paper"
he missed out the most important part, you need to train the model for every scene
@@acasccseea4434It's implicit. He started with the fact that you "take a bunch of photos" of a scene. The brevity relies on maximum extrapolation by the viewer. The only reason I understood this (~90+%) was because I'm familiar with graphics rendering pipelines.
a photorealistic static scene is still a static scene
Yes and gaussian its very dificult to animate
yesss, until someone figures out yet another neat algebraic trick to add motion to it
damn, that's too funny to be true: ruclips.net/video/BHe-BYXzoM8/видео.html (it's exactly what I was talking about in the previous comment).
Maybe we could get a hybrid approach: all static elements in a scene as gaussians, and then dynamic elements like the player, NPCs and interactable objects as traditional polygon objects. Collisions between the two could be calculated using an invisible relatively low-poly version of what's in the gaussian scene.
@@tetramaximum They definitely will eventually, if they haven't already.
That "unlimited graphics" company Euclidean was working with tech like this at least a decade ago. I think the biggest pitfall with this tech right now is that none of it is really dynamic, we don't have player models or entities or animations. It's a dollhouse without the dolls. That's why this tech is usually used in architecture and surveying, not video games. I'm excited to see where this technique can go if we have people working to fix its shortcomings.
Yeah I was gonna say this magic tech sounded familiar
I wonder how easily it can be combined with a traditional pipeline. It would be kind of like those old games with pre-rendered backgrounds except that the backgrounds can be rendered in real time from any angle.
@@shloop. I wondered whether you could convert a splattered scene into traditional 3D but I realised mapping any of this to actual, individual textures and meshes would probably be a nightmare.
Maybe you could convert animated models to gaussians realtime for rendering, and manually create scene geometry for physics?
For lighting, I imagine RT would be impossible as each ray intersection would involve several gaussian probabilities.
As a dilettante I think it's too radical for game engines and traditional path-tracing is too close to photorealism for it to make an impact here
Not everything needs to be a game. There are people dying of Cholera RIGHT NOW!
@@der.SchtefanThis won't help with that though
Basically "real life graphics" using real life object as the base of the image.
Cons : we need like 256 GB of Vram to render this smoothly because all gaussian need to be at vram to render as smoothly as possible.
that's a big con.
fun part, 200gb vram will be needed only in training, stuff's renderable in browser on mobile
The VRAM is only needed in training. Things like Polycam and Luma use gaussian splatting, and they can render in realtime on a phone or browser.
This is basically just a lighter version of a NeRF. From my experience, Nvidia's Instant NGP software is much higher resolution than the latest Gaussian Splatting software (and is a couple years older), but it requires a powerful GPU to render.
@tbuk8350 so would each individual end user need to train it or would it be trained by the devs before hand and then it's no longer a concern for the end user?
@@ScrimmyBungletsdevs
Not needing to wait 5 minutes for the video to play the channel intro, thanking the sponsor, trying to sell t shirts and mugs with the channel logo, begging the viewer to subscribe, and giving an unwanted history on graphics, before getting to the damn point is such a breath of fresh air
well that’s because this video is for people who already understand a lot, most people need that intro minus the sponsors to understand things and most big channels need sponsors to fund their operations
@@shaansingh6048 > 2 million views
I've learned so much, but so little at the same time 😂
My feelings exactly!
"The more you know, the more you realize how much you don't know." -Einstein
Yeah that's the problem with zoomer attention span. Content/information needs to be shortened nowadays to miniscule clips. There are other more lengthy videos about this topic, but sadly they get less views because of the said problem.
If you want to actually understand this stuff you have to sit down with several papers full of university-level math for a few hours, and that's if you already have an education in this level of math you can draw on. Generally if you feel like a popsci video explains a lot without really explaining anything the reason is that they skipped the math.
TL;DR: Learn advanced maths if you want to understand sciency shit.
@@imatreebelieveme6094There’s a spectrum. On one end, there are 2 minute videos like this, and on the other end is what you are talking about. I think there can be a happy medium, with medium to long form videos that explain enough about a topic to understand it at a basic level.
Please continue to make videos like this that are engaging but also technical. From a software engineer and math enthusiast.
I recognize that default apple profile icon!
Concept is cool, but your editing is really great! Reminds me of Bill Wurtz :D
Exactly.
Damn beat me to it! Imagine the collaboration of these two
First thing I thought!
YUP.
You mean it originally came from Bill Wurtz
this requires scene-specific training with precomputed ground truths, if this can be used independently for realtime rasterization that could be a big breakthrough in the history of computer graphics and light transport.
Yeah but imagine photorealistic video games, they could be made from real scenes created in a studio, or from miniature scenes..
@@astronemir Miniature scenes ... we improved virtual rendering of scenes to escape the need for physical representations of settings, now we go all the way back around and build our miniature sceneries to virtualize them. Wild.
@@xormak3935 Or train on virtual scenes.
@@xormak3935 it wont always be like that. We are still developing these technologies. This kind of stuff didnt exist 2 years ago. And many thought AI was just a dream until less than 8 years ago.
By 2030 im sure we will be playing video games that look like real recordings from real life
Train it on a digital scene
video explanation is packed, couldn't even finish my 3d gaussian splatting on the toilet
This technique is an evolution, one could say it's an evolution from Point Clouds. The thing that most analysis I've seen/read is missing is that the main reason this exists now is because we finally have GPUs that are fast enough to do this. It's not like they're the first people who looked at point clouds and thought "hey, why can't we fill the spaces *between* the points?" EDIT: I thought I watched to the end of the video, but I didn't, the author addressed this in the end :) It's not just VRAM though! It's rasterization + alpha blending performance
EDIT2: you know what I realised after reading all the new things coming out about gaussian splatting. I think most likely this technique will first be used as background/skybox in a hybrid approach
I was one of those people and ran into overdraw issues. I can't imagine how vram is the limiting factor rather than the sort and alpha blend.
@@zane49er51 How long do you think before we have sensible techniques for animation, physics, lighting w/ gaussian splatting/NeRF-alikes?
Just because we have gpus that are powerful enough now, doesn’t mean devs should go and cram this feature in mindlessly like they are doing for all other features recently, completely forgetting about optimization and completely crippling our frame rate.
How could one fit this kind of rasterization into a game, if possible? This whole gaussian thing is going completely over my head, but would it even be possible for an engine to use this kind of rasterization while having objects in a scene, that can be interactive and dynamic? Or where an object itself can change and evolve? So far everything is static with the requirement of still photos...
@@MenkoDany I have not worked for any AAA companies and was investigating the technique for indie stylized painterly renders (similar to 11-11 Memories Retold)
If you build the point clouds procedurally instead of training on real images, it is possible to get blotchy brush-stroke like effects from each point. This method is also fantastic for LODs with certain environments because the cloud can be sampled less densely at far away locations, resulting in fewer, larger brush strokes. In the experiment I was working on I got a grass patch and a few flowery bushes looking pretty good before I gave up because of exponential overdraw issues. Culling interior points of the bushes and adding traditional meshes under the grass where everything below was culled helped a bit but then it increased the feasible range to like a 15m sphere around the camera that could be rendered well.
the highly technical rapid fire is just what i need.
the editing and the graphic design is just great, keep it coming i love it!
While being way too technical for any normal person to understand. He immediately alienates the majority of people by failing to explain 90% of what is actually happening.
Has to be the cleanest and most efficient way I got tech news on youtube. Keep it up my dude
Wow. Never seen someone fit so much information in such a short timeframe while keeping it accurate and especially easy to take in. Way to go!
Two minute papers is close
fireship is close
hes the bill wurtz of graphics LOL
We have done this for 12 years already. It’s not applicable to games. Everyone trashed Euclideon when this was initially announced, and now because someone wrote a paper everyone thinks it’s a new invention…
Clearly you haven't seen "history of the entire world, i guess" by Bill Wurtz. This video feels heavily inspired by it.
I like the calm and paced approach to explaining the technique.
Man this was so good, please do more stuff. This content scratches an itch in my brain I didn't know I had. So so good.
Would be great for something like Myst/Riven with premade areas and light levels. It doesn't sound like this method offers much for dynamic interactivity (which is my favourite thing in gaming). It would be great for VR movies/games with a fixed scene that you can look around in.
I just realized it's also probably not great for rendering unreal scenes like a particular one I have trapped in my head.
@@iamlordstarbuilder5595Yeah no dynamic lighting :(
You don't really need to render different lighting, you can merely shift an area of the gaussian to become brighter or darker or of a different hue, based on a light source. So flashlights would behave more like a photo filter. Since the gaussians are sorted by depth, you already have a simple way to simulate shadows from light sources.
@@peoplez129 hmmm, but that's not how light works. It doesn't just make an area go towards white, there's lots of complex interactions to take into account.
I was just thinking this. Wouldn’t it be nice if computer gaming went full circle with adventure games like this becoming a living genre again?
When people begin to do this on a larger scale, and with animated elements, perhaps video, ill pay attention. If they can train the trees to move and the grass to sway, that will be extremely impressive, the next step is reactivity which will blow my mind the most. I dont see it happening for a long time.
exactly. these techniques are great for static scenes/cgi, but these scenes will be 100% static with not even a leaf moving, unless each item is captured individually or some new fancy AI can separate them, but the "AI will just solve it" trope can be said about pretty much anything, so for now it's a cool demo
@@yuriythebestIs there any major obstacle to like, doing this process on two objects separately, and then like, taking the unions of the point clouds from the two objects, and varying the displacements?
@@yuriythebestYeah, and the wishful thinking exhibited in that cliche is likely really bad for ML. Overshilling always holds AI back at some point, think of the 80s.
@@drdca8263 if i'm thinking about this in my head, one thing i can think of is that the program has no idea which two points are supposed to correspond, so stuff would squeeze and shift while moving.
Yup - as soon as any object or source of light moves you'll need ray tracing (or similar) to correctly light the scene and that is when a lot of the realism starts to break down.
I literally don't care about the subject, but the way you did this... possibly the greatest YT video of all time. 2 minutes flat. No nonsense. Somehow hilarious. Kudos!
In the current implementation it reminds me of prerendered backgrounds. They looked great but their use was often limited to scenes that didn't require interaction like in Zelda: OOT or Final Fantasy VII.
My thoughts exactly. Ok, well now do the devs have to build a 3D model to lay "under" this "image" so we can interact with stuff? And what happens when you pick up something or walk through a bush? How well can you actually build a game with this tech?
The funniest thing about Ocarina of time was: There was no background. At least, not really.
What they did is they created a small bubble around the player that shows this background. There is a way to go outside of the bubble and see the world without it. I bet there's some videos on that, it's very fun and interesting.
@@Nerex7 Are you talking about the 3D part of the OOT world or do you refer specifically to the market place in the castle with its fixed camera angles?
I'm talking about the background of the world, outside of the map (as well as the sky). It's all around the player only. @@gumbaholic It's refered to as a sky box, iirc.
@@Nerex7 I see. And sure OOT has a skybox. But that's something different than pre-rendered backgrounds. It's like the backgrounds from the first Resident Evil games. It's the same for the castle court and the front of the Temple of Time. Those are different from the skybox :)
I absolutely love this style of informative video. Usually I have to have videos set to 1.5x because they just drag everything out. But not here! Love it.
Honestly, this video is *perfect* for people like me with ADHD and irritability. No stupid filler, no "hey guys", no "like and subscribe". Just the facts, stated as quickly and concisely as possible. 10/10 video.
So fun to watch. I want all research papers explained this way, even the ones in less visual fields. Subscribed!
the man who made a 7 min video worth of something to a 2 minute barrage of info - I like it
Love your editing and humor!
Quick and informative and I love your editing style
This dude is the pinnacle and culmination of gen z losing their attention span
I've watched it on double speed
I'm 40, and I hate watching videos instead of reading a text. Even if it's a 2 minute video on how to open the battery compartment (which is, frankly, a good use case for video). I really don't want to wait until someone gets to the point, talks through a segue, etc. This is closer to reading, very structured and fast. Wouldn't equate it with short attention span.
Ahh yes, another case of the older generation hating on the younger generation. I remember the exact same said about Millennials and Gen Y.
A missed opportunity for unity and imparting useful lessons. Please see past your hubris and use the experience to create.
@wcjerky bro I am the younger generation hating on the younger generation 💀
I'm not sure I agree. This video to me is like an abstract high level of the concept. It gets to the point.
This is in stark contrast to a bunch of tech videos that stretch to the 10 minute mark just for ads, barely ever making a point.
It's a good summarization. Saving time when trying to sift through information does not necessarily equate to short attention span.
If this takes off and improves I could see it being used to create VR movies, where you can walk around the scene as it happens
So kind of like eavesdropping but the main characters won’t notice and beat you for it
I mean... braindance?
@@theragerghost9733brooo
So porn is going to be revolutionized..
@@ianallen738what a time to be alive
At the moment, photogrammetry seems a lot more applicable as the resulting output is a mesh that any engine can use (though optimization/retopology is always a concern) whereas using this in games seems like it requires a lot of fundamental rethinking but has the potential to achieve a higher level of realism.
Like I saw in another video (I believe it was from Corridor?), this technique is better applied when re-rendering a camera recorded 2D path, using this technique, and then have a new footage but without all the shakiness of your real 2D recording. Kinda sucked to explain it, but I hope you got it.
This looks a lot like en.wikipedia.org/wiki/Volume_rendering#Splatting from 1991, I wonder if there is any big difference apart the training part,
also I know everybody said the same, but your editing is so cool. It's so dynamic, yet it manages to not be exhausting at all
It is close but the new technique optimizes the Gaussians (both the number of gaussians and the parameters) to fit volumetric data while the other one doesn’t, leading to a loss of fidelity.
Please correct me if I’m wrong, I haven’t actually read the old paper.
@@francoislecomte4340 You are totally correct :)
im impressed youtube let you put a link in the comments
Wow, I love your style! I learned a lot and was entertained at the same time.
This could be used for film reshoots if a set was destroyed, but in a video game the player and NPCs would still need to have some sort of lighting/shading.
This is a great application for Google Street View as opposed to the 3D they have now...
HDRI Is a lighting map that can be applied into any virtual space object, you just take a 360 photo of the ambient.
lighting/shading can be figured out by the environment. Movie cgi is lit by using a 360 image sphere of the set
Plants also won't be moving so no wind
I watched a video about AI that confirmed this is already a thing. Not at this fidelity but tools already exist to reconstruct scenes from existing footage, reconstruct voices, generate dialog. Nothing in the future will be real.
This is the most concise explanation of gaussian splatting I have stumbled across so far. Subscription achieved.
this is the content i need in my life, no filler. thanks
It's more niche than photogrammetry because there's no 3D model to put into something else.
But with a bit more work I'd love to see this be a feature on a smartphone.
Perhaps the process could be repeated with a 360 degree FOV to create an environment map for the inserted 3D model. Casting new shadows seems impossible though.
@@EvanBoldt "Casting new shadows seems impossible though." => It really depends. If your footage already has shadows, it'll be difficult. However, if your footage DOESN'T contain shadows, just add some skybox/HDRi, tweak some objects (holes, etc) and voilá.
photogrammetry really isn’t that niche considering it’s used pretty heavily in both AAA video games and film
You can create depth fields from images which can be used to create 3D objects. So it should be possible to integrate it into a pipeline.
@@fusseldieb If you think realistic graphics require only "some skybox/HDRi and tweaking some objects" You have a lot to learn, especially when it comes to required textures.
10/10 intro, literally perfect in every way.
I immediatly got what I clicked for and found myself intrested from 0:02 onwards.
Please be the 2MinutesPaper we deserve (without fillers, making a weird voice on purpose and exaggerating the papers). Good stuff, really liked the video.
This is the first time I've seen an editing style similiar to Bill Wurtz that not only DIDN'T make me wanna gouge my eyes out, but also worked incredibly well and complimented the contents of the video. Nice!
so true bestie
i did want to kill myself a little bit though just a little
bill wurtz without the thing that makes bill wurtz bill wurtz
bill wurtz explaining something complicated, kinda goes in one ear and out the other
@@thefakepie1126 yeah, without the cringe
Your video is spot on!
HTC Vive/The Lab were the reasons why I got into VR.
I loved the photogrammetry environments so much that it's my hobby to capture scenes.
Google Light Fields demos were a glimpse of the future, but blurry.
These high quality NeRF breakthroughs are coming much earlier than I thought it would be.
We will be able to capture and share memories, places... it's going to be awesome!
I don't know if Apple Vision Pro can only do stereoscopic souvenirs capture or if it can do 6dof, but I hope it's the latter
:,D
We all know what you use VR for... you might want to not use a black light around your VR goggles huh.
The best thing about this technique is that it is not a NeRF! It is fully hand-crafted and that's why it beats the best of NeRFs tenfold when it comes to speed.
@@mixer0014 Using AI doesn't make it "hand crafted"... so you are confused.
@@orangehatmusic225
There is no AI it that new tech, just good ol' maths and human ingenuity. TwoMinutePapers has a great explanation, but if you don't have time to watch it now, I hope a quote from the paper can convince you:
"The un-
structured, explicit GPU-friendly 3D Gaussians we use achieve faster
rendering speed and better quality without neural components."
@@orangehatmusic225Ah yes, the only and primary reason people throw hundreds of dollars into VR equipment is to jerk off, so viciously in fact that it would splatter the headset itself. For sure, man, for sure.
This is exactly what I wanted. Keep these videos up, subbing now!
I wonder how we'll get dynamic content into the 'scene'. Will it be like Alone in the Dark where the env is one tech (for that game, it was 2D pre rendered) and characters are another (for them, 3d poly). Or if we will create some pipeline for injecting / merging these fields so you have pre-computed characters (like mortal combat's video of people). Could look janky. Also I don't see this working for environments that need to be modified or even react to dynamic light, but this is early days.
well, it could already work in the state it's in right now for VFX and 3D work, even though you can't as of now inject models or lighting into the scene (to my knowledge), you could still technically take thousands of screenshots and convert the nerf into a photoscanned environment, and then use that photoscanned environment as a shadow and light catcher in a traditional 3D software, and then use a depth map from the photoscan as a way to take advantage of the nerf for an incredibly realistic 3D render, that way you can put things behind other things and control the camera and lighting, while still taking advantage of the reflections and realistic lighting nerfs provide
Yes the issue here is that these scenes are not interactable because the things in them are not 3D objects, they are mere representations from your particular perspective. Dunno how they would solve those problems (which is probably why we won't see it in games anytime soon if ever)
I don’t know much about rendering but this sounds so smart. You take a video, separate the frames, do the other steps, and now you have this video as a 3d environment
This would be an amazing thing for AR as you could use that detailed 3d model of an arbitrary room or place you are in and augment it with virtual elements. Or to have photorealistic environments in VR without needing a AAA production team. Imagine making a house tour abroad in VR where the visuals are rendered photorealistic in real time
@@redcrafterlppa303 "Or to have photorealistic environments in VR without needing a AAA production team" -> This already exists. Some people use plain old photogrammetry for that
most compelling way to present, my recall is much higher from the way you make it so entertaining. Kudos!
Perfect video on an absolutely insane topic. I’d love more bite-sized summaries like this on graphics tech news!
It still is photogrammetry, just rendered with unlit shaders, with all its flaws too. Getting the required image material is tedious and we still won't receive specular information.
Thank you for not turning this into a 58 min long documentary.
For now it’s niche. I imagine it could be used in games blended with traditional rendering pipelines. E.g. use this new method for certain areas that need a light level of detail.
more than just that niche - video production could utilize this in rendering effects
Interesting!!!
I always think back to when I was doing my degree in the 90s, and Ray Tracing was this high end thing PhD students did with super expensive Silicon Graphics servers, and it was always a still image of a very reflective metal sphere on a chess board with some kind of cones or polyhedra thrown in for kicks. It took days and weeks of render time.
About 25 years passed between when I first heard of ray tracing and when I played a game with ray tracing in it.
I might not be 70 when the first game using a Gaussian engine is released, but I wouldn’t imagine it happens before I’m 60.
Still very interesting, though!!!
I don’t think it’ll be used at all personally but I’m stupid so we’ll see, I don’t see how this is better than digital scanning if you have to do everything yourself to add lighting and colission, someone said it could be used for skyboxes and I could see that.
I remember exactly the ray-tracing program you're talking about, in 1993/4 I think a 640x480 image render took about 20 hours.
Ray tracing was also popular on the Amiga, these chess boards, metal spheres etc would crop up regularly in Amiga Format, (I've no idea how long it took to render one on the humble Amiga) some of them were a bit more imaginative though and I remember thinking how cool they looked and wondering if games would ever look like that, tbh I'm a bit surprised it actually happened...I'm not that convinced I'm seeing the same thing here though, how does any of this get animated? It's already kinda sad that it's 2023 and interactivity and physics have hardly moved an inch since Half-Life 2, the last thing we need is more "pre-rendered" backgrounds.
You didn't need an SGI. I bought a 287 co-processor for an IBM PC to do raytracing in late 80s. Started with Vivid and then POV raytracers. By mid 90s we were using SGI's for VR.
I really like your presentation style, keep it up!!!
Sick informative video, just a nit at 1:35, Doom doesn't have shadows per se since all light levels are manually "designed" into the map data by the level author. Quake on the other hand did introduce static light maps that were computed offline from level geometry and placed light sources, generating realistic (if lo-fi for today's standards) shadows.
"all light levels are manually "designed" into the map data by the level author" => Also called Baked textures.
@@fusseldiebno it's not. in doom maps are split up into sectors which can have their light levels manually adjusted by the map author.
well, how much vram do you need? and could nVidea's new compression algo's for regular texture streaming (a 4x savings on VRAM) help at all?
10 GB for a single room... anything outdoor requires 10 server farm buildings
Beyond the Gaussian Splatting, great Audio Splatting with the QnA style format to answer all the "ya but (key points)". Such an exceptional but concise rendering of edited snippets.
i love the term “audio splatting”, thank you
There are 2 ways to get someone with adhd to pay attention
1. subway surfers
1. this
yes there is 1. two times
This is the power thirst of super complex graphics algorithms, and I'm totally here for it.
i love how condensed and straight to the point this is
1:00 disliking and reporting because you didn't play the sorting doooooOOOOT sound effect
2:00 Who doesn't have several gigs of vRAM? My RX 480 is 7 years old and it has 8 gigabytes.
Yeah several doesn't sound like a lot
Why did this lock my attention in so well? Keep up the great work!
I love how short and concise this was. Well done mate! No BS, cut to the chase
the issue is it does not have the 'logic' - so no dynamic light, no possibility to identify a gameobject directly like : 'hey do a transform of vector3 to the bush there' => no 'information' of such a bush.
Let sees where it will bring us of course, but it looks more like a fancy way to represent a static scene.
Probably it can be used in an FX for movie, where you can do something on a static scene, and you do something with 3D stuff inside this 'scene' I do not know ...
Isolate the object, tweak it and export it. Should be doable...
@@fusseldieb the lighting wouldn't change when you move it tho
That was an incredible 2 minute video, i learned everything i needed to know and NOTHING ELSE, i don't even know your name! This was just... perfect, more of this please youtube.
I was in college back when deferred shading was just being talked about as a viable technique for lighting complex scenes in real-time. I even did my dissertation on the technique. Back then GPU's didn't have even close to the memory needed to do it at a playable resolution, but now pretty much every game uses it. I can see the same thing happening with 3D Gaussian Splatting.
Why can't ALL videos be as informative like this one in such a short span on time. Well done!
Because they want ad money
get a longer attention span and youll be able to watch longer videos like this and actually learn stuff
@@womp47 You think the problem with 20 minute long videos where maybe a quarter is about the topic is people's attention span?
Funniest shit I've read in a while
because this video isn't informative.
@@LoLaSn when did i say videos "where a quarter is about the topic"
Subscribed: because you can condense quality information in 2 minutes, no bs, just to the point.
Great video, do you plan more like this? Terse, technical, about 3D graphics or AI. Basically, 2 minute papers, but with less fluff.
this could be done using images from hyperrealistic renders instead of irl photos too, right? to move around in a disney cgi level environment that wouldn't be possible in realtime normally
cool idea.
You'd need a large number of those renders, and wouldn't be able to interact with anything.
So long as the renders basically functioned as they are required (as in, enough of them, from enough different points) and can be converted into whatever format is used to create this...i dont see why not.
You could use something that uses Ray Tracing to create a scene, and once the dots fill in to 100%, you have your screenshot/picture, so then you move the camera to the next position, and allow the dots to fill in. Rinse repeat, and then you'll have RTX fidelity.
@@Lazyguy22 Yes, but you could add in bits redrawn as polygons with their own textures that could be interactable
I'm thinking like the composite scenes from the late 90s fmv games
Jeesus it's been so long since Iv'e seen something so direct and un-bloated that i almost got whiplash. Cheers very much
It'll be used for specific applications only. If you want to use that with video games you'll still have to give geometry to all the objects and environments, meaning it will still need to render all of that with polygons.
It probably can be used with most fps games (for things that won't have to move, interact or have colisions) and far or non-interactible, scenarios from games.
yes then its the same thing as that scam all those years ago, it only renders a scene no objects
@@GabrielLima-pi4kw It's just not worth it. For a city block a la Dust 2 you'd need a physical set, any changes to the scene would require a reshoot which would require the same weather or you'd end up with different lighting, it would clash stylistically with trad. 3D assets, there'd be much less vfx or lighting available, you'd need _two_ rendering pipelines ballooning dev time and render time, you'd need a trad. 3D substitute anyway if you want users with slower systems, shooting at gaussians would give no audiovisual feedback...
Actually, you can use the point cloud rendered objects for the visuals entirely and then rough low-poly objects for collision mapping and skeletal work exclusively. The bigger problem is that this technology needs to know where the camera is and then render the data for that camera. You would have to render extra frames in every direction that the camera could move, otherwise the rendered viewport would always be playing "catchup" (You'd get an unclean, not-yet-decided face until the render catches up - think texture pop-in in games that use texture streaming.)
You can see this prominently in this video - look at the lack of depth for the curvature of the vase or the edges of the table. You can see that as the camera pans the "sides" of those objects blit into existence after the camera has moved. It's very obvious on the bicycle tires as well.
I'd like to see more research done into this to see how to superimpose dynamic objects into a scene like this before it has any sort of practical use in video games but for VR and other sorts of things, this could have lots of potential if you want to have cinema-quality raytraced renders of a scene displayed in realtime. Doesn't have to be limited to real photos.
I mean it'd be pretty simple to do, use a simplified mesh of the scene to mask out the dynamic objects then overlay them when they should be visible. The challenge is more making the dynamic objects look like they belong.
This was amazing. Thanks for the humorous take. Keep going!
I've always wondered about this. All my life since i started getting into videogames graphics from when I was a kid. I knew it had to be possible and watching this is like the biggest closure of my life.
But this tech is not to be used in games/movies/3D printing, because these are not mesh 3d models. These are 100% static, non interactive models. Games need mesh models to be interacted with, lit, animated etc. It's also very demanding on the GPU. It's something that fe. Google Maps could use in the future.
Surely there'd ways to make them non-static? Maybe a lot more computationally demanding and beyond current consumer hardware, but for example you could develop some sort of weighted bone system that acts on the points of a cloud model in similar way that is done on 3d meshes? And I imagine ray/path tracing could be used to simulate lighting? etc. I'll admit I don't have a strong grasp on this technology so I could be completely off base, but I feel that these are only static because the means to animate, light and add interactability is just yet to be realised (and suitable hardware to support this), just like how it would not have been originally with 3d meshes@@Sc0pee
@@Sc0pee if this technique generates 3d objects, that is usually what I would call a mesh. Static meshes and images both are used in games often, and we have editing capabilities for more complex requirements like animation. You're definitely right about the GPU thing tho
Somehow, based on this comment, I doubt you know literally anything about computer graphics.
So I’m a professional 3D artist by trade and I can tell you this won’t can’t on. Least not in its current form. It’s not dynamic for lighting, not technically physical, it’s effectively a cloud point rendering out all its photos all at once which… well that’s effectively a model with only vertices with no faces or lines. The size of each cloud point map (whatever the file is called) is also massive, like… not in vram but in shear file size. Currently photogrammetry is still a better way to bring models into a game since photogrammetry IS basically this, Photogrammetry is this baked into mesh form, aka a usable form.
I didn't understand a single thing but I think you're passionate about it so I'm going to do the dad thing and fully support you.
WHY EVERYBODY DOESNT MAKE INFORMATIVE VIDEOS IN THIS FORMAT? Fast, clear to the point, no filler stuff, just pure info in shortest amount of time.
You should watch "history of the entire world, i guess" if you haven't already. It's 20 minutes of this kind of rapid-fire semi-educational delivery.
5 second clip of something that happens later in the video "HEY WHAT'S GOING ON GUYS today we are doing a thing but first I want to thank this channel's sponsor NordVPN...."
I wonder if something like this could eventually be combined with some level of animation and interactivity - in games that is. That would be wicked.
You could walk through it. It's not actually lit and it's not geo so there's nothing to rig. It's basically 3D footage. I would think this could be very useful for compositors when they need to fill in background data when vfx builds a brand new camera angle that wasn't filmed. This is almost useless as cg data because there's just nothing you can do with it except move through it. This isn't like normal maps, which are a way to utilize lighting and simulate new additional things on stuff that has been created. You can't create these fields, you can only capture them. I take that back, you might be able to create them but you'd need to have modeled, textured, and lit your scene so that you can capture the field from your scene. Again, could be useful to compositors but more of a nuisance for cg. It's cool though
PROBABLY NOT, that's why I kept thinking this is virtually useless (videogame wise), there is no interaction besides looking at a static enviroment.
You can maybe do great skyboxes or non-interactable background objects, but is it worth doing for those?
This guy is too good for this level of subs...
Subbed right away
I saw something similar years ago with similar impressive results. (also just a point cloud being brute forced into a picture.)
However, the downside of this type of technique is its memory requirement, making it fairly niche and hard to use in practice.
For smaller scens it works fine, for anything large and it starts to fall apart.
Beyond this we also have to consider that these renders are of a static world. And given the fairly huge amount of "particles" making up even a relatively small object, then the challenge of "just moving" a few as part of an animation becomes a bit insane in practice. Far from impossible, just going to eat into that frame rate by a lot. Most 3d content is far more than just taking world data and turning it into a picture. And a lot of the "non graphics" related work (that graphics has to wait for, else we don't know what to actually render) is not an inconsequential amount to work with as is. Moving a few tens of thousands of polygons around as a character model walks by isn't trivial work. Change those tens of thousands of polygons into millions of points (to get similar visual fidelity) and that animation step is suddenly a lot more compute intensive.
So in the end, that is my opinion.
Works nice as a 3d picture, but dynamic content is a challenge. Same for memory utilization, something that makes it infeasible for 3d pictures too.
I'll start believing that this is the future as soon as we have actual scenes with proper movement instead of some stilted pans.
Also this is extremely niche because it can only be used for capturing real objects and then rendering a video of them. This allows for a lot of freedom to decide camera moves but still for a lot of things just recording a video is a lot more practical. But if you want to render something that doesn't exist (which is mostly what hyper realistic rendering is useful for) then this is no better than traditional rendering.
What makes it infeasible to, for example, pre-render high fidelity scenes then convert them to this "gaussian space"? I don't see why this can only be used for capturing real objects. Static objects only, sure. Real objects only? I don't follow. Not to say I see an improvement to traditional graphics here if we're pre-rendering stuff anyways, but "it can only be used for capturing real objects" is not a technical limitation.
@@memermemer6103 well since this is a rendering method if you already have a high fidelity render there's no point in rerendering it with this.
and if you mean scenes that can't be rendered in real time, and then converting to gaussian space to run it in real time, read the first part of my comment. Plus since this can only be used for static scenes video games are out of the question, which are basically the only application for high fidelity real time redering.
by "only being able to capture real scenes" I meant it can't render computer graphics by itself.
@@-CrySa- If rendering using this technique would be faster then the reason would be the same reason we have LOD for models. The only thing is, would this actually be faster, or be usable in the applications that games need? Otherwise it's just better photogrammetry.
@@davidburke4101 Well to be able to render with this you first need a high quality traditional render, it doesn't work on it's own. For video games it's useless because it can only do static scenes. And photogrammetry is used for getting 3d models of objects which this can't do.
love the narration, this video should be put in the hall of fame of youtube (Y)
You're like Bill Wurtz without all the fun jazz
An interesting concept but I believe that we should make every effort to head towards wider adoption of real time pathtracing
This doesn't even compare to real time pathtracing--it's merely a faster way of getting baked af textures that can't do anything dynamic whatsoever lol
For video games, absolutely. This would be a huge step backwards.
I wish all information was formatted like this. My new favorite channel
For gaming, you could potentially "pre-render" scenes at extremely high fidelity and then use this technique to produce real-time graphics in games based on those. Almost like baking lighting into your scene. You could even potentially leverage extremely low fidelity models and graphics to supplement or "approximate" these. Maybe even do it on a per-object basis for dynamic objects. And if you do so with flat-lighting, you could leverage global illumination for real-time lighting on top of that.
This could get very interesting...
This seems like a client side revolution more than a development side one. I’m struggling to think of a method to build scenes based on this that isn’t just building the scene ‘traditionally’ and then converting it into a lot of gaussians.
I also am kind of unsure how we’d animate in this. How do we vary light sources, have moving characters, change the environment itself? For all of those animations it seems like we’d have to essentially convert every frame with new gaussians, or automate the animation of generated gaussians from the prior frame or so.
The applications I see benefitting from this would be architectural visualisation, and maybe some vr implementations, at this current stage it just doesn’t have the capabilities to take over other technologies though I am extremely excited to see how this develops. Beautiful work on the video!
You cant, which is why all these "infinite detail" solutions are mostly always limited to static scenes. Works great for applications that dont need dynamic elements, but not really at all for games.
That was the best 2(ish) minutes of listening to someone describing a new thing I've ever spent in my life. Also, this would be amazing if it could be implimented into games/VR/AR.
Reminds me of Euclideon ruclips.net/video/iVsyB938ovY/видео.html in the sense that animations seem difficult to do. Would you need to train the Gaussian splats for every bone / skeleton configuration? Do they interpolate well?
Only compared to Euclideon this is something that isn't proprietary and everyone can benefit from.
Unlike this method, Euclideon was done with Voxels
This video gave all the information neccesary in the shortest time possible while still containing entertaining editing. Bravo sir!😂
If it provided all the information necessary you could sit down and write the code without looking at any other resource.
This was first video I clicked about 3D Gaussian Splatting. I am glad I did.
This is probably useful for certain types of games, scenes, and cinematics, even right now, today.
I could see, for instance, a walking simulator using this technique.
Alternatively, this could also be used to produce interactive cinematics that let you move around the pre-rendered cutscenes, though I shudder to imagine the file size.
But for me the major thing that this technique would need to be useful in a variety of situations is some creativity in mixing it with the existing rednering pipelines. I could see an Unreal Engine 5 character not looking too out of place in a scene like this, for instance, so I could see it not being that bad to render character traditionally and the environment using this method. Beyond that, I don't think you'd specifically have to use photographs for it, either; I think you could do some advanced ray tracing (like in, say, Cyberpunk), and baked it into the scene itself to reduce the load on the GPU. It wouldn't be as dynamic, because you wouldn't have moving objects, or at least not as many...But I'll take that tradeoff in some games for 120FPS+ with super high quality ray tracing in the environment.
To me it almost feels like a different way of storing the scene, rather than a new rendering pipeline. Instead if vertices, edges and faces, the scene can be optimised as a gaussian point cloud.
This is perfect for when you want to buy a house, retail companies can let you download a walk around gaussian splat sim for you to explore.
Several gigs of vram? Nvidia is releasing 8gb consumer cards now, instead of 12gb. I really hope this gets raytracing levels of hype and nvidia gets forced to shit out 16+ gb mid tier cards
anyways, this is a really good video, thanks for showing some chool tech in an approachable way! I don't know much about the rendering pipeline but still managed to understand at least the base concept of the technology. loved the editing style!
I love how invida has to "shit out" new cards, really summarises their last GPUs releases
c h o o l
The suggested limit is 24 gigs of v ram. you can do it lower but you wont get as much of the alphas destroyed
My next card will have 24GB VRAM. I won't purchase less, and I also won't purchase it if it's expensive af. Let NVIDIA cook for a few years.
It's stuff like this that makes me believe that in the distant future, we'll be able to do pretty much anything in a video game.
You deserve every view and sub you got from this. Amazing editing, quick and to the point.
I really hope this gets picked up and adopted quickly by companies that are training 3-D generation on nerfs. The biggest issue I’m seeing is resolution. I imagine this is what they were talking about coming in the next update with imagine 3D. Fingers crossed that would be insane.😮
Run AMD's upscaling over it
It will have an impact only if it doesn't require more than like 8 gigs of ram
it requires 4 gigs
@@IndividualKex oh then it's great I guess, can't wait to see it implemented in a game or becoming a feature in a free engine like unity or unreal
What do you plan to run this on ? A phone ?
My computer has 12 GB of VRAM. And that's just the video RAM
@@xl000 why not run it on a phone?
Congrats on making like a solidly informationally dense video, we like this!
hehe... dense.
as in dense layers. get it? lol
very wurtzian
I feel like I just took a shot of information, this is the espresso of informative content on RUclips