- Видео 30
- Просмотров 398 392
Jia-Bin Huang
Добавлен 2 июл 2008
Sharing what I learned along the way!
3D Reconstruction by Shedding New Light, Literally
We present a method to reconstruct 3D objects from images captured under extreme illumination variations.
📝 arxiv.org/abs/2412.15211
🌐 relight-to-reconstruct.github.io/
00:00 Introduction
00:40 Single illumination case
01:19 Unstructured image collection
02:07 Multiview relighting
02:30 Shading embedding vs. appearance embedding
03:15 Comparisons
Reference:
Hadi Alzayer, Philipp Henzler, Jon Barron, Jia-Bin Huang, Pratul P. Srinivasan, and Dor Verbin
Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation
arXiv 2024
📝 arxiv.org/abs/2412.15211
🌐 relight-to-reconstruct.github.io/
00:00 Introduction
00:40 Single illumination case
01:19 Unstructured image collection
02:07 Multiview relighting
02:30 Shading embedding vs. appearance embedding
03:15 Comparisons
Reference:
Hadi Alzayer, Philipp Henzler, Jon Barron, Jia-Bin Huang, Pratul P. Srinivasan, and Dor Verbin
Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation
arXiv 2024
Просмотров: 1 597
Видео
This AI Makes 3D Illusions
Просмотров 1,3 тыс.Месяц назад
Creating 3D multiview illusion art requires exceptional artistic skills and time. In this work, we show we can democratize 3D illusion generation with the power of AI (specifically, image priors from pretrained text-to-image models). Chapters: 00:00 Optical illusions 00:30 Examples of 3D illusion 01:21 Generating 3D illusion with AI 02:13 Single-view anamorphic art 03:00 Multiview anamorphic ar...
This AI Learned to Turn a Video Into Layers
Просмотров 8 тыс.Месяц назад
Layered composition has been an indispensable aspect of video editing. This work presents a method that extracts semantically meaningful layers for each object of interest. This allows applications like creative video compositions, moment retiming, action shots, and object removal. Chapters: 00:00 Layer composition 01:00 Video editing applications 02:33 Method overview 03:29 Comparison with the...
How FlashAttention Accelerates Generative AI Revolution
Просмотров 4 тыс.2 месяца назад
FlashAttention is an IO-aware algorithm for computing attention used in Transformers. It's fast, memory-efficient, and exact. It has become a standard tool for speeding up LLM training and inference. Join me and learn how FlashAttention works! References: - [OnlineSoftmax] arxiv.org/abs/1805.02867 - [From Online Softmax to FlashAttention] courses.cs.washington.edu/courses/cse599m/23sp/notes/fla...
How Rotary Position Embedding Supercharges Modern LLMs
Просмотров 4,7 тыс.2 месяца назад
Positional information is critical in transformers' understanding of sequences and their ability to generalize beyond training context length. In this video, we discuss - 1) Why attention mechanism in transformers is not sufficient - 2) Earlier attempt for injecting positional information (e.g., sinusoidal positional encoding) - 3) Rotary position embedding, and - 4) Techniques for long-context...
The Algorithm that Helps Machines Learn
Просмотров 1,4 тыс.3 месяца назад
How do machines learn? In this video, we review the basic ideas of optimizers, algorithms that efficiently update the parameters of deep neural networks and minimize the loss function. We will cover gradient descent, momentum, RMSProp, Adam, and AdamW. References: [RMSProp] www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf [Adam] Adam: A Method for Stochastic Optimization arxiv.o...
But What Are Transformers?
Просмотров 5 тыс.5 месяцев назад
Transformers is arguably the most influential neural network architecture in the last decade, powering the current boom of generative AI. In this video, we will review the basic ideas of the original encoder-decoder transformer architecture and understand how various design decisions are made. Enjoy!
How I Understand Flow Matching
Просмотров 21 тыс.7 месяцев назад
Flow matching is a new generative modeling method that combines the advantages of Continuous Normalising Flows (CNFs) and Diffusion Models (DMs). In this tutorial, I share my understanding of the basics of flow matching and provide an overview of how these ideas evolve over time. Check out the resources below to learn more about this topic. Slides Introduction: www.dropbox.com/scl/fi/tv449mdq0k...
3D Texture Made Easy
Просмотров 1,2 тыс.7 месяцев назад
Introducing TextureDreamer! TextureDreamer transfers textures from a few images to arbitrary 3D shapes. Excited about democratizing 3D content creation! TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion Yu-Ying Yeh, Jia-Bin Huang, Changil Kim, Lei Xiao, Thu Nguyen-Phuoc, Numair Khan, Cheng Zhang, Manmohan Chandraker, Carl S Marshall, Zhao Dong, and Zhengqin Li IEEE...
How We Can Convert Any Videos to 3D
Просмотров 2,5 тыс.8 месяцев назад
Videos are windows to another world. But the videos today are *flat*, confined to the original viewpoints. We showcase a method for converting any 2D videos into 3D videos that allow free-view synthesis. Fast View Synthesis of Casual Videos Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, and Feng Liu European Conference on Computer Vision, 202...
How Do Computers See Motion? Lucas-Kanade Method Explained
Просмотров 2,3 тыс.9 месяцев назад
How can machines perceive the dynamic world around us? In this video, we discuss an influential Lucas-Kanade tracking method. The core algorithm and its variants are used in a wide variety of computer vision applications. Stay until the end to learn about the inspiring story behind this seminal paper! Reference: - Bruce D Lucas and Takeo Kanade, An iterative image registration technique with an...
What are Good Features to Track? Shi-Tomasi Corner Detector Explained
Просмотров 1,3 тыс.9 месяцев назад
Identifying reliable features for tracking is an important step for many computer vision systems, including video stabilization, object tracking, and simultaneous localization and mapping (SLAM). This video covers the basics of corner detection algorithms. References: Jianbo Shi and Carlo Tomasi, Good Features to Track, CVPR 1994 C Harris, M Stephens, A combined corner and edge detector, Alvey ...
How does OpenAI's Sora work?
Просмотров 52 тыс.10 месяцев назад
OpenAI presents Sora, a text-to-video model for generating high-quality video from text prompts. In this video, we explain a high-level overview of how Sora works.
Compositional Text-to-Image Generation Made Easy
Просмотров 1,4 тыс.11 месяцев назад
Fast View Synthesis of Casual Videos Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, and Feng Liu arXiv 2023 📝 Paper: arxiv.org/abs/2312.02135 🌐 Website: casual-fvs.github.io/ Abstract: Novel view synthesis from an in-the-wild video is difficult due to challenges like scene dynamics and lack of parallax. While existing methods have shown promi...
How I Understand Diffusion Models
Просмотров 41 тыс.Год назад
Diffusion models are powerful generative models that enable many successful applications like image, video, and 3D generation from texts. In this tutorial, I share my understanding of the diffusion model basics, including training, guidance, resolution, and speed. Below are some other great resources to learn more about diffusion models. Slides Here are the slides used in this video Training: b...
3D Human Digitization from a Single Image!
Просмотров 35 тыс.Год назад
3D Human Digitization from a Single Image!
Expressive Text-to-Image with Rich Text
Просмотров 6 тыс.Год назад
Expressive Text-to-Image with Rich Text
Immersive 3D Rendering from Casual Videos
Просмотров 18 тыс.Год назад
Immersive 3D Rendering from Casual Videos
Step into the World from a Single Image
Просмотров 2,4 тыс.Год назад
Step into the World from a Single Image
Miss Korea 2013 Contestants Face Morphing
Просмотров 170 тыс.11 лет назад
Miss Korea 2013 Contestants Face Morphing
When can we expect to test this awesome concept?
这种方法要是成熟了,岂不是,可以轻易地扫描整个世界!每个人拿手机随便一拍就可以重现现实世界了,而且还可以多人合作,一起分享照片来重建。
is this planning to be opensource?
Why everyone uses RoPE instead of AliBi?
Graduate student descent 😂😂 I will now never forget the concepts in this video ❤
Haha! That’s great!
Great video 👏
Thank you!
Amazing video, thank you
Thank you for your kind words!
I really, really love this video! Thanks for making those complex papers interesting and understandable for me. Now, I have the courage and enthusiasm to read the original papers. Haha!
That's great to hear! Glad you found it helpful.
Fantastic work and a very good explainer video. I hope this makes it into some products soon.
Thanks a lot!
Uncle Roger??? haha sorry, good explanation, cheers
Thank you! Cheers!
Wow! What a beautiful explanation!!!!
Thank you so much!
Idont know if there's anybody like me, the video is easy to understand but I need to watch it more, for now I've watched it for 10 times but still does not fully get the formulas. Thank you so much!
Yup, it’s not easy to understand these math equations. But I hope the video provides some intuition on why and how it works.
amazing!
Thank you! Cheers!
Really amazing video! May I ask what tools you use to create this video?
Thanks! The animation comes from PowerPoints. I edit the video with Adobe premiere pro.
This is the best video on diffusion models, I can't even imagine how you were able to distill this much info into 17 minutes
Glad it was helpful! Thanks a lot!
Very entertaining and valuable. Excellent editing and explainer! subbed
Glad you enjoyed it! Thanks for subscribing!
Would it be better if your lidar to accurately measure distance of various objects in environment
Probably not. LiDAR estimates the distance by detecting reflected light. It would have difficulty in shiny objects (since the specular surfaces reflect light at specific angle). So unless the sensor is perfectly aligned with the reflected path, it won’t “see” the object. Therefore, LiDAR would help in estimating the geometry of diffuse surface, but not shiny objects.
@jbhuang0604 good point mate, well I'd work in this field and see lidar uses polarization to specify the surface type and using dual wavelength emitters can help with transparent or translucent objects and at last there's a way called multi echoes processing which are to be done to distinguish between surface and it's reflectivity And using RGB data one can make a proper 3d reconstruction of the object and environment surrounding it... Just like apple does but not quite accurate.. I was in game development and i used these techniques to create 3d version of object but that time it was just limited to shape size and albedo but now it's far more developed..
Wow! That’s cool! Yup, LiDAR usually has difficulty in working with reflective and transparent surfaces. It’s interesting that you could handle these challenging cases.
Among us
That’s right!
Are you Nvidia engineer?
Nope!
@jbhuang0604 just that you are presenting the algorithm as "we".
Ah! Got it. This is a collaborative work between Google and University of Maryland College Park.
It's a very good explanation and video in general as well!
Glad it was helpful!
@@jbhuang0604 Love all of the content on this channel! Thank you so much for doing this. I am spreading info about this great channel everywhere :))
nice
Glad you liked it!
great video!🎉🎉❤❤
Glad you liked the video!
that is a smart way to fix some issues and i need it!
Very cool!
Really well-made video! Love how you put all these concepts in the same framework and explain all the math intuitively!
Thanks for the kind words, Xuan!
Thank you for such a great videos with all the steps and equations explained so clearly! I was looking for the referenced papers to dive deeper and found those in the video description! I've learned so much through the video! Your students are so lucky to have such a dedicated instructor!
Thanks so much for your kind words!
Do you work at a 7/11 in Kilsyth
What’s that?
@ ice coffee
why you want to remove the black kids only Jia smh this is one of the assumptions that you train an ai on , that inherits bias (racist tendencies) without explicitly being drafted to grasp the causality of its motive.
I am a bit confused. The method is for general objects, not specific to people (or race).
@@jbhuang0604 yes as far as you know , hard to estimate what associations the network carries forward as you extrapolate capabilities over time , that may bake in bias in ways you had not considered
compounding interest of color schema as best practice could result in causal tendencies across varied modality as capacities increase toward more general purpose networks , for example -- were the masking purely done in black and white for autonomous vehicle , with black being labeled as a phantom person , an edge case scenario (1 in 1million) might result in a zoox or waymo determining its ok to smack me on the road
I am a huge fan of this work in any case , that's why I am here
Got it! Thanks for the reminder! Yes, we definitely need to work on reducing these unintentional bias.
Only 30 seconds in, I could already tell that this video was going to be quality and immediately subscribed. Awesome work!!!!
Really appreciate your comment! Glad that you like the video!
Great content, just a small tip, avoid extraneous load, some sound seems unnecessary.
Thanks for the feedback!
Great video
Thank you!
这么新的技术能找到质量这么高的视频,赛博活佛拯救无知本科生啊😭
Thanks a lot!
Every video on that channel is a banger
Glad you like it! Thanks!
Oh my god, I saw this paper before! It was really cool! Good luck :D The video presentation quality here is excellent. (I'm the first author of Diffusion Illusions)
Thanks a lot for your comment. Your work is a big inspiration for us!
Wonderful tutorial! Thank you!
Thanks for your kind words!
This video explains the maths of Flow Matching very well!! Esp. you mentioned that Flow Matching is a generalised version of diffusion model, it suddenly makes all sense. Looking forward to your next video!!
Thank you! You made my day!
This was a very good video. also enjoyed the kanade clip at the end. thank you.
Thanks, I'm glad you enjoyed it!
Thank you for this video! Clearly explained!
Thank you!!
Thank you for this video! Amazing explanation!
You’re welcome! Happy that you liked it.
@11:18 not HMB but HBM
Good catch! Clearly I was just trying to make sure if people are paying attention. :-p
有么有开源计划啊。 我本来还想用inpainting来修复一些视频,这玩意太牛逼了
We are working on it! Stay tuned!
This is the tool I've been waiting for. In the future we'll be able to add generate layers in, we'll be able to generate 3D objects and add them to scenes and also change the position of the camera. All these technologies exist to day in isolation.
Exciting times ahead!!
你好,什么时候可以使用到这项伟大的技术
We are still working on a version that we can release.
Is this open-source?
We are working on it. Stay tuned!
I need this NOW!
We are working on it!
Combining this video with Umar Jamil implementation is useful
That’s COOL!
Can't wait to this goes live!
We are excited as well!
Considering we saw the demo of the toy airplane in this video... Practical flightcrafts/starships/battleships for practical effects is way easier to do without CGI... with of course some AI involved. :) I honestly would like to try this new tool. Any chance for a code release?
Thanks for your comment. We cannot release the version building upon Google internal pretrained model so we have to rebuild it using open source model. We are still working on it and hopefully can release that version.
@@jbhuang0604 Thats awesome, looking forward to try it! Take your time though; no need to rush. :) I can definitely wait.
Great! We are working on it!
This is super promising and exciting, congrats on making this possible!
Thank you so much!
this kind of result is pretty amazing
Yes, we are very excited about what new opportunities this tool can unlock!
pretty cool!
Thanks for the kind words!