Strikes me that someone sharing very engineering and problem-solving oriented statements can be both extremely useful from the standpoint of a practitioner and also touching on very profound, edge-of-understanding areas. You don't always get this dual experience, so I really appreciated hearing from a guest like that. :D
Really enjoyed the short/condensed version. Looking forward to watching the long version to improve understanding. I really like this podcast format! Also great material!
Brilliant content! On the topic of systematic way of finding the data augmentations, it feels like DA helps in overcoming "staticness" of the image domain (for natural images). As Dr. Simon noted, videos provide a lot of semantic information that's missing in the static image world. Looking at the objects, we naturally can find them "cropped" (obscured, behind other object) or in different colour schemes throughout the day. Knowing that the main bottleneck is compute, it makes the whole thing even more exciting (assuming the exponential compute growth keeps up). Big breakthroughs ahead of us! Thank you once again for this show, you're doing great job!
As a speech researcher, it was insightful to hear dr simon's thoughts on simclr, following the same recipe on speech via simclr and similar frameworks, seems to a sample efficient way to learn across various modalities.
Thank you for this great segment. So many insights in those papers. I guess contrastive loss works best when we can use our own intuitions of the right inductive bias to augment the dataset, like Kilcher notes. I struggled with this part in an NLP project, which was concerned with learning sentence embeddings; how do you augment sentences? There aren't many simple surface-level tweaks you can do, analogous to changing the hue and brightness and cropping. You could swap synonyms and delete the odd adjective, but more than that what can you do without changing the meaning?
I'm with Yann LeCun on this one -- I think regularized latent variable models are ultimately the way to go. They reduce the Kolmogorov complexity of the learned model, constraining the size of the solution space, whereas contrastive learning pushes on individual points. To shape the space you're going to need a lot of points...
Could you create a series / episode on active learning? Hugely relevant to work in industry, particularly segmentation tasks the require expert labellers, but weirdly little research on it. Recently discovered the channel and love the work you do.
1:23:04 Random sampling is not trickery. If you try to follow up the sequence "A cat is a" then suggestions with their probability are maybe ("is a", 0.4), ("cute animal", 0.3), ("awesome climber", 0.3). The first is a useless repetition and the latter two make sense. That means that most of the probability mass is concentrated on stuff that makes sense, but the most probable continuation doesn't. Sampling the most probable continuation ignores that most of the probablity rests on stuff that makes sense. When more outputs are available this effect can get more pronounced.
The way I see it, data augmentation is a way of making up for the lack of suitable prior in our model (CNN). Data augmentation doesn't generate new data, but it does implicitly strengthen the weak prior of the CNN which can lead to better generalization, faster training, etc.
Stupid question maybe - if self similarity seems to be a pretty straightforward way to detect unnecessary layers and their exclusion would help - why isn’t a „self similarity check“ and corresponding auto deletion of layers not build into high level frameworks like Keras?
Thanks, you made my day! One usecase of the embeddings learned by contrastive learning is neural search (finding nearest neighbours). However, you can use many different attributes for contrastive learning. Would you just train a head on a pretrained network for each attribute? Or is the gain from fine-tuning the whole model worth it?
When there is no direct correlation in a layer, adding other layers may provide a combined feature correlation. Once the necessary combined correlations are created, there is nothing left to find. Additional layers just create unnecessary combinatoric burden.
Please tell Simon that he can find type dispatching (function overloading) in fastcore from fastai. He is absolutely right that python is not the best language for de DL job. Anyway, I believe python will be default la gauge in the long run in the same way JavaScript is for web dev.
Thanks for the feedback! We are trying to achieve a certain format and are always experimenting - might have gone too far on this one. It's just a show teaser though, the full interview is always shown afterwards with no music or distractions, so just skip ahead.
I can't believe how much great content you've put out there on the internet. From one netizen to another - thank you!
Strikes me that someone sharing very engineering and problem-solving oriented statements can be both extremely useful from the standpoint of a practitioner and also touching on very profound, edge-of-understanding areas. You don't always get this dual experience, so I really appreciated hearing from a guest like that. :D
So many practical insights. So fan to look under the hood of these popular papers.
Really enjoyed the short/condensed version. Looking forward to watching the long version to improve understanding. I really like this podcast format! Also great material!
Brilliant content! On the topic of systematic way of finding the data augmentations, it feels like DA helps in overcoming "staticness" of the image domain (for natural images). As Dr. Simon noted, videos provide a lot of semantic information that's missing in the static image world. Looking at the objects, we naturally can find them "cropped" (obscured, behind other object) or in different colour schemes throughout the day. Knowing that the main bottleneck is compute, it makes the whole thing even more exciting (assuming the exponential compute growth keeps up). Big breakthroughs ahead of us!
Thank you once again for this show, you're doing great job!
Great insights. I really appreciate the relatable and grounded discussions during the whole session. Thank you!
Wow, with every episode even better content :-O mind-blowing
🔥🔥✌️✌️😊😊
Such a good discussion! Thanks!
As a speech researcher, it was insightful to hear dr simon's thoughts on simclr, following the same recipe on speech via simclr and similar frameworks, seems to a sample efficient way to learn across various modalities.
Thanks for this really nice podcast. Please continue :)
Which tool to draw the graph at around 2:40 to 5:00 ?
Whimsical
Great episode! Plenty of practical ideas to play with.
Tim, your Whimsical notes are great! Can you release them public?
Thank you for this great segment. So many insights in those papers. I guess contrastive loss works best when we can use our own intuitions of the right inductive bias to augment the dataset, like Kilcher notes. I struggled with this part in an NLP project, which was concerned with learning sentence embeddings; how do you augment sentences? There aren't many simple surface-level tweaks you can do, analogous to changing the hue and brightness and cropping. You could swap synonyms and delete the odd adjective, but more than that what can you do without changing the meaning?
Another awesome and informative video. 😍
Thank you Abby! 😎
I'm with Yann LeCun on this one -- I think regularized latent variable models are ultimately the way to go. They reduce the Kolmogorov complexity of the learned model, constraining the size of the solution space, whereas contrastive learning pushes on individual points. To shape the space you're going to need a lot of points...
Fascinating, I’m interested in hearing more about text to video generation
Another fantastic one. I've got more papers to read about contrastive loss now
Could you create a series / episode on active learning? Hugely relevant to work in industry, particularly segmentation tasks the require expert labellers, but weirdly little research on it. Recently discovered the channel and love the work you do.
1:23:04 Random sampling is not trickery. If you try to follow up the sequence "A cat is a" then suggestions with their probability are maybe ("is a", 0.4), ("cute animal", 0.3), ("awesome climber", 0.3).
The first is a useless repetition and the latter two make sense. That means that most of the probability mass is concentrated on stuff that makes sense, but the most probable continuation doesn't. Sampling the most probable continuation ignores that most of the probablity rests on stuff that makes sense. When more outputs are available this effect can get more pronounced.
The way I see it, data augmentation is a way of making up for the lack of suitable prior in our model (CNN). Data augmentation doesn't generate new data, but it does implicitly strengthen the weak prior of the CNN which can lead to better generalization, faster training, etc.
Stupid question maybe - if self similarity seems to be a pretty straightforward way to detect unnecessary layers and their exclusion would help - why isn’t a „self similarity check“ and corresponding auto deletion of layers not build into high level frameworks like Keras?
Thanks, you made my day! One usecase of the embeddings learned by contrastive learning is neural search (finding nearest neighbours).
However, you can use many different attributes for contrastive learning. Would you just train a head on a pretrained network for each attribute? Or is the gain from fine-tuning the whole model worth it?
Today is my lucky day. I found it!!
Loved it
amazing
Not sure why you'd not use a mean/variance brightness contrast normalization on the colour channel histograms.
When there is no direct correlation in a layer, adding other layers may provide a combined feature correlation. Once the necessary combined correlations are created, there is nothing left to find. Additional layers just create unnecessary combinatoric burden.
Agreed. Any volonteer to code that for next week ?
Please tell Simon that he can find type dispatching (function overloading) in fastcore from fastai. He is absolutely right that python is not the best language for de DL job. Anyway, I believe python will be default la gauge in the long run in the same way JavaScript is for web dev.
After representation learning there will be data augmentation learning ;)
I had some similar thought, that maybe the selfsimilarity that mandelbrot discovered has some implication in NN
First!! 🙌🙌
Second
As always 🙌
Frequent cut-in video snippets are distracting from the talk.
Thanks for the feedback! We are trying to achieve a certain format and are always experimenting - might have gone too far on this one. It's just a show teaser though, the full interview is always shown afterwards with no music or distractions, so just skip ahead.
If it is the static picture inserts, I disagree. I dont always know what they are talking about, it helps to undertand
I disagree as well. I think they stimulate the imagination / provide a canvas while you are thinking about the content that is being talked about.
Please drop the background noise
... it's distracting and annoying.