C4W4L03 Siamese Network
HTML-код
- Опубликовано: 7 окт 2024
- Take the Deep Learning Specialization: bit.ly/32Rqs4S
Check out all our courses: www.deeplearni...
Subscribe to The Batch, our weekly newsletter: www.deeplearni...
Follow us:
Twitter: / deeplearningai_
Facebook: / deeplearninghq
Linkedin: / deeplearningai
summary:
input is of an image, and there are multiple layers before the output is generated, output is endocded - mathematical representation of the actual image -, same procedure is done for another image. to determine if both images represents the same thing, they take the encoded image and tries to use difference of the point of the encoded output, if the difference is large than the images are representing different thing and vice versa.
Thank you so much to provide this nice video
But why do we need a new architecture for that. Wont the embeddings be far away for two dissimilar images with any neural network typically?
There is a high pitch sound in the background
Glad you confirmed it, wasn’t sure if it’s me or my device.
How do we determine the threshold value for every data input ?
nice expalantion
What's the benefit of having the same model twice? All I see is loss in memory.
Also, do feature vectors, or these encodings, get saved so they won't have to be encoded each time which is more time effecient
So, you only have the same weights stored once. You don't have twice as many parameters, so there is no "memory loss". It is functionally the same as running two different images through one neural network and then taking the feature vectors at the output and comparing them
@@JoeJimson There is a difference. The backpropagation is happening differently.
@@sadenb I think there is no training ... it is pre-trained neural network model.. we just calculate the encodings of the input images by running through them.
@@deekshantmalvi4612 This is not true. Embeddings from pretrained models don't have these metric properties, you have to train the embeddings for your task.
@@sadenb Once the 2 images are fed in, the derivative of the loss function will be used for both the images and the same network is updated correspondingly. It's like one training pair contains 2 images instead of an image and a label. That's all.
am i the only one hearing the very high pitch sound through out all the video ?
Should we need a huge data to a similarity of two different signals? For example I’m having two signals and I want to ensure they are both similar or not. In this case, while using Siamese network, these two signals are enough to input in the network with only one label of (x1, x2, 0) or I need to define more labels to train the network and compute the similarity? How many datas samples are required?
what is meant by the subscript 2 in the norm term?
The subscript two represents the l-2 (or Euclidean norm). A norm is nothing but a mapping from a vector space to the positive reals. There are many norms, so a subscript lets us distinguish between them (at least for the p-norms).
How are we going to train the network with just one input image?
You just compute encodings of image using convnets and then apply triplet loss function to classify.
So does the model need to be trained in any way? Would a pretrained resnet model work better than a rondomly innitialized one?
@@Throwingness yes and yes
Why his voice makes me sleep ?
wake up lady, you've missed one great lecture
Probably you're stupid
Shall we use this siamese network for identifying two samole audio belongs to same person or not
Or audio recognition , ??
yes, theoretically siamese networks should be able to find similarity between any two samples from the same data distribution.
So u better have a really good CNN structure loool