Using any kind of regularization doesn't ensure that the constraint will be satisfied for sure I believe, It does depend on the problem you are optimizing for.
The question was very good and very nice explanation. One follow up question would be - when would we like to put such constraints on w matrix ? Are there any practical applications of doing this ?
Here W is a matrix and not a vector. Hence rows being unit vectors needs not mean that the columns are also unit vectors. Hence, we need both the constraints.
Gram-Schmidt process needs the original set of vectors to be linearly independent which is not guaranteed with the rows or columns of W obtained without additional constraints.
The final loss means with the existing regularisation term and the new constraint?let me know.This is correct. And if future if we are making some assumptions about weights so we an add regularisation term for that?
Do we get these weights when we define the architecture? I mean, while we are defining architecture we have to name weights also right? otherwise how to get these weights so that we can use them on the constraint.
After we define the architecture, we initialise the weights to random values and using gradient descent based approaches, we finetune the weights to minimise a desired objective function. The constraint of added to the objective function.
When you said that r_i should be perpendicular to r_j, and c_i should be perpendicular to c_j did you mean that the r.T*r and c.T*c should be an identity matrix? i.e. the diagonal elements of the matrix that contain dot product of the row_i and row_j, col_i and col_j should be 1 respectively, correct?
Orthogonal constraints are popular in Matrix Factorisation in the real world. This interview question test your depth of understanding of adding new constraints to DL/ML optimisation problems. This can be implemented in TF 2.0 by creating a custom loss and using GradientTape functionality.
This is the tentative syllabus. More details will be provided at the launch of the program. Semester-I 1. Essentials of AI ( 6 credits) : Python, SQL, Linear Algebra, Basics of Probability 2. Data Analysis and Visualisation ( 6 credits) : Plotting, Statistics for Data Analysis, Dimensionality reduction, Visualising high dimensional data, Real-world end to end case-studies 3. Machine Learning( 6 credits) : Calculus and Numerical Optimisation, Classification, Regression and Clustering algorithms, Real-world end to end case-studies Semester-2: 1. Advanced ML(with Deep Learning) [6 credits] : Recommender Systems, Matrix Factorization, Neural Networks, MLPs, Advanced Optimisation methods, Real-world end to end case-studies 2. Deep Learning-II [6 Credits]: CNNs, RNNs, Transformers, TensorFlow and PyTorch, Real-world end to end case-studies 3. Thesis [8 Credits] : Industry or Research focussed Thesis.
Using any kind of regularization doesn't ensure that the constraint will be satisfied for sure I believe, It does depend on the problem you are optimizing for.
The question was very good and very nice explanation. One follow up question would be - when would we like to put such constraints on w matrix ? Are there any practical applications of doing this ?
I just read your comments for other questions. Seems like this question has already been answered. Thanks.
wow... adding this constrain in regularisation is nice idea
Even L2 and L1 regularisation can be interpreted as adding a constraint on weights followed by using Lagrangian multipliers.
Doesn't w^tw=1 imply that ww^t=1, so can we drop that second constraint ww^t-1 from the loss
Here W is a matrix and not a vector. Hence rows being unit vectors needs not mean that the columns are also unit vectors. Hence, we need both the constraints.
can we do like this
first get the optimum w vector
and then apply gram schmitt method on w and wT vactor
Gram-Schmidt process needs the original set of vectors to be linearly independent which is not guaranteed with the rows or columns of W obtained without additional constraints.
Loved the explanation
The final loss means with the existing regularisation term and the new constraint?let me know.This is correct.
And if future if we are making some assumptions about weights so we an add regularisation term for that?
Yes, that’s true. We can add any constraints that we want to add into the loss function itself.
Thank you Sir
Instead of the Frobenius norm can we use another matrix norm?
Yes, you can. Frobenius norm is one of the simplest norms on matrices.
Do we get these weights when we define the architecture? I mean, while we are defining architecture we have to name weights also right? otherwise how to get these weights so that we can use them on the constraint.
After we define the architecture, we initialise the weights to random values and using gradient descent based approaches, we finetune the weights to minimise a desired objective function. The constraint of added to the objective function.
When you said that r_i should be perpendicular to r_j, and c_i should be perpendicular to c_j did you mean that the r.T*r and c.T*c should be an identity matrix? i.e. the diagonal elements of the matrix that contain dot product of the row_i and row_j, col_i and col_j should be 1 respectively, correct?
Yes, that’s correct
Is this jus the interview question or such scenario may occur in real world?
Orthogonal matrix constraints are commonly encountered in Matrix Factorization probelms in the real world.
Where this concept we can actually implement in real world scenario? And how to code for it?
Orthogonal constraints are popular in Matrix Factorisation in the real world. This interview question test your depth of understanding of adding new constraints to DL/ML optimisation problems. This can be implemented in TF 2.0 by creating a custom loss and using GradientTape functionality.
How do we add this constraint to the code?
That’s a good follow up question. We can define custom loss functions and use gradient tape in TF2.0 to achieve this.
Lagrange multipliers
How much will be the fee for PGP for appliedaicourse students? When it will start?
We will launch the application portal in the next week. The fee would be around 78K INR for the 1yr PGD program.
@@AppliedAICourse any discount for already pursuing students who just bought in September?
@@AppliedAICourse Is the syllabus course for 1year PGD available on the site ?
Yes, please contact us on +91 8106-920-029.
This is the tentative syllabus. More details will be provided at the launch of the program.
Semester-I
1. Essentials of AI ( 6 credits) : Python, SQL, Linear Algebra, Basics of Probability
2. Data Analysis and Visualisation ( 6 credits) : Plotting, Statistics for Data Analysis, Dimensionality reduction, Visualising high dimensional data, Real-world end to end case-studies
3. Machine Learning( 6 credits) : Calculus and Numerical Optimisation, Classification, Regression and Clustering algorithms, Real-world end to end case-studies
Semester-2:
1. Advanced ML(with Deep Learning) [6 credits] : Recommender Systems, Matrix Factorization, Neural Networks, MLPs, Advanced Optimisation methods, Real-world end to end case-studies
2. Deep Learning-II [6 Credits]: CNNs, RNNs, Transformers, TensorFlow and PyTorch, Real-world end to end case-studies
3. Thesis [8 Credits] : Industry or Research focussed Thesis.