Support Vector Machine (SVM) from Scratch - Machine Learning Math & Python [Code Fix in Description]

Поделиться
HTML-код
  • Опубликовано: 14 ноя 2024

Комментарии • 9

  • @ruima3847
    @ruima3847 6 месяцев назад +1

    Hi Kai, thanks for posting this video. The concepts are clearly explained and it helped me a lot! :) However, I have one doubt after watching the video and coding along with your notebook. I am bit confused about how the gradient is calculated in your code:
    def calc_gradient(X, y, w, b, C=1):
    n = X.shape[1]
    m = X.shape[0]
    dj_dw = np.zeros(n)
    dj_db = 0.
    for i in range(m):
    hinge_condition = (1 - y[i] * (np.dot(w, X[i]) + b)) > 0
    if hinge_condition:
    dj_dw += w - (C * y[i] * X[i])
    dj_db += -C * y[i]
    else:
    dj_dw += w
    return dj_dw, dj_db
    Arent we repeatedly adding w to the gradient this way whenever a data point is misplaced on the wrong side of the support vector while derivation of dj/dw = w - sum[yi * xi if(condition > 0) else 0] says w is outside the summation sign thus added only once. So I wonder if I could code this way:
    def calc_gradient(X, y, w, b, C=1):
    n = X.shape[1]
    m = X.shape[0]
    dj_dw = np.zeros(n)
    dj_db = 0.
    for i in range(m):
    hinge_condition = (1 - y[i] * (np.dot(w, X[i]) + b)) > 0
    if hinge_condition:
    dj_dw -= (C * y[i] * X[i])
    dj_db += -C * y[i]
    dj_dw += w
    return dj_dw, dj_db

    • @dylankailau6672
      @dylankailau6672  6 месяцев назад +1

      Wow yes you are absolutely right! Thanks for the correction, I'll be sure to update the repository!

  • @faaz12356
    @faaz12356 4 месяца назад

    the best one to explain svm so far (big.. big thanks man)

  • @gunasekhar8440
    @gunasekhar8440 6 месяцев назад +1

    Hi @kaki, at 18.26 you mentioned that max term in the forumla right. Can we asssume that always we need to try to maximize wrong point distance from decision boundary in order to perform good classification model. I mean we are calculating the distance of its from the passing line right. Indirectly we need to maximize that distance from the wrojg point to passing line right. Correct me if im wrong!!

    • @dylankailau6672
      @dylankailau6672  6 месяцев назад

      Thanks for the question! Yes we use the max function to calculate how wrong a point is relative to the decision boundary.
      If the point is on the correct side of the decision boundary, the max function gives us 0, but if the point is on the incorrect side of the boundary, the max function gives us a value greater than 0.
      This max function results in our "slack" variable for each point. The worse our decision boundary is, the more slack we need in order to satisfy the SVM constraints.
      Our goal is to move the decision boundary to a location where our total slack is minimized and our margin is maximized. Therefore, the next part of the video highlights how to accomplish this using gradient descent.

  • @gunasekhar8440
    @gunasekhar8440 6 месяцев назад

    Hi, nice video. For kernel trick first we need to apply the kernel trick to make data to linear then apply svm for decision boundary right

    • @dylankailau6672
      @dylankailau6672  6 месяцев назад +1

      Thanks I'm glad you enjoyed! That sounds about right. If I'm understanding you correctly, the kernel trick is a way of projecting the datapoints into a higher dimension and then a hyperplane can more easily split the datapoints into two classes. I should also note that this video doesn't touch on kernels, but it's a simple introduction to SVM classification.

  • @pranjalpokhrel5944
    @pranjalpokhrel5944 9 месяцев назад

    9:44 in 2/||w|| , why 2 ?is this because no. of features are 2 , in this example?

    • @dylankailau6672
      @dylankailau6672  9 месяцев назад

      Thanks for the question!
      We have 2 in the numerator to represent the distance from the hyperplane (yellow line) to the upper 1 unit boundary PLUS the distance to the lower -1 unit boundary. These two 1 unit distances add to 2.
      Also, if you would like double check that the margin should be written as 2/||w||, you can compare its numerical value (calculated with the weight values we created) to the distance between the blue and red points on the graph circled in orange. The margin equation should equal the distance between the two support vectors. (I calculated this to be ~4.472 for both)