Support Vector Machine Basics
Published November 09, 2023
Assembled from SVM notes and CQF Exam 3 Question 2
Hyperplane interpretation
Support Vector Machines are best understood through analysis of a hyperplane in standard form:
-
Varying has the effect of rotating the hyperplane about the intercept.
-
Varying has the effect of translating the hyperplane in relation to the origin.
-
Points to one side (in 2D, "above", w.l.o.g.) the hyperplane are those which satisfy
-
Equidistant parallel hyperplanes which form a margin can be represented as
Critical to understanding: Scaling the left side of the equation has the effect of expanding and contracting this margin. This means that the constraint can always be satisfied for linearly separable data!
The width of the margin of the SVM is given by , so SVM maximization of the margin is given by
For soft margins, the cost function is
where the regularization parameter determines the relative weight relationship between minimization of margin (the regularization term) and missclassifications.
Regularization parameter
The regularization parameter in a Support Vector Machine is a hyperparameter that determines the relative weight of the squared regularization term in the loss function.
In the popular library
sklearn
,
is the weight applied to missclassification, so is strictly
positive and is inversely proportional to the strength of
regularization.
The objective function of a standard SVM with a linear kernel, for instance, can be written as:
Regularization induces a penalty for model complexity. In the context of SVMs, the regularization term promotes model simplicity, i.e. large margins. In the context of soft margins SVMs, larger margins reduce variance and increase bias. Therefore, varying has the following effect on the model’s bias/variance trade-off:
-
Larger : Large values of reduce regularization and penalize missclassification. This leads to smaller margins in a more complex model with increased variance and reduced bias. Values of that are too large can lead to overfitting.
-
Smaller : Small values of increase regularization by relaxing the penalty for missclassification. This leads to larger margins in a simpler model with reduced variance but increased bias. Small values of are prone to underfitting.
Optimal values for can be found using cross-validation.