Coursera 7 - Support Vector Machines
From Logistic Regression
to Support Vector Machines
1. Large Margin Classification
Alternation view of logistic regression
$ \begin{align} h_\theta (x) = g({\theta^T x}) = \dfrac{1}{1 + e{-\thetaT x}} \end{align} ; , ; h_\theta (x) \in [0, 1] $
<img src="/images/ml/coursera/ml-ng-w3-02.png" width=“820” height=“500” align=“middle” /img>
$ y = 1 ; when ; h_\theta(x) = g(\theta^T x) \geq 0.5 ; when ; \theta^T x \geq 0 $.
$ y = 0 ; when ; h_\theta(x) = g(\theta^T x) \le 0.5 ; when ; \theta^T x \le 0 $
We can compress our cost function’s two conditional cases into one case:
$ \mathrm{Cost}(h_\theta(x),y) = - y \cdot \log(h_\theta(x)) - (1 - y) \cdot \log(1 - h_\theta(x))$
We can fully write out our entire cost function as follows:
J(θ)=−m1∑_i=1m[y(i)log(h_θ(x(i)))+(1−y(i))log(1−h_θ(x(i)))]
J(θ)=min__θm1[∑_i=1my(i) (−logh_θ(x(i)))+(1−y(i))(−log(1−h_θ(x(i))))]+2mλ∑_j=1nθ_j2
cost_1(θTxi)=−logh_θ(x(i))
cost_0(θTxi)=−log(1−h_θ(x(i)))
J(θ)=min__θm1[∑_i=1my(i) (cost_1(θTxi))+(1−y(i))(cost_0(θTxi))]+2mλ∑_j=1nθ_j2
1.1 Optimization Objective
J(θ)=min__θm1[∑_i=1my(i) (cost_1(θTxi))+(1−y(i))(cost_0(θTxi))]+2mλ∑_j=1nθ_j2
令 C=θ1
J(θ)=min__θC∑_i=1m[y(i) cost_1(θTxi)+(1−y(i))cost_0(θTxi)]+2m1∑_j=1nθ_j2
1.2 Large Margin Intuition
<img src="/images/ml/coursera/ml-ng-w7-svm-1.png" width=“620” height=“400” align=“middle” /img>
<img src="/images/ml/coursera/ml-ng-w7-svm-2.png" width=“620” height=“400” align=“middle” /img>
<img src="/images/ml/coursera/ml-ng-w7-svm-3.png" width=“620” height=“400” align=“middle” /img>
1.3 Mathematics Behind Large Margin Classification
<img src="/images/ml/coursera/ml-ng-w7-svm-4.png" width=“620” height=“400” align=“middle” /img>
<img src="/images/ml/coursera/ml-ng-w7-svm-5.png" width=“620” height=“400” align=“middle” /img>
<img src="/images/ml/coursera/ml-ng-w7-svm-6.png" width=“620” height=“400” align=“middle” /img>
2. Kernels
2.1 Kernels I
2.2 Kernels II
3. SVMs in Practice
3.1 Using An SVM
Reference
Checking if Disqus is accessible...