Kernel SVM Primal -> Dual Derivation

Metronome · Jul 26, 2024

I am trying to derive the dual form of a support vector machine from the primal form (see the top of the section "Kernel SVM"). I proceeded assuming the [imath]ξ[/imath] variables should be included in what the primal minimizes over, as I believe [imath]C[/imath] is a hyperparameter but the [imath]ξ[/imath] variables are parameters to be tweaked by the classifier (further evidenced by formulations from other sources which also minimize over [imath]ξ[/imath]). I decided to first try the special case with [imath]2[/imath] features and [imath]2[/imath] training data points to see everything unpacked, and then maybe do it again using more general linear algebraic notation. So the primal I started with is...[math]\min\limits_{w_1, w_2, b, ξ_1, ξ_2}\ w_1 ^2 + w_2 ^2 + Cξ_1 + Cξ_2 \\ s.t.\ \ \ y_1 (w_1 {x_{feature 1}}_{_{point 1}} + w_2 {x_{feature 2}}_{_{point 1}} + b) \geq 1 - ξ_1 \\ y_2 (w_1 {x_{feature 1}}_{_{point 2}} + w_2 {x_{feature 2}}_{_{point 2}} + b) \geq 1 - ξ_2 \\ ξ_1 \geq 0 \\ ξ_2 \geq 0[/math]...where everything is a scalar (no vectors). The Lagrangian should be...[math]\mathcal L = w_1 ^2 + w_2 ^2 + Cξ_1 + Cξ_2 - \alpha_1 (y_1 (w_1 {x_{feature 1}}_{_{point 1}} + w_2 {x_{feature 2}}_{_{point 1}} + b) - 1 + ξ_1)\ -\ \alpha_2 (y_2 (w_1 {x_{feature 1}}_{_{point 2}} + w_2 {x_{feature 2}}_{_{point 2}} + b) - 1 + ξ_2)[/math]Copying this step, the original problem gets replaced by [imath]\max\limits_{\alpha_1 \geq 0, \alpha_2 \geq 0}\ [\min\limits_{w_1, w_2, b, ξ_1, ξ_2} \mathcal L][/imath]. To solve the inner minimization, I set the gradient of the Lagrangian to the zero vector and get the system...[math]w_1 = \frac{\alpha_1 y_1 {x_{feature 1}}_{_{point 1}} + \alpha_2 y_2 {x_{feature 1}}_{_{point 2}}}{2} \\ w_2 = \frac{\alpha_1 y_1 {x_{feature 2}}_{_{point 1}} + \alpha_2 y_2 {x_{feature 2}}_{_{point 2}}}{2} \\ \alpha_1 y_1 = -\alpha_2 y_2 \\ \alpha_1 = C \\ \alpha_2 = C[/math]...but there is not enough information in this system to substitute [imath]b[/imath], [imath]ξ_1[/imath], or [imath]ξ_2[/imath] out of the Lagrangian. How can I proceed? Was it incorrect to put [imath]ξ_1[/imath] and [imath]ξ_2[/imath] in the minimization in the primal? I don't see how treating them as constants would fix this.

Secondly, did I write the Lagrangian correctly? The main thing I'm unsure about is whether I need to subtract another [imath]ξ_1[/imath] and [imath]ξ_2[/imath] from the Lagrangian to account for the nonnegativity constraints of the primal, but I went with my hunch that these constraints are simply not included in the Lagrangian representation.

Kernel SVM Primal -> Dual Derivation

Metronome

Junior Member