Understanding the effect of $C$ in soft margin SVMs The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Gradient Descent for Primal Kernel SVM with Soft-Margin(Hinge) LossIs the Support Vector Classifier in some sense optimal?The optimization problem of soft margin Support Vector Machine: How to interpret?Understanding the equation for margin in linear classificationFinding gradient descent of soft-margin multiclass SVM with different conditionsDevision by norm vector to maximize margin in SVMsThe resulted approximation of using user defined constraints on SVMsMachine learning's activation functions classificationHow to map quadratic programming formulation to dual soft margin SVMMinimizing the soft margin hinge loss.

Wall plug outlet change

Did God make two great lights or did He make the great light two?

How should I replace vector<uint8_t>::const_iterator in an API?

Mortgage adviser recommends a longer term than necessary combined with overpayments

Can a 1st-level character have an ability score above 18?

Match Roman Numerals

Why can't devices on different VLANs, but on the same subnet, communicate?

Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?

Why did all the guest students take carriages to the Yule Ball?

What are these Gizmos at Izaña Atmospheric Research Center in Spain?

How to test the equality of two Pearson correlation coefficients computed from the same sample?

Keeping a retro style to sci-fi spaceships?

What is this lever in Argentinian toilets?

University's motivation for having tenure-track positions

Sort a list of pairs representing an acyclic, partial automorphism

What was the last x86 CPU that did not have the x87 floating-point unit built in?

How to delete random line from file using Unix command?

Windows 10: How to Lock (not sleep) laptop on lid close?

How does ice melt when immersed in water?

how can a perfect fourth interval be considered either consonant or dissonant?

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Can the DM override racial traits?

Arduino Pro Micro - switch off LEDs

What do you call a plan that's an alternative plan in case your initial plan fails?

Understanding the effect of $C$ in soft margin SVMs

The 2019 Stack Overflow Developer Survey Results Are In

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Gradient Descent for Primal Kernel SVM with Soft-Margin(Hinge) LossIs the Support Vector Classifier in some sense optimal?The optimization problem of soft margin Support Vector Machine: How to interpret?Understanding the equation for margin in linear classificationFinding gradient descent of soft-margin multiclass SVM with different conditionsDevision by norm vector to maximize margin in SVMsThe resulted approximation of using user defined constraints on SVMsMachine learning's activation functions classificationHow to map quadratic programming formulation to dual soft margin SVMMinimizing the soft margin hinge loss.

I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:

$$y_i(bf wcdot bf x + b) geq 1 - zeta$$

As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:

$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$

Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).

How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?

EDIT

Here is the same question but I don't understand the answer.

edited Apr 1 at 5:54

asked Mar 31 at 14:08

Kaushal28

224210

add a comment |

$$y_i(bf wcdot bf x + b) geq 1 - zeta$$

$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$

EDIT

Here is the same question but I don't understand the answer.

edited Apr 1 at 5:54

asked Mar 31 at 14:08

Kaushal28

224210

add a comment |

$$y_i(bf wcdot bf x + b) geq 1 - zeta$$

$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$

EDIT

Here is the same question but I don't understand the answer.

edited Apr 1 at 5:54

asked Mar 31 at 14:08

Kaushal28

224210

$$y_i(bf wcdot bf x + b) geq 1 - zeta$$

$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$

EDIT

Here is the same question but I don't understand the answer.

optimization convex-optimization machine-learning

edited Apr 1 at 5:54

asked Mar 31 at 14:08

Kaushal28

224210

edited Apr 1 at 5:54

asked Mar 31 at 14:08

Kaushal28

224210

edited Apr 1 at 5:54

asked Mar 31 at 14:08

Kaushal28

224210

asked Mar 31 at 14:08

Kaushal28

224210

asked Mar 31 at 14:08

Kaushal28

224210

add a comment |

1 Answer
1

active

oldest

votes

With perfect separation, you require that
$$
y_i(bf wcdot bf x + b) geq 1
$$
So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.

When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.

Elaboration

I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!

Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
$$
min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
$$
where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.

If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.

In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.

It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.

edited Apr 1 at 13:37

answered Apr 1 at 7:51

Alex Shtof

716518

$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03

$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10

$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13

$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53

$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3169430%2funderstanding-the-effect-of-c-in-soft-margin-svms%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.

Elaboration

If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.

edited Apr 1 at 13:37

answered Apr 1 at 7:51

Alex Shtof

716518

$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03

$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10

$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13

$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53

$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37

add a comment |

When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.

Elaboration

If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.

edited Apr 1 at 13:37

answered Apr 1 at 7:51

Alex Shtof

716518

$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03

$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10

$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13

$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53

$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37

add a comment |

When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.

Elaboration

If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.

edited Apr 1 at 13:37

answered Apr 1 at 7:51

Alex Shtof

716518

When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.

Elaboration

If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.

edited Apr 1 at 13:37

answered Apr 1 at 7:51

Alex Shtof

716518

edited Apr 1 at 13:37

answered Apr 1 at 7:51

Alex Shtof

716518

answered Apr 1 at 7:51

Alex Shtof

716518

answered Apr 1 at 7:51

Alex Shtof

716518

$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03

$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10

$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13

$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53

$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37

add a comment |

$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03

$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10

$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13

$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53

$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37

"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?

– Kaushal28
Apr 1 at 10:03

@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.

– Alex Shtof
Apr 1 at 10:10

Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.

– Kaushal28
Apr 1 at 10:13

Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?

– Kaushal28
Apr 1 at 12:53

@Kaushal28, I added some elaboration on the subject.

– Alex Shtof
Apr 1 at 13:37

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dgdrxrt

1 Answer
1

Elaboration

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Elaboration

Elaboration

Elaboration

Elaboration

Post as a guest

Popular posts from this blog

Boston (Lincolnshire) Stedsbyld | Berne yn Boston | NavigaasjemenuBoston Borough CouncilBoston, Lincolnshire

Ballerup Komuun Stääden an saarpen | Futnuuten | Luke uk diar | Nawigatsjuunwww.ballerup.dkwww.statistikbanken.dk: Tabelle BEF44 (Folketal pr. 1. januar fordelt på byer)Commonskategorii: Ballerup Komuun55° 44′ N, 12° 22′ O

1 Answer 1

Elaboration

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Elaboration

Elaboration

Elaboration

Elaboration

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Boston (Lincolnshire) Stedsbyld | Berne yn Boston | NavigaasjemenuBoston Borough CouncilBoston, Lincolnshire

Ballerup Komuun Stääden an saarpen | Futnuuten | Luke uk diar | Nawigatsjuunwww.ballerup.dkwww.statistikbanken.dk: Tabelle BEF44 (Folketal pr. 1. januar fordelt på byer)Commonskategorii: Ballerup Komuun55° 44′ N, 12° 22′ O

1 Answer
1

1 Answer
1

1 Answer
1