Understanding the effect of $C$ in soft margin SVMs The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Gradient Descent for Primal Kernel SVM with Soft-Margin(Hinge) LossIs the Support Vector Classifier in some sense optimal?The optimization problem of soft margin Support Vector Machine: How to interpret?Understanding the equation for margin in linear classificationFinding gradient descent of soft-margin multiclass SVM with different conditionsDevision by norm vector to maximize margin in SVMsThe resulted approximation of using user defined constraints on SVMsMachine learning's activation functions classificationHow to map quadratic programming formulation to dual soft margin SVMMinimizing the soft margin hinge loss.
Wall plug outlet change
Did God make two great lights or did He make the great light two?
How should I replace vector<uint8_t>::const_iterator in an API?
Mortgage adviser recommends a longer term than necessary combined with overpayments
Can a 1st-level character have an ability score above 18?
Match Roman Numerals
Why can't devices on different VLANs, but on the same subnet, communicate?
Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?
Why did all the guest students take carriages to the Yule Ball?
What are these Gizmos at Izaña Atmospheric Research Center in Spain?
How to test the equality of two Pearson correlation coefficients computed from the same sample?
Keeping a retro style to sci-fi spaceships?
What is this lever in Argentinian toilets?
University's motivation for having tenure-track positions
Sort a list of pairs representing an acyclic, partial automorphism
What was the last x86 CPU that did not have the x87 floating-point unit built in?
How to delete random line from file using Unix command?
Windows 10: How to Lock (not sleep) laptop on lid close?
How does ice melt when immersed in water?
how can a perfect fourth interval be considered either consonant or dissonant?
How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time
Can the DM override racial traits?
Arduino Pro Micro - switch off LEDs
What do you call a plan that's an alternative plan in case your initial plan fails?
Understanding the effect of $C$ in soft margin SVMs
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Gradient Descent for Primal Kernel SVM with Soft-Margin(Hinge) LossIs the Support Vector Classifier in some sense optimal?The optimization problem of soft margin Support Vector Machine: How to interpret?Understanding the equation for margin in linear classificationFinding gradient descent of soft-margin multiclass SVM with different conditionsDevision by norm vector to maximize margin in SVMsThe resulted approximation of using user defined constraints on SVMsMachine learning's activation functions classificationHow to map quadratic programming formulation to dual soft margin SVMMinimizing the soft margin hinge loss.
$begingroup$
I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:
$$y_i(bf wcdot bf x + b) geq 1 - zeta$$
As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:
$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$
Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).
How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?
EDIT
Here is the same question but I don't understand the answer.
optimization convex-optimization machine-learning
$endgroup$
add a comment |
$begingroup$
I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:
$$y_i(bf wcdot bf x + b) geq 1 - zeta$$
As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:
$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$
Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).
How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?
EDIT
Here is the same question but I don't understand the answer.
optimization convex-optimization machine-learning
$endgroup$
add a comment |
$begingroup$
I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:
$$y_i(bf wcdot bf x + b) geq 1 - zeta$$
As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:
$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$
Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).
How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?
EDIT
Here is the same question but I don't understand the answer.
optimization convex-optimization machine-learning
$endgroup$
I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:
$$y_i(bf wcdot bf x + b) geq 1 - zeta$$
As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:
$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$
Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).
How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?
EDIT
Here is the same question but I don't understand the answer.
optimization convex-optimization machine-learning
optimization convex-optimization machine-learning
edited Apr 1 at 5:54
Kaushal28
asked Mar 31 at 14:08
Kaushal28Kaushal28
224210
224210
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
With perfect separation, you require that
$$
y_i(bf wcdot bf x + b) geq 1
$$
So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.
When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.
Elaboration
I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!
Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
$$
min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
$$
where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.
If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.
In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.
It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.
$endgroup$
$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03
$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10
$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13
$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53
$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3169430%2funderstanding-the-effect-of-c-in-soft-margin-svms%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
With perfect separation, you require that
$$
y_i(bf wcdot bf x + b) geq 1
$$
So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.
When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.
Elaboration
I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!
Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
$$
min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
$$
where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.
If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.
In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.
It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.
$endgroup$
$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03
$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10
$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13
$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53
$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37
add a comment |
$begingroup$
With perfect separation, you require that
$$
y_i(bf wcdot bf x + b) geq 1
$$
So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.
When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.
Elaboration
I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!
Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
$$
min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
$$
where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.
If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.
In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.
It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.
$endgroup$
$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03
$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10
$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13
$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53
$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37
add a comment |
$begingroup$
With perfect separation, you require that
$$
y_i(bf wcdot bf x + b) geq 1
$$
So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.
When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.
Elaboration
I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!
Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
$$
min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
$$
where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.
If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.
In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.
It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.
$endgroup$
With perfect separation, you require that
$$
y_i(bf wcdot bf x + b) geq 1
$$
So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.
When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.
Elaboration
I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!
Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
$$
min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
$$
where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.
If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.
In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.
It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.
edited Apr 1 at 13:37
answered Apr 1 at 7:51
Alex ShtofAlex Shtof
716518
716518
$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03
$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10
$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13
$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53
$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37
add a comment |
$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03
$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10
$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13
$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53
$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37
$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03
$begingroup$
"When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
$endgroup$
– Kaushal28
Apr 1 at 10:03
$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10
$begingroup$
@Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
$endgroup$
– Alex Shtof
Apr 1 at 10:10
$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13
$begingroup$
Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
$endgroup$
– Kaushal28
Apr 1 at 10:13
$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53
$begingroup$
Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
$endgroup$
– Kaushal28
Apr 1 at 12:53
$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37
$begingroup$
@Kaushal28, I added some elaboration on the subject.
$endgroup$
– Alex Shtof
Apr 1 at 13:37
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3169430%2funderstanding-the-effect-of-c-in-soft-margin-svms%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown