Understanding the effect of $C$ in soft margin SVMs The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Gradient Descent for Primal Kernel SVM with Soft-Margin(Hinge) LossIs the Support Vector Classifier in some sense optimal?The optimization problem of soft margin Support Vector Machine: How to interpret?Understanding the equation for margin in linear classificationFinding gradient descent of soft-margin multiclass SVM with different conditionsDevision by norm vector to maximize margin in SVMsThe resulted approximation of using user defined constraints on SVMsMachine learning's activation functions classificationHow to map quadratic programming formulation to dual soft margin SVMMinimizing the soft margin hinge loss.

Wall plug outlet change

Did God make two great lights or did He make the great light two?

How should I replace vector<uint8_t>::const_iterator in an API?

Mortgage adviser recommends a longer term than necessary combined with overpayments

Can a 1st-level character have an ability score above 18?

Match Roman Numerals

Why can't devices on different VLANs, but on the same subnet, communicate?

Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?

Why did all the guest students take carriages to the Yule Ball?

What are these Gizmos at Izaña Atmospheric Research Center in Spain?

How to test the equality of two Pearson correlation coefficients computed from the same sample?

Keeping a retro style to sci-fi spaceships?

What is this lever in Argentinian toilets?

University's motivation for having tenure-track positions

Sort a list of pairs representing an acyclic, partial automorphism

What was the last x86 CPU that did not have the x87 floating-point unit built in?

How to delete random line from file using Unix command?

Windows 10: How to Lock (not sleep) laptop on lid close?

How does ice melt when immersed in water?

how can a perfect fourth interval be considered either consonant or dissonant?

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Can the DM override racial traits?

Arduino Pro Micro - switch off LEDs

What do you call a plan that's an alternative plan in case your initial plan fails?



Understanding the effect of $C$ in soft margin SVMs



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Gradient Descent for Primal Kernel SVM with Soft-Margin(Hinge) LossIs the Support Vector Classifier in some sense optimal?The optimization problem of soft margin Support Vector Machine: How to interpret?Understanding the equation for margin in linear classificationFinding gradient descent of soft-margin multiclass SVM with different conditionsDevision by norm vector to maximize margin in SVMsThe resulted approximation of using user defined constraints on SVMsMachine learning's activation functions classificationHow to map quadratic programming formulation to dual soft margin SVMMinimizing the soft margin hinge loss.










2












$begingroup$


I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:



$$y_i(bf wcdot bf x + b) geq 1 - zeta$$



As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:



$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$



Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).



How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?



EDIT



Here is the same question but I don't understand the answer.










share|cite|improve this question











$endgroup$
















    2












    $begingroup$


    I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:



    $$y_i(bf wcdot bf x + b) geq 1 - zeta$$



    As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:



    $$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$



    Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).



    How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?



    EDIT



    Here is the same question but I don't understand the answer.










    share|cite|improve this question











    $endgroup$














      2












      2








      2


      2



      $begingroup$


      I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:



      $$y_i(bf wcdot bf x + b) geq 1 - zeta$$



      As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:



      $$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$



      Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).



      How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?



      EDIT



      Here is the same question but I don't understand the answer.










      share|cite|improve this question











      $endgroup$




      I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:



      $$y_i(bf wcdot bf x + b) geq 1 - zeta$$



      As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:



      $$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$



      Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).



      How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?



      EDIT



      Here is the same question but I don't understand the answer.







      optimization convex-optimization machine-learning






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Apr 1 at 5:54







      Kaushal28

















      asked Mar 31 at 14:08









      Kaushal28Kaushal28

      224210




      224210




















          1 Answer
          1






          active

          oldest

          votes


















          4












          $begingroup$

          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "69"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3169430%2funderstanding-the-effect-of-c-in-soft-margin-svms%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4












          $begingroup$

          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37















          4












          $begingroup$

          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37













          4












          4








          4





          $begingroup$

          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.






          share|cite|improve this answer











          $endgroup$



          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Apr 1 at 13:37

























          answered Apr 1 at 7:51









          Alex ShtofAlex Shtof

          716518




          716518











          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37
















          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37















          $begingroup$
          "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
          $endgroup$
          – Kaushal28
          Apr 1 at 10:03




          $begingroup$
          "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
          $endgroup$
          – Kaushal28
          Apr 1 at 10:03












          $begingroup$
          @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
          $endgroup$
          – Alex Shtof
          Apr 1 at 10:10




          $begingroup$
          @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
          $endgroup$
          – Alex Shtof
          Apr 1 at 10:10












          $begingroup$
          Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
          $endgroup$
          – Kaushal28
          Apr 1 at 10:13





          $begingroup$
          Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
          $endgroup$
          – Kaushal28
          Apr 1 at 10:13













          $begingroup$
          Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
          $endgroup$
          – Kaushal28
          Apr 1 at 12:53





          $begingroup$
          Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
          $endgroup$
          – Kaushal28
          Apr 1 at 12:53













          $begingroup$
          @Kaushal28, I added some elaboration on the subject.
          $endgroup$
          – Alex Shtof
          Apr 1 at 13:37




          $begingroup$
          @Kaushal28, I added some elaboration on the subject.
          $endgroup$
          – Alex Shtof
          Apr 1 at 13:37

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3169430%2funderstanding-the-effect-of-c-in-soft-margin-svms%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Triangular numbers and gcdProving sum of a set is $0 pmod n$ if $n$ is odd, or $fracn2 pmod n$ if $n$ is even?Is greatest common divisor of two numbers really their smallest linear combination?GCD, LCM RelationshipProve a set of nonnegative integers with greatest common divisor 1 and closed under addition has all but finite many nonnegative integers.all pairs of a and b in an equation containing gcdTriangular Numbers Modulo $k$ - Hit All Values?Understanding the Existence and Uniqueness of the GCDGCD and LCM with logical symbolsThe greatest common divisor of two positive integers less than 100 is equal to 3. Their least common multiple is twelve times one of the integers.Suppose that for all integers $x$, $x|a$ and $x|b$ if and only if $x|c$. Then $c = gcd(a,b)$Which is the gcd of 2 numbers which are multiplied and the result is 600000?

          Ingelân Ynhâld Etymology | Geografy | Skiednis | Polityk en bestjoer | Ekonomy | Demografy | Kultuer | Klimaat | Sjoch ek | Keppelings om utens | Boarnen, noaten en referinsjes Navigaasjemenuwww.gov.ukOffisjele webside fan it regear fan it Feriene KeninkrykOffisjele webside fan it Britske FerkearsburoNederlânsktalige ynformaasje fan it Britske FerkearsburoOffisjele webside fan English Heritage, de organisaasje dy't him ynset foar it behâld fan it Ingelske kultuergoedYnwennertallen fan alle Britske stêden út 'e folkstelling fan 2011Notes en References, op dizze sideEngland

          Հադիս Բովանդակություն Անվանում և նշանակություն | Դասակարգում | Աղբյուրներ | Նավարկման ցանկ