Understanding the effect of $C$ in soft margin SVMs The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Gradient Descent for Primal Kernel SVM with Soft-Margin(Hinge) LossIs the Support Vector Classifier in some sense optimal?The optimization problem of soft margin Support Vector Machine: How to interpret?Understanding the equation for margin in linear classificationFinding gradient descent of soft-margin multiclass SVM with different conditionsDevision by norm vector to maximize margin in SVMsThe resulted approximation of using user defined constraints on SVMsMachine learning's activation functions classificationHow to map quadratic programming formulation to dual soft margin SVMMinimizing the soft margin hinge loss.

Wall plug outlet change

Did God make two great lights or did He make the great light two?

How should I replace vector<uint8_t>::const_iterator in an API?

Mortgage adviser recommends a longer term than necessary combined with overpayments

Can a 1st-level character have an ability score above 18?

Match Roman Numerals

Why can't devices on different VLANs, but on the same subnet, communicate?

Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?

Why did all the guest students take carriages to the Yule Ball?

What are these Gizmos at Izaña Atmospheric Research Center in Spain?

How to test the equality of two Pearson correlation coefficients computed from the same sample?

Keeping a retro style to sci-fi spaceships?

What is this lever in Argentinian toilets?

University's motivation for having tenure-track positions

Sort a list of pairs representing an acyclic, partial automorphism

What was the last x86 CPU that did not have the x87 floating-point unit built in?

How to delete random line from file using Unix command?

Windows 10: How to Lock (not sleep) laptop on lid close?

How does ice melt when immersed in water?

how can a perfect fourth interval be considered either consonant or dissonant?

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Can the DM override racial traits?

Arduino Pro Micro - switch off LEDs

What do you call a plan that's an alternative plan in case your initial plan fails?



Understanding the effect of $C$ in soft margin SVMs



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Gradient Descent for Primal Kernel SVM with Soft-Margin(Hinge) LossIs the Support Vector Classifier in some sense optimal?The optimization problem of soft margin Support Vector Machine: How to interpret?Understanding the equation for margin in linear classificationFinding gradient descent of soft-margin multiclass SVM with different conditionsDevision by norm vector to maximize margin in SVMsThe resulted approximation of using user defined constraints on SVMsMachine learning's activation functions classificationHow to map quadratic programming formulation to dual soft margin SVMMinimizing the soft margin hinge loss.










2












$begingroup$


I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:



$$y_i(bf wcdot bf x + b) geq 1 - zeta$$



As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:



$$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$



Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).



How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?



EDIT



Here is the same question but I don't understand the answer.










share|cite|improve this question











$endgroup$
















    2












    $begingroup$


    I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:



    $$y_i(bf wcdot bf x + b) geq 1 - zeta$$



    As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:



    $$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$



    Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).



    How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?



    EDIT



    Here is the same question but I don't understand the answer.










    share|cite|improve this question











    $endgroup$














      2












      2








      2


      2



      $begingroup$


      I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:



      $$y_i(bf wcdot bf x + b) geq 1 - zeta$$



      As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:



      $$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$



      Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).



      How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?



      EDIT



      Here is the same question but I don't understand the answer.










      share|cite|improve this question











      $endgroup$




      I'm learning soft margin support vector machines form this book. It's written that in soft margin SVMs, we allow minor errors in classifications to classify noisy/non-linear dataset or the dataset with outliers to correctly classify. To do this, the following constraint is introduced:



      $$y_i(bf wcdot bf x + b) geq 1 - zeta$$



      As $zeta$ can be set to any larger number, we also need to add a penalty to optimization function to restrict the values of $zeta$. Doing this will lead to the largest possible margin with minimum possible error (misclassifications). After penalizing the original SVM optimization function, it becomes:



      $$min_bf w, b, zeta left(frac12 ^2 + Csum_i=0^m zeta_i right)$$



      Here $C$ is added to control the "softness" of the SVM. What I don't understand is how different values of C controls the so-called "softness"? In the book mentioned above and in this question, it's written that higher values of $C$ make the SVM act nearly the same as hard margin SVM and lower $C$ values makes the SVM more "softer" (allows more errors).



      How this conclusion can be intuitively seen from the above equation? Choosing $C$ near to $0$ makes the above function more like hard margin SVM. So why soft margin SVM becomes hard margin when $C$ is $+inf$ ?



      EDIT



      Here is the same question but I don't understand the answer.







      optimization convex-optimization machine-learning






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Apr 1 at 5:54







      Kaushal28

















      asked Mar 31 at 14:08









      Kaushal28Kaushal28

      224210




      224210




















          1 Answer
          1






          active

          oldest

          votes


















          4












          $begingroup$

          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "69"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3169430%2funderstanding-the-effect-of-c-in-soft-margin-svms%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4












          $begingroup$

          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37















          4












          $begingroup$

          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37













          4












          4








          4





          $begingroup$

          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.






          share|cite|improve this answer











          $endgroup$



          With perfect separation, you require that
          $$
          y_i(bf wcdot bf x + b) geq 1
          $$

          So your $xi_i$ are the deviation you allow from the above inequality. When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small, since their sum has a large weight. When $C$ is small, it means that their sum has a small weight, and at the minimum $xi_i$ may be larger, allowing more deviation from the above inequality.



          When $C$ is extremely large, the only way to minimize the objective is to make the deviations extremely small, bringing the result close to hard margin SVM.



          Elaboration



          I see that there is some confusion - between the optimal value and the optimal solution. The optimal value is the minimal value of the objective function. The optimal solution are the actual variables (in your case $bf w$ and $bf xi$). The optimal value may become large when $C$ goes to infinity, but you did not ask about the optimal value at all!



          Now, let us go a bit abstract. Assume you are solving an optimization problem of the form
          $$
          min_bf x, bf y ~ alpha f(bf x) + beta g(bf y) quad texts.t. quad (bf x, bf y) in D,
          $$

          where $alpha, beta > 0$ are some constants. To make the objective as small as possible, we need to somehow balance $f$ and $g$: choosing $bf x$ such that $f$ is small might constrain us to choose $bf y$ such that $g$ becomes larger, and vice versa.



          If $alpha$ is much larger then $beta$, then it is 'more beneficial' to make $f$ small, at the expense of making $g$ a bit larger. The same holds the other way around.



          In your case you have two functions $|bf w|^2$ and $sum_i=1^n xi_i$, and $alpha = 1$, $beta = C$. If $C$ is much smaller then $1$, then it is `beneficial' to make the norm of $bf w$ small. If $C$ is much larger then $1$ then it is the other way around.



          It turns out that $sum_i=1^n xi_i$, since $xi geq 0$, happens to be exactly $|bf xi|_1$, meaning that the entries $xi_i$ become small. Moreover, it is well-known that attempting to minimize the $ell_1$ norm promotes sparsity (just Google it), meaning that as $C$ increases, more and more entries of $xi$ become zero.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Apr 1 at 13:37

























          answered Apr 1 at 7:51









          Alex ShtofAlex Shtof

          716518




          716518











          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37
















          • $begingroup$
            "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
            $endgroup$
            – Kaushal28
            Apr 1 at 10:03










          • $begingroup$
            @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
            $endgroup$
            – Alex Shtof
            Apr 1 at 10:10










          • $begingroup$
            Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
            $endgroup$
            – Kaushal28
            Apr 1 at 10:13











          • $begingroup$
            Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
            $endgroup$
            – Kaushal28
            Apr 1 at 12:53











          • $begingroup$
            @Kaushal28, I added some elaboration on the subject.
            $endgroup$
            – Alex Shtof
            Apr 1 at 13:37















          $begingroup$
          "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
          $endgroup$
          – Kaushal28
          Apr 1 at 10:03




          $begingroup$
          "When $C$ is large, minimizing $|w|^2 + C sum_i=1^n xi_i$ means that $xi_i$ will be small" Why? What about the case when we only change $C$ by keeping $zeta$ fix? Doesn't it make the value of optimization function to infinity?
          $endgroup$
          – Kaushal28
          Apr 1 at 10:03












          $begingroup$
          @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
          $endgroup$
          – Alex Shtof
          Apr 1 at 10:10




          $begingroup$
          @Kaushal28 look at the extreme case when the sample is linearly separable. You can choose all $xi_i$ to be zero. The constant $C$ balances the importance of $xi$ having a small $ell_1$ norm versus the importance of $w$ having a small Euclidean norm. The weight of the norm-squared of $w$ is 1, and the weight of the norm of $xi$ is $C$.
          $endgroup$
          – Alex Shtof
          Apr 1 at 10:10












          $begingroup$
          Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
          $endgroup$
          – Kaushal28
          Apr 1 at 10:13





          $begingroup$
          Sorry. But when all $zeta$ are zero, no need to choose $C$ as it will be hard margin problem. Still confused.
          $endgroup$
          – Kaushal28
          Apr 1 at 10:13













          $begingroup$
          Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
          $endgroup$
          – Kaushal28
          Apr 1 at 12:53





          $begingroup$
          Why $ξ_i$ is likely to be set to $0$ if $C$ is very large?
          $endgroup$
          – Kaushal28
          Apr 1 at 12:53













          $begingroup$
          @Kaushal28, I added some elaboration on the subject.
          $endgroup$
          – Alex Shtof
          Apr 1 at 13:37




          $begingroup$
          @Kaushal28, I added some elaboration on the subject.
          $endgroup$
          – Alex Shtof
          Apr 1 at 13:37

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3169430%2funderstanding-the-effect-of-c-in-soft-margin-svms%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Boston (Lincolnshire) Stedsbyld | Berne yn Boston | NavigaasjemenuBoston Borough CouncilBoston, Lincolnshire

          Ballerup Komuun Stääden an saarpen | Futnuuten | Luke uk diar | Nawigatsjuunwww.ballerup.dkwww.statistikbanken.dk: Tabelle BEF44 (Folketal pr. 1. januar fordelt på byer)Commonskategorii: Ballerup Komuun55° 44′ N, 12° 22′ O

          Serbia Índice Etimología Historia Geografía Entorno natural División administrativa Política Demografía Economía Cultura Deportes Véase también Notas Referencias Bibliografía Enlaces externos Menú de navegación44°49′00″N 20°28′00″E / 44.816666666667, 20.46666666666744°49′00″N 20°28′00″E / 44.816666666667, 20.466666666667U.S. Department of Commerce (2015)«Informe sobre Desarrollo Humano 2018»Kosovo-Metohija.Neutralna Srbija u NATO okruzenju.The SerbsTheories on the Origin of the Serbs.Serbia.Earls: Webster's Quotations, Facts and Phrases.Egeo y Balcanes.Kalemegdan.Southern Pannonia during the age of the Great Migrations.Culture in Serbia.History.The Serbian Origin of the Montenegrins.Nemanjics' period (1186-1353).Stefan Uros (1355-1371).Serbian medieval history.Habsburg–Ottoman Wars (1525–1718).The Ottoman Empire, 1700-1922.The First Serbian Uprising.Miloš, prince of Serbia.3. Bosnia-Hercegovina and the Congress of Berlin.The Balkan Wars and the Partition of Macedonia.The Falcon and the Eagle: Montenegro and Austria-Hungary, 1908-1914.Typhus fever on the eastern front in World War I.Anniversary of WWI battle marked in Serbia.La derrota austriaca en los Balcanes. Fin del Imperio Austro-Húngaro.Imperio austriaco y Reino de Hungría.Los tiempos modernos: del capitalismo a la globalización, siglos XVII al XXI.The period of Croatia within ex-Yugoslavia.Yugoslavia: Much in a Name.Las dictaduras europeas.Croacia: mito y realidad."Crods ask arms".Prólogo a la invasión.La campaña de los Balcanes.La resistencia en Yugoslavia.Jasenovac Research Institute.Día en memoria de las víctimas del genocidio en la Segunda Guerra Mundial.El infierno estuvo en Jasenovac.Croacia empieza a «desenterrar» a sus muertos de Jasenovac.World fascism: a historical encyclopedia, Volumen 1.Tito. Josip Broz.El nuevo orden y la resistencia.La conquista del poder.Algunos aspectos de la economía yugoslava a mediados de 1962.Albania-Kosovo crisis.De Kosovo a Kosova: una visión demográfica.La crisis de la economía yugoslava y la política de "estabilización".Milosevic: el poder de un absolutista."Serbia under Milošević: politics in the 1990s"Milosevic cavó en Kosovo la tumba de la antigua Yugoslavia.La ONU exculpa a Serbia de genocidio en la guerra de Bosnia.Slobodan Milosevic, el burócrata que supo usar el odio.Es la fuerza contra el sufrimiento de muchos inocentes.Matanza de civiles al bombardear la OTAN un puente mientras pasaba un tren.Las consecuencias negativas de los bombardeos de Yugoslavia se sentirán aún durante largo tiempo.Kostunica advierte que la misión de Europa en Kosovo es ilegal.Las 24 horas más largas en la vida de Slobodan Milosevic.Serbia declara la guerra a la mafia por matar a Djindjic.Tadic presentará "quizás en diciembre" la solicitud de entrada en la UE.Montenegro declara su independencia de Serbia.Serbia se declara estado soberano tras separación de Montenegro.«Accordance with International Law of the Unilateral Declaration of Independence by the Provisional Institutions of Self-Government of Kosovo (Request for Advisory Opinion)»Mladic pasa por el médico antes de la audiencia para extraditarloDatos de Serbia y Kosovo.The Carpathian Mountains.Position, Relief, Climate.Transport.Finding birds in Serbia.U Srbiji do 2010. godine 10% teritorije nacionalni parkovi.Geography.Serbia: Climate.Variability of Climate In Serbia In The Second Half of The 20thc Entury.BASIC CLIMATE CHARACTERISTICS FOR THE TERRITORY OF SERBIA.Fauna y flora: Serbia.Serbia and Montenegro.Información general sobre Serbia.Republic of Serbia Environmental Protection Agency (SEPA).Serbia recycling 15% of waste.Reform process of the Serbian energy sector.20-MW Wind Project Being Developed in Serbia.Las Naciones Unidas. Paz para Kosovo.Aniversario sin fiesta.Population by national or ethnic groups by Census 2002.Article 7. Coat of arms, flag and national anthem.Serbia, flag of.Historia.«Serbia and Montenegro in Pictures»Serbia.Serbia aprueba su nueva Constitución con un apoyo de más del 50%.Serbia. Population.«El nacionalista Nikolic gana las elecciones presidenciales en Serbia»El europeísta Borís Tadic gana la segunda vuelta de las presidenciales serbias.Aleksandar Vucic, de ultranacionalista serbio a fervoroso europeístaKostunica condena la declaración del "falso estado" de Kosovo.Comienza el debate sobre la independencia de Kosovo en el TIJ.La Corte Internacional de Justicia dice que Kosovo no violó el derecho internacional al declarar su independenciaKosovo: Enviado de la ONU advierte tensiones y fragilidad.«Bruselas recomienda negociar la adhesión de Serbia tras el acuerdo sobre Kosovo»Monografía de Serbia.Bez smanjivanja Vojske Srbije.Military statistics Serbia and Montenegro.Šutanovac: Vojni budžet za 2009. godinu 70 milijardi dinara.Serbia-Montenegro shortens obligatory military service to six months.No hay justicia para las víctimas de los bombardeos de la OTAN.Zapatero reitera la negativa de España a reconocer la independencia de Kosovo.Anniversary of the signing of the Stabilisation and Association Agreement.Detenido en Serbia Radovan Karadzic, el criminal de guerra más buscado de Europa."Serbia presentará su candidatura de acceso a la UE antes de fin de año".Serbia solicita la adhesión a la UE.Detenido el exgeneral serbobosnio Ratko Mladic, principal acusado del genocidio en los Balcanes«Lista de todos los Estados Miembros de las Naciones Unidas que son parte o signatarios en los diversos instrumentos de derechos humanos de las Naciones Unidas»versión pdfProtocolo Facultativo de la Convención sobre la Eliminación de todas las Formas de Discriminación contra la MujerConvención contra la tortura y otros tratos o penas crueles, inhumanos o degradantesversión pdfProtocolo Facultativo de la Convención sobre los Derechos de las Personas con DiscapacidadEl ACNUR recibe con beneplácito el envío de tropas de la OTAN a Kosovo y se prepara ante una posible llegada de refugiados a Serbia.Kosovo.- El jefe de la Minuk denuncia que los serbios boicotearon las legislativas por 'presiones'.Bosnia and Herzegovina. Population.Datos básicos de Montenegro, historia y evolución política.Serbia y Montenegro. Indicador: Tasa global de fecundidad (por 1000 habitantes).Serbia y Montenegro. Indicador: Tasa bruta de mortalidad (por 1000 habitantes).Population.Falleció el patriarca de la Iglesia Ortodoxa serbia.Atacan en Kosovo autobuses con peregrinos tras la investidura del patriarca serbio IrinejSerbian in Hungary.Tasas de cambio."Kosovo es de todos sus ciudadanos".Report for Serbia.Country groups by income.GROSS DOMESTIC PRODUCT (GDP) OF THE REPUBLIC OF SERBIA 1997–2007.Economic Trends in the Republic of Serbia 2006.National Accounts Statitics.Саопштења за јавност.GDP per inhabitant varied by one to six across the EU27 Member States.Un pacto de estabilidad para Serbia.Unemployment rate rises in Serbia.Serbia, Belarus agree free trade to woo investors.Serbia, Turkey call investors to Serbia.Success Stories.U.S. Private Investment in Serbia and Montenegro.Positive trend.Banks in Serbia.La Cámara de Comercio acompaña a empresas madrileñas a Serbia y Croacia.Serbia Industries.Energy and mining.Agriculture.Late crops, fruit and grapes output, 2008.Rebranding Serbia: A Hobby Shortly to Become a Full-Time Job.Final data on livestock statistics, 2008.Serbian cell-phone users.U Srbiji sve više računara.Телекомуникације.U Srbiji 27 odsto gradjana koristi Internet.Serbia and Montenegro.Тренд гледаности програма РТС-а у 2008. и 2009.години.Serbian railways.General Terms.El mercado del transporte aéreo en Serbia.Statistics.Vehículos de motor registrados.Planes ambiciosos para el transporte fluvial.Turismo.Turistički promet u Republici Srbiji u periodu januar-novembar 2007. godine.Your Guide to Culture.Novi Sad - city of culture.Nis - european crossroads.Serbia. Properties inscribed on the World Heritage List .Stari Ras and Sopoćani.Studenica Monastery.Medieval Monuments in Kosovo.Gamzigrad-Romuliana, Palace of Galerius.Skiing and snowboarding in Kopaonik.Tara.New7Wonders of Nature Finalists.Pilgrimage of Saint Sava.Exit Festival: Best european festival.Banje u Srbiji.«The Encyclopedia of world history»Culture.Centenario del arte serbio.«Djordje Andrejevic Kun: el único pintor de los brigadistas yugoslavos de la guerra civil española»About the museum.The collections.Miroslav Gospel – Manuscript from 1180.Historicity in the Serbo-Croatian Heroic Epic.Culture and Sport.Conversación con el rector del Seminario San Sava.'Reina Margot' funde drama, historia y gesto con música de Goran Bregovic.Serbia gana Eurovisión y España decepciona de nuevo con un vigésimo puesto.Home.Story.Emir Kusturica.Tercer oro para Paskaljevic.Nikola Tesla Year.Home.Tesla, un genio tomado por loco.Aniversario de la muerte de Nikola Tesla.El Museo Nikola Tesla en Belgrado.El inventor del mundo actual.República de Serbia.University of Belgrade official statistics.University of Novi Sad.University of Kragujevac.University of Nis.Comida. Cocina serbia.Cooking.Montenegro se convertirá en el miembro 204 del movimiento olímpico.España, campeona de Europa de baloncesto.El Partizan de Belgrado se corona campeón por octava vez consecutiva.Serbia se clasifica para el Mundial de 2010 de Sudáfrica.Serbia Name Squad For Northern Ireland And South Korea Tests.Fútbol.- El Partizán de Belgrado se proclama campeón de la Liga serbia.Clasificacion final Mundial de balonmano Croacia 2009.Serbia vence a España y se consagra campeón mundial de waterpolo.Novak Djokovic no convence pero gana en Australia.Gana Ana Ivanovic el Roland Garros.Serena Williams gana el US Open por tercera vez.Biography.Bradt Travel Guide SerbiaThe Encyclopedia of World War IGobierno de SerbiaPortal del Gobierno de SerbiaPresidencia de SerbiaAsamblea Nacional SerbiaMinisterio de Asuntos exteriores de SerbiaBanco Nacional de SerbiaAgencia Serbia para la Promoción de la Inversión y la ExportaciónOficina de Estadísticas de SerbiaCIA. Factbook 2008Organización nacional de turismo de SerbiaDiscover SerbiaConoce SerbiaNoticias de SerbiaSerbiaWorldCat1512028760000 0000 9526 67094054598-2n8519591900570825ge1309191004530741010url17413117006669D055771Serbia