Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parametersGradient of gaussian process marginal likelihood with automatic relevance detectionHyperparameter gradients for Matérn covarianceFisher Expected Information for a Gaussian Process modelGaussian process for machine learnigGradient of gaussian process marginal likelihood with automatic relevance detectionHow to use Conjugate Gradient Method to maximize log marginal likelihoodHow to optimize the log likelihood to obtain parameters for the maximum likelihood estimate?Maximum Likelihood with Gaussian distributionAveraged log-likelihood with a latent variable for mixture modelsConsider a binary classifier and Gaussian conditional distribution, with mean and covariance matrix. Find the decision boundary. More details below.Kullback Leibler Divergence for Gaussian processHyperparameter gradients for Matérn covariance

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

What Exploit Are These User Agents Trying to Use?

How exploitable/balanced is this homebrew spell: Spell Permanency?

Rotate ASCII Art by 45 Degrees

Does the Idaho Potato Commission associate potato skins with healthy eating?

Why didn't Boeing produce its own regional jet?

What's the meaning of "Sollensaussagen"?

Why is the sentence "Das ist eine Nase" correct?

How to coordinate airplane tickets?

What reasons are there for a Capitalist to oppose a 100% inheritance tax?

Why were 5.25" floppy drives cheaper than 8"?

Machine learning testing data

Is it possible to create a QR code using text?

Solving an equation with constraints

One verb to replace 'be a member of' a club

Finding the reason behind the value of the integral.

What exactly is ineptocracy?

Theorists sure want true answers to this!

How to remove border from elements in the last row?

Should I tell management that I intend to leave due to bad software development practices?

OP Amp not amplifying audio signal

My ex-girlfriend uses my Apple ID to log in to her iPad. Do I have to give her my Apple ID password to reset it?

How to show a landlord what we have in savings?

Why do I get negative height?



Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parameters


Gradient of gaussian process marginal likelihood with automatic relevance detectionHyperparameter gradients for Matérn covarianceFisher Expected Information for a Gaussian Process modelGaussian process for machine learnigGradient of gaussian process marginal likelihood with automatic relevance detectionHow to use Conjugate Gradient Method to maximize log marginal likelihoodHow to optimize the log likelihood to obtain parameters for the maximum likelihood estimate?Maximum Likelihood with Gaussian distributionAveraged log-likelihood with a latent variable for mixture modelsConsider a binary classifier and Gaussian conditional distribution, with mean and covariance matrix. Find the decision boundary. More details below.Kullback Leibler Divergence for Gaussian processHyperparameter gradients for Matérn covariance













9












$begingroup$


The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf



But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.



I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.



Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?










share|cite|improve this question









$endgroup$
















    9












    $begingroup$


    The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf



    But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.



    I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.



    Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?










    share|cite|improve this question









    $endgroup$














      9












      9








      9


      8



      $begingroup$


      The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf



      But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.



      I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.



      Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?










      share|cite|improve this question









      $endgroup$




      The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf



      But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.



      I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.



      Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?







      machine-learning






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 20 '14 at 8:57









      aaronqliaaronqli

      272214




      272214




















          4 Answers
          4






          active

          oldest

          votes


















          14












          $begingroup$

          We are looking to maximise the log probability of $lnP(y|x, theta)$:



          $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



          The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



          $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
          -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



          So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



          For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



          $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



          The derivatives with respect to the hyperparameters are as follows:



          $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



          $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



          However, often GP libraries use the notation:



          $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



          where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



          $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



          $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



          There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



          Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            Many thanks! This is very helpful.
            $endgroup$
            – aaronqli
            Dec 20 '14 at 19:52






          • 2




            $begingroup$
            jpro is right, your answer for $fracdKdl$ is incorrect
            $endgroup$
            – George
            Aug 25 '16 at 19:27










          • $begingroup$
            Ah sorry about that - I'll fix it when I get home not to confuse people in the future
            $endgroup$
            – j__
            Aug 26 '16 at 13:21


















          9












          $begingroup$

          I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



          $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

          I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



          $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



          With simple calculations, I finally get:



          $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.






          share|cite|improve this answer











          $endgroup$




















            1












            $begingroup$

            Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



            Partial deivatives log marginal likelihood w.r.t. hyperparameters



            where the 2 terms have different signs and the y targets vector is transposed just the first time.






            share|cite|improve this answer









            $endgroup$




















              0












              $begingroup$

              As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



              $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



              where $alpha=K^-1y$



              Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?






              share|cite|improve this answer










              New contributor




              Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$













                Your Answer





                StackExchange.ifUsing("editor", function ()
                return StackExchange.using("mathjaxEditing", function ()
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                );
                );
                , "mathjax-editing");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "69"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                noCode: true, onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1030534%2fgradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                14












                $begingroup$

                We are looking to maximise the log probability of $lnP(y|x, theta)$:



                $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



                The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
                -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



                So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



                For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



                $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



                The derivatives with respect to the hyperparameters are as follows:



                $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



                However, often GP libraries use the notation:



                $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



                where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



                $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



                $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



                Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.






                share|cite|improve this answer











                $endgroup$












                • $begingroup$
                  Many thanks! This is very helpful.
                  $endgroup$
                  – aaronqli
                  Dec 20 '14 at 19:52






                • 2




                  $begingroup$
                  jpro is right, your answer for $fracdKdl$ is incorrect
                  $endgroup$
                  – George
                  Aug 25 '16 at 19:27










                • $begingroup$
                  Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                  $endgroup$
                  – j__
                  Aug 26 '16 at 13:21















                14












                $begingroup$

                We are looking to maximise the log probability of $lnP(y|x, theta)$:



                $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



                The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
                -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



                So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



                For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



                $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



                The derivatives with respect to the hyperparameters are as follows:



                $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



                However, often GP libraries use the notation:



                $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



                where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



                $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



                $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



                Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.






                share|cite|improve this answer











                $endgroup$












                • $begingroup$
                  Many thanks! This is very helpful.
                  $endgroup$
                  – aaronqli
                  Dec 20 '14 at 19:52






                • 2




                  $begingroup$
                  jpro is right, your answer for $fracdKdl$ is incorrect
                  $endgroup$
                  – George
                  Aug 25 '16 at 19:27










                • $begingroup$
                  Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                  $endgroup$
                  – j__
                  Aug 26 '16 at 13:21













                14












                14








                14





                $begingroup$

                We are looking to maximise the log probability of $lnP(y|x, theta)$:



                $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



                The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
                -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



                So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



                For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



                $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



                The derivatives with respect to the hyperparameters are as follows:



                $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



                However, often GP libraries use the notation:



                $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



                where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



                $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



                $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



                Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.






                share|cite|improve this answer











                $endgroup$



                We are looking to maximise the log probability of $lnP(y|x, theta)$:



                $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



                The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
                -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



                So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



                For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



                $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



                The derivatives with respect to the hyperparameters are as follows:



                $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



                However, often GP libraries use the notation:



                $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



                where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



                $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



                $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



                Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Jan 30 at 12:46









                Martin Ferianc

                34




                34










                answered Dec 17 '14 at 23:54









                j__j__

                1,271717




                1,271717











                • $begingroup$
                  Many thanks! This is very helpful.
                  $endgroup$
                  – aaronqli
                  Dec 20 '14 at 19:52






                • 2




                  $begingroup$
                  jpro is right, your answer for $fracdKdl$ is incorrect
                  $endgroup$
                  – George
                  Aug 25 '16 at 19:27










                • $begingroup$
                  Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                  $endgroup$
                  – j__
                  Aug 26 '16 at 13:21
















                • $begingroup$
                  Many thanks! This is very helpful.
                  $endgroup$
                  – aaronqli
                  Dec 20 '14 at 19:52






                • 2




                  $begingroup$
                  jpro is right, your answer for $fracdKdl$ is incorrect
                  $endgroup$
                  – George
                  Aug 25 '16 at 19:27










                • $begingroup$
                  Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                  $endgroup$
                  – j__
                  Aug 26 '16 at 13:21















                $begingroup$
                Many thanks! This is very helpful.
                $endgroup$
                – aaronqli
                Dec 20 '14 at 19:52




                $begingroup$
                Many thanks! This is very helpful.
                $endgroup$
                – aaronqli
                Dec 20 '14 at 19:52




                2




                2




                $begingroup$
                jpro is right, your answer for $fracdKdl$ is incorrect
                $endgroup$
                – George
                Aug 25 '16 at 19:27




                $begingroup$
                jpro is right, your answer for $fracdKdl$ is incorrect
                $endgroup$
                – George
                Aug 25 '16 at 19:27












                $begingroup$
                Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                $endgroup$
                – j__
                Aug 26 '16 at 13:21




                $begingroup$
                Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                $endgroup$
                – j__
                Aug 26 '16 at 13:21











                9












                $begingroup$

                I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



                $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

                I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



                $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



                With simple calculations, I finally get:



                $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.






                share|cite|improve this answer











                $endgroup$

















                  9












                  $begingroup$

                  I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



                  $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

                  I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



                  $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



                  With simple calculations, I finally get:



                  $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.






                  share|cite|improve this answer











                  $endgroup$















                    9












                    9








                    9





                    $begingroup$

                    I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



                    $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

                    I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



                    $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



                    With simple calculations, I finally get:



                    $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.






                    share|cite|improve this answer











                    $endgroup$



                    I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



                    $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

                    I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



                    $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



                    With simple calculations, I finally get:



                    $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.







                    share|cite|improve this answer














                    share|cite|improve this answer



                    share|cite|improve this answer








                    edited Feb 5 '15 at 10:28

























                    answered Jan 31 '15 at 15:13









                    jprojpro

                    10116




                    10116





















                        1












                        $begingroup$

                        Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



                        Partial deivatives log marginal likelihood w.r.t. hyperparameters



                        where the 2 terms have different signs and the y targets vector is transposed just the first time.






                        share|cite|improve this answer









                        $endgroup$

















                          1












                          $begingroup$

                          Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



                          Partial deivatives log marginal likelihood w.r.t. hyperparameters



                          where the 2 terms have different signs and the y targets vector is transposed just the first time.






                          share|cite|improve this answer









                          $endgroup$















                            1












                            1








                            1





                            $begingroup$

                            Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



                            Partial deivatives log marginal likelihood w.r.t. hyperparameters



                            where the 2 terms have different signs and the y targets vector is transposed just the first time.






                            share|cite|improve this answer









                            $endgroup$



                            Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



                            Partial deivatives log marginal likelihood w.r.t. hyperparameters



                            where the 2 terms have different signs and the y targets vector is transposed just the first time.







                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered Mar 25 '17 at 12:09









                            DavideMDavideM

                            113




                            113





















                                0












                                $begingroup$

                                As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



                                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



                                where $alpha=K^-1y$



                                Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?






                                share|cite|improve this answer










                                New contributor




                                Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.






                                $endgroup$

















                                  0












                                  $begingroup$

                                  As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



                                  $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



                                  where $alpha=K^-1y$



                                  Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?






                                  share|cite|improve this answer










                                  New contributor




                                  Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                  Check out our Code of Conduct.






                                  $endgroup$















                                    0












                                    0








                                    0





                                    $begingroup$

                                    As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



                                    $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



                                    where $alpha=K^-1y$



                                    Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?






                                    share|cite|improve this answer










                                    New contributor




                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.






                                    $endgroup$



                                    As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



                                    $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



                                    where $alpha=K^-1y$



                                    Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?







                                    share|cite|improve this answer










                                    New contributor




                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.









                                    share|cite|improve this answer



                                    share|cite|improve this answer








                                    edited Mar 28 at 16:35





















                                    New contributor




                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.









                                    answered Mar 28 at 15:41









                                    Vilius CiuzelisVilius Ciuzelis

                                    12




                                    12




                                    New contributor




                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.





                                    New contributor





                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.






                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Mathematics Stack Exchange!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        Use MathJax to format equations. MathJax reference.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1030534%2fgradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Boston (Lincolnshire) Stedsbyld | Berne yn Boston | NavigaasjemenuBoston Borough CouncilBoston, Lincolnshire

                                        Ballerup Komuun Stääden an saarpen | Futnuuten | Luke uk diar | Nawigatsjuunwww.ballerup.dkwww.statistikbanken.dk: Tabelle BEF44 (Folketal pr. 1. januar fordelt på byer)Commonskategorii: Ballerup Komuun55° 44′ N, 12° 22′ O

                                        Serbia Índice Etimología Historia Geografía Entorno natural División administrativa Política Demografía Economía Cultura Deportes Véase también Notas Referencias Bibliografía Enlaces externos Menú de navegación44°49′00″N 20°28′00″E / 44.816666666667, 20.46666666666744°49′00″N 20°28′00″E / 44.816666666667, 20.466666666667U.S. Department of Commerce (2015)«Informe sobre Desarrollo Humano 2018»Kosovo-Metohija.Neutralna Srbija u NATO okruzenju.The SerbsTheories on the Origin of the Serbs.Serbia.Earls: Webster's Quotations, Facts and Phrases.Egeo y Balcanes.Kalemegdan.Southern Pannonia during the age of the Great Migrations.Culture in Serbia.History.The Serbian Origin of the Montenegrins.Nemanjics' period (1186-1353).Stefan Uros (1355-1371).Serbian medieval history.Habsburg–Ottoman Wars (1525–1718).The Ottoman Empire, 1700-1922.The First Serbian Uprising.Miloš, prince of Serbia.3. Bosnia-Hercegovina and the Congress of Berlin.The Balkan Wars and the Partition of Macedonia.The Falcon and the Eagle: Montenegro and Austria-Hungary, 1908-1914.Typhus fever on the eastern front in World War I.Anniversary of WWI battle marked in Serbia.La derrota austriaca en los Balcanes. Fin del Imperio Austro-Húngaro.Imperio austriaco y Reino de Hungría.Los tiempos modernos: del capitalismo a la globalización, siglos XVII al XXI.The period of Croatia within ex-Yugoslavia.Yugoslavia: Much in a Name.Las dictaduras europeas.Croacia: mito y realidad."Crods ask arms".Prólogo a la invasión.La campaña de los Balcanes.La resistencia en Yugoslavia.Jasenovac Research Institute.Día en memoria de las víctimas del genocidio en la Segunda Guerra Mundial.El infierno estuvo en Jasenovac.Croacia empieza a «desenterrar» a sus muertos de Jasenovac.World fascism: a historical encyclopedia, Volumen 1.Tito. Josip Broz.El nuevo orden y la resistencia.La conquista del poder.Algunos aspectos de la economía yugoslava a mediados de 1962.Albania-Kosovo crisis.De Kosovo a Kosova: una visión demográfica.La crisis de la economía yugoslava y la política de "estabilización".Milosevic: el poder de un absolutista."Serbia under Milošević: politics in the 1990s"Milosevic cavó en Kosovo la tumba de la antigua Yugoslavia.La ONU exculpa a Serbia de genocidio en la guerra de Bosnia.Slobodan Milosevic, el burócrata que supo usar el odio.Es la fuerza contra el sufrimiento de muchos inocentes.Matanza de civiles al bombardear la OTAN un puente mientras pasaba un tren.Las consecuencias negativas de los bombardeos de Yugoslavia se sentirán aún durante largo tiempo.Kostunica advierte que la misión de Europa en Kosovo es ilegal.Las 24 horas más largas en la vida de Slobodan Milosevic.Serbia declara la guerra a la mafia por matar a Djindjic.Tadic presentará "quizás en diciembre" la solicitud de entrada en la UE.Montenegro declara su independencia de Serbia.Serbia se declara estado soberano tras separación de Montenegro.«Accordance with International Law of the Unilateral Declaration of Independence by the Provisional Institutions of Self-Government of Kosovo (Request for Advisory Opinion)»Mladic pasa por el médico antes de la audiencia para extraditarloDatos de Serbia y Kosovo.The Carpathian Mountains.Position, Relief, Climate.Transport.Finding birds in Serbia.U Srbiji do 2010. godine 10% teritorije nacionalni parkovi.Geography.Serbia: Climate.Variability of Climate In Serbia In The Second Half of The 20thc Entury.BASIC CLIMATE CHARACTERISTICS FOR THE TERRITORY OF SERBIA.Fauna y flora: Serbia.Serbia and Montenegro.Información general sobre Serbia.Republic of Serbia Environmental Protection Agency (SEPA).Serbia recycling 15% of waste.Reform process of the Serbian energy sector.20-MW Wind Project Being Developed in Serbia.Las Naciones Unidas. Paz para Kosovo.Aniversario sin fiesta.Population by national or ethnic groups by Census 2002.Article 7. Coat of arms, flag and national anthem.Serbia, flag of.Historia.«Serbia and Montenegro in Pictures»Serbia.Serbia aprueba su nueva Constitución con un apoyo de más del 50%.Serbia. Population.«El nacionalista Nikolic gana las elecciones presidenciales en Serbia»El europeísta Borís Tadic gana la segunda vuelta de las presidenciales serbias.Aleksandar Vucic, de ultranacionalista serbio a fervoroso europeístaKostunica condena la declaración del "falso estado" de Kosovo.Comienza el debate sobre la independencia de Kosovo en el TIJ.La Corte Internacional de Justicia dice que Kosovo no violó el derecho internacional al declarar su independenciaKosovo: Enviado de la ONU advierte tensiones y fragilidad.«Bruselas recomienda negociar la adhesión de Serbia tras el acuerdo sobre Kosovo»Monografía de Serbia.Bez smanjivanja Vojske Srbije.Military statistics Serbia and Montenegro.Šutanovac: Vojni budžet za 2009. godinu 70 milijardi dinara.Serbia-Montenegro shortens obligatory military service to six months.No hay justicia para las víctimas de los bombardeos de la OTAN.Zapatero reitera la negativa de España a reconocer la independencia de Kosovo.Anniversary of the signing of the Stabilisation and Association Agreement.Detenido en Serbia Radovan Karadzic, el criminal de guerra más buscado de Europa."Serbia presentará su candidatura de acceso a la UE antes de fin de año".Serbia solicita la adhesión a la UE.Detenido el exgeneral serbobosnio Ratko Mladic, principal acusado del genocidio en los Balcanes«Lista de todos los Estados Miembros de las Naciones Unidas que son parte o signatarios en los diversos instrumentos de derechos humanos de las Naciones Unidas»versión pdfProtocolo Facultativo de la Convención sobre la Eliminación de todas las Formas de Discriminación contra la MujerConvención contra la tortura y otros tratos o penas crueles, inhumanos o degradantesversión pdfProtocolo Facultativo de la Convención sobre los Derechos de las Personas con DiscapacidadEl ACNUR recibe con beneplácito el envío de tropas de la OTAN a Kosovo y se prepara ante una posible llegada de refugiados a Serbia.Kosovo.- El jefe de la Minuk denuncia que los serbios boicotearon las legislativas por 'presiones'.Bosnia and Herzegovina. Population.Datos básicos de Montenegro, historia y evolución política.Serbia y Montenegro. Indicador: Tasa global de fecundidad (por 1000 habitantes).Serbia y Montenegro. Indicador: Tasa bruta de mortalidad (por 1000 habitantes).Population.Falleció el patriarca de la Iglesia Ortodoxa serbia.Atacan en Kosovo autobuses con peregrinos tras la investidura del patriarca serbio IrinejSerbian in Hungary.Tasas de cambio."Kosovo es de todos sus ciudadanos".Report for Serbia.Country groups by income.GROSS DOMESTIC PRODUCT (GDP) OF THE REPUBLIC OF SERBIA 1997–2007.Economic Trends in the Republic of Serbia 2006.National Accounts Statitics.Саопштења за јавност.GDP per inhabitant varied by one to six across the EU27 Member States.Un pacto de estabilidad para Serbia.Unemployment rate rises in Serbia.Serbia, Belarus agree free trade to woo investors.Serbia, Turkey call investors to Serbia.Success Stories.U.S. Private Investment in Serbia and Montenegro.Positive trend.Banks in Serbia.La Cámara de Comercio acompaña a empresas madrileñas a Serbia y Croacia.Serbia Industries.Energy and mining.Agriculture.Late crops, fruit and grapes output, 2008.Rebranding Serbia: A Hobby Shortly to Become a Full-Time Job.Final data on livestock statistics, 2008.Serbian cell-phone users.U Srbiji sve više računara.Телекомуникације.U Srbiji 27 odsto gradjana koristi Internet.Serbia and Montenegro.Тренд гледаности програма РТС-а у 2008. и 2009.години.Serbian railways.General Terms.El mercado del transporte aéreo en Serbia.Statistics.Vehículos de motor registrados.Planes ambiciosos para el transporte fluvial.Turismo.Turistički promet u Republici Srbiji u periodu januar-novembar 2007. godine.Your Guide to Culture.Novi Sad - city of culture.Nis - european crossroads.Serbia. Properties inscribed on the World Heritage List .Stari Ras and Sopoćani.Studenica Monastery.Medieval Monuments in Kosovo.Gamzigrad-Romuliana, Palace of Galerius.Skiing and snowboarding in Kopaonik.Tara.New7Wonders of Nature Finalists.Pilgrimage of Saint Sava.Exit Festival: Best european festival.Banje u Srbiji.«The Encyclopedia of world history»Culture.Centenario del arte serbio.«Djordje Andrejevic Kun: el único pintor de los brigadistas yugoslavos de la guerra civil española»About the museum.The collections.Miroslav Gospel – Manuscript from 1180.Historicity in the Serbo-Croatian Heroic Epic.Culture and Sport.Conversación con el rector del Seminario San Sava.'Reina Margot' funde drama, historia y gesto con música de Goran Bregovic.Serbia gana Eurovisión y España decepciona de nuevo con un vigésimo puesto.Home.Story.Emir Kusturica.Tercer oro para Paskaljevic.Nikola Tesla Year.Home.Tesla, un genio tomado por loco.Aniversario de la muerte de Nikola Tesla.El Museo Nikola Tesla en Belgrado.El inventor del mundo actual.República de Serbia.University of Belgrade official statistics.University of Novi Sad.University of Kragujevac.University of Nis.Comida. Cocina serbia.Cooking.Montenegro se convertirá en el miembro 204 del movimiento olímpico.España, campeona de Europa de baloncesto.El Partizan de Belgrado se corona campeón por octava vez consecutiva.Serbia se clasifica para el Mundial de 2010 de Sudáfrica.Serbia Name Squad For Northern Ireland And South Korea Tests.Fútbol.- El Partizán de Belgrado se proclama campeón de la Liga serbia.Clasificacion final Mundial de balonmano Croacia 2009.Serbia vence a España y se consagra campeón mundial de waterpolo.Novak Djokovic no convence pero gana en Australia.Gana Ana Ivanovic el Roland Garros.Serena Williams gana el US Open por tercera vez.Biography.Bradt Travel Guide SerbiaThe Encyclopedia of World War IGobierno de SerbiaPortal del Gobierno de SerbiaPresidencia de SerbiaAsamblea Nacional SerbiaMinisterio de Asuntos exteriores de SerbiaBanco Nacional de SerbiaAgencia Serbia para la Promoción de la Inversión y la ExportaciónOficina de Estadísticas de SerbiaCIA. Factbook 2008Organización nacional de turismo de SerbiaDiscover SerbiaConoce SerbiaNoticias de SerbiaSerbiaWorldCat1512028760000 0000 9526 67094054598-2n8519591900570825ge1309191004530741010url17413117006669D055771Serbia