Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parametersGradient of gaussian process marginal likelihood with automatic relevance detectionHyperparameter gradients for Matérn covarianceFisher Expected Information for a Gaussian Process modelGaussian process for machine learnigGradient of gaussian process marginal likelihood with automatic relevance detectionHow to use Conjugate Gradient Method to maximize log marginal likelihoodHow to optimize the log likelihood to obtain parameters for the maximum likelihood estimate?Maximum Likelihood with Gaussian distributionAveraged log-likelihood with a latent variable for mixture modelsConsider a binary classifier and Gaussian conditional distribution, with mean and covariance matrix. Find the decision boundary. More details below.Kullback Leibler Divergence for Gaussian processHyperparameter gradients for Matérn covariance

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

What Exploit Are These User Agents Trying to Use?

How exploitable/balanced is this homebrew spell: Spell Permanency?

Rotate ASCII Art by 45 Degrees

Does the Idaho Potato Commission associate potato skins with healthy eating?

Why didn't Boeing produce its own regional jet?

What's the meaning of "Sollensaussagen"?

Why is the sentence "Das ist eine Nase" correct?

How to coordinate airplane tickets?

What reasons are there for a Capitalist to oppose a 100% inheritance tax?

Why were 5.25" floppy drives cheaper than 8"?

Machine learning testing data

Is it possible to create a QR code using text?

Solving an equation with constraints

One verb to replace 'be a member of' a club

Finding the reason behind the value of the integral.

What exactly is ineptocracy?

Theorists sure want true answers to this!

How to remove border from elements in the last row?

Should I tell management that I intend to leave due to bad software development practices?

OP Amp not amplifying audio signal

My ex-girlfriend uses my Apple ID to log in to her iPad. Do I have to give her my Apple ID password to reset it?

How to show a landlord what we have in savings?

Why do I get negative height?



Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parameters


Gradient of gaussian process marginal likelihood with automatic relevance detectionHyperparameter gradients for Matérn covarianceFisher Expected Information for a Gaussian Process modelGaussian process for machine learnigGradient of gaussian process marginal likelihood with automatic relevance detectionHow to use Conjugate Gradient Method to maximize log marginal likelihoodHow to optimize the log likelihood to obtain parameters for the maximum likelihood estimate?Maximum Likelihood with Gaussian distributionAveraged log-likelihood with a latent variable for mixture modelsConsider a binary classifier and Gaussian conditional distribution, with mean and covariance matrix. Find the decision boundary. More details below.Kullback Leibler Divergence for Gaussian processHyperparameter gradients for Matérn covariance













9












$begingroup$


The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf



But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.



I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.



Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?










share|cite|improve this question









$endgroup$
















    9












    $begingroup$


    The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf



    But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.



    I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.



    Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?










    share|cite|improve this question









    $endgroup$














      9












      9








      9


      8



      $begingroup$


      The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf



      But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.



      I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.



      Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?










      share|cite|improve this question









      $endgroup$




      The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf



      But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.



      I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.



      Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?







      machine-learning






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 20 '14 at 8:57









      aaronqliaaronqli

      272214




      272214




















          4 Answers
          4






          active

          oldest

          votes


















          14












          $begingroup$

          We are looking to maximise the log probability of $lnP(y|x, theta)$:



          $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



          The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



          $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
          -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



          So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



          For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



          $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



          The derivatives with respect to the hyperparameters are as follows:



          $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



          $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



          However, often GP libraries use the notation:



          $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



          where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



          $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



          $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



          There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



          Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            Many thanks! This is very helpful.
            $endgroup$
            – aaronqli
            Dec 20 '14 at 19:52






          • 2




            $begingroup$
            jpro is right, your answer for $fracdKdl$ is incorrect
            $endgroup$
            – George
            Aug 25 '16 at 19:27










          • $begingroup$
            Ah sorry about that - I'll fix it when I get home not to confuse people in the future
            $endgroup$
            – j__
            Aug 26 '16 at 13:21


















          9












          $begingroup$

          I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



          $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

          I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



          $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



          With simple calculations, I finally get:



          $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.






          share|cite|improve this answer











          $endgroup$




















            1












            $begingroup$

            Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



            Partial deivatives log marginal likelihood w.r.t. hyperparameters



            where the 2 terms have different signs and the y targets vector is transposed just the first time.






            share|cite|improve this answer









            $endgroup$




















              0












              $begingroup$

              As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



              $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



              where $alpha=K^-1y$



              Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?






              share|cite|improve this answer










              New contributor




              Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$













                Your Answer





                StackExchange.ifUsing("editor", function ()
                return StackExchange.using("mathjaxEditing", function ()
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                );
                );
                , "mathjax-editing");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "69"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                noCode: true, onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1030534%2fgradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                14












                $begingroup$

                We are looking to maximise the log probability of $lnP(y|x, theta)$:



                $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



                The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
                -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



                So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



                For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



                $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



                The derivatives with respect to the hyperparameters are as follows:



                $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



                However, often GP libraries use the notation:



                $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



                where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



                $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



                $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



                Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.






                share|cite|improve this answer











                $endgroup$












                • $begingroup$
                  Many thanks! This is very helpful.
                  $endgroup$
                  – aaronqli
                  Dec 20 '14 at 19:52






                • 2




                  $begingroup$
                  jpro is right, your answer for $fracdKdl$ is incorrect
                  $endgroup$
                  – George
                  Aug 25 '16 at 19:27










                • $begingroup$
                  Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                  $endgroup$
                  – j__
                  Aug 26 '16 at 13:21















                14












                $begingroup$

                We are looking to maximise the log probability of $lnP(y|x, theta)$:



                $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



                The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
                -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



                So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



                For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



                $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



                The derivatives with respect to the hyperparameters are as follows:



                $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



                However, often GP libraries use the notation:



                $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



                where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



                $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



                $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



                Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.






                share|cite|improve this answer











                $endgroup$












                • $begingroup$
                  Many thanks! This is very helpful.
                  $endgroup$
                  – aaronqli
                  Dec 20 '14 at 19:52






                • 2




                  $begingroup$
                  jpro is right, your answer for $fracdKdl$ is incorrect
                  $endgroup$
                  – George
                  Aug 25 '16 at 19:27










                • $begingroup$
                  Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                  $endgroup$
                  – j__
                  Aug 26 '16 at 13:21













                14












                14








                14





                $begingroup$

                We are looking to maximise the log probability of $lnP(y|x, theta)$:



                $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



                The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
                -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



                So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



                For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



                $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



                The derivatives with respect to the hyperparameters are as follows:



                $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



                However, often GP libraries use the notation:



                $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



                where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



                $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



                $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



                Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.






                share|cite|improve this answer











                $endgroup$



                We are looking to maximise the log probability of $lnP(y|x, theta)$:



                $$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$



                The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is



                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
                -frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$



                So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.



                For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:



                $$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$



                The derivatives with respect to the hyperparameters are as follows:



                $$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                $$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$



                However, often GP libraries use the notation:



                $$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$



                where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:



                $$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$



                $$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$



                There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.



                Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Jan 30 at 12:46









                Martin Ferianc

                34




                34










                answered Dec 17 '14 at 23:54









                j__j__

                1,271717




                1,271717











                • $begingroup$
                  Many thanks! This is very helpful.
                  $endgroup$
                  – aaronqli
                  Dec 20 '14 at 19:52






                • 2




                  $begingroup$
                  jpro is right, your answer for $fracdKdl$ is incorrect
                  $endgroup$
                  – George
                  Aug 25 '16 at 19:27










                • $begingroup$
                  Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                  $endgroup$
                  – j__
                  Aug 26 '16 at 13:21
















                • $begingroup$
                  Many thanks! This is very helpful.
                  $endgroup$
                  – aaronqli
                  Dec 20 '14 at 19:52






                • 2




                  $begingroup$
                  jpro is right, your answer for $fracdKdl$ is incorrect
                  $endgroup$
                  – George
                  Aug 25 '16 at 19:27










                • $begingroup$
                  Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                  $endgroup$
                  – j__
                  Aug 26 '16 at 13:21















                $begingroup$
                Many thanks! This is very helpful.
                $endgroup$
                – aaronqli
                Dec 20 '14 at 19:52




                $begingroup$
                Many thanks! This is very helpful.
                $endgroup$
                – aaronqli
                Dec 20 '14 at 19:52




                2




                2




                $begingroup$
                jpro is right, your answer for $fracdKdl$ is incorrect
                $endgroup$
                – George
                Aug 25 '16 at 19:27




                $begingroup$
                jpro is right, your answer for $fracdKdl$ is incorrect
                $endgroup$
                – George
                Aug 25 '16 at 19:27












                $begingroup$
                Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                $endgroup$
                – j__
                Aug 26 '16 at 13:21




                $begingroup$
                Ah sorry about that - I'll fix it when I get home not to confuse people in the future
                $endgroup$
                – j__
                Aug 26 '16 at 13:21











                9












                $begingroup$

                I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



                $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

                I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



                $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



                With simple calculations, I finally get:



                $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.






                share|cite|improve this answer











                $endgroup$

















                  9












                  $begingroup$

                  I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



                  $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

                  I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



                  $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



                  With simple calculations, I finally get:



                  $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.






                  share|cite|improve this answer











                  $endgroup$















                    9












                    9








                    9





                    $begingroup$

                    I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



                    $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

                    I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



                    $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



                    With simple calculations, I finally get:



                    $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.






                    share|cite|improve this answer











                    $endgroup$



                    I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):



                    $K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

                    I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.



                    $fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.



                    With simple calculations, I finally get:



                    $fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.







                    share|cite|improve this answer














                    share|cite|improve this answer



                    share|cite|improve this answer








                    edited Feb 5 '15 at 10:28

























                    answered Jan 31 '15 at 15:13









                    jprojpro

                    10116




                    10116





















                        1












                        $begingroup$

                        Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



                        Partial deivatives log marginal likelihood w.r.t. hyperparameters



                        where the 2 terms have different signs and the y targets vector is transposed just the first time.






                        share|cite|improve this answer









                        $endgroup$

















                          1












                          $begingroup$

                          Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



                          Partial deivatives log marginal likelihood w.r.t. hyperparameters



                          where the 2 terms have different signs and the y targets vector is transposed just the first time.






                          share|cite|improve this answer









                          $endgroup$















                            1












                            1








                            1





                            $begingroup$

                            Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



                            Partial deivatives log marginal likelihood w.r.t. hyperparameters



                            where the 2 terms have different signs and the y targets vector is transposed just the first time.






                            share|cite|improve this answer









                            $endgroup$



                            Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:



                            Partial deivatives log marginal likelihood w.r.t. hyperparameters



                            where the 2 terms have different signs and the y targets vector is transposed just the first time.







                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered Mar 25 '17 at 12:09









                            DavideMDavideM

                            113




                            113





















                                0












                                $begingroup$

                                As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



                                $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



                                where $alpha=K^-1y$



                                Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?






                                share|cite|improve this answer










                                New contributor




                                Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.






                                $endgroup$

















                                  0












                                  $begingroup$

                                  As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



                                  $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



                                  where $alpha=K^-1y$



                                  Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?






                                  share|cite|improve this answer










                                  New contributor




                                  Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                  Check out our Code of Conduct.






                                  $endgroup$















                                    0












                                    0








                                    0





                                    $begingroup$

                                    As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



                                    $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



                                    where $alpha=K^-1y$



                                    Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?






                                    share|cite|improve this answer










                                    New contributor




                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.






                                    $endgroup$



                                    As DavideM mentions, the gradient of marginal likelihood can be computed as follows:



                                    $$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$



                                    where $alpha=K^-1y$



                                    Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?







                                    share|cite|improve this answer










                                    New contributor




                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.









                                    share|cite|improve this answer



                                    share|cite|improve this answer








                                    edited Mar 28 at 16:35





















                                    New contributor




                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.









                                    answered Mar 28 at 15:41









                                    Vilius CiuzelisVilius Ciuzelis

                                    12




                                    12




                                    New contributor




                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.





                                    New contributor





                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.






                                    Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Mathematics Stack Exchange!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        Use MathJax to format equations. MathJax reference.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1030534%2fgradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Triangular numbers and gcdProving sum of a set is $0 pmod n$ if $n$ is odd, or $fracn2 pmod n$ if $n$ is even?Is greatest common divisor of two numbers really their smallest linear combination?GCD, LCM RelationshipProve a set of nonnegative integers with greatest common divisor 1 and closed under addition has all but finite many nonnegative integers.all pairs of a and b in an equation containing gcdTriangular Numbers Modulo $k$ - Hit All Values?Understanding the Existence and Uniqueness of the GCDGCD and LCM with logical symbolsThe greatest common divisor of two positive integers less than 100 is equal to 3. Their least common multiple is twelve times one of the integers.Suppose that for all integers $x$, $x|a$ and $x|b$ if and only if $x|c$. Then $c = gcd(a,b)$Which is the gcd of 2 numbers which are multiplied and the result is 600000?

                                        Barbados Ynhâld Skiednis | Geografy | Demografy | Navigaasjemenu

                                        Σερβία Πίνακας περιεχομένων Γεωγραφία | Ιστορία | Πολιτική | Δημογραφία | Οικονομία | Τουρισμός | Εκπαίδευση και επιστήμη | Πολιτισμός | Δείτε επίσης | Παραπομπές | Εξωτερικοί σύνδεσμοι | Μενού πλοήγησης43°49′00″N 21°08′00″E / 43.8167°N 21.1333°E / 43.8167; 21.133344°49′14″N 20°27′44″E / 44.8206°N 20.4622°E / 44.8206; 20.4622 (Βελιγράδι)Επίσημη εκτίμηση«Σερβία»«Human Development Report 2018»Παγκόσμιος Οργανισμός Υγείας, Προσδόκιμο ζωής και υγιές προσδόκιμο ζωής, Δεδομένα ανά χώρα2003 statistics2004 statistics2005 statistics2006 statistics2007 statistics2008 statistics2009-2013 statistics2014 statisticsStatistical Yearbook of the Republic of Serbia – Tourism, 20152016 statisticsStatistical Yearbook of the Republic of Serbia – Tourism, 2015Πληροφορίες σχετικά με τη Σερβία και τον πολιτισμό τηςΣερβική ΠροεδρίαΕθνικός Οργανισμός Τουρισμού της ΣερβίαςΣερβική ΕθνοσυνέλευσηΣερβίαεε