Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parametersGradient of gaussian process marginal likelihood with automatic relevance detectionHyperparameter gradients for Matérn covarianceFisher Expected Information for a Gaussian Process modelGaussian process for machine learnigGradient of gaussian process marginal likelihood with automatic relevance detectionHow to use Conjugate Gradient Method to maximize log marginal likelihoodHow to optimize the log likelihood to obtain parameters for the maximum likelihood estimate?Maximum Likelihood with Gaussian distributionAveraged log-likelihood with a latent variable for mixture modelsConsider a binary classifier and Gaussian conditional distribution, with mean and covariance matrix. Find the decision boundary. More details below.Kullback Leibler Divergence for Gaussian processHyperparameter gradients for Matérn covariance

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

What Exploit Are These User Agents Trying to Use?

How exploitable/balanced is this homebrew spell: Spell Permanency?

Rotate ASCII Art by 45 Degrees

Does the Idaho Potato Commission associate potato skins with healthy eating?

Why didn't Boeing produce its own regional jet?

What's the meaning of "Sollensaussagen"?

Why is the sentence "Das ist eine Nase" correct?

How to coordinate airplane tickets?

What reasons are there for a Capitalist to oppose a 100% inheritance tax?

Why were 5.25" floppy drives cheaper than 8"?

Machine learning testing data

Is it possible to create a QR code using text?

Solving an equation with constraints

One verb to replace 'be a member of' a club

Finding the reason behind the value of the integral.

What exactly is ineptocracy?

Theorists sure want true answers to this!

How to remove border from elements in the last row?

Should I tell management that I intend to leave due to bad software development practices?

OP Amp not amplifying audio signal

My ex-girlfriend uses my Apple ID to log in to her iPad. Do I have to give her my Apple ID password to reset it?

How to show a landlord what we have in savings?

Why do I get negative height?

Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parameters

Gradient of gaussian process marginal likelihood with automatic relevance detectionHyperparameter gradients for Matérn covarianceFisher Expected Information for a Gaussian Process modelGaussian process for machine learnigGradient of gaussian process marginal likelihood with automatic relevance detectionHow to use Conjugate Gradient Method to maximize log marginal likelihoodHow to optimize the log likelihood to obtain parameters for the maximum likelihood estimate?Maximum Likelihood with Gaussian distributionAveraged log-likelihood with a latent variable for mixture modelsConsider a binary classifier and Gaussian conditional distribution, with mean and covariance matrix. Find the decision boundary. More details below.Kullback Leibler Divergence for Gaussian processHyperparameter gradients for Matérn covariance

The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf

But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.

I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.

Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?

asked Nov 20 '14 at 8:57

aaronqli

272214

add a comment |

The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf

But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.

Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?

asked Nov 20 '14 at 8:57

aaronqli

272214

add a comment |

The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf

But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.

Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?

asked Nov 20 '14 at 8:57

aaronqli

272214

The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf

But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.

Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?

machine-learning

asked Nov 20 '14 at 8:57

aaronqli

272214

asked Nov 20 '14 at 8:57

aaronqli

272214

asked Nov 20 '14 at 8:57

aaronqli

272214

asked Nov 20 '14 at 8:57

aaronqli

272214

asked Nov 20 '14 at 8:57

aaronqli

272214

add a comment |

4 Answers
4

active

oldest

votes

We are looking to maximise the log probability of $lnP(y|x, theta)$:

$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$

The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$

So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.

For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:

$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$

The derivatives with respect to the hyperparameters are as follows:

$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$

However, often GP libraries use the notation:

$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$

where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:

$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$

$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.

Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.

edited Jan 30 at 12:46

Martin Ferianc

answered Dec 17 '14 at 23:54

j__

1,271717

$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52

2

$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27

$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21

add a comment |

I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):

$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.

$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.

With simple calculations, I finally get:

$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.

edited Feb 5 '15 at 10:28

answered Jan 31 '15 at 15:13

jpro

10116

add a comment |

Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:

Partial deivatives log marginal likelihood w.r.t. hyperparameters

where the 2 terms have different signs and the y targets vector is transposed just the first time.

answered Mar 25 '17 at 12:09

DavideM

113

add a comment |

As DavideM mentions, the gradient of marginal likelihood can be computed as follows:

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$

where $alpha=K^-1y$

Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?

edited Mar 28 at 16:35

answered Mar 28 at 15:41

Vilius Ciuzelis

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1030534%2fgradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

We are looking to maximise the log probability of $lnP(y|x, theta)$:

$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$

The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$

So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.

For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:

$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$

The derivatives with respect to the hyperparameters are as follows:

$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$

However, often GP libraries use the notation:

$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$

$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$

$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.

edited Jan 30 at 12:46

Martin Ferianc

answered Dec 17 '14 at 23:54

j__

1,271717

$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52

2

$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27

$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21

add a comment |

We are looking to maximise the log probability of $lnP(y|x, theta)$:

$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$

The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$

So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.

For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:

$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$

The derivatives with respect to the hyperparameters are as follows:

$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$

However, often GP libraries use the notation:

$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$

$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$

$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.

edited Jan 30 at 12:46

Martin Ferianc

answered Dec 17 '14 at 23:54

j__

1,271717

$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52

2

$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27

$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21

add a comment |

We are looking to maximise the log probability of $lnP(y|x, theta)$:

$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$

The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$

So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.

For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:

$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$

The derivatives with respect to the hyperparameters are as follows:

$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$

However, often GP libraries use the notation:

$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$

$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$

$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.

edited Jan 30 at 12:46

Martin Ferianc

answered Dec 17 '14 at 23:54

j__

1,271717

We are looking to maximise the log probability of $lnP(y|x, theta)$:

$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$

The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$

So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.

For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:

$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$

The derivatives with respect to the hyperparameters are as follows:

$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$

However, often GP libraries use the notation:

$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$

$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$

$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$

Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.

edited Jan 30 at 12:46

Martin Ferianc

answered Dec 17 '14 at 23:54

j__

1,271717

edited Jan 30 at 12:46

Martin Ferianc

edited Jan 30 at 12:46

Martin Ferianc

edited Jan 30 at 12:46

Martin Ferianc

answered Dec 17 '14 at 23:54

j__

1,271717

answered Dec 17 '14 at 23:54

j__

1,271717

answered Dec 17 '14 at 23:54

j__

1,271717

$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52

2

$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27

$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21

add a comment |

$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52

2

$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27

$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21

Many thanks! This is very helpful.

– aaronqli
Dec 20 '14 at 19:52

jpro is right, your answer for $fracdKdl$ is incorrect

– George
Aug 25 '16 at 19:27

Ah sorry about that - I'll fix it when I get home not to confuse people in the future

– j__
Aug 26 '16 at 13:21

add a comment |

I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):

$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.

$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.

With simple calculations, I finally get:

$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.

edited Feb 5 '15 at 10:28

answered Jan 31 '15 at 15:13

jpro

10116

add a comment |

I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):

$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.

$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.

With simple calculations, I finally get:

$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.

edited Feb 5 '15 at 10:28

answered Jan 31 '15 at 15:13

jpro

10116

add a comment |

I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):

$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.

$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.

With simple calculations, I finally get:

$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.

edited Feb 5 '15 at 10:28

answered Jan 31 '15 at 15:13

jpro

10116

I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):

$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$

I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.

$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.

With simple calculations, I finally get:

$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.

edited Feb 5 '15 at 10:28

answered Jan 31 '15 at 15:13

jpro

10116

edited Feb 5 '15 at 10:28

answered Jan 31 '15 at 15:13

jpro

10116

answered Jan 31 '15 at 15:13

jpro

10116

answered Jan 31 '15 at 15:13

jpro

10116

add a comment |

Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:

Partial deivatives log marginal likelihood w.r.t. hyperparameters

where the 2 terms have different signs and the y targets vector is transposed just the first time.

answered Mar 25 '17 at 12:09

DavideM

113

add a comment |

Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:

Partial deivatives log marginal likelihood w.r.t. hyperparameters

where the 2 terms have different signs and the y targets vector is transposed just the first time.

answered Mar 25 '17 at 12:09

DavideM

113

add a comment |

Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:

Partial deivatives log marginal likelihood w.r.t. hyperparameters

where the 2 terms have different signs and the y targets vector is transposed just the first time.

answered Mar 25 '17 at 12:09

DavideM

113

Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:

Partial deivatives log marginal likelihood w.r.t. hyperparameters

where the 2 terms have different signs and the y targets vector is transposed just the first time.

answered Mar 25 '17 at 12:09

DavideM

113

answered Mar 25 '17 at 12:09

DavideM

113

answered Mar 25 '17 at 12:09

DavideM

113

answered Mar 25 '17 at 12:09

DavideM

113

add a comment |

As DavideM mentions, the gradient of marginal likelihood can be computed as follows:

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$

where $alpha=K^-1y$

edited Mar 28 at 16:35

answered Mar 28 at 15:41

Vilius Ciuzelis

New contributor

add a comment |

As DavideM mentions, the gradient of marginal likelihood can be computed as follows:

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$

where $alpha=K^-1y$

edited Mar 28 at 16:35

answered Mar 28 at 15:41

Vilius Ciuzelis

New contributor

add a comment |

As DavideM mentions, the gradient of marginal likelihood can be computed as follows:

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$

where $alpha=K^-1y$

edited Mar 28 at 16:35

answered Mar 28 at 15:41

Vilius Ciuzelis

New contributor

As DavideM mentions, the gradient of marginal likelihood can be computed as follows:

$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$

where $alpha=K^-1y$

edited Mar 28 at 16:35

answered Mar 28 at 15:41

Vilius Ciuzelis

New contributor

edited Mar 28 at 16:35

answered Mar 28 at 15:41

Vilius Ciuzelis

New contributor

answered Mar 28 at 15:41

Vilius Ciuzelis

answered Mar 28 at 15:41

Vilius Ciuzelis

New contributor

Vilius Ciuzelis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dgdrxrt

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

Boston (Lincolnshire) Stedsbyld | Berne yn Boston | NavigaasjemenuBoston Borough CouncilBoston, Lincolnshire

Ballerup Komuun Stääden an saarpen | Futnuuten | Luke uk diar | Nawigatsjuunwww.ballerup.dkwww.statistikbanken.dk: Tabelle BEF44 (Folketal pr. 1. januar fordelt på byer)Commonskategorii: Ballerup Komuun55° 44′ N, 12° 22′ O

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Boston (Lincolnshire) Stedsbyld | Berne yn Boston | NavigaasjemenuBoston Borough CouncilBoston, Lincolnshire

Ballerup Komuun Stääden an saarpen | Futnuuten | Luke uk diar | Nawigatsjuunwww.ballerup.dkwww.statistikbanken.dk: Tabelle BEF44 (Folketal pr. 1. januar fordelt på byer)Commonskategorii: Ballerup Komuun55° 44′ N, 12° 22′ O

4 Answers
4

4 Answers
4

4 Answers
4