Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parametersGradient of gaussian process marginal likelihood with automatic relevance detectionHyperparameter gradients for Matérn covarianceFisher Expected Information for a Gaussian Process modelGaussian process for machine learnigGradient of gaussian process marginal likelihood with automatic relevance detectionHow to use Conjugate Gradient Method to maximize log marginal likelihoodHow to optimize the log likelihood to obtain parameters for the maximum likelihood estimate?Maximum Likelihood with Gaussian distributionAveraged log-likelihood with a latent variable for mixture modelsConsider a binary classifier and Gaussian conditional distribution, with mean and covariance matrix. Find the decision boundary. More details below.Kullback Leibler Divergence for Gaussian processHyperparameter gradients for Matérn covariance
Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?
What Exploit Are These User Agents Trying to Use?
How exploitable/balanced is this homebrew spell: Spell Permanency?
Rotate ASCII Art by 45 Degrees
Does the Idaho Potato Commission associate potato skins with healthy eating?
Why didn't Boeing produce its own regional jet?
What's the meaning of "Sollensaussagen"?
Why is the sentence "Das ist eine Nase" correct?
How to coordinate airplane tickets?
What reasons are there for a Capitalist to oppose a 100% inheritance tax?
Why were 5.25" floppy drives cheaper than 8"?
Machine learning testing data
Is it possible to create a QR code using text?
Solving an equation with constraints
One verb to replace 'be a member of' a club
Finding the reason behind the value of the integral.
What exactly is ineptocracy?
Theorists sure want true answers to this!
How to remove border from elements in the last row?
Should I tell management that I intend to leave due to bad software development practices?
OP Amp not amplifying audio signal
My ex-girlfriend uses my Apple ID to log in to her iPad. Do I have to give her my Apple ID password to reset it?
How to show a landlord what we have in savings?
Why do I get negative height?
Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parameters
Gradient of gaussian process marginal likelihood with automatic relevance detectionHyperparameter gradients for Matérn covarianceFisher Expected Information for a Gaussian Process modelGaussian process for machine learnigGradient of gaussian process marginal likelihood with automatic relevance detectionHow to use Conjugate Gradient Method to maximize log marginal likelihoodHow to optimize the log likelihood to obtain parameters for the maximum likelihood estimate?Maximum Likelihood with Gaussian distributionAveraged log-likelihood with a latent variable for mixture modelsConsider a binary classifier and Gaussian conditional distribution, with mean and covariance matrix. Find the decision boundary. More details below.Kullback Leibler Divergence for Gaussian processHyperparameter gradients for Matérn covariance
$begingroup$
The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf
But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.
I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.
Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?
machine-learning
$endgroup$
add a comment |
$begingroup$
The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf
But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.
I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.
Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?
machine-learning
$endgroup$
add a comment |
$begingroup$
The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf
But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.
I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.
Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?
machine-learning
$endgroup$
The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf
But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given.
I am implementing the Rprop algorithm in http://ml.informatik.uni-freiburg.de/_media/publications/blumesann2013.pdf for learning hyper-parameters sigma (signal variance) and h (length). Alas, my implementation is not working well. I have derived the gradients but I am not sure if they are correct.
Can someone point me to a good tutorial / article that explicitly give the expressions for the hyper parameter gradients?
machine-learning
machine-learning
asked Nov 20 '14 at 8:57
aaronqliaaronqli
272214
272214
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
$begingroup$
We are looking to maximise the log probability of $lnP(y|x, theta)$:
$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$
The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$
So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.
For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:
$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$
The derivatives with respect to the hyperparameters are as follows:
$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$
However, often GP libraries use the notation:
$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$
where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:
$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$
$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.
Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.
$endgroup$
$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52
2
$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27
$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21
add a comment |
$begingroup$
I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):
$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$
I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.
$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.
With simple calculations, I finally get:
$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.
$endgroup$
add a comment |
$begingroup$
Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:
Partial deivatives log marginal likelihood w.r.t. hyperparameters
where the 2 terms have different signs and the y targets vector is transposed just the first time.
$endgroup$
add a comment |
$begingroup$
As DavideM mentions, the gradient of marginal likelihood can be computed as follows:
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$
where $alpha=K^-1y$
Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1030534%2fgradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
We are looking to maximise the log probability of $lnP(y|x, theta)$:
$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$
The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$
So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.
For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:
$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$
The derivatives with respect to the hyperparameters are as follows:
$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$
However, often GP libraries use the notation:
$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$
where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:
$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$
$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.
Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.
$endgroup$
$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52
2
$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27
$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21
add a comment |
$begingroup$
We are looking to maximise the log probability of $lnP(y|x, theta)$:
$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$
The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$
So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.
For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:
$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$
The derivatives with respect to the hyperparameters are as follows:
$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$
However, often GP libraries use the notation:
$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$
where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:
$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$
$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.
Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.
$endgroup$
$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52
2
$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27
$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21
add a comment |
$begingroup$
We are looking to maximise the log probability of $lnP(y|x, theta)$:
$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$
The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$
So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.
For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:
$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$
The derivatives with respect to the hyperparameters are as follows:
$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$
However, often GP libraries use the notation:
$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$
where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:
$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$
$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.
Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.
$endgroup$
We are looking to maximise the log probability of $lnP(y|x, theta)$:
$$ln P(y|x, theta) = -frac12ln|K| - frac12y^tK^-1y - fracN2ln2pi$$
The three components can be seen as balancing the complexity of the GP (to avoid overfit) and the data fit, with a constant on the end. So the gradient is
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12y^TK^-1fracpartial Kpartialtheta_iK^-1y^T
-frac12mathrmtrleft(K^-1fracpartial Kpartialtheta_iright)$$
So all we need to know is $fracpartial Kpartialtheta_i$ to be able to solve it. I think you got this far but I wasn't sure so I thought I would recap.
For the case of the RBF/expodentiated quadratic (never call it squared exponential as this is actually incorrect) kernel, under the following formulation:
$$K(x,x') = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right)$$
The derivatives with respect to the hyperparameters are as follows:
$$fracpartial Kpartialsigma = 2sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
$$fracpartial Kpartial l = sigma^2expleft(frac-(x-x')^T(x-x')2l^2right) frac(x-x')^T(x-x')l^3$$
However, often GP libraries use the notation:
$$K(x,x') = sigmaexpleft(frac-(x-x')^T(x-x')lright)$$
where $sigma$ and $l$ is confined to only to positive real numbers. Let $l=exp(theta_1)$ and $sigma=exp(2theta_2)$, then by passing in $a,b$ we know are values will conform to this rule. In this case the derivatives are:
$$fracpartial Kpartialtheta_1 = sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)left(frac(x-x')^T(x-x')l^2right)$$
$$fracpartial Kpartial theta_2 = 2 sigmaexpleft(frac-(x-x')^T(x-x')2l^2right)$$
There is is interesting work carried out by the likes of Mike Osborne looking at marginalising out hyper parameters. However as far as I am aware I think it is only appropriate for low numbers of parameters and isn't incorporated in standard libraries yet. Worth a look all the same.
Another not is that the optimisation space is multimodal the majority of the time so if you are using convex optimisation be sure to use a fare few initialisations.
edited Jan 30 at 12:46
Martin Ferianc
34
34
answered Dec 17 '14 at 23:54
j__j__
1,271717
1,271717
$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52
2
$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27
$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21
add a comment |
$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52
2
$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27
$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21
$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52
$begingroup$
Many thanks! This is very helpful.
$endgroup$
– aaronqli
Dec 20 '14 at 19:52
2
2
$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27
$begingroup$
jpro is right, your answer for $fracdKdl$ is incorrect
$endgroup$
– George
Aug 25 '16 at 19:27
$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21
$begingroup$
Ah sorry about that - I'll fix it when I get home not to confuse people in the future
$endgroup$
– j__
Aug 26 '16 at 13:21
add a comment |
$begingroup$
I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):
$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$
I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.
$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.
With simple calculations, I finally get:
$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.
$endgroup$
add a comment |
$begingroup$
I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):
$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$
I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.
$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.
With simple calculations, I finally get:
$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.
$endgroup$
add a comment |
$begingroup$
I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):
$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$
I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.
$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.
With simple calculations, I finally get:
$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.
$endgroup$
I believe that the derivative of $fracpartial Kpartial l$, as it was given by j_f is not correct. I think that the correct one is the following (i present the derivation step by step):
$K(x,x') = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big)$
I now call $g(l)=big(frac-(x-x')^T(x-x')2l^2big)$. So $K=sigma^2expbig( g(l) big)$.
$fracpartial Kpartial l = fracpartial Kpartial g fracpartial gpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) fracpartial gpartial l$.
With simple calculations, I finally get:
$fracpartial Kpartial l = sigma^2expbig(frac-(x-x')^T(x-x')2l^2big) frac(x-x')^T(x-x')l^3$.
edited Feb 5 '15 at 10:28
answered Jan 31 '15 at 15:13
jprojpro
10116
10116
add a comment |
add a comment |
$begingroup$
Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:
Partial deivatives log marginal likelihood w.r.t. hyperparameters
where the 2 terms have different signs and the y targets vector is transposed just the first time.
$endgroup$
add a comment |
$begingroup$
Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:
Partial deivatives log marginal likelihood w.r.t. hyperparameters
where the 2 terms have different signs and the y targets vector is transposed just the first time.
$endgroup$
add a comment |
$begingroup$
Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:
Partial deivatives log marginal likelihood w.r.t. hyperparameters
where the 2 terms have different signs and the y targets vector is transposed just the first time.
$endgroup$
Maybe there is also an error in the gradient formulation, because in Rasmussen&Williams - Gaussian Process for Machine Learning, p.114, eq 5.9, it is expressed as:
Partial deivatives log marginal likelihood w.r.t. hyperparameters
where the 2 terms have different signs and the y targets vector is transposed just the first time.
answered Mar 25 '17 at 12:09
DavideMDavideM
113
113
add a comment |
add a comment |
$begingroup$
As DavideM mentions, the gradient of marginal likelihood can be computed as follows:
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$
where $alpha=K^-1y$
Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?
New contributor
$endgroup$
add a comment |
$begingroup$
As DavideM mentions, the gradient of marginal likelihood can be computed as follows:
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$
where $alpha=K^-1y$
Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?
New contributor
$endgroup$
add a comment |
$begingroup$
As DavideM mentions, the gradient of marginal likelihood can be computed as follows:
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$
where $alpha=K^-1y$
Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?
New contributor
$endgroup$
As DavideM mentions, the gradient of marginal likelihood can be computed as follows:
$$fracpartialpartialtheta_i log P(y|x, theta) = frac12traceleft(left(alphaalpha^T-K^-1right)fracpartial Kpartialtheta_iright)$$
where $alpha=K^-1y$
Since $K$ in marginal likelihood is the covariance matrix of inputs $x$, do we really care about the rest, i.e. $-(x-x')^T(x-x')$? All exponents go to 0 anyways when $x=x'=x$. Or am I missing something here?
New contributor
edited Mar 28 at 16:35
New contributor
answered Mar 28 at 15:41
Vilius CiuzelisVilius Ciuzelis
12
12
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1030534%2fgradients-of-marginal-likelihood-of-gaussian-process-with-squared-exponential-co%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown