How does the formula of the correlation coefficient measures “linear” relationship? The Next CEO of Stack OverflowProving that the magnitude of the sample correlation coefficient is at most $1$Pearson Correlation Coefficient InterpretationCalculating the correlation coefficient between least square estimatesPearson Correlation Coefficient Formula UnderstandingCorrelation CoefficientMotivation Of Correlation Coefficient FormulaCorrelation coefficient, and the probability of correct classificationWhy does the correlation coefficient work?Correlation coefficient in terms of standard units: intuitionRelationship between Pearson's Correlation Coefficient and distance of data points from line of best fit
Avoiding the "not like other girls" trope?
Read/write a pipe-delimited file line by line with some simple text manipulation
A hang glider, sudden unexpected lift to 25,000 feet altitude, what could do this?
Calculating discount not working
Is it correct to say moon starry nights?
Strange use of "whether ... than ..." in official text
Compensation for working overtime on Saturdays
Raspberry pi 3 B with Ubuntu 18.04 server arm64: what pi version
How can I prove that a state of equilibrium is unstable?
Car headlights in a world without electricity
Airship steam engine room - problems and conflict
Calculate the Mean mean of two numbers
Small nick on power cord from an electric alarm clock, and copper wiring exposed but intact
What difference does it make matching a word with/without a trailing whitespace?
Which acid/base does a strong base/acid react when added to a buffer solution?
What did the word "leisure" mean in late 18th Century usage?
Why did early computer designers eschew integers?
How to unfasten electrical subpanel attached with ramset
Find a path from s to t using as few red nodes as possible
What is the difference between 'contrib' and 'non-free' packages repositories?
Why did the Drakh emissary look so blurred in S04:E11 "Lines of Communication"?
Do I need to write [sic] when including a quotation with a number less than 10 that isn't written out?
Is it okay to majorly distort historical facts while writing a fiction story?
Is there a rule of thumb for determining the amount one should accept for a settlement offer?
How does the formula of the correlation coefficient measures “linear” relationship?
The Next CEO of Stack OverflowProving that the magnitude of the sample correlation coefficient is at most $1$Pearson Correlation Coefficient InterpretationCalculating the correlation coefficient between least square estimatesPearson Correlation Coefficient Formula UnderstandingCorrelation CoefficientMotivation Of Correlation Coefficient FormulaCorrelation coefficient, and the probability of correct classificationWhy does the correlation coefficient work?Correlation coefficient in terms of standard units: intuitionRelationship between Pearson's Correlation Coefficient and distance of data points from line of best fit
$begingroup$
We do know that Pearson's correlation correlation coefficient measures the strength of the relationship (how much correlated) between two random variables , but then, what about $textbflinearity$ , how does this very formula :
$$r = fracsum_i=1^n(x_i - barx)(y_i - bary)sqrtsum_i=1^n(x_i - barx)^2sum_i=1^n(y_i - bary)^2$$
measures specifically a $textbflinear$ relationship ? Is there an intuitive way to look at it that would explain why does it quantify a linear relationship ?
probability statistics
$endgroup$
add a comment |
$begingroup$
We do know that Pearson's correlation correlation coefficient measures the strength of the relationship (how much correlated) between two random variables , but then, what about $textbflinearity$ , how does this very formula :
$$r = fracsum_i=1^n(x_i - barx)(y_i - bary)sqrtsum_i=1^n(x_i - barx)^2sum_i=1^n(y_i - bary)^2$$
measures specifically a $textbflinear$ relationship ? Is there an intuitive way to look at it that would explain why does it quantify a linear relationship ?
probability statistics
$endgroup$
add a comment |
$begingroup$
We do know that Pearson's correlation correlation coefficient measures the strength of the relationship (how much correlated) between two random variables , but then, what about $textbflinearity$ , how does this very formula :
$$r = fracsum_i=1^n(x_i - barx)(y_i - bary)sqrtsum_i=1^n(x_i - barx)^2sum_i=1^n(y_i - bary)^2$$
measures specifically a $textbflinear$ relationship ? Is there an intuitive way to look at it that would explain why does it quantify a linear relationship ?
probability statistics
$endgroup$
We do know that Pearson's correlation correlation coefficient measures the strength of the relationship (how much correlated) between two random variables , but then, what about $textbflinearity$ , how does this very formula :
$$r = fracsum_i=1^n(x_i - barx)(y_i - bary)sqrtsum_i=1^n(x_i - barx)^2sum_i=1^n(y_i - bary)^2$$
measures specifically a $textbflinear$ relationship ? Is there an intuitive way to look at it that would explain why does it quantify a linear relationship ?
probability statistics
probability statistics
asked Feb 6 at 19:55
HilbertHilbert
1769
1769
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
In order to show how the Pearson's correlation correlation coefficient (simply "r" from now) measures the strength of the linear relationship between two variables, it may be useful to show that if one variable is a (positive) linear combination of the other, then $r$ = $1$.
That is:
$$forall a, b in mathbbR, Y = aX + b Rightarrow Cov(X, Y) = sqrtVar(X)sqrtVar(Y)$$
where the latter clearly implies $r$ = $1$.
Proof:
beginalign
Cov(X, Y) &= E(XY) - E(X)E(Y) \
&= E[X(aX + b)] - E(X)E(aX + b) \
&= E(aX^2 + bX) - a[E(X)]^2 - bE(X) \
&= a[E(X^2) - [E(X)]^2] + bE(X) - bE(X) \
&= aVar(X)
endalign
where I have used $E(aX)$ = $a$$E(X)$ and $E(b)$ = $b$ if $b$ and $a$ are constants.
We also have:
$$Var(Y) = Var(aX + b) = a^2Var(X)$$
using $Var(aX)$ = $a^2$$Var(X)$ and $Var(b)$ = $0$ if $b$ and $a$ are constants. This implies:
$$sqrtVar(Y) = asqrtVar(X)$$
from which we finally obtain that:
$$sqrtVar(X)sqrtVar(Y) = aVar(X)$$
proving the claim. Similarly can be proved $r$ = $-1$ if $Y$ = $-aX$ + $b$ exploiting $Var(-X)$ = $Var(X)$.
More in general, when $r$ is between $-1$ and $1$ (excluding the case $0$ implying no linear relation) it means that the data present "somewhat" a linear relationship. That is, scatter plotting the two variables, we would see that the majority of the data points (excluding outliers) are gathered in a cloud around line, and the more $r$ is far from either $-1$ or $1$, the more disperse is the cloud around, respectively, a negatively and positively sloped line.
New contributor
$endgroup$
1
$begingroup$
Thank you, this demonstration is neat and straightforward.
$endgroup$
– Hilbert
Mar 28 at 10:21
add a comment |
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3102966%2fhow-does-the-formula-of-the-correlation-coefficient-measures-linear-relationsh%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
In order to show how the Pearson's correlation correlation coefficient (simply "r" from now) measures the strength of the linear relationship between two variables, it may be useful to show that if one variable is a (positive) linear combination of the other, then $r$ = $1$.
That is:
$$forall a, b in mathbbR, Y = aX + b Rightarrow Cov(X, Y) = sqrtVar(X)sqrtVar(Y)$$
where the latter clearly implies $r$ = $1$.
Proof:
beginalign
Cov(X, Y) &= E(XY) - E(X)E(Y) \
&= E[X(aX + b)] - E(X)E(aX + b) \
&= E(aX^2 + bX) - a[E(X)]^2 - bE(X) \
&= a[E(X^2) - [E(X)]^2] + bE(X) - bE(X) \
&= aVar(X)
endalign
where I have used $E(aX)$ = $a$$E(X)$ and $E(b)$ = $b$ if $b$ and $a$ are constants.
We also have:
$$Var(Y) = Var(aX + b) = a^2Var(X)$$
using $Var(aX)$ = $a^2$$Var(X)$ and $Var(b)$ = $0$ if $b$ and $a$ are constants. This implies:
$$sqrtVar(Y) = asqrtVar(X)$$
from which we finally obtain that:
$$sqrtVar(X)sqrtVar(Y) = aVar(X)$$
proving the claim. Similarly can be proved $r$ = $-1$ if $Y$ = $-aX$ + $b$ exploiting $Var(-X)$ = $Var(X)$.
More in general, when $r$ is between $-1$ and $1$ (excluding the case $0$ implying no linear relation) it means that the data present "somewhat" a linear relationship. That is, scatter plotting the two variables, we would see that the majority of the data points (excluding outliers) are gathered in a cloud around line, and the more $r$ is far from either $-1$ or $1$, the more disperse is the cloud around, respectively, a negatively and positively sloped line.
New contributor
$endgroup$
1
$begingroup$
Thank you, this demonstration is neat and straightforward.
$endgroup$
– Hilbert
Mar 28 at 10:21
add a comment |
$begingroup$
In order to show how the Pearson's correlation correlation coefficient (simply "r" from now) measures the strength of the linear relationship between two variables, it may be useful to show that if one variable is a (positive) linear combination of the other, then $r$ = $1$.
That is:
$$forall a, b in mathbbR, Y = aX + b Rightarrow Cov(X, Y) = sqrtVar(X)sqrtVar(Y)$$
where the latter clearly implies $r$ = $1$.
Proof:
beginalign
Cov(X, Y) &= E(XY) - E(X)E(Y) \
&= E[X(aX + b)] - E(X)E(aX + b) \
&= E(aX^2 + bX) - a[E(X)]^2 - bE(X) \
&= a[E(X^2) - [E(X)]^2] + bE(X) - bE(X) \
&= aVar(X)
endalign
where I have used $E(aX)$ = $a$$E(X)$ and $E(b)$ = $b$ if $b$ and $a$ are constants.
We also have:
$$Var(Y) = Var(aX + b) = a^2Var(X)$$
using $Var(aX)$ = $a^2$$Var(X)$ and $Var(b)$ = $0$ if $b$ and $a$ are constants. This implies:
$$sqrtVar(Y) = asqrtVar(X)$$
from which we finally obtain that:
$$sqrtVar(X)sqrtVar(Y) = aVar(X)$$
proving the claim. Similarly can be proved $r$ = $-1$ if $Y$ = $-aX$ + $b$ exploiting $Var(-X)$ = $Var(X)$.
More in general, when $r$ is between $-1$ and $1$ (excluding the case $0$ implying no linear relation) it means that the data present "somewhat" a linear relationship. That is, scatter plotting the two variables, we would see that the majority of the data points (excluding outliers) are gathered in a cloud around line, and the more $r$ is far from either $-1$ or $1$, the more disperse is the cloud around, respectively, a negatively and positively sloped line.
New contributor
$endgroup$
1
$begingroup$
Thank you, this demonstration is neat and straightforward.
$endgroup$
– Hilbert
Mar 28 at 10:21
add a comment |
$begingroup$
In order to show how the Pearson's correlation correlation coefficient (simply "r" from now) measures the strength of the linear relationship between two variables, it may be useful to show that if one variable is a (positive) linear combination of the other, then $r$ = $1$.
That is:
$$forall a, b in mathbbR, Y = aX + b Rightarrow Cov(X, Y) = sqrtVar(X)sqrtVar(Y)$$
where the latter clearly implies $r$ = $1$.
Proof:
beginalign
Cov(X, Y) &= E(XY) - E(X)E(Y) \
&= E[X(aX + b)] - E(X)E(aX + b) \
&= E(aX^2 + bX) - a[E(X)]^2 - bE(X) \
&= a[E(X^2) - [E(X)]^2] + bE(X) - bE(X) \
&= aVar(X)
endalign
where I have used $E(aX)$ = $a$$E(X)$ and $E(b)$ = $b$ if $b$ and $a$ are constants.
We also have:
$$Var(Y) = Var(aX + b) = a^2Var(X)$$
using $Var(aX)$ = $a^2$$Var(X)$ and $Var(b)$ = $0$ if $b$ and $a$ are constants. This implies:
$$sqrtVar(Y) = asqrtVar(X)$$
from which we finally obtain that:
$$sqrtVar(X)sqrtVar(Y) = aVar(X)$$
proving the claim. Similarly can be proved $r$ = $-1$ if $Y$ = $-aX$ + $b$ exploiting $Var(-X)$ = $Var(X)$.
More in general, when $r$ is between $-1$ and $1$ (excluding the case $0$ implying no linear relation) it means that the data present "somewhat" a linear relationship. That is, scatter plotting the two variables, we would see that the majority of the data points (excluding outliers) are gathered in a cloud around line, and the more $r$ is far from either $-1$ or $1$, the more disperse is the cloud around, respectively, a negatively and positively sloped line.
New contributor
$endgroup$
In order to show how the Pearson's correlation correlation coefficient (simply "r" from now) measures the strength of the linear relationship between two variables, it may be useful to show that if one variable is a (positive) linear combination of the other, then $r$ = $1$.
That is:
$$forall a, b in mathbbR, Y = aX + b Rightarrow Cov(X, Y) = sqrtVar(X)sqrtVar(Y)$$
where the latter clearly implies $r$ = $1$.
Proof:
beginalign
Cov(X, Y) &= E(XY) - E(X)E(Y) \
&= E[X(aX + b)] - E(X)E(aX + b) \
&= E(aX^2 + bX) - a[E(X)]^2 - bE(X) \
&= a[E(X^2) - [E(X)]^2] + bE(X) - bE(X) \
&= aVar(X)
endalign
where I have used $E(aX)$ = $a$$E(X)$ and $E(b)$ = $b$ if $b$ and $a$ are constants.
We also have:
$$Var(Y) = Var(aX + b) = a^2Var(X)$$
using $Var(aX)$ = $a^2$$Var(X)$ and $Var(b)$ = $0$ if $b$ and $a$ are constants. This implies:
$$sqrtVar(Y) = asqrtVar(X)$$
from which we finally obtain that:
$$sqrtVar(X)sqrtVar(Y) = aVar(X)$$
proving the claim. Similarly can be proved $r$ = $-1$ if $Y$ = $-aX$ + $b$ exploiting $Var(-X)$ = $Var(X)$.
More in general, when $r$ is between $-1$ and $1$ (excluding the case $0$ implying no linear relation) it means that the data present "somewhat" a linear relationship. That is, scatter plotting the two variables, we would see that the majority of the data points (excluding outliers) are gathered in a cloud around line, and the more $r$ is far from either $-1$ or $1$, the more disperse is the cloud around, respectively, a negatively and positively sloped line.
New contributor
edited Mar 28 at 11:29
New contributor
answered Mar 28 at 10:14
NicgNicg
514
514
New contributor
New contributor
1
$begingroup$
Thank you, this demonstration is neat and straightforward.
$endgroup$
– Hilbert
Mar 28 at 10:21
add a comment |
1
$begingroup$
Thank you, this demonstration is neat and straightforward.
$endgroup$
– Hilbert
Mar 28 at 10:21
1
1
$begingroup$
Thank you, this demonstration is neat and straightforward.
$endgroup$
– Hilbert
Mar 28 at 10:21
$begingroup$
Thank you, this demonstration is neat and straightforward.
$endgroup$
– Hilbert
Mar 28 at 10:21
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3102966%2fhow-does-the-formula-of-the-correlation-coefficient-measures-linear-relationsh%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown