The “correct” standard deviationCalculating mean and standard deviation of very large sample sizesHow to determine if Standard Deviation is high/lowHow do I state that a data set has a 'denser' standard deviation?Standard deviation of mean of a set of numbers, which are impreciseHow can the standard deviation be interpreted when the range is partially impossible?How mean change standard deviation?Finding the standard deviation for original and transformed dataIntuition for Standard Deviation$n$ vs $n-1$ for the standard deviationStandard deviation about a value other than the mean

Collect Fourier series terms

What is the word for reserving something for yourself before others do?

Show that if two triangles built on parallel lines, with equal bases have the same perimeter only if they are congruent.

Finding angle with pure Geometry.

TGV timetables / schedules?

Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?

Did Shadowfax go to Valinor?

How can I make my BBEG immortal short of making them a Lich or Vampire?

Is it legal for company to use my work email to pretend I still work there?

Adding span tags within wp_list_pages list items

Schoenfled Residua test shows proportionality hazard assumptions holds but Kaplan-Meier plots intersect

What's the point of deactivating Num Lock on login screens?

Can divisibility rules for digits be generalized to sum of digits

What are these boxed doors outside store fronts in New York?

Why do I get two different answers for this counting problem?

Can I ask the recruiters in my resume to put the reason why I am rejected?

Theorems that impeded progress

Which models of the Boeing 737 are still in production?

Why doesn't H₄O²⁺ exist?

Why, historically, did Gödel think CH was false?

Have astronauts in space suits ever taken selfies? If so, how?

Why can't I see bouncing of a switch on an oscilloscope?

"You are your self first supporter", a more proper way to say it

How does strength of boric acid solution increase in presence of salicylic acid?

The “correct” standard deviation

Calculating mean and standard deviation of very large sample sizesHow to determine if Standard Deviation is high/lowHow do I state that a data set has a 'denser' standard deviation?Standard deviation of mean of a set of numbers, which are impreciseHow can the standard deviation be interpreted when the range is partially impossible?How mean change standard deviation?Finding the standard deviation for original and transformed dataIntuition for Standard Deviation$n$ vs $n-1$ for the standard deviationStandard deviation about a value other than the mean

This may end up being a question more about scientific best practice than anything else, but I think this is the right community to ask it in to get the insight I'm looking for.

Say I have two little square widgets made out of a material that shrinks when it gets wet. I want to know by how much. I measure the length of the widgets along two lines each (because they're not shaped perfectly and my measurement technique isn't perfect), before and after soaking them with water. I come back with data that looks like this:

Widget Measurement Before After Shrinkage
1 1 1.898 1.722 0.176
1 2 1.904 1.737 0.167
2 1 2.003 1.763 0.240
2 2 2.029 1.843 0.186

Now, I can calculate the overall mean without worrying too much in this case, since the mean of two means is the same as the mean of all the points that went in as long as each mean has the same number of samples, which in this case they do. So:

avg(0.176,0.167,0.240,0.186) = 0.192 = avg(avg(0.176,0.167),avg(0.240,0.186))

However, this type of relation is not true for the standard deviation. There are several approaches that immediately present themselves to me as options for finding an overall standard deviation for this dataset:

Use all of the data at once: sd(0.176,0.167,0.240,0.186) = 0.033

Get a standard deviation for each widget, and average them: avg(sd(0.176,0.167),sd(0.240,0.186)) = 0.022

Get the average for each widget, and take the standard deviation of the two: sd(avg(0.176,0.167),avg(0.240,0.186)) = 0.029

Now, maybe it's just confusion on my part as to the meaning of a standard deviation, but I don't know which approach would be correct to use here (for the purpose of, for example, putting error bars on a graph). Intuitively I'm drawn to the first method, because it seems to incorporate the most information about the data in the actual standard deviation calculation. I'm wary, though, that doing this could be be implicitly making some assumption about the structure of the data, such as homogeneity, which may not actually hold.

What approach is generally regarded as correct, and what assumptions about the structure of the data does it imply? Is there another, more correct method (or another method that makes fewer assumptions) which I failed to list?

edited Mar 29 at 15:48

asked Mar 29 at 15:24

realityChemist

1836

2

$begingroup$
The assertion that "the mean of two means is the same as the mean of all the points that went in" is simply false in the general case. I believe this only holds true when each "sub mean" includes an equal number of values.
$endgroup$
– Brian
Mar 29 at 15:39

$begingroup$
Thanks for the heads up, I'll edit the question
$endgroup$
– realityChemist
Mar 29 at 15:44

add a comment |

This may end up being a question more about scientific best practice than anything else, but I think this is the right community to ask it in to get the insight I'm looking for.

Widget Measurement Before After Shrinkage
1 1 1.898 1.722 0.176
1 2 1.904 1.737 0.167
2 1 2.003 1.763 0.240
2 2 2.029 1.843 0.186

avg(0.176,0.167,0.240,0.186) = 0.192 = avg(avg(0.176,0.167),avg(0.240,0.186))

Use all of the data at once: sd(0.176,0.167,0.240,0.186) = 0.033

Get a standard deviation for each widget, and average them: avg(sd(0.176,0.167),sd(0.240,0.186)) = 0.022

Get the average for each widget, and take the standard deviation of the two: sd(avg(0.176,0.167),avg(0.240,0.186)) = 0.029

edited Mar 29 at 15:48

asked Mar 29 at 15:24

realityChemist

1836

2

$begingroup$
The assertion that "the mean of two means is the same as the mean of all the points that went in" is simply false in the general case. I believe this only holds true when each "sub mean" includes an equal number of values.
$endgroup$
– Brian
Mar 29 at 15:39

$begingroup$
Thanks for the heads up, I'll edit the question
$endgroup$
– realityChemist
Mar 29 at 15:44

add a comment |

This may end up being a question more about scientific best practice than anything else, but I think this is the right community to ask it in to get the insight I'm looking for.

Widget Measurement Before After Shrinkage
1 1 1.898 1.722 0.176
1 2 1.904 1.737 0.167
2 1 2.003 1.763 0.240
2 2 2.029 1.843 0.186

avg(0.176,0.167,0.240,0.186) = 0.192 = avg(avg(0.176,0.167),avg(0.240,0.186))

Use all of the data at once: sd(0.176,0.167,0.240,0.186) = 0.033

Get a standard deviation for each widget, and average them: avg(sd(0.176,0.167),sd(0.240,0.186)) = 0.022

Get the average for each widget, and take the standard deviation of the two: sd(avg(0.176,0.167),avg(0.240,0.186)) = 0.029

edited Mar 29 at 15:48

asked Mar 29 at 15:24

realityChemist

1836

This may end up being a question more about scientific best practice than anything else, but I think this is the right community to ask it in to get the insight I'm looking for.

Widget Measurement Before After Shrinkage
1 1 1.898 1.722 0.176
1 2 1.904 1.737 0.167
2 1 2.003 1.763 0.240
2 2 2.029 1.843 0.186

avg(0.176,0.167,0.240,0.186) = 0.192 = avg(avg(0.176,0.167),avg(0.240,0.186))

Use all of the data at once: sd(0.176,0.167,0.240,0.186) = 0.033

Get a standard deviation for each widget, and average them: avg(sd(0.176,0.167),sd(0.240,0.186)) = 0.022

Get the average for each widget, and take the standard deviation of the two: sd(avg(0.176,0.167),avg(0.240,0.186)) = 0.029

statistics philosophy descriptive-statistics

edited Mar 29 at 15:48

asked Mar 29 at 15:24

realityChemist

1836

edited Mar 29 at 15:48

asked Mar 29 at 15:24

realityChemist

1836

edited Mar 29 at 15:48

asked Mar 29 at 15:24

realityChemist

1836

asked Mar 29 at 15:24

realityChemist

1836

asked Mar 29 at 15:24

realityChemist

1836

2

$begingroup$
The assertion that "the mean of two means is the same as the mean of all the points that went in" is simply false in the general case. I believe this only holds true when each "sub mean" includes an equal number of values.
$endgroup$
– Brian
Mar 29 at 15:39

$begingroup$
Thanks for the heads up, I'll edit the question
$endgroup$
– realityChemist
Mar 29 at 15:44

add a comment |

2

$begingroup$
The assertion that "the mean of two means is the same as the mean of all the points that went in" is simply false in the general case. I believe this only holds true when each "sub mean" includes an equal number of values.
$endgroup$
– Brian
Mar 29 at 15:39

$begingroup$
Thanks for the heads up, I'll edit the question
$endgroup$
– realityChemist
Mar 29 at 15:44

The assertion that "the mean of two means is the same as the mean of all the points that went in" is simply false in the general case. I believe this only holds true when each "sub mean" includes an equal number of values.

– Brian
Mar 29 at 15:39

Thanks for the heads up, I'll edit the question

– realityChemist
Mar 29 at 15:44

add a comment |

2 Answers
2

active

oldest

votes

Before I answer your question: In general it is not true that the mean of two means is the mean of all points. consider the example $avg(avg(0,0,0),avg(1,1)) = 0.5 neq 0.4 = avg(0,0,0,1,1)$.

Regarding the standard deviation: Only your first method actually makes sense, because the other methods do in general not coincide with the definition of the standard deviation.

answered Mar 29 at 15:43

flawr

11.7k32546

add a comment |

If you consider your shrinkage estimates as samples from distributions with a common variance then the pooled estimate of the common variance is
$$
s^2=frac(n_1-1)s_1^2+(n_2-1)s_2^2n_1+n_2-2
$$

In this expression you have a sample of size $n_1$ with sample variance $s_1^2$ and a sample of size $n_2$ with sample variance $s_2^2$

If I understand your data, you have $n_1=2$ for widget 1 and $n_2=2$ for widget 2 giving
$$
s^2=fracs_1^2+s_2^22
$$
so actually the variance is the average of the individual variances, in this case. The standard deviation is the square root of the variance.

This link may be helpful.

answered Mar 29 at 16:50

PM.

3,4432925

$begingroup$
I'm a bit confused by this (from the link): "You can only use the above formulas if the standard deviations for the two groups are the same (this is because it would otherwise be violating the assumption of homogeneity of variances." [sic] If the two standard deviations were the same, wouldn't these formulas simplify to tautologies, $s^2 = s^2$? Are they trying to say that the population standard deviations need to be the same in order to use this for sample standard deviations?
$endgroup$
– realityChemist
Mar 29 at 17:06

$begingroup$
@realityChemist This is what I meant when saying you have samples from a distribution with a common variance. All your samples are estimates of that common variance. I can't vouch for the exact content of the link I'm afraid, it was only added to provide a starting reference, terminology and a stepping stone to further searching around if need be.
$endgroup$
– PM.
Mar 29 at 17:15

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3167246%2fthe-correct-standard-deviation%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Before I answer your question: In general it is not true that the mean of two means is the mean of all points. consider the example $avg(avg(0,0,0),avg(1,1)) = 0.5 neq 0.4 = avg(0,0,0,1,1)$.

Regarding the standard deviation: Only your first method actually makes sense, because the other methods do in general not coincide with the definition of the standard deviation.

answered Mar 29 at 15:43

flawr

11.7k32546

add a comment |

Before I answer your question: In general it is not true that the mean of two means is the mean of all points. consider the example $avg(avg(0,0,0),avg(1,1)) = 0.5 neq 0.4 = avg(0,0,0,1,1)$.

Regarding the standard deviation: Only your first method actually makes sense, because the other methods do in general not coincide with the definition of the standard deviation.

answered Mar 29 at 15:43

flawr

11.7k32546

add a comment |

Before I answer your question: In general it is not true that the mean of two means is the mean of all points. consider the example $avg(avg(0,0,0),avg(1,1)) = 0.5 neq 0.4 = avg(0,0,0,1,1)$.

Regarding the standard deviation: Only your first method actually makes sense, because the other methods do in general not coincide with the definition of the standard deviation.

answered Mar 29 at 15:43

flawr

11.7k32546

Before I answer your question: In general it is not true that the mean of two means is the mean of all points. consider the example $avg(avg(0,0,0),avg(1,1)) = 0.5 neq 0.4 = avg(0,0,0,1,1)$.

Regarding the standard deviation: Only your first method actually makes sense, because the other methods do in general not coincide with the definition of the standard deviation.

answered Mar 29 at 15:43

flawr

11.7k32546

answered Mar 29 at 15:43

flawr

11.7k32546

answered Mar 29 at 15:43

flawr

11.7k32546

answered Mar 29 at 15:43

flawr

11.7k32546

add a comment |

If you consider your shrinkage estimates as samples from distributions with a common variance then the pooled estimate of the common variance is
$$
s^2=frac(n_1-1)s_1^2+(n_2-1)s_2^2n_1+n_2-2
$$

In this expression you have a sample of size $n_1$ with sample variance $s_1^2$ and a sample of size $n_2$ with sample variance $s_2^2$

This link may be helpful.

answered Mar 29 at 16:50

PM.

3,4432925

$begingroup$
I'm a bit confused by this (from the link): "You can only use the above formulas if the standard deviations for the two groups are the same (this is because it would otherwise be violating the assumption of homogeneity of variances." [sic] If the two standard deviations were the same, wouldn't these formulas simplify to tautologies, $s^2 = s^2$? Are they trying to say that the population standard deviations need to be the same in order to use this for sample standard deviations?
$endgroup$
– realityChemist
Mar 29 at 17:06

$begingroup$
@realityChemist This is what I meant when saying you have samples from a distribution with a common variance. All your samples are estimates of that common variance. I can't vouch for the exact content of the link I'm afraid, it was only added to provide a starting reference, terminology and a stepping stone to further searching around if need be.
$endgroup$
– PM.
Mar 29 at 17:15

add a comment |

If you consider your shrinkage estimates as samples from distributions with a common variance then the pooled estimate of the common variance is
$$
s^2=frac(n_1-1)s_1^2+(n_2-1)s_2^2n_1+n_2-2
$$

In this expression you have a sample of size $n_1$ with sample variance $s_1^2$ and a sample of size $n_2$ with sample variance $s_2^2$

This link may be helpful.

answered Mar 29 at 16:50

PM.

3,4432925

$begingroup$
I'm a bit confused by this (from the link): "You can only use the above formulas if the standard deviations for the two groups are the same (this is because it would otherwise be violating the assumption of homogeneity of variances." [sic] If the two standard deviations were the same, wouldn't these formulas simplify to tautologies, $s^2 = s^2$? Are they trying to say that the population standard deviations need to be the same in order to use this for sample standard deviations?
$endgroup$
– realityChemist
Mar 29 at 17:06

$begingroup$
@realityChemist This is what I meant when saying you have samples from a distribution with a common variance. All your samples are estimates of that common variance. I can't vouch for the exact content of the link I'm afraid, it was only added to provide a starting reference, terminology and a stepping stone to further searching around if need be.
$endgroup$
– PM.
Mar 29 at 17:15

add a comment |

If you consider your shrinkage estimates as samples from distributions with a common variance then the pooled estimate of the common variance is
$$
s^2=frac(n_1-1)s_1^2+(n_2-1)s_2^2n_1+n_2-2
$$

In this expression you have a sample of size $n_1$ with sample variance $s_1^2$ and a sample of size $n_2$ with sample variance $s_2^2$

This link may be helpful.

answered Mar 29 at 16:50

PM.

3,4432925

If you consider your shrinkage estimates as samples from distributions with a common variance then the pooled estimate of the common variance is
$$
s^2=frac(n_1-1)s_1^2+(n_2-1)s_2^2n_1+n_2-2
$$

In this expression you have a sample of size $n_1$ with sample variance $s_1^2$ and a sample of size $n_2$ with sample variance $s_2^2$

This link may be helpful.

answered Mar 29 at 16:50

PM.

3,4432925

answered Mar 29 at 16:50

PM.

3,4432925

answered Mar 29 at 16:50

PM.

3,4432925

answered Mar 29 at 16:50

PM.

3,4432925

$begingroup$
I'm a bit confused by this (from the link): "You can only use the above formulas if the standard deviations for the two groups are the same (this is because it would otherwise be violating the assumption of homogeneity of variances." [sic] If the two standard deviations were the same, wouldn't these formulas simplify to tautologies, $s^2 = s^2$? Are they trying to say that the population standard deviations need to be the same in order to use this for sample standard deviations?
$endgroup$
– realityChemist
Mar 29 at 17:06

$begingroup$
@realityChemist This is what I meant when saying you have samples from a distribution with a common variance. All your samples are estimates of that common variance. I can't vouch for the exact content of the link I'm afraid, it was only added to provide a starting reference, terminology and a stepping stone to further searching around if need be.
$endgroup$
– PM.
Mar 29 at 17:15

add a comment |

$begingroup$
I'm a bit confused by this (from the link): "You can only use the above formulas if the standard deviations for the two groups are the same (this is because it would otherwise be violating the assumption of homogeneity of variances." [sic] If the two standard deviations were the same, wouldn't these formulas simplify to tautologies, $s^2 = s^2$? Are they trying to say that the population standard deviations need to be the same in order to use this for sample standard deviations?
$endgroup$
– realityChemist
Mar 29 at 17:06

$begingroup$
@realityChemist This is what I meant when saying you have samples from a distribution with a common variance. All your samples are estimates of that common variance. I can't vouch for the exact content of the link I'm afraid, it was only added to provide a starting reference, terminology and a stepping stone to further searching around if need be.
$endgroup$
– PM.
Mar 29 at 17:15

I'm a bit confused by this (from the link): "You can only use the above formulas if the standard deviations for the two groups are the same (this is because it would otherwise be violating the assumption of homogeneity of variances." [sic] If the two standard deviations were the same, wouldn't these formulas simplify to tautologies, $s^2 = s^2$? Are they trying to say that the population standard deviations need to be the same in order to use this for sample standard deviations?

– realityChemist
Mar 29 at 17:06

@realityChemist This is what I meant when saying you have samples from a distribution with a common variance. All your samples are estimates of that common variance. I can't vouch for the exact content of the link I'm afraid, it was only added to provide a starting reference, terminology and a stepping stone to further searching around if need be.

– PM.
Mar 29 at 17:15

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dgdrxrt

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Boston (Lincolnshire) Stedsbyld | Berne yn Boston | NavigaasjemenuBoston Borough CouncilBoston, Lincolnshire

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Boston (Lincolnshire) Stedsbyld | Berne yn Boston | NavigaasjemenuBoston Borough CouncilBoston, Lincolnshire

2 Answers
2

2 Answers
2

2 Answers
2