Re: Questions on the properties of inter-mingled samples



 Science > Physics > Re: Questions on the properties of inter-mingled samples

LINK TO THIS PAGE  


rating :  0   |  0


  Page 1 of 1

1

 
Topic: Science > Physics
User: "Harry"
Date: 13 Aug 2003 06:11:56 PM
Object: Re: Questions on the properties of inter-mingled samples
Rich Ulrich,
i had a feeling that the mean was going to be easy to calculate.
however, the standard deviation of this new, 'composite' sample wasn't
going to be easy given their inter-correlations.
i have tried to find the answer myself in the libraries, bookstores,
and internet, but to no avail. i'm sure that people have done
research on "variance pooling", because i see some tantalizing sites
on the net about this topic. finally, i know that the answer to this
problem is not going to be that apparant or "commonsensical" like you
would think.
again (for those who don't know my original question): given 3 sets:
S1, S2, and S3 with means and standard deviations (SD) of m1, m2, m3,
sd1, sd2, and sd3, what is the mean and sd of a new sample composed of
a% S1, b% S2, and c% S3, given that the 3 samples are correlated with
one another?

suppose that you have 2 different sets, S1 and S2. there are many,
many sample points in each one. S1 has a mean of x1 and a standard
deviation, sd, of sd1. moreover, S2 has a mean and sd of x2 and sd2.
moreover, S1 and S2 are slightly correllated with each other with R =
-0.4.

suppose that i created a new set called S* with 45% of its sample
points from S1 and 55% from S2, what would the new standard deviation
and mean be? is there an algorithm/calculation for this?


To be concrete, in order to show how silly the problem is,
I restate that:
You have a vector of (X1, X2) for weight and height; which
are correlated with some r.
You want to know, if you create an X3
where 45% of the numbers are weights, and 55% are heights,
then WHAT is the new mean and SD?

Obviously, from the re-statement, the correlation is
a red-herring -- unless there is some secret connection
not explained.

So, Yes. You can describe the two means and two SDs;
and have different samples; and have an overall mean
and overall SD. These 'details' are the statistics (say)
of an ANOVA; here are the within-group means and variances.

To get the totals: You can see that the mean is the simply-
weighted composite of the two. For the SD: I would
probably open a basic statistics book to get the ANOVA
formulas, to make sure that I did not mess up the
weighting for figuring the sum-of-squares of the group
means around the overall mean.


[snip, extended version of Q, with nothing new that I noted.]

.

User: "Rich Ulrich"

Title: Re: Questions on the properties of inter-mingled samples 14 Aug 2003 09:42:42 AM
On 13 Aug 2003 16:11:56 -0700,
(Harry) wrote:

Rich Ulrich,

i had a feeling that the mean was going to be easy to calculate.
however, the standard deviation of this new, 'composite' sample wasn't
going to be easy given their inter-correlations.

- You have not shown any way in which the correlation
can have *any* effect. I suspect that you do not understand
what "correlation" denotes, because you persist in saying
that you have "sets" instead of saying that you have
pairs of numbers --


i have tried to find the answer myself in the libraries, bookstores,
and internet, but to no avail. i'm sure that people have done
research on "variance pooling", because i see some tantalizing sites
on the net about this topic. finally, i know that the answer to this
problem is not going to be that apparant or "commonsensical" like you
would think.

again (for those who don't know my original question): given 3 sets:
S1, S2, and S3 with means and standard deviations (SD) of m1, m2, m3,
sd1, sd2, and sd3, what is the mean and sd of a new sample composed of
a% S1, b% S2, and c% S3, given that the 3 samples are correlated with
one another?

[ snip, rest]
Trying my concrete example some more:
There are N pairs of (X,Y) where
X is weight, Y is height; they are correlated.
Means and SDs are known.
What is the mean and SD for Z,
if a vector Z of length N is constructed that consists
of random selections from (X,Y), so that some percentage
of the time Z is set to X, and the rest, Y?
- the mean and variance of these numbers, Z,
are readily determined as I described, if there isn't
anything more to the problem than this. (For instance,
autocorrelation? Non-random selection? I'm
imagining that the seemingly-silly model was
inspired by some reality....)
Now -- perhaps this is where
Harry's intuition is being stimulated --
there could be narrower limits on the
*variance* of the variance, or the
variance of the *mean*. For instance, if
X and Y were identical in the first place,
-- r= 1.0, same means and SDs --
then Z would be identical to them, too,
and the mean and SD of Z would have
no "sampling variability" relative to
the "population" of the fixed X,Y.
If this is intended to represent something real and
interesting, it would possibly be helpful to say WHAT.
--
Rich Ulrich,

http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization." Justice Holmes.
.


  Page 1 of 1

1

 


Related Articles
 

NEWER

pg.1612     pg.1232     pg.940     pg.716     pg.544     pg.412     pg.311     pg.234     pg.175     pg.130     pg.96     pg.70     pg.50     pg.35     pg.24     pg.16     pg.10     pg.6     pg.3     pg.1

OLDER