QUALITY CONTROL II

Students “t”

The purpose of this and previous blogs is to show that statistics is basically linear algebra and intrinsically simple. Thus in previous posts I showed that data gathered can be simply expressed by lines, the lengths of which represent the mean and the standard deviation of the data respectively.  It was also established that the two lines are independent of one another. If we actually knew that they absolutely represented the true values of each, we would be through. However, that is not the case. The mean is always sort of “fuzzy” in that we can’t be sure that what we measured is the true value. Measuring uncertainty is where it gets complicated. There is usually uncertainty with the standard deviation, but not always.  Data sources from facilities that routinely manufacture a product may have sufficient data on the standard deviations to be able to assume their data represents the true value.

True Value of the Standard Deviation is Known.

 In this case the normal distribution is all that is needed to evaluate the uncertainty of the mean.

 

True Value of the Standard Deviation is Not Known

Usually standard deviations are also fuzzy thus both the mean and the standard deviations can be considered to be random variables. While the mean is normally distributed, the square of the standard deviation (variance) is distributed according to the chi squared distribution. (The chi squared distribution with one degree of freedom is the square of the normal distribution.) However, the distribution that we want is that of the mean divided by the standard deviation, both of which being random variable.

Derivation of the t” Distribution

 The complication is what we need now is the distribution of the normal distribution divided by the square root of the chi square distribution. The equation for that is:

 f(z) = Integral |x|f(x)f(zx)dx

where zx is the normal distribution, x is the square root of the chi square distribution and z is the “t”  distribution.

 The calculation of the derivation of the “t” distribution may be found in Statistical Inference, Vijay K. Rohatgi, John Wiley & Sons, 1984.

So we see that while the basic concepts of statistics is simple, the problem of uncertainty is complex.

Advertisements

ASPHALT QUALITY CONTROL

Means and Standard Deviations as Lengths

When we talk about quality control we hear about distributions, such as the poisson, hypergeometric, binomial, normal, “t”, chi-squared and “F”. How complicated! And we are told to worry about things being independent, are inundated with words like variance, mean, median, mode, standard deviation, whether the standard deviation is homo or hetroscedastic (whether the standard deviation is constant or not), confidence limits, and such things as Type I error, Type II error, null hypothesis etc. It cannot be denied that all of these have their place. However, to get to the basics, all we are really trying to do is measure lengths. Statistics is really simply analytical geometry or linear algebra, depending on one’s outlook. Let’s look at the mean and standard deviation.

Mean (one type of average). We are told that it is the first moment around the origin.

Mathematically it is the integral of xf(x)dx between some limits where f(x) is some distribution  function. Yet it is still length.

Consider a set of “n” data points, X= (x1, x2, —, xn). Then visualize a graph of n dimensions with a single location, X, representing those data. Also visualize a line in that n dimensional space that is equidistant from each axis, i.e. It goes through (1,1,—–,1) etc. Drop a line perpendicular from X to that equidistant line. Call that point M=(µ, µ,—-, µ).  Divide every point by the square root of n, the number of data points to introduce the number of tests into our considerations.

The line (δ ) from the X to M would be the vector (x1– µ, x2– µ, —, xn– µ) while the line (µ) from the origin to M would be the vector (µ, µ,—-, µ). Since the two lines are perpendicular, their scalar (or inner or dot) product would be zero:

((µ, µ,—-, µ))·((x1– µ, x2– µ,—, xn– µ)/ )= 0

x1, + x2, +—-,+ xn – nµ = 0

µ= (x1, + x2, +—-, + xn)/n, which is identical to the form for the mean.

That is, the length of the line µ from the origin to M is equal in value to the mean of the data points.

Standard Deviation. The length of the line, δ, from X to M is the square root of (1/n)*((x1)2+ (x2)2+—-,+ (xn)2 – nµ2). (1/n)*(x1)2+ (x2)2+—-,+ (xn)2 is the square of the length of the line from the origin to the data, X,  while (1/n)*(nµ2) is the square of the length from the origin to the point of M.

δ = ((1/n)*((x1)2+ (x2)2+—-, + (xn)2 -nµ2))0.5

Thus the equation of the length of the line δ is identically to one of the equations used for calculating standard deviations (where the standard deviation is not a random variable. If the sample standard deviation (s) is a random variable, 1/n would be replaced with 1/(n-1)).

Rulers. To measure lengths we need a ruler. We use miles in the United States, in Canada they use kilometers while in Russia, the Verst may be used. In statistics the ruler used is the length, “δ”, if the standard deviation is known or, “s” if the standard deviation is a random variable.

The many terms mentioned above and the sophistication of the mathematics are important in establishing the reliability of the data, still, basically we are only measuring lengths.