Many of the specifications that are now in use by the Federal agencies and states are based upon the laws of probability, not absolute limits. They are called *Statistical Specifications*, and use the law of probability to assess specification compliance. They are quite mathematical and not very well understood even by many of those who are using them. It must be discussed because it is no longer *the wave of the future* but in use throughout the country. Small details can cause specification noncompliance even when the product furnished is of excellent quality. The lack of understanding of the laws of probability by those who are writing specifications has caused many of the problems which result in penalties.

**What does statistical mean?**

At first, one may say “well, he must be talking about gathering lots and lots of data.” That is not what is meant by statistical, although data is needed to develop a sound statistically based specification. The word “statistical” and the word “probability” are closely related. The introduction of statistics is the introduction of the theory of probability and concerns chance. Some of the top statisticians in the country are undoubtably working in Las Vegas, Atlantic City and in many Indian casinos to make sure that the games offered will assure that those who play them will, on the average, lose. The same laws that govern gambling govern statistical specification including, with some specifications, assurance that the contractor will not, on the average, get full pay.

The specifications generally are based upon the assurance that for a single pay item with a 100% pay factor, on the average, the contractor who is actually within specification 100% of the time would get paid 95%. Thus any specification that has a maximum pay factor of 100% has a penalty build within it which is, by the way, calculable. To overcome this problems many specifications will have as a possible “bonus” an extra 5%. However this is really not a bonus but a way to make the specification fair.

**Precision and Accuracy**

Before I go any further, I need to clear up a misunderstanding. The words *Precision* and *Accuracy* are often used interchangeably, however their meanings are quite different. Accuracy refers to how close the average of the test values represents the actual or true value. Precision, on the other hand, refers to how close repeated tests on the same sample are to each other. It is therefore possible to have accurate but imprecise test data or to have very precise but inaccurate data. As an example, the data may be scattered and, when plotted out, would resemble a shotgun pattern, but the mean of the data would be close to the true value. The data is therefore accurate, but imprecise (however such imprecise data requires many tests to obtain accuracy). We also can have a narrow shotgun pattern, which would give us a group of data that was very close to the true value, which would be accurate and precise.

On the other hand, a piece of equipment might very accurately reproduce itself but does not give an accurate answer. It’s like shooting a very narrow pattern at a target, but with the pattern way up into one of the corners, such as would happen if the sight of the rifle is off. While the rifle is shooting with precision, it is not accurate. An example of how this can happen in a laboratory would be the case when a thermometer has a split mercury. The thermometer might be reading, say, 10° high. If that thermometer was used to control the viscosity bath, all of the viscosities would be reported very high. If we ran those viscosities over and over, we would get very close to the same answer as the precision of the viscosity test is quite good, but the answer would be wrong in all cases. The method would be very precise but not accurate.

What can the specifying engineer do if he wishes to be assured that the actual values lie within a narrower range that can be justified by the confidence limits for a single test? That is quite easy. Run duplicate or higher replicate test and use the average of the replicates. The standard deviation of duplicates on % asphalt is 0.20 rather than 0.28 while the standard deviation for quadruplicates is 0.14. One way replication has been achieved without increasing the testing load is to use running averages of, say, the last five tests. The other way is to use lot sizes of 3-7 samples and use the lot average and range of data.

In the early days specifications were generated that would have a number of different criteria with the pay factor that of the lowest value. This gets expensive. If there are two criteria each with a 5% chance of being out of specification compliance when they were actually in, there is a 10% chance that one or the other will be out of specification compliance thus even though they were actually in specification on both criteria. The pay would be 90%. If there were 5 criteria, the pay would be only 77%. I was on one project in which I calculated that there was only one chance out of 10^^{15} that the contractor could get full pay on all lots. That case was settled in favor of the contractor just prior to the trial.

**When limits are not really limits**

Before the introduction of statistical specifications if the specification for, say, % asphalt was 4.5-5.5, an average of 4.4 would be fine. Now 4.5 and 5.5 would be the lower and upper limits for calculating pay factors. An average of 4.4 from a lot size of 4 would have a pay factor of 92% (assuming a standard deviation of 0.28).

These are just brief comments of a complicated but important area of mathematics.

Robert L. Dunning www.petroleumsciences.com, chemistdunning@gmail.com