This file is part of a
program based on the Bio 4835 Biostatistics class taught at
Daniel, W. W. 1999. Biostatistics: a foundation for analysis in the health sciences.
The file follows this text very closely and readers are encouraged to consult the text for further information.
Confidence interval for the difference of two population
From each of two populations an independent random sample is drawn. Sample means, and , are calculated. The difference is
- which is an unbiased estimator of the difference between the two population means, - . The variance of the estimator is
( / ) + ( / ).
Conditions for use
Assuming the populations are normally distributed, there are three situations where we would determine the 100(1- ) percent confidence interval for - .
a) where the population variances are known (use z)
b) where the population variances are unknown but equal (use t)
c) where the population variances are unknown but unequal (use t'). There is some acceptance of t' for cases such as these.
The concept of t' is noted here so that readers are aware of its existence but it will not be treated further in this narrative.
Situation a) Population variances are known (z is used)
When the population variances are known, the 100(1- ) percent confidence interval for - is given by
A research team is interested in the difference between serum uric acid levels in patients with and without Down's syndrome. In a large hospital for the treatment of the mentally retarded, a sample of 12 individuals with Down's syndrome yielded a mean of = 4.5 mg/100 ml. In a general hospital a sample of 15 normal individuals of the same age and sex were found to have a mean value of = 3.4 mg/100 ml. If it is reasonable to assume that the two populations of values are normally distributed with variances equal to 1 and 1.5, find the 95 percent confidence interval for - .
= 12, = 4.5, = 1
= 15, = 3.4, = 1.5
- = 4.5 - 3.4 = 1.1
1.1 ± 1.96 (.4282)
Discussion: As this is a z-interval, we know that the correct value of z to use is 1.96. We interpret this interval that the difference between the two population means is 1.1 and we are 95% confident that the true mean lies between 0.26 and 1.94.
Situation b) Population variances are unknown but can be assumed to be equal (t is used)
If it can be assumed that the population variances are equal then each sample variance is actually a point estimate of the same quantity. Therefore, we can combine the sample variances to form a pooled estimate.
The pooled estimated of the common variance is made using weighted averages. This means that each sample variance is weighted by its degrees of freedom.
Pooled estimate of the variance
The pooled estimate of the variance comes from the formula:
Standard error of the estimate
The standard error of the estimate is
The 100(1- ) confidence interval for - is
= 13, = 21.0, = 4.9
= 17, = 12.1, = 5.6
- = 21.0 - 12.1 = 8.9
8.9 ± 2.0484 (1.9569)
8.9 ± 4.0085
Discussion: The correct value of t to use for a 95% confidence interval with 28 degrees of freedom is 2.0484. We interpret this interval that the difference between the two population means is estimated to be 8.9 and we are 95% confident that the true value lies between 4.9 and 12.9.