Confidence Interval for a Population Mean


This file is part of a program based on the Bio 4835 Biostatistics class taught at Kean University in Union, New Jersey.  The course uses the following text:
Daniel, W. W. 1999.  Biostatistics: a foundation for analysis in the health sciences.  New York: John Wiley and Sons.  
The file follows this text very closely and readers are encouraged to consult the text for further information.

A) Confidence interval for a population mean

Estimating the mean

Estimating the mean of a normally distributed population entails drawing a sample of size n and computing x-bar which is used as a point estimate of mu.

It is more meaningful to estimate mu by an interval that communicates information regarding the probable magnitude of mu.

Sample distributions and estimation

Interval estimates are based on sampling distributions.  When the sample mean is being used as an estimator of a population mean, and the population is normally distributed, the sample mean will be normally distributed with mean, mu sub x-bar, equal to the population mean, mu, and variance sd variance.

The 95% confidence interval

Approximately 95% of the values of x making up the distribution will lie within 2 standard deviations of the mean.  The interval is noted by the two points, mu minus 2 sigma x-bar and mu plus 2 sigma x-bar, so that 95% of the values are in the interval, mu +- 2 sigma x-bar.

Since mu and mu sub x-bar are unknown, the location of the distribution is uncertain.  We can use x-bar as a point estimate of mu.  In constructing intervals of mu +- 2 sigma x-bar, 95% of these intervals would contain mu.

95% confidence interval
Example

Suppose a researcher, interested in obtaining an estimate of the average level of some enzyme in a certain human population, takes a sample of 10 individuals, determines the level of the enzyme in each, and computes a sample mean of x = 22.  Suppose further it is known that the variable of interest is approximately normally distributed with a variance of 45.  We wish to estimate mu.

Solution

An approximate confidence interval for muis given by:

                confidence interval calculation

Components of an interval estimate

This is the general form for an interval estimate.

        estimator (reliability coefficient) times (standard error)

The general form for an interval estimate consists of three components.  These are known as the estimator, the reliability coefficient, and the standard error.

Estimator: The interval estimate of mu is centered on the point estimate of mu.  As noted in the table above, x-baris an unbiased point estimator for mu.  

Reliability coefficient: Approximately 95% of the values of the standard normal curve lie within 2 standard deviations of the mean.  The z score in this case is called the reliability coefficient.  We use a value of z that will give the correct interval size.  The proper z score depends on the value of alphabeing used.  Generally, the three values of alphamost commonly used are .01, .05 and .10.  Their corresponding z scores are 1.645, 1.96 and 2.575, respectively, as shown in the table below.

Table of reliability coefficients

        table of reliability coefficients

Standard error:  The standard error equals standard error formula

Interpretation of confidence intervals

The interval estimate for mu is expressed as:  interval estimate for mu

Assuming that we are using a value of alpha=.05, we can say that, in repeated sampling, 95% of the intervals constructed this way will include mu.  This is based on the probability of occurrence of different values of x-bar.

The area of the curve of x-bar that is outside the area of the interval is called alpha, and the area inside the interval is called 1-alpha .

Interpretation of the interval

There are two ways in which interval estimates can be interpreted.  These are known as the probabilistic interpretation and the practical interpretation.  

The probabilistic interpretation results from repeated sampling.  With repeated sampling from a normally distributed population with a known standard deviation, 100(1-alpha ) percent of all intervals in the form interval estimate for mu will, in the long run, include the population mean, mu.  The quantity 1-alpha is called the confidence coefficient or confidence level and the intervalinterval estimate for mu , is called the confidence interval for mu.

Note that the percentage of intervals involved depends on the value of alpha.  With modern electronic devices such as the TI-83 calculator and Microsoft Excel, it is possible to use any value of alpha.  When statistics was developing during the 20th century, such devices were not generally available so one had to use tables.  These tables were very difficult to prepare and so only a few values of alphawere supported.  The most commonly used values of alphaare .01, .05, and .10.  When these are used in the formula 100 (1-alpha ), they yield percentages of 99%, 95%, and 90%, respectively.  The most widely used value for a confidence level is 95%, which corresponds to alpha=.05.  Using this figure, the probabilistic interpretation says that in 100 samplings, 95 of them should include mu.  For situations in which there is neither time nor ability to do 100 samplings, the practical interpretation is used.

The practical interpretation of the interval is used for a single sampling.  When sampling is from a normally distributed population with known standard deviation, we are 100(1-alpha ) percent confident that the single computed interval, interval estimate for mu , contains the population mean, mu.

Precision

Precision indicates how much the values deviate from their mean.  Precision is found by multiplying the reliability factor by the standard error of the mean.  This is also called the margin of error.

Exercise 6.2.2

We wish to estimate the mean serum indirect bilirubin level of 4-day-old infants.  The mean for a sample of 16 infants was found to be 5.98 mg/dl.  Assuming bilirubin levels in 4-day-old infants are approximately normally distributed with a standard deviation of 3.5 mg/dl find:
    A) The 90% confidence interval for mu
    B) The 95% confidence interval for mu
    C) The 99% confidence interval for mu

 (1) Given
        x-bar = 5.98
        sigma = 3.5
         n = 16
 (2) Sketch
normal curve
(3) Calculations

We start with the formula for an interval estimate then substitute the values given in the problem.
bilirubin solution

Then we need to determine the values of the reliability coefficient that will be used in solving the three parts of the problem.  We consult the Table of Reliability Coefficients above.  The correct value of reliability coefficient is multiplied by the standard error (.975).  The resulting value is subtracted from then added to the value of x-barto give the boundaries of the interval estimate.

    A)  90% interval (z = 1.645)

                        5.98 1.645 (.875)

           5.98-1.439375, 5.98+1.439375
                      (4.5408, 7.4129)
            Interpretation:  We estimate the population mean to be 5.98.  We are 90% confident that the true value of the mean lies between 4.5408 and 7.4129)

    B)  95% interval (z = 1.96)

                    5.98 1.96 (.875)
                    (4.265, 7.695)
            Interpretation:  We estimate the population mean to be 5.98.  We are 95% confident that the true value of the mean lies between 4.265 and 7.695)

    C)  99% interval (z = 2.575)

                    5.98 2.575 (.875)
                    (3.7261, 8.2339)
            Interpretation:  We estimate the population mean to be 5.98.  We are 99% confident that the true value of the mean lies between 3.7261 and 8.2339)

(4) Results

            A higher percent confidence level gives a wider band.  There is less chance of making an error but there is more uncertainty.
            Calculator answers are more accurate because the calculator uses exact values and derives its answers from calculus.


The t distribution

In most real life situations the variance of the population is unknown.  We know that the z scorez score formula , is normally distributed if the population is normally distributed and is approximately normally distributed when the population is large.  But, it cannot be used because sigma is unknown.

Estimation of the standard deviation

The sample standard deviation, sample standard deviation formula, can be used to replace sigma.  If ngreater than or equal to 30, then s is a good approximation of sigma.  An alternate procedure is used when the samples are small.  It is known as Student's t distribution.

Student's t distribution

Student's t distribution is used as an alternative for z with small samples.  It uses the following formula:

        t-score formula

Properties of the t distribution

1.  Mean = 0
2.  It is symmetrical about the mean.
3.  Variance is greater than 1 but approaches 1 as the sample gets large.  For df > 2, the variance = df/(df-2) or variance
4.  The range is -infinity to +infinity .
5.  t is really a family of distributions because the divisors are different.
6.  Compared with the normal distribution, t is less peaked and has higher tails.
7.  t distribution approaches the normal distribution as n-1 approaches infinity.


Confidence interval for a mean using t

When sampling is from a normal distribution whose standard deviation, sigma, is unknown, the
100(1-alpha ) percent confidence interval for the population mean, x-bar, is given by:

        t interval formula


Deciding between z and t

When constructing a confidence interval for a population mean, we must decide whether to use z or tWhich one to use depends on the size of the sample, whether it is normally distributed or not, and whether or not the variance is known.  There are various flowcharts and decision keys that can be used to help decide.  Mine appears below.


Key for deciding between z and t in confidence interval construction
 
1.
    Population normally distributed................2
       Not as above—normally distributed.........5
 
2.    Sample size is large (30 or higher)............3
       Sample size is small (less than 30)............4
 
3.    Population variance is known.............use z
       Population variance not known.... use t (or z)
 
4.    Population variance is known.............use z
       Population variance is not known.......use t
 
5.    Sample size is large..................................6
       Sample size is small..................................7
 
6.    Population variance is known.............use z
       Population variance not known
       (central limit theorem applies)............use z
 
7.    Must use a non-parametric method

Example

In a study of preeclampsia, Kaminski and Rechberger found the mean systolic blood pressure of 10 healthy, nonpregnant women to be 119 with a standard deviation of 2.1.

(Preeclampsia:  Development of hypertension, albuminuria, or edema between the 20th week of pregnancy and the first week postpartum.
Eclampsia:  Coma and/or convulsive seizures in the same time period, without other etiology.)

a.  What is the estimated standard error of the mean?
b.  Construct the 99% confidence interval for the mean of the population from which the 10 subjects may be presumed to be a random sample.
c.  What is the precision of the estimate?
d.  What assumptions are necessary for the validity of the confidence interval you constructed?

(1) Given
        n = 10
        x-bar = 119
        s = 2.1

(2) Sketch of t distribution
t graph

(3) Calculations

 standard error formula
                         = .6640783086

99% confidence interval  

(The correct value of t for a 99% confidence interval with 9 degrees of freedom is 3.2498)

              t interval formula 

            119 3.2498 (.66407...)

                116.84, 121.16

Precision = 3.2498 (.66407...)
               = 2.158121687

 Assumptions

  • The population is normally distributed
  • The 10 subjects represent a random sample from this population