This file is part of a program based on the Bio 4835 Biostatistics class taught
at

Daniel, W. W. 1999. Biostatistics: a foundation for analysis in the
health sciences.

The file follows this text very closely and readers are encouraged to consult
the text for further information.

**A) Confidence interval for a population mean
**

Estimating the mean of a normally distributed population entails drawing a sample of size n and computing which is used as a point estimate of .

It is more meaningful to estimate by an interval that communicates information regarding the probable magnitude of .

Interval estimates are based on sampling distributions. When the sample mean is being used as an estimator of a population mean, and the population is normally distributed, the sample mean will be normally distributed with mean, , equal to the population mean, , and variance .

Approximately 95% of the values of x making up the distribution will lie within 2 standard deviations of the mean. The interval is noted by the two points, and , so that 95% of the values are in the interval, .

Since and are unknown, the location of the distribution is uncertain. We can use as a point estimate of . In constructing intervals of , 95% of these intervals would contain .

Suppose a researcher, interested in obtaining an estimate of the average level of some enzyme in a certain human population, takes a sample of 10 individuals, determines the level of the enzyme in each, and computes a sample mean of x = 22. Suppose further it is known that the variable of interest is approximately normally distributed with a variance of 45. We wish to estimate .

An approximate confidence interval for is given by:

This is the general form for an interval estimate.

estimator ± (reliability coefficient) (standard error)

The general form for an interval estimate consists of three components. These are known as the

Table of reliability coefficients

The interval estimate for is expressed as:

Assuming that we are using a value of =.05, we can say that, in repeated sampling, 95% of the intervals constructed this way will include . This is based on the probability of occurrence of different values of .

The area of the curve of that is outside the area of the interval is called , and the area inside the interval is called 1- .

There are two ways in which interval estimates can be interpreted. These are known as the

The probabilistic interpretation results from repeated sampling. With repeated sampling from a normally distributed population with a known standard deviation, 100(1- ) percent of all intervals in the form will, in the long run, include the population mean, . The quantity 1- is called the

Note that the percentage of intervals involved depends on the value of . With modern electronic devices such as the TI-83 calculator and Microsoft Excel, it is possible to use any value of . When statistics was developing during the 20th century, such devices were not generally available so one had to use tables. These tables were very difficult to prepare and so only a few values of were supported. The most commonly used values of are .01, .05, and .10. When these are used in the formula 100 (1- ), they yield percentages of 99%, 95%, and 90%, respectively. The most widely used value for a confidence level is 95%, which corresponds to =.05. Using this figure, the probabilistic interpretation says that in 100 samplings, 95 of them should include . For situations in which there is neither time nor ability to do 100 samplings, the practical interpretation is used.

The practical interpretation of the interval is used for a single sampling. When sampling is from a normally distributed population with known standard deviation, we are 100(1- ) percent confident that the single computed interval, , contains the population mean, .

Precision indicates how much the values deviate from their mean. Precision is found by multiplying the reliability factor by the standard error of the mean. This is also called the margin of error.

Exercise 6.2.2

We wish to estimate the mean serum indirect bilirubin level of 4-day-old infants. The mean for a sample of 16 infants was found to be 5.98 mg/dl. Assuming bilirubin levels in 4-day-old infants are approximately normally distributed with a standard deviation of 3.5 mg/dl find:

A) The 90% confidence interval for

B) The 95% confidence interval for

C) The 99% confidence interval for

(1) Given

= 5.98

= 3.5

n = 16

(2) Sketch

(3) Calculations

We start with the formula for an interval estimate then substitute the values given in the problem.

Then we need to determine the values of the reliability coefficient that will be used in solving the three parts of the problem. We consult the Table of Reliability Coefficients above. The correct value of reliability coefficient is multiplied by the standard error (.975). The resulting value is subtracted from then added to the value of to give the boundaries of the interval estimate.

A) 90% interval (z = 1.645)

5.98 ± 1.645 (.875)

5.98-1.439375, 5.98+1.439375

(4.5408, 7.4129)

Interpretation: We estimate the population mean to be 5.98. We are 90% confident that the true value of the mean lies between 4.5408 and 7.4129)

B) 95% interval (z = 1.96)

5.98 ± 1.96 (.875)

(4.265, 7.695)

Interpretation: We estimate the population mean to be 5.98. We are 95% confident that the true value of the mean lies between 4.265 and 7.695)

C) 99% interval (z = 2.575)

5.98 ± 2.575 (.875)

(3.7261, 8.2339)

Interpretation: We estimate the population mean to be 5.98. We are 99% confident that the true value of the mean lies between 3.7261 and 8.2339)

(4) Results

A higher percent confidence level gives a wider band. There is less chance of making an error but there is more uncertainty.

Calculator answers are more accurate because the calculator uses exact values and derives its answers from calculus.

In most real life situations the variance of the population is unknown. We know that the z score, , is normally distributed if the population is normally distributed and is approximately normally distributed when the population is large. But, it cannot be used because is unknown.

The sample standard deviation, , can be used to replace . If n 30, then s is a good approximation of . An alternate procedure is used when the samples are small. It is known as Student's

Student's

Properties of the t distribution

1. Mean = 0

2. It is symmetrical about the mean.

3. Variance is greater than 1 but approaches 1 as the sample gets large. For df > 2, the variance = df/(df-2) or

4. The range is - to + .

5.

6. Compared with the normal distribution,

7.

Confidence interval for a mean using

When sampling is from a normal distribution whose standard deviation, , is unknown, the

100(1- ) percent confidence interval for the population mean, , is given by:

When constructing a confidence interval for a population mean, we must decide whether to use z or

Key for deciding between z and t in confidence interval construction

1. Population normally distributed................2

Not as above—normally distributed.........5

2. Sample size is large (30 or higher)............3

Sample size is small (less than 30)............4

3. Population variance is known.............use z

Population variance not known.... use

4. Population variance is known.............use z

Population variance is not known.......use

5. Sample size is large..................................6

Sample size is small..................................7

6. Population variance is known.............use z

Population variance not known

(central limit theorem applies)............use z

7. Must use a non-parametric method

Example

In a study of preeclampsia, Kaminski and Rechberger found the mean systolic blood pressure of 10 healthy, nonpregnant women to be 119 with a standard deviation of 2.1.

(Preeclampsia: Development of hypertension, albuminuria, or edema between the 20th week of pregnancy and the first week postpartum.

Eclampsia: Coma and/or convulsive seizures in the same time period, without other etiology.)

a. What is the estimated standard error of the mean?

b. Construct the 99% confidence interval for the mean of the population from which the 10 subjects may be presumed to be a random sample.

c. What is the precision of the estimate?

d. What assumptions are necessary for the validity of the confidence interval you constructed?

(1) Given

n = 10

= 119

s = 2.1

(2) Sketch of

(3) Calculations

= .6640783086

99% confidence interval

(The correct value of t for a 99% confidence interval with 9 degrees of freedom is 3.2498)

119 ± 3.2498 (.66407...)

116.84, 121.16

Precision = 3.2498 (.66407...)

= 2.158121687

Assumptions

- The population is normally distributed
- The 10 subjects represent a random sample from this population