This file is part of a program based on the Bio 4835 Biostatistics class taught
at

Daniel, W. W. 1999. Biostatistics: a foundation for analysis in the
health sciences.

The file follows this text very closely and readers are encouraged to consult
the text for further information.

**B)** **Hypothesis
testing of the difference between two population means**

This is a two sample z test which is used to determine
if two population means are equal or unequal. There are three
possibilities for formulating hypotheses.

l. : =
:

2. : :
<

3. : :
>

Procedure

The same procedure is used in three different situations

- Sampling is from normally distributed populations with known variances

- Sampling from normally distributed populations where population variances are unknown

- population variances equal

This is with *t* distributed as Student's *t*
distribution with ( + -2) degrees of freedom and a pooled variance.

- population variances unequal

When population variances are
unequal, a distribution of *t'* is used in a manner similar to calculations
of confidence intervals in similar circumstances.

- Sampling from populations that are not normally distributed

If both sample sizes are 30 or larger the central limit theorem is in
effect. The test statistic is

If the population variances are unknown, the sample
variances are used.

**Sampling from normally distributed populations with population variances
known
**

Example 7.3.1

Serum uric acid levels

Is there a difference between the means between individuals with Down's syndrome and normal individuals?

(1) Data

= 4.5 = 12 = 1

= 3.4 = 15 = 1.5

= .05

(2) Assumptions

- two independent random samples
- each drawn from a normally distributed population

(3) Hypotheses

: =

:

(4) Test statistic

This is a two sample z test.

(a) Distribution of test statistic

If the assumptions are correct and is true, the
test statistic is distributed as the normal distribution.

(b) Decision rule

With = .05, the critical values of z are
-1.96 and +1.96. We reject if z < -1.96 or z > +1.96.

(5) Calculation of test statistic

(6) Statistical decision

Reject because 2.57 > 1.96.

(7) Conclusion

From these data, it can be concluded that the population means are not
equal. A 95% confidence interval would give the same conclusion.

p = .0102.

**Sampling from normally distributed populations with unknown variances
**

With equal population variances, we can obtain a pooled value from the sample variances.

Example 7.3.2

Lung destructive index

We wish to know if we may conclude, at the 95% confidence level, that smokers, in general, have greater lung damage than do non-smokers.

(1) Data

Smokers: = 17.5 = 16 = 4.4752

Non-Smokers: = 12.4 = 9 = 4.8492

= .05

Calculation of Pooled Variance:

(2) Assumptions

- independent random samples
- normal distribution of the populations
- population variances are equal

(3) Hypotheses

:

: >

(4) Test statistic

(a) Distribution of test statistic

If the assumptions are met and is true, the test statistic is distributed
as Student's t distribution with 23 degrees of freedom.

(b) Decision rule

With = .05 and df
= 23, the critical value of *t* is 1.7139. We reject if t
> 1.7139.

(5) Calculation of test statistic

(6) Statistical decision

Reject because 2.6563 > 1.7139.

(7) Conclusion

On the basis of the data, we conclude that > .

Actual values

t = 2.6558

p = .014

**Sampling from populations that are not normally distributed
**

Example 7.3.4

These data were obtained in a study comparing persons with disabilities with persons without disabilities. A scale known as the Barriers to Health Promotion Activities for Disabled Persons (BHADP) Scale gave the data. We wish to know if we may conclude, at the 99% confidence level, that persons with disabilities score higher than persons without disabilities.

(1) Data

Disabled: = 31.83 = 132 = 7.93

Nondisabled: = 25.07 = 137 = 4.80

= .01

(2) Assumptions

- independent random samples

(3) Hypotheses

:

: >

(4) Test statistic

Because of the large samples, the central limit theorem permits calculation of
the z score as opposed to using *t*. The z
score is calculated using the given sample standard deviations.

(a) Distribution of test statistic

If the assumptions are correct and is true, the test
statistic is approximately normally distributed

(b) Decision rule

With = .01 and a one tail test, the critical value of z is
2.33. We reject z > 2.33.

(5) Calculation of test statistic

(6) Statistical decision

Reject because 8.42 > 2.33.

(7) Conclusion

On the basis of these data, the average persons with disabilities score higher
on the BHADP test than do the nondisabled persons.

Actual values

z = 8.42

p = 1.91 x 10-17

**Paired comparisons
**

Sometimes data comes from nonindependent samples. An example might be testing "before and after" of cosmetics or consumer products. We could use a single random sample and do "before and after" tests on each person. A hypothesis test based on these data would be called a

With a population of n pairs of measurements, forming a simple random sample from a normally distributed population, the mean of the difference, , is tested using the following implementation of

Example 7.4.1

Very-low-calorie diet (VLCD) Treatment

Table gives B (before) and A (after) treatment data for obese female patients in a weight-loss program.

We calculate di = A-B for each pair of data resulting in negative values meaning that the participants lost weight.

We wish to know if we may conclude, at the 95% confidence level, that the treatment is effective in causing weight reduction in these people.

(1) Data

Values of di are calculated by subtracting each A from each B to give a negative number. On the TI-83 calculator place the A data in L1 and the B data in L2. Then make L3 = L1 - L2 and the calculator does each calculation automatically.

In Microsoft Excel put the A data in column A and the B data in column B, without using column headings so that the first pair of data are on line 1. In cell C1, enter the following formula: =a1-b1. This calculates the difference, di, for B - A. Then copy the formula down column C until the rest of the differences are calculated.

n = 9

= .05

(2) Assumptions

- the observed differences are a simple random sample from a normally distributed population of differences

(3) Hypotheses

: 0

: < 0 (meaning
that the patients lost weight)

(4) Test statistic

The test statistic is *t* which is calculated as

(a) Distribution of test statistic

The test statistic is distributed as Student's t with 8 degrees of freedom

(b) Decision rule

With = .05 and 8 df the critical
value of t is -1.8595. We reject if t < -1.8595.

(5) Calculation of test statistic

(6) Statistical decision

Reject because -12.7395 < -1.8595

p = 6.79 x 10-7

(7) Conclusion

On the basis of these data, we conclude that the diet program is effective.

Other considerations

- a confidence interval for can be constructed
- z can be used if the variance is known or if the sample is large