This file is part of a program based on the Bio 4835 Biostatistics class taught
at

Daniel, W. W. 1999. Biostatistics: a foundation for analysis in the
health sciences.

The file follows this text very closely and readers are encouraged to consult
the text for further information.

**A) Hypothesis testing of a single population mean**

A hypothesis about a population mean can be tested when sampling is from any of
the following.

- A normally distributed population--variances known

- A normally distributed population--variances unknown

- A population that is not normally distributed (assuming n 30; the central limit theorem applies)

**p**** values
**

A p value is a probability that the result is as extreme or more extreme than the observed value if the null hypothesis is true. If the p value is less than or equal to , we reject the null hypothesis, otherwise we do not reject the null hypothesis.

In a one tail test, the rejection region is at one end of the distribution or the other. In a two tail test, the rejection region is split between the two tails. Which one is used depends on the way the null hypothesis is written.

A confidence interval can be used to test hypotheses. For example, if the null hypothesis is: : = 30, a 95% confidence interval can be constructed. If 30 were within the confidence interval, we could conclude that the null hypothesis is not rejected at that level of significance.

The procedure of nine steps is followed for hypothesis testing. It is very important to observe several items.

- avoid rounding numbers off until the very end of the problem
- pay strict attention to details
- be very careful with regard to the way things are worded
- write down all the steps and do not take short cuts.

**Sampling from a normally distributed
population--variance known
**

Example 7.2.1

A simple random sample of 10 people from a certain population has a mean age of 27. Can we conclude that the mean age of the population is not 30? The variance is known to be 20. Let = .05.

[Note:

(1) Data

n = 10 = 20

= 27 = .05

(2) Assumptions

- simple random sample
- normally distributed population

(3) Hypotheses

: = 30

: 30

(4) Test statistic

As the population variance is known, we use z as the test statistic.

(a) Distribution of test statistic

If the assumptions are correct and is true, the
test statistic follows the standard normal distribution. Therefore, we
calculate a z score and use it to test the hypothesis.

(b) Decision rule

Reject if the z value falls in the rejection region. Fail to
reject if it falls in the nonrejection region.

Because of the structure of it is a two tail test. Therefore,
reject if z -1.96 or z 1.96.

(5) Calculation of test statistic

(6) Statistical decision

We reject the null hypothesis because z = -2.12 which
is in the rejection region. The value is significant at the .05 level.

(7) Conclusion

We conclude that is not 30.

p = .0340

A z value of -2.12 corresponds to an area of .0170. Since there are
two parts to the rejection region in a two tail test, the p value is twice this
which is .0340.

A problem like this can also be solved using a confidence interval. A
confidence interval will show that the calculated value of z does not fall
within the boundaries of the interval. It will not, however, give a
probability.

Confidence interval

**Same**** example as a one tail test.
**

Example 7.2.1 (reprise)

A simple random sample of 10 people from a certain population has a mean age of 27. Can we conclude that the mean age of the population is less than 30? The variance is known to be 20. Let = .05.

(1) Data

n = 10 = 20

= 27 = .05

(2) Assumptions

- simple random sample
- normally distributed population

(3) Hypotheses

: = 30

: 30

(4) Test statistic

As the population variance is known, we use z as the test statistic.

(a) Distribution of test statistic

If the assumptions are correct and is true, the test
statistic follows the standard normal distribution. Therefore, we
calculate a z score and use it to test the hypothesis.

(b) Decision rule

Reject if the z value falls in the rejection region. Fail to
reject if it falls in the nonrejection region.

With = .05 and the inequality we have the entire rejection
region at the left. The critical value will be z = -1.645.
Reject if z < -1.645.

(5) Calculation of test statistic

(6) Statistical decision

We reject the null hypothesis because -2.12 <
-1.645.

(7) Conclusion

We conclude that < 30.

p = .0170 this time because it is only a one tail test and not a two tail test.

**Sampling is from a normally distributed
population--variance unknown.
**

When the population variance is unknown, which is most of the time, a slightly different approach is necessary. The z score formula cannot be used because the population variance is unknown, so we have to use

Example 7.2.3 Body mass index

A simple random sample of 14 people from a certain population gives body mass indices as shown in Table 7.2.1. Can we conclude that the BMI is not 35?

Let = .05.

(1) Data

n = 14 s = 10.63918736

= 30.5

= .05

(2) Assumptions

- simple random sample
- population of similar subjects
- normally distributed

(3) Hypotheses

: = 35

: 35

(4) Test statistic

(a) Distribution of test statistic

If the assumptions are correct and is true, the test
statistic follows Student's *t* distribution with 13 degrees of freedom.

(b) Decision rule

We have a two tail test. With
= .05 it means that each tail is 0.025. The critical t values with 13 df are -2.1604 and 2.1604.

We reject if the t -2.1604 or t 2.1604.

(5) Calculation of test statistic

(6) Statistical decision

Do not reject the null hypothesis because -1.58 is not in the rejection region.

(7) Conclusion

Based on the data of the sample, it is possible that = 35. p
= .1375

**Sampling is from a population that is not
normally distributed
**

Example 7.2.4

Maximum oxygen uptake data

Can we conclude that > 30?

Let = .05.

(1) Data

n = 242 s = 12.14

= 33.3

= .05

(2) Assumptions

- simple random sample
- population is similar to those subjects in the sample (cannot assume normal distribution)

(3) Hypotheses

: 30

: > 30

(4) Test statistic

In this situation we do not know if the population displays a normal
distribution. However, with a large sample size, we know from the Central
Limit Theorem that the sampling distribution of the population is distributed
normally. With a large sample, we can use z as the test statistic
calculated using s, the sample standard deviation.

(a) Distribution of test statistic

By virtue of the Central Limit Theorem, the test statistic is approximately
normally distributed with = 0 if is true.

(b) Decision rule

This is a one tail test with =
.05. The rejection region is at the right of the value z = 1.645.

(5) Calculation of test statistic

(6) Statistical decision

Reject because 4.23 > 1.645.

(7) Conclusion

The maximum oxygen uptake for the sampled population is greater than 30.

The p value < .001 because 4.23 is off the chart (p(3.89)
< .001).

Note: The classical way of finding
probabilities when this field was developed was by using tables. This was
in the first part of the 20th century. After the advent of hand-held
calculators and computers, it became possible to calculate a more accurate
value for p. In this case, the actual value is 1.17 x 10-5 (.0000117).

This value was found using the TI-83 calculator. In many publications
even now, in the first decade of the 21st century, give values of p < .001
rather than calculating an accurate value. These values indicate a
fleetingly small probability that the effect was due to a random chance
occurrence. Such values of p are understood as such by the general
scientific community.