Hypothesis Testing of a Single Population Mean


This file is part of a program based on the Bio 4835 Biostatistics class taught at Kean University in Union, New Jersey.  The course uses the following text:
Daniel, W. W. 1999.  Biostatistics: a foundation for analysis in the health sciences.  New York: John Wiley and Sons.  
The file follows this text very closely and readers are encouraged to consult the text for further information.

A) Hypothesis testing of a single population mean

A hypothesis about a population mean can be tested when sampling is from any of the following.

  • A normally distributed population--variances known

                z formula
            

  • A normally distributed population--variances unknown

                t-score formula
            

  •  A population that is not normally distributed (assuming n greater than or equal to 30; the central limit theorem applies)

                z score formula
            
p values

A p value is a probability that the result is as extreme or more extreme than the observed value if the null hypothesis is true.  If the p value is less than or equal to alpha, we reject the null hypothesis, otherwise we do not reject the null hypothesis.


One tail and two tail tests

In a one tail test, the rejection region is at one end of the distribution or the other.  In a two tail test, the rejection region is split between the two tails.  Which one is used depends on the way the null hypothesis is written.

Confidence intervals

A confidence interval can be used to test hypotheses.  For example, if the null hypothesis is:  H0 : mu = 30, a 95% confidence interval can be constructed.  If 30 were within the confidence interval, we could conclude that the null hypothesis is not rejected at that level of significance.

Procedure

The procedure of nine steps is followed for hypothesis testing.  It is very important to observe several items.

  •  avoid rounding numbers off until the very end of the problem
  • pay strict attention to details
  • be very careful with regard to the way things are worded
  • write down all the steps and do not take short cuts.


Sampling from a normally distributed population--variance known

Example 7.2.1

A simple random sample of 10 people from a certain population has a mean age of 27.  Can we conclude that the mean age of the population is not 30?  The variance is known to be 20.  Let alpha = .05.

[Note:  "Yes we can, if..."  A way to help solve this type of problem is to answer "Yes we can, if..."  In this case the question is, "Can we conclude that the mean age of the population is not 30?"  Answer, "Yes we can, if we can reject the null hypothesis that it is 30."  Responding to problems the same way all the time will lead to less confusion and less errors. ]

(1) Data

        n = 10     sigma-squared = 20
       x-bar= 27    alpha = .05

(2) Assumptions

  • simple random sample 
  • normally distributed population

(3) Hypotheses

        H0 :  mu = 30
        HA :  mu not equal to30

(4) Test statistic

As the population variance is known, we use z as the test statistic.

        z-score formula

    (a) Distribution of test statistic

If the assumptions are correct and H0 is true, the test statistic follows the standard normal distribution.  Therefore, we calculate a z score and use it to test the hypothesis.

    (b) Decision rule

Reject H0 if the z value falls in the rejection region.  Fail to reject H0 if it falls in the nonrejection region.

        normal curve
Because of the structure of H0 it is a two tail test.  Therefore, reject H0 if z less than or equal to -1.96 or z greater than or equal to 1.96.  


(5) Calculation of test statistic

        calculation
 
(6) Statistical decision

We reject the null hypothesis because z = -2.12 which is in the rejection region.  The value is significant at the .05 level.

(7) Conclusion

We conclude that mu is not 30.

p = .0340

A z value of -2.12 corresponds to an area of .0170.  Since there are two parts to the rejection region in a two tail test, the p value is twice this which is .0340.

A problem like this can also be solved using a confidence interval.  A confidence interval will show that the calculated value of z does not fall within the boundaries of the interval.  It will not, however, give a probability.  

Confidence interval

        confidence interval calculation 


Same example as a one tail test.

Example 7.2.1 (reprise)

A simple random sample of 10 people from a certain population has a mean age of 27.  Can we conclude that the mean age of the population is less than 30?  The variance is known to be 20.  Let alpha = .05.


(1) Data

        n = 10     sigma-squared = 20
       x-bar= 27    alpha = .05

(2) Assumptions

  • simple random sample 
  • normally distributed population

(3) Hypotheses

        H0 :  mu = 30
        HA :  mu not equal to30

(4) Test statistic

As the population variance is known, we use z as the test statistic.

        z-score formula

    (a) Distribution of test statistic

If the assumptions are correct and H0 is true, the test statistic follows the standard normal distribution.  Therefore, we calculate a z score and use it to test the hypothesis.

    (b) Decision rule

Reject H0 if the z value falls in the rejection region.  Fail to reject H0 if it falls in the nonrejection region.

        normal curve

With alpha = .05 and the inequality we have the entire rejection region at the left.  The critical value will be z = -1.645.  Reject H0 if z < -1.645.

(5) Calculation of test statistic

        calculation


(6) Statistical decision

We reject the null hypothesis because -2.12 < -1.645.

(7) Conclusion

We conclude that mu < 30.

p = .0170 this time because it is only a one tail test and not a two tail test.


Sampling is from a normally distributed population--variance unknown.

When the population variance is unknown, which is most of the time, a slightly different approach is necessary.  The z score formula cannot be used because the population variance is unknown, so we have to use t.  The formula for t relies on the value of s, the sample standard deviation, which can be calculated from the data of the sample.  

Example 7.2.3 Body mass index

A simple random sample of 14 people from a certain population gives body mass indices as shown in Table 7.2.1.  Can we conclude that the BMI is not 35?

        data table

Let alpha = .05.

(1) Data

         n = 14   s = 10.63918736
        x-bar = 30.5
        alpha = .05

(2) Assumptions

  • simple random sample
  • population of similar subjects
  • normally distributed

(3) Hypotheses

         H0:  mu = 35
        HA :  mu not equal to35

(4) Test statistic

        t formula

   
    (a) Distribution of test statistic

If the assumptions are correct and H0 is true, the test statistic follows Student's t distribution with 13 degrees of freedom.

    (b) Decision rule

We have a two tail test.  With alpha = .05 it means that each tail is 0.025.  The critical t values with 13 df are -2.1604 and 2.1604.

        t curve
We reject H0 if the t less than or equal to -2.1604 or t greater than or equal to 2.1604.

(5) Calculation of test statistic

        calculation
 
(6) Statistical decision

Do not reject the null hypothesis because -1.58 is not in the rejection region.

(7) Conclusion

Based on the data of the sample, it is possible that mu = 35.  p = .1375


Sampling is from a population that is not normally distributed

Example 7.2.4

Maximum oxygen uptake data

Can we conclude that mu > 30?
Let alpha = .05.

(1) Data

        n = 242   s = 12.14
        x-bar  = 33.3
        alpha = .05

(2) Assumptions

  • simple random sample
  • population is similar to those subjects in the sample (cannot assume normal distribution)


(3) Hypotheses

        H0 :  mu  less than or equal to 30
        HA :  mu > 30

(4) Test statistic

In this situation we do not know if the population displays a normal distribution.  However, with a large sample size, we know from the Central Limit Theorem that the sampling distribution of the population is distributed normally.  With a large sample, we can use z as the test statistic calculated using s, the sample standard deviation.

        z formula
            
        (a) Distribution of test statistic

By virtue of the Central Limit Theorem, the test statistic is approximately normally distributed with mu = 0 if H0 is true.

        (b) Decision rule

            normal curve
This is a one tail test with alpha = .05.  The rejection region is at the right of the value z = 1.645.

 
(5) Calculation of test statistic

        calculation
 
(6) Statistical decision

Reject H0 because 4.23 > 1.645.


(7) Conclusion

The maximum oxygen uptake for the sampled population is greater than 30.
The p value < .001 because 4.23 is off the chart (p(3.89) < .001).  

        Note:  The classical way of finding probabilities when this field was developed was by using tables.  This was in the first part of the 20th century.  After the advent of hand-held calculators and computers, it became possible to calculate a more accurate value for p.  In this case, the actual value is 1.17 x 10-5 (.0000117).
This value was found using the TI-83 calculator.  In many publications even now, in the first decade of the 21st century, give values of p < .001 rather than calculating an accurate value.  These values indicate a fleetingly small probability that the effect was due to a random chance occurrence.  Such values of p are understood as such by the general scientific community.