# Overview 概述

Statistics (e.g. sample mean, sample variance, etc) are functions of random variables and therefore are also random variables themselves.

The distributions of the these statistics are called sampling distributions.

Our inferred knowledge about these distributions is used to estimate parameters of the model which we postulate was used to generate the data.

So far we have learned about point estimators and interval estimators.

And now, we focus on hypothesis tests.

# Objectives 目标

1. Understand what is hypothesis tests and $p$-values
了解什么是假设检验$p$
2. Know how to choose the appropriate test for a hypothesis test
知道如何为假设检验选择合适的检验
3. know how to interpret the result of a hypothesis test
知道如何解释假设检验的结果

# Overview of hypothesis testing 假设检验概述

A statistical hypothesis is an assertion or conjecture concerning one or more populations.

For example, suppose that the hypothesis postulated by the engineer is that the fraction defective $p$ in a certain process is 0.10.

The experiment is to observe a random sample of the product in question.

Suppose that 100 items are tested and 12 items are found defective.

It is reasonable to conclude that this evidence does not refute the condition that the binomial parameter $p = 0.10,$ and thus it may lead one not to reject the hypothesis.

However, it also does not refute $p = 0.12$ or perhaps even $p = 0.15.$

Rejection of a hypothesis implies that the sample evidence refutes it.

Or, rejection means that there is a small probability of obtaining the sample information observed when, in fact, the hypothesis is true.

For example, for our proportion-defective hypothesis, a sample of 100 revealing 20 defective items is certainly evidence for rejection.

Why? If, indeed, $p = 0.10$, the probability of obtaining 20 or more defectives is approximately $0.002$.

[1] 0.001978561


With the resulting small risk of a wrong conclusion, it would seem safe to reject the hypothesis that p = 0.10.

In other words, rejection of a hypothesis tends to all but rule out the hypothesis. On the other hand, it is very important to emphasize that acceptance or, rather, failure to reject does not rule out other possibilities.

The foregoing implies that when the data analyst formalizes experimental evidence on the basis of hypothesis testing, the formal statement of the hypothesis is very important.

# Elements of a hypothesis test 假设检验的要素

# Hypotheses 假设

null hypothesis, denoted by $H_0$

alternative hypothesis, denoted by $H_1$

In our binomial example, we may state

$H_0: p = 0.10, \\ H_1: p > 0.10.$

Now 12 defective items out of 100 does not refute $p = 0.10$, so the conclusion is “fail to reject $H_0$”. However, if the data produce 20 out of 100 defective items, then the conclusion is “reject $H_0$” in favor of $H_1: p > 0.10$ .

# Test Statistic and Critical Region 检验统计量和临界区

# Discrete Case Study 离散案例研究

A certain type of cold vaccine is known to be only 25% effective after a period of 2 years.

To determine if a new and somewhat more expensive vaccine is superior in providing protection against the same virus for a longer period of time, suppose that 20 people are chosen at random and inoculated.

If more than 8 of those receiving the new vaccine surpass the 2-year period without contracting the virus, the new vaccine will be considered superior to the one presently in use.

We are essentially testing the null hypothesis that the new vaccine is equally effective after a period of 2 years as the one now commonly used.

The alternative hypothesis is that the new vaccine is in fact superior.

$H_0: p = 0.25, \\ H_1: p > 0.25.$

The test statistic on which we base our decision is $X$, the number of individuals in our test group who receive protection from the new vaccine for a period of at least 2 years.

The possible values of $X$, from 0 to 20, are divided into two groups: those numbers less than or equal to 8 and those greater than 8. All possible scores greater than 8 constitute the critical region.

The last number that we observe in passing into the critical region is called the critical value.

In our illustration, the critical value is the number 8.

Therefore, if $x > 8$, we reject $H_0$ in favor of the alternative hypothesis $H_1$.

If $x ≤ 8$, we fail to reject $H_0$.

# Continuous Case Study 连续案例研究

Consider the null hypothesis that the average weight of male students in a certain college is 68 kilograms against the alternative hypothesis that it is unequal to 68.

That is, we wish to test

$H_0: \mu = 68, \\ H_1: \mu \ne 68.$

The alternative hypothesis allows for the possibility that $\mu < 68$ or $\mu > 68$.

Sample mean that falls close to the hypothesized value of 68 would be considered evidence in favor of $H_0$.

On the other hand, a sample mean that is considerably less than or more than 68 would be evidence inconsistent with $H_0$ and therefore favoring $H_1$.

The sample mean is the test statistic in this case.

A critical region for the test statistic might arbitrarily be chosen to be the two intervals $\bar{x} < 67$ and $\bar{x} > 69$.

The non-rejection region will then be the interval $67 ≤ \bar{x} ≤ 69$.

# One- and Two-Tailed Tests 单尾和双尾检验

A test of any statistical hypothesis where the alternative is one sided, such as

$H_0: \theta = \theta_0, \\ H_1: \theta > \theta_0$

or perhaps

$H_0: \theta = \theta_0, \\ H_1: \theta < \theta_0$

is called a one-tailed test.

Generally, the critical region for the alternative hypothesis $\theta > \theta_0$ lies in the right tail of the distribution of the test statistic, while the critical region for the alternative hypothesis $\theta < \theta_0$ lies entirely in the left tail.

(In a sense, the inequality symbol points in the direction of the critical region.)

A test of any statistical hypothesis where the alternative is two sided, such as

$H_0: \theta = \theta_0,\\ H_1: \theta \ne \theta_0$

is called a two-tailed test.

Question: In the above two case studies, which one is a one-tailed test? Which one is a two-tailed test?

# In-Class Exercise: Determine the hypotheses, test statistics, critical region

1. A manufacturer of a certain brand of rice cereal claims that the average saturated fat content does not exceed 1.5 grams per serving. State the null and alternative hypotheses to be used in testing this claim and determine where the critical region
is located.
某品牌米糊制造商声称每份平均饱和脂肪含量不超过 1.5 克。陈述用于测试此声明的无效假设和替代假设，并确定关键区域位于何处。

2. A real estate agent claims that 60% of all private residences being built today are 3-bedroom homes. To test this claim, a large sample of new residences is inspected; the proportion of these homes with 3 bedrooms is recorded and used as the test statistic. State the null and alternative hypotheses to be used in this test and determine the location of the critical region.
一位房地产经纪人声称，当今建造的所有私人住宅中有 60% 是三居室住宅。为了检验这一说法，我们检查了大量新住宅样本；记录这些拥有 3 间卧室的房屋的比例并用作测试统计数据。陈述要在此测试中使用的原假设和替代假设，并确定关键区域的位置。

# Hypothesis Testing 假设检验

# Approach to Hypothesis Testing with Fixed Probability $\alpha$ 固定概率$\alpha$ 的假设检验方法

1. State the null and alternative hypotheses.
陈述零假设和替代假设。

2. Choose a fixed significance level $\alpha$.
选择一个固定的显著性水平$\alpha$

3. Choose an appropriate test statistic and establish the critical region based on $\alpha$.
选择适当的检验统计量并基于 建立临界区。

4. Reject $H_0$ if the computed test statistic is in the critical region. Otherwise, do not reject.
如果计算的测试统计量在临界区，则拒绝$H_0$。否则，不要拒绝。

5. Draw scientific or engineering conclusions.
得出科学或工程结论。

# Tests on a Single Mean (Variance Known) 单一均值检验（方差已知）

The model for the underlying situation centers around an experiment with $X_1, X_2, \ldots, X_n$ representing a random sample from a distribution with mean $\mu$ and variance $\sigma^2 > 0$.

Consider first the hypothesis

$H_0: \mu = \mu_0,\\ H_1: \mu \ne \mu_0.$

The appropriate test statistic should be based on the random variable $\bar{X}$.

$Z = \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \sim \mathcal{N}(0,1).$

This is a two-tailed test, given $\alpha$, we should reject $H_0$, if

$z = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}} > z_{\alpha/2} \qquad \text{ or } \qquad z = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}} < - z_{\alpha/2} .$

# Example 1

A manufacturer of sports equipment has developed a new synthetic fishing line that the company claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram.

Test the hypothesis that $\mu = 8$ kilograms against the alternative that $\mu \ne 8$ kilograms if a random sample of 50 lines is tested and found to have a mean breaking strength of 7.8 kilograms.

Use a 0.01 level of significance.

$H_0: \mu = 8,\\ H_1: \mu \ne 8.$

We get critical region first.

[1] -2.575829


Critical region: $z < −2.575$ and $z > 2.575$

[1] -2.828427


$z = −2.83 < -2.575$, we should reject $H_0$.

To get critical region in Python , We need scipy package.
Python 中想要获得临界区，需要 scipy 包。

We can get the same region using norm.ppf .

-2.575829303548901


# Example 2

A random sample of 100 recorded deaths in the United States during the past year showed an average life span of 71.8 years.

Assuming a population standard deviation of 8.9 years, does this seem to indicate that the mean life span today is greater than 70 years?

Use a 0.05 level of significance.

$H_0: \mu = 70,\\ H_1: \mu > 70.$

[1] 2.022472

[1] 1.644854


Therefore, we reject $H_0$ and conclude that the mean life span today is greater than 70 years.

# Use of P-Values

Definition
A P-value is the lowest level (of significance) at which the observed value of the test statistic is significant.
P 值是检验统计量的观察值达到显著意义的最低水平（显著性）。

# Revisit Example 1

A manufacturer of sports equipment has developed a new synthetic fishing line that the company claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram.

Test the hypothesis that $\mu = 8$ kilograms against the alternative that $\mu \ne 8$ kilograms if a random sample of 50 lines is tested and found to have a mean breaking strength of 7.8 kilograms.

We can still construct the same hypotheses.

$H_0: \mu = 8,\\ H_1: \mu \ne 8.$

Since the test in this example is two tailed, the desired $P$-value is twice the area of the shaded region to the left of $z = −2.83$.

We have

$P = P(|Z| > 2.83) = 2P(Z < −2.83) = 0.0046.$

[1] 0.0046548


Based on the very small p-value, we should reject $H_0$.

We can also check p value in python .

0.0046548004134630006


We can also get the small p-value, we should rejcet $H_0$.

# Revisit Example 2

A random sample of 100 recorded deaths in the United States during the past year showed an average life span of 71.8 years.

Assuming a population standard deviation of 8.9 years, does this seem to indicate that the mean life span today is greater than 70 years?

Use a 0.05 level of significance.

We can still construct the same hypotheses.

$H_0: \mu = 70,\\ H_1: \mu > 70.$

The p-value corresponding to $z = 2.02$ is given by the area of the shaded region in the following figure.
$z = 2.02$ 对应的 p 值由下图中阴影区域的面积给出。

We have $P = P(Z > 2.02) = 0.0217.$

As a result, the evidence in favor of $H_1$ is even stronger than that suggested by a 0.05 level of significance.

[1] 0.02169169


We can also use Python to get p value.

0.02169169376764679


P value is less than 0.05.

We should reject $H_0$.

# In-class Exercise

1. In a research report, Richard H. Weindruch of the UCLA Medical School claims that mice with an average life span of 32 months will live to be about 40 months old when 40% of the calories in their diet are replaced by vitamins and protein.
平均寿命为 32 个月的老鼠，当它们饮食中 40% 的卡路里被维生素和蛋白质所取代时，它们可以活到 40 个月大。
Is there any reason to believe that $\mu< 40$ if 64 mice that are placed on this diet have an average life of 38 months with a standard deviation of 5.8 months? Use a P-value in your conclusion.
如果 64 只老鼠接受这种饮食，平均寿命为 38 个月，标准差为 5.8 个月，那么有没有理由相信 $\mu< 40$ ？在结论中用 p 值。

2. An electrical firm manufactures light bulbs that have a lifetime that is approximately normally distributed with a mean of 800 hours and a standard deviation of 40 hours.
一家电气公司生产的灯泡寿命大约为正态分布，平均 800 小时，标准差为 40 小时。
Test the hypothesis that μ = 800 hours against the alternative, $\mu \ne 800$ hours, if a random sample of 30 bulbs has an average life of 788 hours.
如果随机抽取 30 个灯泡，平均寿命为 788 小时，则检验 $\mu \ne 800$ 小时的假设，

$H_0: \mu = 800,\\ H_1: \mu \ne 800.$

[1] -1.643168


[1] 0.1003482


Not reject $H_0$

$H_0: \mu = 40,\\ H_1: \mu < 40.$

[1] -2.758621


[1] 0.002902293


Reject $H_0$

# Tests on a Single Mean (Variance Unknown) 单一均值检验（方差未知）

For the two-sided hypothesis

$H_0: \mu = \mu_0,\\ H_1: \mu \ne \mu_0.$

We reject $H_0$ at significance level $\alpha$, When
the computed $t$-statistic

$t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}$

exceeds $t_{\alpha/2, n-1}$ or is less than $-t_{\alpha/2,n-1}$.

# Example 3

Data are collected on a neutral substance (pH = 7.0).

A sample of the measurements were taken with the data as follows:

$7.07, 7.00, 7.10, 6.97, 7.00, 7.03,7.01,7.01,6.98,7.08$

It is, then, of interest to test

$H_0: \mu = 7.0,\\ H_1: \mu \ne 7.0.$

[1] 1.79541


[1] 0.106159


We can also use t.test here.

    One Sample t-test

data:  phmeter
t = 1.7954, df = 9, p-value = 0.1062
alternative hypothesis: true mean is not equal to 7
95 percent confidence interval:
6.993501 7.056499
sample estimates:
mean of x
7.025

Ttest_1sampResult(statistic=1.7954096195592317, pvalue=0.10615895425089732)


Should we reject $H_0$ or not reject $H_0$?

If we consider $\alpha = 0.05$, we should not reject $H_0$.

Notice that the sample size of 10 is rather small.

An increase in sample size (perhaps another experiment) may sort things out.

Note: How to choose a good sample size is an advanced topic.

# In-Class Exercise

Test the hypothesis that the average content of containers of a particular lubricant is 10 liters if the contents of a random sample of 10 containers are 10.2, 9.7, 10.1, 10.3, 10.1, 9.8, 9.9, 10.4, 10.3, and 9.8 liters.

Use a 0.01 level of significance and assume that the distribution of contents is normal.

$H_0: \mu = 10,\\ H_1: \mu \ne 10.$


One Sample t-test

data:  lubricant
t = 0.77174, df = 9, p-value = 0.46
alternative hypothesis: true mean is not equal to 10
99 percent confidence interval:
9.807338 10.312662
sample estimates:
mean of x
10.06


Based on the evidence, we can not reject $H_0$.

# Tests on Two Means (Variances Known) 两种均值的检验（方差已知）

For the two-sided hypothesis

$H_0: \mu_1-\mu_2 = d_0,\\ H_1: \mu_1 -\mu_2 \ne d_0.$

We reject $H_0$ at significance level $\alpha$, When
the computed $z$-statistic

$z = \frac{\bar{x}_1-\bar{x}_2-d_0}{\sqrt{\sigma_1^2/n_1+\sigma_2^2/n_2}}$

exceeds $z_{\alpha/2}$ or is less than $-z_{\alpha/2}$.

# Example 4

A study was conducted in which two types of engines, $A$ and $B,$ were compared.

Gas mileage, in miles per gallon, was measured.

Fifty experiments were conducted using engine type $A$ and 75 experiments were done with engine type $B$.

The gasoline used and other conditions were held constant.

The average gas mileage was 36 miles per gallon for engine $A$ and 42 miles per gallon for engine $B$.

Assume that the population standard deviations are 6 and 8 for engines $A$ and $B,$ respectively.

Let $\alpha = 0.05$. Can we say these two engines have the same gas mileage?

$H_0: \mu_A-\mu_B = 0,\\ H_1: \mu_A -\mu_B \ne 0.$

[1] -4.783446

[1] 1.959964


As the value is less than $z_{\alpha/2}$, we should reject $H_0$.

# Tests on Two Means (Unknown But Equal Variance) 两个均值的检验（方差未知但相等）

For the two-sided hypothesis

$H_0: \mu_1 -\mu_2 = d_0,\\ H_1: \mu_1 -\mu_2 \ne d_0.$

We reject $H_0$ at significance level $\alpha$, When
the computed $z$-statistic

$t = \frac{\bar{x}_1-\bar{x}_2-d_0}{s_p\sqrt{1/n_1+1/n_2}}$

where

$s_p^2 = \frac{s_1^2(n_1-1)+s_2^2(n_2-1)}{n_1+n_2-2}$

exceeds $t_{\alpha/2,n_1+n_2-2}$ or is less than $-t_{\alpha/2,n_1+n_2-2}$.

# Example 5

In a study conducted at Virginia Tech on the development of ectomycorrhizal, a symbiotic relationship between the roots of trees and a fungus, in which minerals are transferred from the fungus to the trees and sugars from the trees to the fungus, 20 northern red oak seedlings exposed to the fungus Pisolithus tinctorus were grown in a greenhouse.

All seedlings were planted in the same type of soil and received the same amount of sunshine and water.

Half received no nitrogen at planting time, to serve as a control, and the other half received 368 ppm of nitrogen in the form NaNO_3_.

The stem weights, in grams, at the end of 140 days were recorded as follows:

No Nitrogen:
0.32 0.53 0.28 0.37 0.47 0.43 0.36 0.42 0.38 0.43

Nitrogen:
0.26 0.43 0.47 0.49 0.52 0.75 0.79 0.86 0.62 0.46


Hypothesis Test:

$H_0: \mu_{NIT} = \mu_{NON} ,\\ H_1: \mu_{NIT} \ne \mu_{NON}.$

where the population means indicate mean weights.

Assume the populations to be normally distributed with equal variances.

[1] -2.619094

[1] -2.100922


As tvalue $< -t_{0.025,18}$, we reject $H_0$.

Yes!

We can use two sample t-test.

Don't forget to add the condition var.equal = TURE .

    Two Sample t-test

data:  noNitro and Nitro
t = -2.6191, df = 18, p-value = 0.01739
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.29915788 -0.03284212
sample estimates:
mean of x mean of y
0.399     0.565


[1] 0.01738648


Here we make an assumption that the variances are equal. Does that make sense?

Can we do a mean test with different variances? Yes!

    Welch Two Sample t-test

data:  noNitro and Nitro
t = -2.6191, df = 11.673, p-value = 0.02286
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.30452438 -0.02747562
sample estimates:
mean of x mean of y
0.399     0.565


No matter which method we choose, we should reject $H_0$.

How to do a two samples test in Python ?

Let's try the same variances first.

Ttest_indResult(statistic=-2.6190944840455472, pvalue=0.017386483684799125)


We can also assume the variances are not equal.

Ttest_indResult(statistic=-2.6190944840455472, pvalue=0.022863946155002354)


Therefore, we also should reject $H_0$.

# In-class Exercise: Tests on Two Means (Unknown But Not Equal Variance)

A study was conducted by the Department of Zoology at Virginia Tech to determine if there is a significant difference in the density of organisms at two different stations located on Cedar Run, a secondary stream in the Roanoke River drainage basin.

Sewage from a sewage treatment plant and overflow
from the Federal Mogul Corporation settling pond enter the stream near its headwaters.

The following data give the density measurements, in number of organisms per square meter, at the two collecting stations:

Can we conclude, at the 0.05 level of significance, that the average densities at the two stations are equal?

Assume that the observations come from normal populations with different variances.

$H_0: \mu_1 = \mu_2 ,\\ H_1: \mu_1 \ne \mu_2.$


Welch Two Sample t-test

data:  stat1 and stat2
t = 2.7578, df = 18.781, p-value = 0.01261
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1389.003 10164.331
sample estimates:
mean of x mean of y
9897.500  4120.833


# Test on a Single Proportion 单一比例检验

For the two-sided hypothesis

$H_0: p = p_0,\\ H_1: p \ne p_0.$

The appropriate random variable on which we base our decision criterion is the binomial random variable $X$, although we could just as well use the statistic $\hat{p} = X/n$.

Values of $X$ that are far from the mean $\mu = np_0$ will lead to the rejection of the null hypothesis.
$X$ 的值远离平均值$\mu = np_0$ 将导致拒绝零假设。

Because $X$ is a discrete binomial variable, it is unlikely that a critical region can be established whose size is exactly equal to a pre-specified value of $\alpha$.

For this reason it is preferable, in dealing with small samples, to base our decisions on P-values.

At the $\alpha$-level of significance, we compute
$\alpha$- 显著性水平，计算

$P = 2P(X \le x \text{ when } p = p_0) \qquad \text{ if } x < np_0$

or

$P = 2P(X \ge x \text{ when } p = p_0) \qquad \text{ if } x > np_0$

and reject $H_0$ in favor of $H_1$ if the computed P-value is less than or equal to $\alpha$.

# Example 6

In a random sample of $n=500$ families owning television sets in the city of Hamilton, Canada, it is found that $x=340$ subscribe to HBO.
$n=500$ 加拿大汉密尔顿市拥有电视机的家庭随机样本中发现， 订阅了 HBO 的家庭是 $x=340$

Suppose we make the conjecture, the proportion of families with television sets in this city that subscribe to HBO is 0.7.

We have the following hypotheses.

$H_0: p = 0.7, \\ H_1: p \ne 0.7.$

    1-sample proportions test with continuity correction

data:  340 out of 500
X-squared = 0.85952, df = 1, p-value = 0.3539
alternative hypothesis: true p is not equal to 0.7
95 percent confidence interval:
0.6368473 0.7203411
sample estimates:
p
0.68


How to get p-value?

[1] 0.3533839


We can find p-value is much larger than $\alpha = 0.05$, we should not reject $H_0$.

$H_0: p = 0.7, \\ H_1: p > 0.7.$

    1-sample proportions test with continuity correction

data:  340 out of 500
X-squared = 0.85952, df = 1, p-value = 0.8231
alternative hypothesis: true p is greater than 0.7
95 percent confidence interval:
0.6437733 1.0000000
sample estimates:
p
0.68


As this one-tailed test,so the p value is

[1] 0.8233081


We also can find p-value is much larger than $\alpha = 0.05$, we should not reject $H_0$.

# In-class Exercise: Test on a Single Proportion.

A commonly prescribed drug for relieving nervous tension is believed to be only 60% effective.

Experimental results with a new drug administered to a random sample of 100 adults who were suffering from nervous tension show that 70 received relief.

Is this sufficient evidence to conclude that the new drug is superior to the one commonly prescribed? (NO/YES: Use a 0.05 level of significance.)

When $n$ is large, we can use the normal approximation.
$n$ 很大时，我们可以使用正态近似。

The z-value for testing $p = p_0$ is given by

$z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$

, which is a value of the standard normal variable $Z$.

Hence, for a two-tailed test at the $\alpha$-level of significance, the critical region is $z < −z_{\alpha/2}$ or $z > z_{\alpha/2}$.

For the one-sided alternative $p < p_0$, the critical region is $z < −z_{\alpha}$, and for the alternative $p > p0$, the critical region is $z > z_{\alpha}$.

[1] 2.041241

[1] 1.644854

[1] 0.02061342


It is easily to see $z > z_{\alpha}$, we should reject $H_0$.
p value is 0.0206.

$H_0: p = 0.6, \\ H_1: p > 0.6.$

    1-sample proportions test with continuity correction

data:  70 out of 100
X-squared = 3.7604, df = 1, p-value = 0.02624
alternative hypothesis: true p is greater than 0.6
95 percent confidence interval:
0.6149607 1.0000000
sample estimates:
p
0.7


We should reject $H_0$.

# Two Samples: Tests on Two Proportions 两个样本：两个比例的检验

In general, we wish to test the null hypothesis that two proportions, or binomial parameters, are equal.

That is, we are testing $p_1 = p_2$ against one of the alternatives $p_1 < p_2$, $p_1 > p_2$, or $p_1 \ne p_2$.

The z-value for testing $p_1= p_2$ is determined from the formula

$z = \frac{\hat{p}_1 -\hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(1/n_1+1/n_2)}}$

，where $\hat{p} = (x_1+x_2)/(n_1+n_2)$.

The critical regions for the appropriate alternative hypotheses are set up as before, using critical points of the standard normal curve.

Hence, for the alternative $p_1= p_2$ at the $\alpha$-level of significance, the critical region is $z < −z_{\alpha/2}$ or $z > z_{\alpha/2}$.

For a test where the alternative is $p_1 < p_2$, the critical region is $z < −z_{\alpha}$, and when the alternative is $p_1 > p_2$, the critical region is $z > z_\alpha$.

# Example 7

A vote is to be taken among the residents of a town and the surrounding county to determine whether a proposed chemical plant should be constructed.

The construction site is within the town limits, and for this reason many voters in the county believe that the proposal will pass because of the large proportion of town voters who favor the construction.

To determine if there is a significant difference in the proportions of town voters and county voters favoring the proposal, a poll is taken.

If 120 of 200 town voters favor the proposal and 240 of 500 county residents favor it, would you agree that the proportion of town voters favoring the proposal is higher than the proportion of county voters?

Use an $\alpha = 0.05$ level of significance.

$H_0: p_1 = p_2, \\ H_1: p_1 > p_2.$

[1] 2.86972

[1] 1.644854

[1] 0.002054176


Therefore, we reject $H_0$ and agree that the proportion of town voters favoring the proposal is higher than the proportion of county voters.

Can we use prop.test ? Yes

    2-sample test for equality of proportions with continuity correction

data:  c out of c120 out of 200240 out of 500
X-squared = 7.7619, df = 1, p-value = 0.002668
alternative hypothesis: greater
95 percent confidence interval:
0.04869691 1.00000000
sample estimates:
prop 1 prop 2
0.60   0.48


We also can find p-value is much smaller than $\alpha = 0.05$, we should reject $H_0$.

# Two-Sample Tests Concerning Variances 关于方差的两样本检验

In this section, we are concerned with testing hypotheses concerning comparison of population variances or standard deviations.

Attention is focused on comparative experiments between methods or processes, where inherent reproducibility or variability must formally be compared.

In addition, to determine if the equal variance assumption is violated, a test comparing two variances is often applied prior to conducting a t-test on two means.

We shall test the null hypothesis $H_0$ that $\sigma^2_1 = \sigma_1^2$ against one of the usual alternatives

$\sigma_1^2 < \sigma_2^2, \qquad \sigma_1^2 > \sigma_2^2, \qquad \text{ or } \qquad \sigma_1^2 \ne \sigma_2^2.$

For independent random samples of sizes $n_1$ and $n_2$, respectively, from the two populations, the f-value for testing $\sigma^2_1 = \sigma_1^2$ is the ratio

$f =\frac{s_1^2}{s_2^2}$

, where $s^2_1$ and $s^2_2$ are the variances computed from the two samples.
$s^2_1$$s^2_2$ 是从两个样本计算的方差。

Therefore, the critical regions of size $\alpha$ corresponding to the one-sided alternatives $\sigma_1^2 < \sigma_2^2$ and $\sigma_1^2 > \sigma_2^2$ are, respectively, $f < f_{1-\alpha,v_1,v_2}$ and $f> f_{\alpha,v_1,v_2}$.

For the two-sided alternative $\sigma_1^2 \ne \sigma_2^2$ the critical region is $f < f_{1-\alpha,v_1,v_2}$ or $f> f_{\alpha,v_1,v_2}$.

# Example 8

Let's still consider the weight of nonitrogen and nitrogen samples.

No Nitrogen:
0.32 0.53 0.28 0.37 0.47 0.43 0.36 0.42 0.38 0.43
Nitrogen:
0.26 0.43 0.47 0.49 0.52 0.75 0.79 0.86 0.62 0.46


Do these two samples have the same variance? Consider $\alpha = 0.05$.

$H_0: \sigma_{NON}^2 = \sigma_{NIT}^2, \\ H_1: \sigma_{NON}^2 \ne \sigma_{NIT}^2$

[1] 0.1519516

[1] 0.2483859

[1] 4.025994


We can find fvalue < f1 = $f_{0.975,9,9}$, we should reject $H_0$.

    F test to compare two variances

data:  noNitro and Nitro
F = 0.15195, num df = 9, denom df = 9, p-value = 0.009787
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.03774262 0.61175613
sample estimates:
ratio of variances
0.1519516


To calculate P-value

[1] 0.009786692


Based on the p-value, we know we should reject $H_0$.

0.15195156922096545

0.24838585469445493

4.025994158282978

0.009786692293824268


Based on the results, we can see F value is in the reject region and the p value is much less than significance level $\alpha$.

Therefore, we rejct $H_0$.

Note: $F$ test is very sensitive to the distributions of the populations.
$F$ 检验对总体的分布非常敏感。

# One Comprehensive Example 一个综合例子

Let's play with a data set ToothGrowth .

We want to investigate mean of len between two supp are the same or not.

We assume len follow an approximately normal distribution.

However, we don't whether these two samples have the same variance or not. Let's see what we can do.

   len supp dose
1  4.2   VC  0.5
2 11.5   VC  0.5
3  7.3   VC  0.5
4  5.8   VC  0.5
5  6.4   VC  0.5
6 10.0   VC  0.5

      len        supp         dose
Min.   : 4.20   OJ:30   Min.   :0.500
1st Qu.:13.07   VC:30   1st Qu.:0.500
Median :19.25           Median :1.000
Mean   :18.81           Mean   :1.167
3rd Qu.:25.27           3rd Qu.:2.000
Max.   :33.90           Max.   :2.000


Before onducting hypothesis testing, let's do a visualization for the data.

Based on the plot, we can see len seems increasing when dose increases.

We can't find a clear relationship between len and supp .

Let's try a boxplot.

Based on the graph, we may have the conjecture the means and variances of len are different between these two supp .

Let's do an F test first. Consider $\alpha = 0.05$.

$H_0: \sigma_{OJ}^2 = \sigma_{VC}^2, \\ H_1: \sigma_{OJ}^2 \ne \sigma_{VC}^2$

    F test to compare two variances

data:  len by supp
F = 0.6386, num df = 29, denom df = 29, p-value = 0.2331
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3039488 1.3416857
sample estimates:
ratio of variances
0.6385951


# In-class Exercise

1. Will you reject $H_0$ or not? Why?
你会拒绝 $H_0$ 吗？或不拒绝？为什么？

2. Based on the result of F test, do a hypothesis testing for the mean of len between two supp . What is your conclusion?
根据 F 检验的结果，这两个样本 supplen 均值进行假设检验。你的结论是什么？

We should not reject $H_0$ as p value is larger than 0.05.

$H_0: \mu_{OJ} = \mu_{VC}, \\ H_1: \mu_{OJ} \ne \mu_{VC}$

    Two Sample t-test

data:  len by supp
t = 1.9153, df = 58, p-value = 0.06039
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1670064  7.5670064
sample estimates:
mean in group OJ mean in group VC
20.66333         16.96333


We should not reject $H_0$.

# Summary of hypothesis testing 假设检验总结

• Three elements of a test: hypotheses, test statistic, and critical region
检验的三个要素：假设、检验统计量和临界区
• In practice, check assumptions to know which test to use (i.e., which distribution to reference)
在实践中，检查假设以了解使用哪种测试（即参考哪个分布）
• We learned about: one- and two-population location and scale problems, in continuous setting, and proportion in discrete setting
了解了：连续环境下的单总体和双总体区位和规模问题，离散环境下的比例问题

# References

1. Probability & Statistics for Engineers & Scientist, 9th Edition, Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, Keying Ye, Prentice Hall