# # Objectives

• Understand Central Limit Theorem and when it can be used
了解中心极限定理以及何时可以使用
• Know the sampling distribution of the sample mean, the difference between two means, the sample variance, and the ratio of two sample variances
知道样本均值的抽样分布，两个均值的差值，样本方差，以及两个样本方差的比值
• Understand $t$ , $\chi^2$ , and $F$ distributions; in which situation, we should use these distributions
理解 $t$$\chi^2$，和 $F$ 分布；在哪种情况下，我们应该使用这些分布
• Know how to use q- and p- function to find the quantiles and probability of all these distributions
知道如何使用 q-p- 函数来找到所有这些分布的分位数和概率

# # Review of Statistics 统计知识回顾

Population 群体
the entire group of individuals that we want information about.

Sample 样本
a part of the population that we actually examine in order to gather information about the whole population.

Inferential statistics 推论统计
use a fact about a sample to estimate the truth about the whole population.

Sample Mean 样本均值
arithmetic average, denoted by $\bar{x}$

$\bar{x} = \frac{x_1+\cdots+x_n}{n} = \frac{1}{n}\sum\limits_{i=1}^n x_i=\frac{1}{n}\sum x_i$

Sample variance 样本方差
measure of variability, spread out, denoted by $s^2$

$s^2 = \frac{(x_1-\bar{x})^2+\cdots+(x_n-\bar{x})^2}{n-1} = \frac{\sum\limits_{i=1}^n (x_i-\bar{x})^2}{n-1}$

# # Distribution of Sample Mean 样本均值的分布

Central Limit Theorem CLT 中心极限定理
Let $\bar{X}$ be the mean of a random sample of size $n$ taken from a population with mean $\mu$ and finite variance $\sigma^2$, then the limiting of the limiting form of the distribution of

$Z = \frac{\bar{X} -\mu}{\sigma/\sqrt{n}}$

as $n\rightarrow \infty$, is the standard normal distribution, i.e., $Z \sim \mathcal{N}(0,1)$.
$n\rightarrow \infty$ , 是呈标准正态分布，即 $Z \sim \mathcal{N}(0,1)$
Remark 备注
• CLT can be summarized in one sentence:
CLT 可以用一句话概括：
When sample size $n$ is large ($n>30$), approximately,
当样本量 $n$ 很大（$n>30$），则其均值大约为，

$\bar{X}\sim \mathcal{N}(\mu, \sigma^2/n)$

• The mean and standard deviation of sample mean from a population with mean $\mu$ and standard deviation $\sigma$ are always:
总体均值为 $\mu$ 、标准差为 $\sigma$ ，其样本均值的均值和标准差

$\mu_{\bar{X}} = \mu, \qquad \sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$

regardless of population distribution and sample size $n$.
无论群体分布和样本大小 $n$ 如何。

• The Central Limit Theorem simply provides us the shape of the distribution of sample mean when the sample size is large.
中心极限定理只是为我们提供了样本量较大时，样本均值分布的形态。

## # Example 1 pnorm()

When a batch of a certain chemical product is prepared, the amount of a particular impurity in the batch is a random variable with mean value 4.0 g and standard deviation 1.5 g. If 50 batches are independently prepared, what is the (approximate) probability that the sample average amount of impurity $\bar{X}$ is between 3.5 and 3.8 g?

Since the sample size $n = 50 >30$, we can assume the distribution of sample mean $\bar{X}$ is approximately normal with mean and standard deviation

$\mu_{\bar{X}} = \mu = 4.0,\qquad \sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{1.5}{\sqrt{50}} = 0.2121$

Thus,

$P(3.5\leq \bar{X}\leq 3.8) = P\left(\frac{3.5-4}{0.2121}\leq Z\leq \frac{3.8-4}{0.2121}\right) = P(-2.36\leq Z\leq -0.94) = 0.1645$

We can use pnorm() .

 0.1636782


## # In-class Exercise: Sample Mean Distribution 样本均值分布

The amount of time that a drive-through bank teller spends on a customer is a random variable with a mean $\mu = 3.2$ minutes and a standard deviation $\sigma = 1.6$ minutes. If a random sample of 64 customers is observed, find the probability that their mean time at the teller’s window is

1. at most 2.7 minutes;
最多 2.7 分钟；

 0.006209665

2. more than 3.5 minutes;
超过 3.5 分钟；

 0.0668072

3. at least 3.2 minutes but less than 3.4 minutes.
至少 3.2 分钟但少于 3.4 分钟。

 0.3413447


# # Sampling Distribution of the Difference between Two Means 两个均值之间差异的抽样分布

## # The normal distribution 正态分布

A scientist or engineer may be interested in a comparative experiment in which two manufacturing methods, 1 and 2, are to be compared.

The basis for the comparison is the difference in the population means.

Suppose that we have two populations, the first with mean $\mu_1$ and variance $\sigma_1^2$, and the second with mean $\mu_2$ and variance $\sigma_2^2$.

Theorem 定理
If independent samples of size $n_1$ and $n_2$ are drawn at random from two populations, discrete or continuous, with means $\mu_1$ and $\mu_2$ and variances $\sigma_1^2$ and $\sigma_2^2$, respectively, then the sampling distribution of the differences of means, $\bar{X}_1-\bar{X}_2$, is approximately normally distributed with mean and variance given by

$\mu_{\bar{X}_1-\bar{X}_2} = \mu_1-\mu_2, \qquad \sigma_{\bar{X}_1-\bar{X}_2}^2 = \frac{\sigma^2_1}{n_1} + \frac{\sigma_2^2}{n_2}$

Hence,

$Z = \frac{\left(\bar{X}_1-\bar{X}_2\right)-(\mu_1-\mu_2)}{\sqrt{\sigma^2_1/n_1 + \sigma_2^2/n_2}} \sim \mathcal{N}(0,1)$

Remark 备注
If both $n_1$ and $n_2$ are greater than or equal to 30, the normal approximation for the distribution of $\bar{X}_1-\bar{X}_2$ is very good when the distributions are not too far away from normal.

### # Example 2

The television picture tubes of manufacturer $A$ have a mean lifetime of 6.5 years and a standard deviation of 0.9 year, while those of manufacturer $B$ have a mean lifetime of 6.0 years and a standard deviation of 0.8 year.

What is the probability that a random sample of 36 tubes from manufacturer $A$ will have a mean lifetime that is at least 1 year more than the mean lifetime of a sample of 49 tubes from manufacturer $B$?

We are given the following information:

$\text{Poppulation 1:} \mu_1 = 6.5, \qquad \sigma_1 = 0.9, \qquad n_1 = 36,$

$\text{Poppulation 2:} \mu_2 = 6.0, \qquad \sigma_2 = 0.8, \qquad n_2 = 49 .$

Thus the distribution of $\bar{X}_1 - \bar{X}_2$ will be approximately normal and with mean and standard deviation

$\mu_{\bar{X}_1 - \bar{X}_2} = 6.5-6.0 = 0.5, \qquad \sigma_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{0.81}{36}+\frac{0.64}{49}} = 0.189.$

Thus by the theorem 因此根据定理

$P(\bar{X}_1 - \bar{X}_2 \ge 1.0) = P(Z > \frac{1.0-0.5}{0.189}) = P(Z>2.65) = 1- P(Z \le 2.65) = 0.0040.$

 0.004007479

 0.004007479


### # In-class Exercise: Difference between Two Means 两个均值的差别

Two different box-filling machines are used to fill cereal boxes on an assembly line.

The critical measurement influenced by these machines is the weight of the product in the boxes.

Engineers are quite certain that the variance of the weight of product is $\sigma^2 = 1$ ounce.

Experiments are conducted using both machines with sample sizes of 36 each.

The sample averages for machines $A$ and $B$ are $\bar{x}_A = 4.5$ ounces and $\bar{x}_B = 4.7$ ounces.

Engineers are surprised that the two sample averages for the filling machines are so different.

(a) Use the Central Limit Theorem to determine

$P(\bar{X}_B-\bar{X}_A \ge 0.2)$

under the condition that $\mu_A = \mu_B$.
$\mu_A = \mu_B$ 这样的条件下。

(b) Do the aforementioned experiments seem to, in any way, strongly support a conjecture that the population means for the two machines are different? Explain using your answer in a.

 0.198072


Question: If we have no idea about the population variance, what can we do?

## # The $t$ Distribution $t$ 分布

Let $X_1, X_2,...,X_n$ is a random sample from a normal distribution $\mathcal{N}(\mu, \sigma^2)$. Let

$\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i, \qquad S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i-\bar{X})^2$

Then the random variable

$T = \frac{\bar{X}-\mu}{S/\sqrt{n}}$

has the $t$ distribution with $v = n-1$ degrees of freedom, $t_{n-1}$.

Remark
The $t$-distribution can still be used even if the population distribution is not normal, as long as

• the sample size $n$ is large.
样本量$n$ 很大
• the population distribution is not too-skewed.
总体分布并不太偏。

The following is the graph of the pdf of $t$ distribution with different degrees of freedom. As the degrees of freedom $v$ gets large, the pdf gets closer to normal curve.

### # Example 3 qt

A chemical engineer claims that the population mean yield of a certain batch process is 500 grams per milliliter of raw material.

To check this claim he samples 25 batches each month.

If the computed $t-$value falls between $-t_{0.05}$ and $t_{0.05},$ he is satisfied with this claim.

What conclusion should he draw from a sample that has a mean $\bar{x} = 518$ grams per milliliter and a sample standard deviation $s = 40$ grams? Assume the distribution of yields to be approximately normal.

We know $\mu = 500, \bar{x}=518, s = 40, n = 25$.

Thus 自由度 $v = n-1 = 24$.

Then

$t = \frac{518-500}{40/\sqrt{25}} = 2.25$

How to find $\pm t_{0.05}$?
We should consider the quantile function qt .

 1.710882

 -1.710882


Therefore, based on the value of $t$ computed from the sample, it is more reasonable $\mu >500$.

Hence, the engineer is likely to conclude that the process produces a better product than he thought.

### # In-class Exercise: $t$ Distribution

A manufacturing firm claims that the batteries used in their electronic games will last an average of 30 hours.

To maintain this average, 16 batteries are tested each month.

If the computed t-value falls between $-t_{0.025}$ and $t_{0.025}$, the firm is satisfied with its claim.

What conclusion should the firm draw from a sample that has a mean of $\bar{x}= 27.5$ hours and a standard deviation of $s = 5$ hours? Assume the distribution of battery lives to be approximately normal.

 -2

 2.13145

 -2.13145


# # Sampling Distribution of Sample Variance 样本方差的抽样分布

## # The Chi-Squared Distribution 卡方分布

Chi-squared distribution is usually used when we seek some conclusion on variance of a population.

If $S^2$ is the variance of a random sample of size $n$ taken from a normal population having the variance $\sigma^2$, then the statistic

$\chi^2 = \frac{(n-1)S^2}{\sigma^2} = \sum_{i=1}^n \frac{(X_i-\bar{X})^2}{\sigma^2}$

has a chi-squared distribution with $v = n − 1$ degrees of freedom.

The probability that a random sample produces a $\chi^2$ value greater than some specified value is equal to the area under the curve to the right of this value.

It is customary to let $\chi^2_\alpha$ represent the $\chi^2$ value above which we find an area of $\alpha$. This is illustrated by the shaded region in the following figure.

### # Example 4 qchisq

A manufacturer of car batteries guarantees that the batteries will last , on average 3 years with a sandard deviation of 1 year.

If five of these batteries have lifetimes of $1.9, 2.4, 3.0, 3.5,$ and $4.2$ years, should the manufacturer still be convinced that the batteries have a standard deviation of 1 year?

Assume that the battery lifetime follows a normal distribution. Let's consider $\alpha = 0.05$.

$S^2 = \frac{1}{n(n-1)}\left[ n\sum_{i=1}^n X_i^2 - \left(\sum_{i=1}^n X_i\right)^2\right] = \frac{(5) (48.26) -(15)^2}{(5) (4)} = 0.815$

Then

$\chi^2_4 = \frac{(n-1)S^2}{\sigma^2} = \frac{(4)(0.815)}{1} =3.26$

Next, we want to find $\chi^2_{\alpha/2}$ and $\chi^2_{1-\alpha_2}$ with the degrees of freedom $v$, where $\alpha = 0.05$ and $v = 4$.

 0.4844186

 11.14329


Since $95\%$ of the $\chi^2$ values with 4 degrees of freedom fall between $0.484$ and $11.143$, the computed value with $\sigma^2=1$ is reasonable.

### # In-class Exercise: $\chi^2$ Distribution

1. For a chi-squared distribution, find
对于卡方分布，找到

• $\chi^2_{0.025}$ when $v = 15$;
• $\chi^2_{0.01}$ when $v = 7$;
• $\chi^2_{0.05}$ when $v = 24$;
 27.48839

 18.47531

 36.41503

2. The scores on a placement test given to college freshmen for the past five years are approximatedly normally distributed with a mean $\mu= 74$ and a variance $\sigma^2 =8$.
过去五年大学新生的分班考试分数近似一个均值$\mu= 74$ 和方差 $\sigma^2 =8$ 的正态分布。
would you still consider $\sigma^2 = 8$ to be a valid value of the variance if a random sample of 20 students who take the placement test this year obtain a value of $s^2 = 20$?
如果今年参加分级考试的 20 名学生的随机样本有 $s^2 = 20$，是否仍将 $\sigma^2 = 8$ 视为方差的有效值？

 47.5

 8.906516

 32.85233


## # The $F$ Distribution $F$ 分布

Suppose that we have a random sample of $m$ observations from the normal population $\mathcal{N}(\mu_1,\sigma_1^2)$ and an independent random sample of $n$ observations from a second normal population $\mathcal{N}(\mu_2,\sigma_2^2)$.

Then,

$F_{m-1,n-1} = \frac{S_1^2/\sigma_1^2}{S_2^2/\sigma_2^2},$

follows an $F$ distribution with $v_1 = m-1$ and $v_2 = n-1$, where $S_1$, $S_2$ and $\sigma_1$, $\sigma_2$ are sample and population standard deviations of sample 1 and sample 2, respectively.

This $F$ statistic can be used to compare variances from two independent groups.
$F$ 统计量可用于比较来自两个独立组的差异。

Here are some density curves of $F$ distribution with different degrees of freedoms:

$f_{1-\alpha}(v_1,v_2) = \frac{1}{f_\alpha(v_2,v_1)}$

The $F-$distribution finds enormous application in comparing sample variances.
$F$ 分布在比较样本方差方面有着巨大的应用。

Applications of the $F-$distribution are found in problems involving two or more samples.
$F$ 分布多应用在涉及两个或更多样本的问题中。

### # Example 5 qf

Pull-strength tests on 10 soldered leads for a semiconductor device yield the following results, in pounds of force required to rupture the bond: 19.8 12.7 13.2 16.9 10.6 18.8 11.1 14.3 17.0 12.5

Another set of 8 leads was tested after encapsulation to determine whether the pull strength had been increased by encapsulation of the device, with the following results: 24.9 22.8 23.6 22.1 20.4 21.6 21.8 22.5

Comment on the evidence available concerning equality
of the two population variances.

 5.657436


Next we want compare this value with $F_{0.005,m-1,n-1}$ and $F_{0.995,m-1,n-1}$.

If the F-value falls into the interval $(F_{0.995,m-1,n-1},F_{0.005,m-1,n-1})$, we have 99% confidence the variances are equal.

 0.1452452

 8.513823


We can see fvalue falls into this interval, so the two population variances are very likely to be equal.

# # Conclusion

1. Distribution of sample mean/difference between two means with known variance when sample size is large enough is normal distribution.
当样本大小足够大时，两个已知方差的样本均值 / 差值分布是正态分布的。

2. With unknown variance, the instance comes from the normal distribution, then the sample mean follows a $t$ distribution
方差未知，实例来自正态分布，则样本均值遵循 $t$ 分布

3. $\chi^2$ distribution is used for sample variance.
$\chi^2$ 分布用于样本方差。

4. $F$ distribution is used for the ratio of two variances.
$F$ 分布用于两个方差的比率。

1. Probability & Statistics for Engineers & Scientist, 9th Edition, Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, Keying Ye, Prentice Hall