# Objectives 目标

  • Understand Uniform Distribution and Normal Distribution
    了解均匀分布正态分布
  • Know how to generate the uniform and normal random numbers
    知道如何生成均匀和正态随机数
  • Know how to use q- function to find the quantiles of the normal distribution
    知道如何使用 q- 函数找到正态分布的分位数

# Uniform Distribution 均匀分布

Uniform Random Variable 均匀随机变量
A uniform random variable is a continuous random variable for which every outcome in an interval is equally likely.
均匀随机变量是一个连续的随机变量,在一个区间内,每个结果的可能性都相等。
  • Example: XX is a random real number taken from [0,1][0,1].
    XX 是一个随机实数取自 [0,1][0,1]

  • We can use runif(n) to generate random number taken from [0,1][0,1].
    我们可以 runif(n) 用来生成随机数取自 [0,1][0,1]

n <- 10 # number of sample size
x <- runif(n) # generate random numbers taken from the interval [0,1]
hist(x,probability=TRUE,col=gray(.9),main="uniform on [0,1]")
curve(dunif(x,0,1),add=T,col = "red")

What will happen if increasing the number of the sample size?
如果增加样本数量会发生什么?

n <- 1e6 # number of sample size
x <- runif(n) # generate random numbers taken from the interval [0,1]
hist(x,probability=TRUE,col=gray(.9),main="uniform on [0,1]")
curve(dunif(x,0,1),add=T,col = "red")

  • Notation: XU[a,b]X\sim U[a,b]

  • Probability Density Function pdf :
    f(x)=1baf(x) = \frac{1}{b-a} for axba\leq x\leq b

  • Cumulative Distribution Function cdf :
    For axba\leq x\leq b

    F(x)=P(aXx)=axf(t)dt=ax1badt=xaba.F(x)= P(a \leq X \le x) = \int_a^x f(t)d t=\int_a^x \frac{1}{b-a}d t=\frac{x-a}{b-a}.

  • Expectation & Variance 期望值与方差
\mu = E(X) &=\int_a^b x\frac{1}{b-a} d x=\left.\frac{x^2}{2(b-a)}\right|_{a}^b=\frac{a+b}{2} \sigma^2 = var(X) &= E(X^2)-[E(X)]^2 = \int_a^b x^2\frac{1}{b-a} d x - \left(\frac{a+b}{2}\right)^2= \left.\frac{x^3}{3(b-a)}\right|_{a}^b-\frac{(a+b)^2}{4} = \frac{(b-a)^2}{12}

# Important R functions

  • To generate random numbers, pdf (aka pmf), cdf, and quantiles.
    生成随机数、pdf(又名 pmf)、cdf 和分位数。

  • The prefixes for these functions are:
    这些函数的前缀是:

rrandom number generation随机数生成
dprobability density function or probability mass function概率密度函数或概率质量函数
pcumulative distribution function累积分布函数
qquantiles分位数

# Example 1

Suppose XU[1,1]X \sim U[-1,1]

n <- 1e6 # number of sample size
x <- runif(n,-1,1) # generate random numbers taken from the interval [-1,1]
hist(x,probability=TRUE,col=gray(.9),main="uniform on [-1,1]") # histogram of the relative frequency - probability=TRUE
curve(dunif(x,-1,1),add=T,col = "red") #plot the probability density function of X

punif(0.5, min =-1, max = 1) # find the P(-1 <= X <= 0.5)
[1] 0.75
qunif(0.5, min = -1, max = 1) # find the median of X
[1] 0
mean(x) # find the sample mean
[1] 0.0002108174
var(x) # find the sample variance
[1] 0.3333544

# In-class Exercise: Uniform Distribution 均匀分布

  1. Suppose XU[2,3]X \sim U[-2,3], please generate the random numbers with sample size n=1e6n = 1e6.

    n <- 1e6 # number of sample size
    x <- runif(n, -2, 3) # generate random numbers taken from the interval [-2,3]
  2. Do a histogram for the instances you generated with the y-axis as relative frequency instead of frequency.
    使用 y 轴作为相对频率而不是频率为您生成的实例绘制直方图。

    hist(x, probability=TRUE, col=gray(.9), main="uniform on [-2,3]") # histogram of the relative frequency
  3. Find P(X0)P(X \le 0), P(X1)P(X \le 1), and P(X1)P(X \ge 1) (a little tricky).

    punif(0, min =-2, max = 3) # find the P( X <= 0 )
    [1] 0.4
    
    punif(1, min =-2, max = 3) # find the P( X <= 1 )
    [1] 0.6
    
    1 - punif(0, min =-2, max = 3) # find the P( X >= 1 )
    [1] 0.4
    
  4. Find Q1, median, Q3 of this random variable XX. What is the expectation value and variance of XX?
    找到这个随机变量XX 的 Q1、中位数、Q3。期望值和方差是多少?

    qunif(0.25, min = -2, max = 3) # Q1
    [1] -0.75
    
    qunif(0.5, min = -2, max = 3) # median
    [1] 0.5
    
    qunif(0.75, min = -2, max = 3) # Q3
    [1] 1.75
    

    The expectation is b+a2=3+(2)2=0.5\frac{b+a}{2}=\frac{3+(-2)}{2}=0.5 . The variance is b+a2=3+(2)2=0.5(ba)212=(3(2))212=2512=2.08333\frac{b+a}{2}=\frac{3+(-2)}{2}=0.5\frac{(b-a)^{2}}{12}=\frac{(3-(-2))^{2}}{12}=\frac{25}{12}=2.08333 .

  5. Find Q1, median, Q3, mean, and variance of the sample you generated. Compare your results with the answers in Ex.4.
    找出生成的样本的 Q1、中位数、Q3、均值和方差。将您的结果与例 4 中的答案进行比较。

    quantile(x)
            0%        25%        50%        75%       100% 
    -1.9999994 -0.7476943  0.5001483  1.7515364  2.9999982
    
    mean(x)
    [1] 0.5008553
    
    var(x)
    [1] 2.081609
    

# Normal Distribution 正态分布

  • The normal distribution (also be called Gaussian distribution) is a symmetric distribution that is centered around a mean and spreads out in both directions.
    正态分布(也被称为高斯分布)是围绕平均值和差居中出在两个方向上对称分布。

    Examples: Test scores for all ITM 514 students. 所有 ITM 514 学生的考试成绩。

  • Notation: XN(μ,σ2)X\sim \mathcal{N}(\mu, \sigma^2)
    符号

    • μ\mu is the mean of the distribution 分布的平均值
    • σ2\sigma^2 is the variance of the distribution 分布的方差
  • Probability Density Function pdf :

    f(x)=12πσe(xμ)22σ2,for<x<.f(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}},\,\,\,\text{for } -\infty<x<\infty.

  • CDF: The CDF of normal distribution doesn't have a closed form, i.e., there is no analytic answer of the integral 正态分布的 CDF 没有封闭形式,即没有积分的解析答案

P(X<x)=P(Xx)=F(x)=xf(t)dt=12πσxe(tμ)22σ2dtP(X<x) = P(X \le x)=F(x) = \int_{-\infty}^x f(t)d t =\frac{1}{\sqrt{2\pi}\sigma} \int_{-\infty}^x e^{-\frac{(t-\mu)^2}{2\sigma^2}} dt

P(x1<X<x2)=x1x2f(t)dt=12πσx1x2e(tμ)22σ2dt=F(x2)F(x1).P(x_1<X<x_2) = \int_{x_1}^{x_2}f(t)dt = \frac{1}{\sqrt{2\pi}\sigma}\int_{x_1}^{x^2} e^{-\frac{(t-\mu)^2}{2\sigma^2}} dt = F(x_2) -F(x_1).

# Example 2

Suppose ZZ is the standard normal random variable, i.e., ZN(0,1)Z \sim \mathcal{N}(0, 1)
假设 ZZ 是标准正态随机变量

n <- 1e6 # number of sample size
x <- rnorm(n, mean = 0, sd = 1) # generate the standard normal random numbers
hist(x, probability=TRUE, col=gray(.9), main="standard normal random numbers") # histogram of the relative frequency
curve(dnorm(x, mean = 0, sd =1),add=T,col = "red") #plot the probability density function of X

pnorm(0.5, mean =  0, sd = 1) # find the P(X <= 0.5) or P(X < 0.5)
[1] 0.6914625
qnorm(0.5, mean = 0, sd = 1) # find the median of X
[1] 0
mean(x) # find the sample mean
[1] -0.002525208
var(x)
[1] 1.002635

What's the relationship between the general normal random variable XN(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2) and the standard normal random variable?
一般正态随机变量和标准正态随机变量有什么关系

Z=XμσN(0,1).Z= \frac{X-\mu}{\sigma} \sim \mathcal{N}(0,1).

To calculate the probability involving XN(μ,σ2)X\sim \mathcal{N}(\mu,\sigma^2), 计算涉及的概率

\begin{eqnarray*} P(a\leq X\leq b) &=& P\left(\frac{a-\mu}{\sigma}\leq \frac{X-\mu}{\sigma}\leq \frac{b-\mu}{\sigma}\right)\\ &=& P\left(\frac{a-\mu}{\sigma}\leq Z\leq \frac{b-\mu}{\sigma}\right), \end{eqnarray*}

# Example 3

The achievement scores from a college entrance examination are normally distributed with mean 75 and standard deviation 10. What fraction of the scores lies between 80 and 90?
高考成绩的平均分是 75,标准差是 10 的正态分布。80 到 90 之间的分数是多少?

The scores XN(75,102)X\sim \mathcal{N}(75,10^2), then

P(80X90)=P(807510Z907510)=P(0.5Z1.5).P(80\leq X\leq 90) = P\left(\frac{80-75}{10}\leq Z\leq \frac{90-75}{10}\right) = P(0.5\leq Z\leq 1.5) .

p1<- pnorm(1.5, mean = 0, sd = 1) - pnorm(0.5, mean =  0, sd = 1) # find the P(0.5 <=Z <=1.5) 
p1 # display the answer
[1] 0.2417303
p2 <- pnorm(90, mean = 75, sd = 10) - pnorm(80, mean = 75, sd = 10) # find the P(80 <=X <=90)
p2 # display the answer
[1] 0.2417303

# Example 4

Find the value of z0z_0 such that 95%95\% of the standard normal ZZ values lie between z0-z_0 and z0z_0; that is, P(z0Zz0)=.95P(-z_0\leq Z\leq z_0) = .95.
找到 z0z_0 使 95 % 标准正常的 ZZ 值介于 z0-z_0z0z_0 之间

P(z0Zz0)=P(Zz0)P(Zz0)=(1P(Zz0))P(Zz0)=12P(Z<z0)=0.95P(Z<z0)=0.025.P(-z_0\le Z \le z_0) = P(Z \le z_0) -P(Z \le -z_0) = (1- P(Z \ge z_0)) - P(Z \le -z_0) = 1- 2P(Z <-z_0) = 0.95 \Rightarrow P(Z<-z_0) = 0.025.

How to find z0z_0 in R ?

p1 <- - qnorm(0.025, mean = 0, sd = 1)  # find the P(0.5 <=Z <=1.5) 
p1 # display the answer
[1] 0.2417303
p2 <- pnorm(90, mean = 75, sd = 10) - pnorm(80, mean = 75, sd = 10) # find the P(80 <=X <=90)
p2 # display the answer
[1] 0.2417303

# In-class Exercise

Suppose ZN(0,1)Z \sim \mathcal{N}(0,1),
can you find a y>0y>0 such P(yZy)=0.01P(-y \le Z \le y) = 0.01?
假设 ZN(0,1)Z \sim \mathcal{N}(0,1),你能找到一个 y>0y>0P(yZy)=0.01P(-y \le Z \le y) = 0.01?

Critical value zαz_{\alpha} of a standard normal distribution is the value on the measurement axis for which α\alpha of the are under the standard normal curve lies to the right of zαz_{\alpha}.
临界值 zαz_{\alpha} 的标准正态分布是在测量轴上的值,对于这个 α\alpha 位于标准正态曲线下方的 zαz_{\alpha}

# In-class Exercise: Normal Distribution

  1. Suppose XN(1,32)X \sim \mathcal{N}(-1,3^2), please generate the random numbers with sample size n=1e6n = 1e6.
    认为 XN(1,32)X \sim \mathcal{N}(-1,3^2),请生成样本大小的随机数 n=1e6n = 1e6.

    #1 Create the sample
    n <- 1e6
    x <- rnorm(n, mean = -1, sd = 3)
  2. Do a histogram for the instances you generated with the y-axis as relative frequency instead of frequency.
    使用 y 轴作为相对频率而不是频率为您生成的实例绘制直方图。

    #2 A histogram of x
    hist(x, probability = TRUE)
    curve(dnorm(x, mean = -1, sd = 3), add = T, col = "red")

  3. Find P(X0)P(X \le 0), P(X1)P(X \le 1), P(X1)P(X \ge 1), and P(0X1)P( 0\le X \le 1).
    P(X0)P(X \le 0), P(X1)P(X \le 1), P(X1)P(X \ge 1), and P(0X1)P( 0\le X \le 1)

    #3 P(X <=0)
    pnorm(0, mean = -1, sd = 3)
    [1] 0.6305587
    
    # P(X <= 1)
    pnorm(1, mean = -1, sd = 3)
    [1] 0.7475075
    
    # P(X >= 1)
    1 - pnorm(1, mean = -1, sd = 3)
    [1] 0.7475075
    
    # P(0 <= X <=1)
    pnorm(1, mean = -1, sd = 3) - pnorm(0, mean = -1, sd = 3)
  4. Find Q1, median, Q3 of this random variable XX. What is the expectation value and variance of XX?
    找到这个随机变量XX 的 Q1、中位数、Q3,XX 的期望值和方差是多少 XX?

    #4 Q1
    qnorm(0.25, min = -1, max = 3)
    [1] -3.023469
    
    #4 Median
    qnorm(0.5, min = -1, max = 3)
    [1] -1
    
    #4 Q3
    qnorm(0.75, min = -1, max = 3)
    [1] 1.023469
    
  5. Find Q1, median, Q3, mean, and variance of the sample you generated. Compare your results with the answers in Ex.4.
    找出您生成的样本的 Q1、中位数、Q3、均值和方差。将您的结果与第 4 题中的答案进行比较。

    #4 We know the mean is -1 and variance is 9
    #5 Q1, median, Q3 of the sample
    quantile(x)
             0%        25%        50%        75%       100% 
    -15.773765  -3.029712  -1.010223   1.014308  14.674138
    
    # Mean of the sample
    mean(x)
    [1] -1.008806
    
    #5 Variance of the sample
    var(x)
    [1] 8.978004
    

# Conclusion

Normal distribution is the most important distribution. When we talk about the Central Limit Theorem, confidence interval, and hypothesis testing, we will come back to the normal distribution.
正态分布是最重要的分布。当我们谈论中心极限定理、置信区间和假设检验时,我们会用到正态分布。