Data Analysis and Statistical Inference (1)

Data Analysis and Statistical Inference

明显的跟不上课程节奏,底子太差,需要加油应对一个接一个的deadline

1.莫名其妙的概率

以前是真绕不过弯来,概率原来是这样的,而不是一个简单的分子除以分母再加个百分号。

A 2005 survey found that 7% of teenagers (ages 13 to 17) suffer from an extreme fear of spiders (arachnophobia). At a summer camp there are 10 teenagers sleeping in each tent. Assume that these 10 teenagers are independent of each other. What is the probability that at least one of the suffers from arachnophobia?

每个人不怕的概率是0.93,10个人都不怕的概率是0.93**10,那么至少有一个人怕的概率是: 1 - 0.93^10

[1] 0.516

2.更加莫名其妙的概率

Last semester, out of 170 students taking a particular statistics class, 71 students were “majoring” in social sciences and 53 students were majoring in pre-medical studies. There were 6 students who were majoring in both pre-medical studies and social sciences. What is the probability that a randomly chosen student is majoring in social sciences, given that s/he is majoring in pre-medical studies?

我一直以为应该是6/71,即pre-medical &social sciences/social sciences

但是答案却是:If M is the event a student is majoring in pre-medical studies and S is the event s/he is majoring in social sciences, then calculate P(S|M)=P(S&M)/P(M)=6/53.

3.一个有意思的函数

是从课程提供的Rdata里提出来的,

The custom function calc streak, which was loaded in with the data, can be used to calculate the lengths of all shooting streaks 大体用途是,H和M两个字符随机排列,按M进行分区,在分割的区间里,统计H分别出现的频率。。。

代码先备份在这里,完全是偷梁换柱啊。。。

昨天没看懂意图,以为让自己写这个function,折腾了几个小时没搞定,一直在想用正则表达式来劈分字符串,然后统计H出现的次数,一直搞不定,差点开始用暴力循环了,看到这个代码,除了感叹,还能做什么呢?

===

calc_streak <- function(x){

定义一个向量y,与x的长度一致,全部赋值为0,rep(0,n)

y <- rep(0,length(x))

将x中为“H”的,对应的y中设置为“1”

y[x == "H"] <- 1

为后面diff做差分准备

y <- c(0, y, 0)

wz为y为0的下标,索引?

wz <- which(y == 0)

用差分函数diff,求取wz相邻两个数的差值,减去1得到的就是按M劈分,各个区间内H的频数

streak <- diff(wz) - 1

return(streak)

}

 

4.一个简单的simulation

In a simulation, you set the ground rules of a random process and then the computer uses random numbers to generate an outcome that adheres to those rules. As a simple example, you can simulate flipping a fair coin with the following.

The vector outcomes can be thought of as a hat with two slips of paper in it: one slip says “heads” and the other says “tails”. The function sample draws one slip from the hat and tells us if it was a head or a tail.Run the second command listed above several times. Just like when flipping a coin, sometimes you’ll get a heads, sometimes you’ll get a tails, but in the long run, you’d expect to get roughly equal numbers of each.

outcomes <- c("heads", "tails") sample(outcomes, size = 1, replace = TRUE)

sample(outcomes, size = 100, replace = TRUE,prob=c(0.2,0.8)) outcomes <- c(0,1,2,3) sample(outcomes, size = 3, replace = TRUE)

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注