[PPT] - Introduc)on to Sta)s)cs 02-223 How to Analyze Your Own PowerPoint Presentation

SLIDE 1

Introduc)on ¡to ¡Sta)s)cs ¡

02-‑223 ¡How ¡to ¡Analyze ¡Your ¡Own ¡Genome ¡ Fall ¡2013 ¡

SLIDE 2

Why ¡Use ¡Sta)s)cs? ¡

Anecdotal ¡evidence ¡is ¡unreliable ¡ ¡ Why ¡does ¡the ¡phone ¡always ¡ring ¡when ¡you’re ¡in ¡the ¡shower? ¡ Or, ¡why ¡do ¡you ¡have ¡an ¡increased ¡risk ¡for ¡breast ¡cancer ¡when ¡you ¡ have ¡a ¡mutaHon ¡in ¡BRCA ¡gene? ¡ ¡

SLIDE 3

Overview ¡

StaHsHcs ¡

– Mean ¡ – Variance ¡ – Covariance ¡ – CorrelaHon ¡

Probability ¡

– Probability ¡mass ¡funcHon ¡for ¡discrete ¡random ¡variables ¡ – Probability ¡density ¡funcHon ¡for ¡conHnuous ¡random ¡variables ¡

SLIDE 4

Mean ¡of ¡Green ¡Pea ¡Height ¡

Mean ¡= ¡(3+5+2+6)/4 ¡= ¡4 ¡inches ¡

3 ¡inches ¡ 5 ¡inches ¡ 2 ¡inches ¡ 6 ¡inches ¡

SLIDE 5

Describing ¡the ¡Center ¡of ¡Data ¡Points ¡

Let ¡y ¡denote ¡a ¡quanHtaHve ¡variable, ¡with ¡observaHons ¡y1 , y2 ,

y3 , … , yn

Then, ¡the ¡mean ¡of ¡these ¡observaHons ¡is ¡given ¡as: ¡

SLIDE 6

Variance ¡

Mean ¡= ¡(3+5+2+6)/4 ¡= ¡4 ¡

inches ¡

Mean ¡= ¡(4.5+3.5+3.9+4.1)/4 ¡

= ¡4 ¡inches ¡

3 ¡inches ¡ 5 ¡inches ¡ 2 ¡inches ¡ 6 ¡inches ¡ 4.5 ¡ inches ¡ 3.5 ¡ inches ¡ 3.9 ¡ inches ¡ 4.1 ¡ inches ¡

SLIDE 7

Variance ¡

Mean ¡= ¡(3+5+2+6)/4 ¡= ¡4 ¡inches ¡
Variance ¡= ¡((-‑1)2+12+(-‑2)2+ ¡22)/(4-‑1) ¡

= ¡3.33 ¡

Mean ¡= ¡(4.5+3.5+3.9+4.1)/4 ¡= ¡4 ¡inches ¡
Variance ¡= ¡(0.52+(-‑0.5)2+(-‑0.1)2+(0.1)2)/

(4-‑1) ¡= ¡(0.25+0.25+0.02)/3= ¡0.173 ¡

3 ¡inches ¡ 5 ¡inches ¡ 2 ¡inches ¡ 6 ¡inches ¡ 4.5 ¡ inches ¡ 3.5 ¡ inches ¡ 3.9 ¡ inches ¡ 4.1 ¡ inches ¡

‑1 ¡inches ¡ 1 ¡inches ¡
‑2 ¡inches ¡

2 ¡inches ¡ 0.5 ¡ inches ¡

‑0.5 ¡

inches ¡

‑0.1 ¡

inches ¡ 0.1 ¡ inches ¡

(height-‑mean) ¡ ¡

SLIDE 8

Describing ¡the ¡Variability ¡of ¡Data ¡Points ¡

How ¡to ¡compute ¡variance ¡(a ¡“typical” ¡distance ¡from ¡the ¡

mean) ¡

– The ¡devia'on ¡of ¡observaHon ¡i ¡ ¡from ¡the ¡mean ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡is ¡ ¡ – The ¡variance ¡of ¡the ¡n ¡observaHons ¡is ¡ – The ¡standard ¡deviaHon ¡s ¡is ¡the ¡square ¡root ¡of ¡variance ¡

y

SLIDE 9

Proper)es ¡of ¡Standard ¡Devia)on ¡

s ≥ 0, and only equals 0 if all observations are equal
s increases with the amount of variation around the mean
Division by n - 1 (not n) is due to technical reasons
s depends on the units of the data (e.g. measure cm vs inch)

SLIDE 10

Correla)on ¡

“GPA” ¡and ¡“TV ¡in ¡hours ¡per ¡week” ¡are ¡negaHvely ¡correlated ¡

How ¡can ¡we ¡quanHfy ¡the ¡level ¡of ¡correlaHon? ¡ Mean ¡ 3.02 ¡ 13.8 ¡

SLIDE 11

Covariance ¡and ¡Correla)on ¡

Degree ¡of ¡associaHon ¡between ¡two ¡variables ¡x ¡and ¡y ¡
Given ¡observaHons ¡x1, ¡…, ¡xn ¡and ¡y1, ¡…, ¡yn ¡

– Covariance ¡ – CorrelaHon: ¡ ¡

Falls ¡between ¡-‑1 ¡and ¡+1, ¡with ¡sign ¡indicaHng ¡direcHon ¡of ¡

associaHon ¡ ¡

(Variance ¡of ¡xi’s) ¡x ¡(n-‑1) ¡ (Variance ¡of ¡yi’s) ¡x ¡(n-‑1) ¡

SLIDE 12

Correla)on ¡ between ¡ X1 ¡and ¡X2

X1 ¡ X2 ¡

SLIDE 13

Correla)on ¡and ¡Causa)on ¡

Correla)on ¡does ¡not ¡imply ¡causa)on! ¡

SLIDE 14

Probability ¡Mass ¡Func)ons ¡(Discrete) ¡

A ¡probability ¡funcHon ¡maps ¡the ¡possible ¡values ¡of ¡x ¡against

¡their ¡respecHve ¡probabiliHes ¡of ¡occurrence, ¡P(x) ¡ ¡

P(x) ¡is ¡a ¡number ¡from ¡0 ¡to ¡1.0. ¡
The ¡area ¡under ¡a ¡probability ¡funcHon ¡is ¡always ¡1. ¡

x ¡ 0 ¡ 1 ¡ P(x) ¡ P(x) ¡>= ¡0 ¡ Example: ¡Coin ¡flip ¡experiment ¡

SLIDE 15

Discrete ¡Example: ¡SNPs ¡at ¡Genome ¡Locus ¡ Chr3:11,112 ¡

x p(x) 1/3 AA TT AT

x p(x) AA p(x=AA) =1/3 AT p(x=AT) =1/2 TT p(x=TT) =1/6 1.0

Probability ¡Mass ¡FuncHon ¡ You ¡genotyped ¡the ¡genome ¡locus ¡at ¡ Chr3:11,112 ¡for ¡600 ¡people. ¡You ¡found ¡ that ¡200 ¡people ¡had ¡genotype ¡AA, ¡300 ¡ people ¡had ¡genotype ¡AT, ¡and ¡ ¡100 ¡ people ¡had ¡genotype ¡TT. ¡ ¡

1/2 1/6

P(x) =

all x

∑

SLIDE 16

Discrete ¡Example: ¡Roll ¡of ¡a ¡Die ¡

x p(x) 1/6 1 4 5 6 2 3

x p(x) 1 p(x=1)=1/6 2 p(x=2)=1/6 3 p(x=3)=1/6 4 p(x=4)=1/6 5 p(x=5)=1/6 6 p(x=6)=1/6 1.0

Probability ¡Mass ¡FuncHon ¡

P(x) =

all x

∑

SLIDE 17

Probability ¡Density ¡Func)on ¡(Con)nuous) ¡

Unlike ¡discrete, ¡density ¡funcHon ¡does ¡not ¡represent ¡

probability ¡but ¡its ¡rate ¡of ¡change ¡called ¡the ¡“likelihood” ¡

f(x) ¡ x ¡ f(x) ¡>= ¡0 ¡ & ¡ ¡Integrates ¡to ¡1.0 ¡

SLIDE 18

The ¡Gaussian ¡Density ¡

Mean ¡ Standard ¡ DeviaHon ¡

The ¡shape ¡of ¡the ¡Gaussian ¡density ¡funcHon ¡is ¡determined ¡by ¡

mean ¡μ ¡and ¡variance ¡σ ¡2 ¡ ¡

SLIDE 19

Different ¡Gaussian ¡Density ¡Func)ons ¡

SLIDE 20

Summary ¡

Mean: ¡describes ¡the ¡center ¡of ¡the ¡data ¡cloud ¡
Variance: ¡describes ¡the ¡variability ¡of ¡the ¡data ¡cloud ¡
Covariance: ¡describes ¡the ¡level ¡of ¡associaHon ¡between ¡two ¡

variables ¡

Probability ¡mass ¡funcHon ¡for ¡discrete ¡random ¡variables ¡

– ProbabiliHes ¡sum ¡to ¡1 ¡

Probability ¡density ¡funcHon ¡for ¡conHnuous ¡random ¡variables ¡
ProbabiliHes ¡integrate ¡to ¡1 ¡

Introduc)on ¡to ¡Sta)s)cs ¡

02-­‑223 ¡How ¡to ¡Analyze ¡Your ¡Own ¡Genome ¡ Fall ¡2013 ¡

Why ¡Use ¡Sta)s)cs? ¡

Anecdotal ¡evidence ¡is ¡unreliable ¡ ¡ Why ¡does ¡the ¡phone ¡always ¡ring ¡when ¡you’re ¡in ¡the ¡shower? ¡ Or, ¡why ¡do ¡you ¡have ¡an ¡increased ¡risk ¡for ¡breast ¡cancer ¡when ¡you ¡ have ¡a ¡mutaHon ¡in ¡BRCA ¡gene? ¡ ¡

Overview ¡

– Mean ¡ – Variance ¡ – Covariance ¡ – CorrelaHon ¡

– Probability ¡mass ¡funcHon ¡for ¡discrete ¡random ¡variables ¡ – Probability ¡density ¡funcHon ¡for ¡conHnuous ¡random ¡variables ¡

Mean ¡of ¡Green ¡Pea ¡Height ¡

Describing ¡the ¡Center ¡of ¡Data ¡Points ¡

y3 , … , yn

Variance ¡

inches ¡

= ¡4 ¡inches ¡

Variance ¡

= ¡3.33 ¡

(4-­‑1) ¡= ¡(0.25+0.25+0.02)/3= ¡0.173 ¡

Describing ¡the ¡Variability ¡of ¡Data ¡Points ¡

mean) ¡

– The ¡devia'on ¡of ¡observaHon ¡i ¡ ¡from ¡the ¡mean ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡is ¡ ¡ – The ¡variance ¡of ¡the ¡n ¡observaHons ¡is ¡ – The ¡standard ¡deviaHon ¡s ¡is ¡the ¡square ¡root ¡of ¡variance ¡

y

Proper)es ¡of ¡Standard ¡Devia)on ¡

Correla)on ¡

Covariance ¡and ¡Correla)on ¡

– Covariance ¡ – CorrelaHon: ¡ ¡

associaHon ¡ ¡

Correla)on ¡ between ¡ X1 ¡and ¡X2

Correla)on ¡and ¡Causa)on ¡

Probability ¡Mass ¡Func)ons ¡(Discrete) ¡

¡their ¡respecHve ¡probabiliHes ¡of ¡occurrence, ¡P(x) ¡ ¡

Discrete ¡Example: ¡SNPs ¡at ¡Genome ¡Locus ¡ Chr3:11,112 ¡

x p(x) 1/3 AA TT AT

x p(x) AA p(x=AA) =1/3 AT p(x=AT) =1/2 TT p(x=TT) =1/6 1.0

1/2 1/6

P(x) =

∑

Discrete ¡Example: ¡Roll ¡of ¡a ¡Die ¡

x p(x) 1/6 1 4 5 6 2 3

x p(x) 1 p(x=1)=1/6 2 p(x=2)=1/6 3 p(x=3)=1/6 4 p(x=4)=1/6 5 p(x=5)=1/6 6 p(x=6)=1/6 1.0

P(x) =

∑

Probability ¡Density ¡Func)on ¡(Con)nuous) ¡

probability ¡but ¡its ¡rate ¡of ¡change ¡called ¡the ¡“likelihood” ¡

The ¡Gaussian ¡Density ¡

mean ¡μ ¡and ¡variance ¡σ ¡2 ¡ ¡

Different ¡Gaussian ¡Density ¡Func)ons ¡

Summary ¡

variables ¡

– ProbabiliHes ¡sum ¡to ¡1 ¡

02-‑223 ¡How ¡to ¡Analyze ¡Your ¡Own ¡Genome ¡ Fall ¡2013 ¡

(4-‑1) ¡= ¡(0.25+0.25+0.02)/3= ¡0.173 ¡