This document is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The slide set referred to in this document is “GWAS 3”.

We saw previously how a strict significance threshold (like 5e-8) is needed in GWAS in order to avoid reporting false positives (i.e., null variants that reach the threshold). On the other hand, a strict threshold makes it relatively difficult to get even the true positives to reach the threshold. (In other words, we have many false negatives.)

Next we will study the properties of the variants and study design that determine how likely we are to catch the true positives: We study the statistical power in GWAS. (See slides 1-7.) A review on the topic by Sham and Purcell 2014.

**Statistical power** of a statistical significance test is the probability that the test will reject the null hypothesis \(H_0\) (in GWAS, \(H_0\) says that \(\beta=0\)) at the given significance threshold when the data follow a specific *alternative hypothesis* \(H_1\). In the GWAS setting, \(H_1\) is specified by fixing the study design (total sample size, case and controls counts) and parameters describing the variant (MAF and effect size).

To compute the P-value, we only needed to consider the null hypothesis \(H_0\). For power analysis, we also need to define explicitly how the true effects look like, i.e., we need a quantitative alternative hypothesis \(H_1\). Of course, not all true effects are the same, and therefore power analysis is often represented as a power curve over a plausible range of parameter values.

By assuming that the sampling distribution of effect size estimate \(\widehat{\beta}\) is Normal (which assumption is based on the Central Limit Theorem and works well for large sample sizes and common variants), we have that \(\widehat{\beta} \sim \mathcal{N}(\beta,\textrm{SE}^2)\), where \(\beta\) is the true effect size and SE the standard error of the estimator. As we saw last week, it follows that under the null, i.e. when \(\beta=0\), the Wald’s test statistic \(z = \widehat{\beta}/\textrm{SE}\) is distributed approximately as \(z\sim \mathcal{N}(0,1)\), which is used to derive a P-value. The other way to get a P-value is from the chi-square distribution as \(z^2 \sim \chi^2_1\). We will often use the chi-square distribution because then we need to consider only the upper tail of the distribution to compute (the two-sided) P-value, and this makes visualization of the distibutions simpler than with the Normal distribution.

What about if \(\beta \neq 0\), that is, if the variant has a non-zero effect? Then \(z\sim \mathcal{N}(\beta/\textrm{SE}, 1)\) and \(z^2 \sim \chi^2_1((\beta/\textrm{SE})^2)\), which is called a chi-square distribution with 1 degree of freedom and non-centrality parameter NCP=\((\beta/\textrm{SE})^2\). When \(\beta=0\) this reduces to the standard *central* chi-square distribution \(\chi^2_1 = \chi^2_1(0)\).

**Example 3.1.** Let’s illustrate these distributions by a simulation of GWAS results under the alternative and under the null. To save time, we don’t do regressions with genotype-phenotype data but simulate the effect estimates directly from their known distributions. First, however, we need to find the standard error, and that we do by fitting one regression model. We will visualize the distributions both for the Wald statistic (having a Normal distribution) and for its square (having a chi-square distribution).

```
n = 500 #individuals
p = 5000 #SNPs for both null and alternative
f = 0.5 #MAF
b.alt = 0.2 #effect size under the alternative hypothesis
x = rbinom(n, 2, f) #genotypes at 1 SNP for n ind
y = scale( rnorm(n) ) #random phenotype normalized to have sample sd=1
se = summary( lm( y ~ x ) )$coeff[2,2] #pick se, and assume it stays constant and independent of beta
b.hat.null = rnorm(p, 0, se) #estimates under null
b.hat.alt = rnorm(p, b.alt, se) #estimates under alternative
par(mfrow=c(1,2))
# Draw observed densities of z-scores
plot(NULL, xlim = c(-3,6), ylim = c(0,0.5), xlab = "z",
ylab = "density", col = "white") #empty panel for plotting
lines(density( (b.hat.null/se) ), col = "black", lwd = 2) #Wald stat for null variants
lines(density( (b.hat.alt/se) ), col = "red", lwd = 2) #Wald stat for alternative variants
# add theoretical densities for z-scores
x.seq = seq(-3, 6, 0.01)
lines(x.seq, dnorm(x.seq, 0, 1), col = "blue", lty = 2) #for null
lines(x.seq, dnorm(x.seq, b.alt/se, 1), col = "orange", lty = 2) #for alternative
# Draw observed densities of z^2
plot(NULL, xlim = c(0,35), ylim = c(0,1), xlab = expression(z^2),
ylab = "density", col = "white") #empty panel for plotting
lines(density( (b.hat.null/se)^2 ), col = "black", lwd = 2) #chi-square stat for null variants
lines(density( (b.hat.alt/se)^2 ), col = "red", lwd = 2) #chi-square stat for alternative variants
# Let's add theoretical densities of the chi-square distributions
x.seq = seq(0, 35, 0.01)
lines(x.seq, dchisq(x.seq, df = 1, ncp = 0), col = "blue", lty = 2) #ncp=0 for null
lines(x.seq, dchisq(x.seq, df = 1, ncp = (b.alt/se)^2), col = "orange", lty = 2) #ncp = (beta/se)^2 for alternative
legend("topright", leg = c("NULL obs'd","ALT obs'd","NULL theor","ALT theor"),
col = c("black","red","blue","orange"),
lty = c(1,1,2,2), lwd = c(2,2,1,1) )
#Let's add significance thresholds corresponding to 0.05 and 5e-8
#By definition, the thresholds are always computed under the null.
q.thresh = qchisq( c(0.05, 5e-8), df = 1, ncp = 0, lower = FALSE)
abline(v = q.thresh, col = c("darkgreen", "springgreen"), lty = 3)
text( q.thresh+2, c(0.4,0.4), c("P<0.05","P<5e-8") )
```

Theoretical distributions match well with the observed ones so we trust that we now understand the relevant parameters also of the theoretical chi-square distribution. We also see that almost all of the alternative distribution is to the right of the significance threhsold of 0.05 but to the left of threshold 5e-8. Thus, we would discover almost all variants with these parameters at level 0.05 but almost none at the genome-wide significance level of 5e-8.

How to compute the exact proportion of the distribution to the right of a given threshold value?

```
q.thresh = qchisq(c(0.05,5e-8), df = 1, ncp = 0, lower = FALSE) #repeating thresholds in chi-square units
pchisq(q.thresh, df = 1, ncp = (b.alt/se)^2, lower = FALSE) #correspond to right tail probabilities
```

`## [1] 0.89821524 0.01321279`

So we have a probability of 90% to detect a variant at significance threshold 0.05, when effect size is 0.2 sd units of a quantitative phenotype, MAF is 50% and sample size is 500. This probability is also called statistical **power** corresponding to the given parameters (significance threshold, \(\beta\), MAF and \(n\)). On the other hand, with these parameters, we only have a power of 1.3% at the genome-wide significance level 5e-8. A frequentist interpretation is that we are likely to discover 90 out of 100 variants having the parameters considered at the significance level 0.05 but only about 1 in 100 at the level 5e-8.

**Question.** What is “power” under the null? This may be a confusing question since we typically use power to describe the ability to detect *non-zero effects*, but, technically, we can interpret this question as asking for the probability of getting a significant result when the null hypothesis holds. And, by definition, this probability is the significance threshold \(\alpha\), and does not depend on \(n\) or MAF. Consequently, the power to detect any non-zero effect can never be less than \(\alpha\) and will get close to \(\alpha\) for tiny effects (\(\beta \approx 0\)) which are almost indistinguishable from 0.

Typically, we would like to design studies to have a good power (say at least 80%) to detect the types of variants that we are interested in. How can we do that?

The parameters affecting power are

Sample size \(n\); increasing sample size increases power because it increases accuracy of effect size estimate.

Effect size \(\beta\); increasing the absolute value of effect size increases power because it increases difference from the null model.

Minor allele frequency \(f\); increasing MAF increases power because it increases accuracy of effect size estimate.

Significance threshold \(\alpha\); increasing the threshold increases power because larger significance thresholds are easier to achieve.

In case-control studies, the proportion of cases \(\phi\); moving \(\phi\) closer to 0.5 increases power because it increases accuracy of effect size estimate.

We’ll soon discuss intuition why each of these parameters affects power. But let’s first write down how these parameters and power are tied together.

For a given significance level \(\alpha\), power is determined by the non-centrality parameter NCP\(=(\beta/\textrm{SE})^2\) of the chi-square distribution. The mean of the distribution is 1+NCP; hence the larger the NCP, the larger the power, because the distribution moves to right with increasing NCP. We see that the NCP increases with \(\beta^2\) and this explains why increasing \(|\beta|\) increases power. We also see that the NCP increases as SE decreases and therefore we need to know how SE depends on \(n\), \(f\) and \(\phi\).

For the linear model \[y = \mu + x\beta + \varepsilon,\] SE of \(\widehat{\beta}\) is \[\textrm{SE}_\textrm{lin} = \frac{\sigma}{\sqrt{\textrm{Var}(x) n}} \approx \frac{\sigma}{\sqrt{2 f (1-f) n}},\] where the variance of genotype \(x\) is, under Hardy-Weinberg equilibrium, approximately \(2f(1-f)\), and \(\sigma\) is the standard deviation of the error term \(\varepsilon\): \(\sigma^2 = \textrm{Var}(y) - \beta^2 \textrm{Var}(x).\) This form for SE is a direct consequence of the variance estimate of the mean-centered linear model: \(\textrm{Var}(\widehat{\beta})=\sigma^2/ \sum_{i=1}^n x_i^2 .\)

In a typical QT-GWAS, the effects of variants on the total phenotypic variance are small (< 1%) and then we can assume that the error variance \(\sigma^2 \approx \textrm{Var}(y),\) which is 1 if the phenotype is processed by quantile normalisation or scaling of the residuals after regressing out the covariate effects.

For *binary* case-control data analyzed by logistic regression, \[\textrm{SE}_\textrm{bin} \approx \frac{1}{\sqrt{\textrm{Var}(x) n \phi (1-\phi)}} \approx \frac{1}{\sqrt{2 f (1-f) n \phi (1-\phi)}}.\] Thus, the difference to the linear model formula is that \(\sigma\) is replaced by 1 and \(n\) is replaced by the **effective sample size** \(n \phi (1-\phi).\) For derivation, see Appendix A of Vukcevic et al. 2012.

Smaller SE means higher precision of the effect size estimate. Both of the formulas show how SE decreases with increasing sample size \(n\), with increasing MAF \(f\) and, for binary data, SE decreases as \(\phi\) gets closer to 0.5. These formulas work well for typical GWAS settings, but may not hold when some parameter (\(n\) or \(f\) or \(\phi(1-\phi)\)) gets close to zero. To know exactly when the formulas start to break down, it is best to do simulations. In particular, the formula may not be good for the rare variant testing (\(f<0.001\)).

Now we can write down the NCPs of additive GWAS models as \[\textrm{NCP}_\textrm{lin} = (\beta/\textrm{SE}_\textrm{lin})^2 \approx 2 f (1-f) n \beta^2/\sigma^2 \qquad \textrm{ and } \qquad \textrm{NCP}_\textrm{bin} = (\beta/\textrm{SE}_\textrm{bin})^2 \approx 2 f (1-f) n \phi(1-\phi) \beta^2.\]

Let’s next discuss how and why each parameter affects the NCPs, and hence power, the way it does.

Out of the parameters affecting power, the sample size is most clearly under the control of study design. Therefore, it is the primary parameter by which we can design studies of sufficient power.

Increasing \(n\) decreases SE in the regression models in proportion to \(1/\sqrt{n}\). In the GWAS context, we can think that larger \(n\) leads to more accurate estimate of the phenotypic means in each of the genotype classes. Therefore, as \(n\) grows, we also have more accurate estimate of the phenotypic difference between the genotype classes, which means that we are better able to distinguish a true phenotypic difference between genotype groups from zero. In other words, we have a larger power to detect a genotype’s effect on the phenotype.

**Example 3.2.** Above we saw that with \(n=500\) (and MAF = 0.5, \(\beta=0.2\)) we had only 1% power at significance level \(\alpha=\) 5e-8. Let’s determine how large \(n\) should be to achieve 90% power.

```
f = 0.5
b.alt = 0.2
sigma = sqrt(1 - 2*f*(1-f)*b.alt^2) #error sd after SNP effect is accounted for (see next part for explanation)
ns = seq(500, 4000, 10) #candidate values for n
ses = sigma/sqrt(ns*2*f*(1-f)) #SEs corresponding to each candidate n
q.thresh = qchisq(5e-8, df = 1, ncp = 0, lower = F) #chi-sqr threshold corresp alpha=5e-8
pwr = pchisq(q.thresh, df = 1, ncp=(b.alt/ses)^2, lower=F) #power at alpha=5e-8 for VECTOR of SE values
plot(ns, pwr, col = "darkgreen", xlab = "n", ylab = "power",
main = paste0("QT sd=1; MAF=",f,"; beta=",b.alt), t = "l", lwd = 1.5)
abline(h = 0.9, lty = 2)
```

```
#Let's output the first n that gives power >= 90%
ns[min(which(pwr >= 0.9))]
```

`## [1] 2230`

So, we need \(n=2230\) in order to have power of 90%.

When effect size is \(\beta\) and MAF is \(f\), then variance explained by the additive effect on the genotype is \(\textrm{Var}(x\beta) = \textrm{Var}(x) \beta^2 \approx 2f(1-f)\beta^2\). When total phenotypic variance of a quantitative trait is 1, then \(2f(1-f)\beta^2\) is also the proportion of the variance explained by the variant. For example, in our ongoing example setting, the variance explained by the variant is

`2*f*(1-f)*b.alt^2.`

`## [1] 0.02`

That is, the variant explains 2% of the variation of the phenotype. This is a very large variance explained compared to a typical common variant association with complex traits, such as BMI or height, but is more realistic for some molecular traits, such as metabolite levels, that are less complex genetically and may have larger effects from individual variants.

**Example 3.3.** What if we wanted to find a suitable \(n\) that gives 90% power for MAF=50% when the variant explained only 0.5% of the phenotype?

```
f = 0.5
y.explained = 0.005
b.alt = sqrt(y.explained / (2*f*(1-f)) ) #this is beta that explains 0.5%
sigma = sqrt(1 - y.explained) #error sd after SNP effect is accounted for
ns = seq(1000,12000,10) #candidate n
ses = sigma / sqrt( ns*2*f*(1-f) ) #SE corresponding to each n
q.thresh = qchisq(5e-8, df = 1, ncp = 0, lower = F) #threshold corresp alpha=5e-8
pwr = pchisq(q.thresh, df = 1, ncp = (b.alt/ses)^2, lower = F) #power at alpha=5e-8
plot(ns,pwr, col = "darkgreen", xlab = "n", ylab = "power",
main = paste0("QT sd=1; MAF=",f,"; beta=",b.alt), t = "l", lwd = 1.5)
abline( h = 0.9, lty = 2 )
```