This vignette is intended to demonstrate non-simulation based
features of the bayesassurance
package. The primary focus
of this tutorial involves discussing the underlying closed-form
solutions embedded in the pwr_freq()
and
assurance_nd_na()
functions followed by examples on how
these functions are implemented in R. This document will also cover
add-on tools and features that many of the simulation-based functions
are contingent upon. In particular, the matrix generating functions,
gen_Xn()
and gen_Xn_longitudinal()
, will be
relevant in the next set of vignettes.
pwr_freq()
FunctionThe pwr_freq()
function takes in a set of fixed inputs
pertaining to a one-sample hypothesis test and returns the corresponding
statistical power of the test.
To elaborate, consider the following one-sided hypothesis test for population mean θ: $$ H_0: \theta = \theta_0 \\ H_a: \theta = \theta_1 > \theta_0, $$ where θ0 is the reference value that we will test against. Assuming the known population variance is denoted as σ2 and the sample mean’s distribution is approximately Gaussian, H0 is rejected if $\bar{y} > \theta_0 + \frac{\sigma}{\sqrt{n}}Z_{1-\alpha}$, where ȳ is the sample mean, α is the specified Type I error, Z1 − α is the corresponding quantile of the Gaussian distribution, and n is the sample size. We can use this rejection criteria to derive the statistical power of the test, defined by 1 − β = P(reject H0|Ha is true). Straightforward standardization procedure leads to where Δ = θ1 − θ0 denotes the critical difference and Φ denotes the cumulative distribution function of the standard normal. Similar steps can be taken to determine the power expressions for alternative comparisons of Ha. When Ha : θ1 < θ0, the power is determined as $$ 1-\beta = P\left(\bar{y} < \theta_0 - \frac{\sigma}{\sqrt{n}}Z_{1-\alpha} \Bigg| \theta = \theta_1 \right) = 1 - \Phi\Big(\sqrt{n}\frac{\Delta}{\sigma} + Z_{1-\alpha}\Big). $$ Finally in the two-sided case where Ha : θ1 ≠ θ0, the power is determined as
Load the bayesassurance package.
Specify the following inputs:
n
: Sample size (either vector or scalar)theta_0
: Initial value the parameter is set equal to in
the null hypothesistheta_1
: Alternative value to be compared to
theta_0.sigsq
: Known variance σ2alt
: Specifies comparison between theta_1
and theta_0
in the alternative hypothesis, where
alt = "greater"
tests if θ1 > θ0,
alt = "less"
tests if θ1 < θ0,
and alt = "two.sided"
performs a two-sided test for θ1 ≠ θ0.
alt
is set to "greater"
by default.alpha
: Significance levelAs an example, we assign the following set of arbitrary inputs to
pass into pwr_freq()
and save the output as
pwr_vals
.
n <- seq(10, 140, 5)
pwr_vals <- bayesassurance::pwr_freq(n = n, theta_0 = 0.15, theta_1 = 0.35, sigsq = 0.3,
alt = "greater", alpha = 0.05)
The saved output pwr_vals
contains two objects:
pwr_table
: table of sample sizes and corresponding
power values.pwr_plot
: power curve that is only returned if
n
is a vector.To view the power curve, simply type pwr_vals$pwr_plot
in the R console. To ensure a smooth continuous curve, the function
determines additional power values for a wider range of sample sizes
that surround the inputted values specified for n
. The
resulting power curve thus includes values above, below, and between the
inputted values specified for n
, with specific values of
interest marked in red.
The first six entries of the power table can be shown by calling
pwr_vals$pwr_table
.
n | Power |
---|---|
10 | 0.3120128 |
15 | 0.4087972 |
20 | 0.4952685 |
25 | 0.5717723 |
30 | 0.6387600 |
35 | 0.6968609 |
The power plot is produced using ggplot2
, displaying the
inputted sample sizes on the x-axis and the resulting power values on
the y-axis. The points highlighted in red denote values specified by the
user. In this example, we see red points marked along the values of
n=10
through n=140
in increments of 5.
If a scalar value is inputted into n
, a single power
value is returned.
assurance_nd_na()
FunctionThe assurance_nd_na()
function determines the assurance
of a specified outcome based on a closed-form solution that is derived
under the Bayesian setting.
Suppose we seek to evaluate the tenability of θ > θ0 given data from a Gaussian population with mean θ and known variance σ2. We assign two sets of priors for θ, one at the design stage and the other at the analysis stage. The analysis objective specifies the condition that needs to be satisfied, which in this case, involves observing that P(θ > θ0|ȳ) > 1 − α. The design objective seeks a sample size that is needed to ensure that the analysis objective is met 100δ% of the time, where δ denotes the assurance.
Let $\theta \sim N\big(\theta_1, \frac{\sigma^2}{n_a}\big)$ be our analysis stage prior and $\theta \sim N\big(\theta_1, \frac{\sigma^2}{n_d}\big)$ be our design stage prior, where na and nd are precision parameters that respectively quantify the degree of belief carried towards parameter θ and the degree of belief carried towards the population from which we are drawing samples from to evaluate θ. Then, given the likelihood $\bar{y} \sim N\big(\theta, \frac{\sigma^2}{n}\big)$, we can obtain the posterior distribution of θ by multiplying the analysis prior and likelihood:
This posterior distribution gives us P(θ > θ0|ȳ) and the assurance is then defined as The assurance expression can be expanded out further by using the marginal distribution of ȳ, which is obtained by $$ \int{N\left(\theta {\left|\theta_1, \frac{\sigma^2}{n_d}\right.}\right) \times N\left(\bar{y} {\left | \theta, \frac{\sigma^2}{n} \right.}\right) d\theta} = N\left(\bar{y} {\left |\theta_1, \Big(\frac{1}{n} + \frac{1}{n_d}\Big) \sigma^2 \right.}\right) . $$ Since the assurance definition is conditioned on ȳ, we use this to standardize the assurance expression to obtain the following closed-form solution: $$ \delta(\Delta, n) = \Phi\Bigg(\sqrt{\frac{nn_d}{n+n_d}} \Bigg[\frac{n+n_a}{n}\frac{\Delta}{\sigma} + Z_{\alpha}\frac{\sqrt{n+n_a}}{n}\Bigg]\Bigg), $$ where Δ = θ1 − θ0. Similar steps can be taken to derive the closed-form expressions of assurance for θ < θ0 and θ ≠ θ0. If we wish to seek the tenability of θ < θ0, the assurance is given by $$ \delta(\Delta, n) = 1-\Phi\Bigg(\sqrt{\frac{nn_d}{n+n_d}} \Bigg[\frac{n+n_a}{n}\frac{\Delta}{\sigma} + Z_{1-\alpha}\frac{\sqrt{n+n_a}}{n}\Bigg]\Bigg). $$ Finally, for evaluating the tenability of θ ≠ θ0, the assurance is given by $$ \delta(\Delta, n) = 1-\Phi\Bigg(\sqrt{\frac{nn_d}{n+n_d}} \Bigg[\frac{n+n_a}{n}\frac{\Delta}{\sigma} + Z_{1-\alpha/2}\frac{\sqrt{n+n_a}}{n}\Bigg]\Bigg) + \Bigg[\frac{n+n_a}{n}\frac{\Delta}{\sigma} + Z_{\alpha/2}\frac{\sqrt{n+n_a}}{n}\Bigg]\Bigg). $$
After loading in the bayesassurance
package, specify the
following inputs:
n
: sample size (scalar or vector)n_a
: sample size at analysis stage that quantifies the
amount of prior information we have for parameter θn_d
: sample size at design stage that quantifies the
amount of prior information we have for where the data is being
generated from.theta_0
: parameter value that is known a priori
(typically provided by the client)theta_1
: alternative parameter value that will be
tested in comparison to theta_0. See alt for specification options.alt
: specifies alternative test case, where
alt = "greater"
evaluates the tenability of θ > θ0,alt = "less"
evaluates the tenability of θ < θ0, and
alt = "two.sided"
evaluates the tenability of θ ≠ θ0.
alt
is set to "greater"
by default.sigsq
: known variance σ2alpha
: significance levelSuppose we assign the following fixed parameters to determine the Bayesian assurance for the given vector of sample sizes.
n <- seq(10, 500, 5)
n_a <- 1e-8
n_d <- 1e+8
theta_0 <- 0.15
theta_1 <- 0.25
sigsq <- 0.104
assur_vals <- assurance_nd_na(n = n, n_a = n_a, n_d = n_d, theta_0 = theta_0,
theta_1 = theta_1, sigsq = sigsq, alt = "greater",
alpha = 0.05)
Just as with the pwr_freq()
function, the
assurance_nd_na()
function returns a list of two outputs
provided that the inputted sample size n is a vector of values.
assurance_table
: table of sample sizes and
corresponding power values.assurance_plot
: assurance curve that is only returned
if n is a vector.The first six entries of the resulting assurance table is shown by
calling assur_vals$assurance_table
.
n | Assurance |
---|---|
10 | 0.2532578 |
15 | 0.3285602 |
20 | 0.3981637 |
25 | 0.4623880 |
30 | 0.5213579 |
35 | 0.5752063 |
The assurance plot is produced using the ggplot2
package. It displays the inputted sample sizes on the x-axis and the
resulting assurance values on the y-axis. Similar to
pwr_freq()
, the assurance_nd_na()
function
determines additional assurance values for a wider range of sample sizes
that surround the inputted values specified for n
. The
resulting assurance curve thus includes values above, below, and between
the inputted values specified for n
, with specific values
of interest marked in red.
pwr_curves()
FunctionThe pwr_curves()
function constructs a single plot based
on the combined set of inputs used to compute the frequentist power and
Bayesian assurance. The plot includes three curves that include the
power curve, the assurance curve obtained under a closed-form solution,
and optionally, the assurance curve obtained under a simulation setting.
The optional simulated assurance points are obtained using
bayes_sim
, which is discussed in a later vignette.
Load in the bayesassurance
package and specify the
following parameters:
n
: sample size (either scalar or vector)n_a
: Precision parameter that quantifies the degree of
belief carried towards parameter θn_d
: Precision parameter that quantifies the degree of
belief carried towards the population from which we are drawing samples
from to evaluate θ.theta_0
: parameter value that is known a priori
(typically provided by the client)theta_1
: alternative parameter value that will be
tested in comparison to #’ theta_0. See alt for specification
options.sigsq
: known variance σ2alt
: specifies alternative test case, where
alt = "greater"
tests if θ > θ0,
alt = "less"
tests if θ < θ0, and
alt = "two.sided"
performs a two-sided test. By default,
alt = "greater"
.alpha
: significance levelbayes_sim
: when set to TRUE
, this
indicates that the user wants to include simulated assurance results in
the outputted plot. Default setting is FALSE
.mc_iter
: If bayes_sim = TRUE
, specifies
the number of MC samples to evaluate. Default set to 5000 when not
specified.Example 1
The following code chunk has bayes_sim
set to FALSE,
outputting just the power and assurance curves obtained from
pwr_freq()
and assurance_nd_na()
.
n <- seq(100, 300, 10)
n_a <- 1e-8
n_d <- 1e+8
theta_0 <- 0.15
theta_1 <- 0.25
sigsq <- 0.104
alpha <- 0.05
out1 <- bayesassurance::pwr_curve(n, n_a, n_d, theta_0, theta_1, sigsq,
alt = "greater", alpha, bayes_sim = FALSE)
Three objects are returned as a list in out1
. They
are
power_table
: table of sample sizes and corresponding
power values.assurance_table
: table of sample sizes and
corresponding assurance valuesplot
: figure displaying power curve and assurance
valuesSimilar to previous examples, the first six entries of the resulting
power table is shown by calling out1$power_table
.
n | Power |
---|---|
100 | 0.9273057 |
110 | 0.9460128 |
120 | 0.9601112 |
130 | 0.9706665 |
140 | 0.9785223 |
150 | 0.9843375 |
Likewise, the first six entries of the resulting assurance table is
shown by calling out1$assurance_table
, whose results are
equal to those of the power table.
n | Assurance |
---|---|
100 | 0.9273056 |
110 | 0.9460127 |
120 | 0.9601111 |
130 | 0.9706664 |
140 | 0.9785222 |
150 | 0.9843374 |
The resulting plot is displayed by calling
out1$plot
.
Example 2
The next code chunk has bayes_sim
set to
TRUE
, outputting the assurance values obtained by sampling
from the posterior distribution using bayes_sim()
in
addition to the power and assurance curves obtained from
pwr_freq()
and assurance_nd_na()
n <- seq(100, 300, 10)
n_a <- 1e-8
n_d <- 1e+8
theta_0 <- 0.15
theta_1 <- 0.25
sigsq <- 0.104
alpha <- 0.05
set.seed(10)
out2 <- bayesassurance::pwr_curve(n, n_a, n_d, theta_0, theta_1, sigsq, alt = "greater",
alpha, bayes_sim = TRUE)
Just as we have done in the previous example, the first six entries
of the resulting assurance table from the bayes_sim()
function is displayed by calling out2$bayes_sim_table
.
n | Assurance |
---|---|
100 | 0.9292 |
110 | 0.9492 |
120 | 0.9562 |
130 | 0.9724 |
140 | 0.9782 |
150 | 0.9846 |
The resulting plot is displayed by calling
out2$plot
.