ShinyStats

ShinyStats

Data input
One sample test
Two sample test

Interactive web application for performing one- and two-sample statistical tests developed by the Bioinformatics Core at the Cancer Research UK Cambridge Institute for the Introduction to Statistics training course.

Upload CSV or TSV file

Browse...

Select a sample data set

Upload a tabular data file or select one of the sample datasets from the drop-down list and click on either the 'One sample test' or 'Two sample test' tabs above to select variables of interest, explore and visualize the selected data, and carry out statistical tests.

Variable

Hypothesized mean

Transformation

None Natural log Square root Cube root

Summary statistics
Plots
Assumptions
Statistical tests

Show hypothesized mean

Box plot

Show points on box plot

Outlier points for those observations that are further than 1.5 IQR from the edges of the box, i.e. the first and third quantiles, are always displayed.

Overlay density on box plot

Histogram

Overlay normal distribution

The normal distribution shown is based on the computed mean and standard deviation of the data.

Choose number of bins

Number of bins

The actual number of bins may differ for aesthetic reasons.

This page provides graphical and statistical tools that can help with assessing the assumptions of a parametric test, e.g. t-test.

The following assumptions are made in a parametric, one sample t-test:

the data are independent - values are not related to one another
the data are on a continuous scale
the data are a random sample from a population that is normally distributed

Preliminary statistical tests of assumptions such as normality or equal variance between groups are controversial and often criticised within the statistics community. Applying a transformation, e.g. log, square root or cube root, can help to make skewed data conform more closely to normality.

Q-Q plot
Shapiro-Wilk test

The Q-Q (quantile-quantile) plot compares the data with a normal distribution by plotting their quantiles against each other.

The theoretical quantiles are for a standard normal distribution with mean 0 and standard deviation 1. The points will lie approximately along the diagonal line (fitted through the points for the first and third quartiles) if the data are normally distributed. Significant deviations from the line may suggest the use of a non-parametric test.

Shapiro-Wilk test of normality

The Shapiro-Wilk test tests the null hypothesis that the data come from a normally distributed population.

The null hypothesis can be rejected if the p-value is less than 0.05, suggesting that the data come from a population that are not normally distributed. If the null hypothesis can't be rejected, this means there is insufficient evidence that the data are not normal. This is not the same as accepting that the data come from a normal distribution, i.e. it does not prove that the null hypothesis is true. Caution is advised when using a preliminary test for normality to decide whether a parametric or non-parametric test should subsequently be used, particularly when the sample size is small. It is often better to make your own assessment by looking at box plots, density plots, histograms and Q-Q plots.

Test

Parametric

Non-parametric

A parametric, one sample t-test assumes that the data values are independent, continuous and a random sample from a population that is normally distributed.

Alternative hypothesis

Two-sided

Greater

Less

The true mean is not equal to the specified value.

The true mean is greater than the specified value.

The true mean is less than the specified value.

One sample t-test

Wilcoxon signed rank test

Tests whether the data come from a symmetric population centred around a specified median value.

Paired observations

Variable 1

Variable 2 (reference)

Transformation

None Natural log Square root Cube root

Categorical variable

Group 1

Group 2 (reference)

Variable

Transformation

None Natural log Square root Cube root

Summary statistics
Plots
Assumptions
Statistical tests

Box plot

Show points on box plots

Outlier points for those observations that are further than 1.5 IQR from the edges of the box, i.e. the first and third quantiles, are always displayed.

Overlay density on box plots

Histogram

Overlay normal distribution

The normal distribution shown is based on the computed mean and standard deviation of each group.

Choose number of bins

Number of bins

The actual number of bins may differ for aesthetic reasons.

This page provides graphical and statistical tools that can help with assessing the assumptions of a parametric test, e.g. t-test.

The following assumptions are made in a parametric, two sample t-test:

the data are independent - observations in one sample (or group) are independent of observations in the other sample
the data are on a continuous scale
the data are a random sample from a population for each group that is normally distributed

The following assumptions are made in a parametric, paired t-test of the differences between paired measurements:

the data are independent - measurements for one observation do not affect measurements for any other subject
each of the paired measurements must be obtained from the same subject
the measured differences are normally distributed

Q-Q plot
Shapiro-Wilk test
F-test

The Q-Q (quantile-quantile) plot compares the data with a normal distribution by plotting their quantiles against each other.

Shapiro-Wilk test of normality

The Shapiro-Wilk test tests the null hypothesis that the data come from a normally distributed population. The test is run for each of the two groups separately.

F-test to compare two variances

The F-test of equality of variances is a test for the null hypothesis that two normal populations have the same variance. The Welch t-test, an adaptation of Student's t-test, may be more reliable when the samples have unequal variances or sample sizes.

The null hypothesis can be rejected if the p-value is less than 0.05, suggesting that the data come from populations with different variance. Treat the result of this test with caution; it is often better assess differences in variance between the two groups by inspecting the box and density plots.

Q-Q plot
Shapiro-Wilk test

The Q-Q (quantile-quantile) plot compares the data with a normal distribution by plotting their quantiles against each other. In the paired two sample case, quantiles for the differences between pairs of measurements are used.

Shapiro-Wilk test of normality

The Shapiro-Wilk test tests the null hypothesis that the data come from a normally distributed population. In the paired two sample case, the test is run on the differences between pairs of measurements.

Test

Parametric

Non-parametric

A parametric, two sample t-test assumes that the observations of one sample are independent of the observations of the other sample and that the observations are random samples from a normally-distributed population for each group.

A parametric, paired t-test assumes that there are paired measurements for each subject, that these observations are independent of one another, and that the measured differences are normally distributed.

Equal variance in each sample

The Welch t-test, an adaptation of Student's t-test, may be more reliable when the samples have unequal variances or sample sizes.

Alternative hypothesis

Two-sided

Greater

Less

The means for the two groups are different.

The mean for group 1 is greater than the mean for group 2.

The mean for group 1 is less than the mean for group 2.

The mean difference between variable 1 and variable 2 is not equal to 0.

The mean difference between variable 1 and variable 2 is greater than 0.

The mean difference between variable 1 and variable 2 is less than 0.

Two sample t-test

Welch two sample t-test

Wilcoxon rank sum test

Also known as the Mann-Whitney U test.

Paired t-test

Wilcoxon signed rank test

Cancer Research UK Cambridge Institute's cookie policy can be found here.