Hypothesis Testing

Overview
In comparative studies, often the best way to analyse and report the results is to use confidence intervals. However, statistical hypothesis tests are still widely used in scientific work and thus, we need to consider how these tests work. Most statistical analyses involve comparisons (usually between treatments and procedures or between groups of subjects).

Types of Hypotheses
The medical hypothesis is the basis of the statistical hypotheses (i.e. null hypothesis H0 and alternative hypothesis H1).

Medical Hypothesis
The medical hypothesis is the starting point. It is statement that presents an idea of association between variables. For example, progressive polyarthritis (PAP) is associated with the HLA-DR4 antigen.

Null Hypothesis H0
A null hypothesis states that there is no association between the variables of interest. For example, there is no association of PAP with HLA-DR4.

Alternative Hypothesis H1
An alternative hypothesis states that there is a real association between the variables of interest. For example, PAP is associated with HLA- DR4.

Fourfold & Contingency Tables
These tables are used to calculate observed frequencies, expected frequencies, chi-squared statistic $$\chi^2$$ and degree of freedom (df).

A chi-squared test is used to determine whether there is an association between two variables, which may be:
 * qualitative
 * discrete quantitative
 * continuous quantitative (where values have been grouped)

Data from two such variables may be arranged in a contingency table. The categories for one variable define the rows, and the categories for the other variable define the columns. Individuals are assigned to the appropriate cell. When the table only has two rows or two columns, the table is called a four-fold table.

Observed and Expected Frequencies
Observed frequencies are simply the observed number of subjects that apply to each cell. The expected frequencies for each cell are the numbers that we would expect to find if the null hypothesis was true (i.e. no association). To calculate he expected frequency for each cell, we use the observed frequencies. For a given cell, the sum of the cell's row multiplied by the sum of the cell's column, divided by the total number of subjects  n. See the example below:

Observed frequencies in the sample of 308 patients divided according to presence of PAP and HLA-DR4

Expected frequencies in the sample of 308 patients divided according to presence of PAP and HLA-DR4

You can see that for the first cell, the expected value was calculated as follows:
 * $$(46+50)\times(46+28)\div308=23$$