Hardy-Weinberg equilibrium

Hardy–Weinberg equilibrium (also Hardy–Weinberg law) is a theoretical equilibrium distribution of alleles in a population derived by Godfrey Harold Hardy, a British mathematician and friend of geneticist Reginald Punett, and independently by German physician Wilhelm Weinberg in 1908.

Equilibrium for two alleles
It describes the frequency of genotypes in an idealized population. The model was formulated under several assumptions:

Let us now assume that there are only two alleles of the observed gene in the population, labeled A (dominant) and a (recessive). The frequency (relative frequency) of allele A is denoted by p, the frequency of allele a is denoted by q. Since we assume that there are only these two alleles in the population, it must hold:
 * The population is large enough that the simplifying assumption of an infinitely large population can be assumed when building the model. In practice, it is sufficient for the population to be so large that gene drift can be neglected.
 * There is no selection in the population.
 * Mutations do not occur in the population.
 * There is neither emigration nor immigration.
 * The area occupied by a population is such that any individual can interbreed with any other individual.
 * Individuals are representatives of both sexes.


 * $$p+q=1$$

If a new offspring is to arise from the crossing of parents, it will take over one of its alleles from each randomly selected parent. The probability that a randomly selected parent will produce a gamete with a given allele corresponds to its population frequency. In this way, one can even ignore the existence of parents and move to the gametic urn model. For the final population, a gametic urn would be created for the two monitored gametes in such a way that each individual would contribute its alleles, i.e. a homozygote would insert two alleles A or a, heterozygote for one of the alleles A and a. A new individual is then created by drawing two alleles from this urn. The assumption of an infinitely large population ensures that removing one allele from a gametic urn does not change the frequency of alleles in the urn. It will still be true that allele A is drawn from the gametic urn with probability p and allele a with probability q.

Now suppose that a new individual is created by drawing from the gametic urn. The question is the frequency with which individual combinations of alleles will arise. The answer is easy, because from the assumption of the infinity of the basic population, the generation of two alleles (or alleles "from the mother" and "from the father") is an independent random phenomenon. Thus:


 * The probability that an individual with genotype AA will arise means that allele A was drawn in both cases, or more precisely, that the "mother's" allele was A and at the same time the "father's" allele was A. The probability of drawing the allele A is its frequency p, that is:


 * $$P(AA) = p\cdot p = p^2$$


 * The probability that an individual with the genotype aa will arise can be derived by a completely analogous reasoning:


 * $$P(aa) = q\cdot q = q^2$$


 * The probability of producing an individual with genotype Aa can be derived in several ways. For checking, we indicate both:

The purely formal method is based on the assumption that no other phenotype exists in the population, i.e. that the sum of the probabilities of the occurrence of individual phenotypes is equal to one:


 * $$P(AA) + P(aa) + P(Aa) = 1$$

Substituting, we find that:


 * $$P(Aa) = 1 - P(AA) - P(aa) = 1 - p^2 - q^2$$

Next, we will use the fact that q=1-p. Gradually we get:


 * $$P(Aa) = 1 - p^2 - (1-p)^2 = -2p^2 + 2p = 2p(1-p)$$

Now, for a change, we realize that q=1-p and get the result:


 * $$P(Aa) = 2pq$$

The same result can also be arrived at by probabilistic reasoning, if we break down the assignment as: The probability that an individual receives the A allele from the "mother" and the "a" allele from the father, or that he receives the a allele from the "mother" and the "A" allele from the father. Thus:


 * $$P(Aa) = pq + qp = 2pq$$

The equation describing the Hardy-Weinberg equilibrium is then nothing more than an expression of the fact that other combinations do not occur in the population:


 * $$p^2 + 2pq + q^2 = 1$$

Complete dominance of the A allele
In the event that allele A is completely dominant over allele a, we have the frequency of two phenotypes at our disposal when monitoring the population. The frequency of the dominant phenotype is equal to the sum of the frequencies of dominant homozygotes and heterozygotes:


 * $$P(\mbox{fenotyp A}) = P(AA) + P(Aa) = p^2 + 2pq$$

For the frequency of recessive homozygotes:


 * $$P(\mbox{fenotyp a}) = P(aa) = q^2$$

Thus, for the frequency of allele a:


 * $$q = \sqrt{P(\mbox{fenotyp a})\;}$$

V konečné ale dostatečně velké populaci se pravděpodobnosti nahradí frekvencemi (relativními četnostmi) jednotlivých fenotypů. Tedy pro populaci o velikosti N, kde N(A) jedinců má dominantní fenotyp a N(a) jedinců má recesivní fenotyp budou odhady frekvence alel:

In a finite but sufficiently large population, probabilities are replaced by frequencies (relative frequencies) of individual phenotypes. Thus, for a population of size N, where N(A) individuals have a dominant phenotype and N(a) individuals have a recessive phenotype, the allele frequency estimates will be:


 * $$q = \sqrt{\frac{N(a)}{N}\;}$$

and quite as expected:


 * $$p = 1 - q$$

Incomplete dominance of the A allele
If the A allele is incompletely dominant, or if it is codominant with respect to the a allele, a heterozygous phenotype can also be distinguished. The frequency of the dominant allele can then be calculated directly from the frequency of the dominant and mixed phenotype. Basic equation:


 * $$p^2 + 2pq + q^2 = 1$$

It can be adjusted by substituting q=1-p for the frequency of recessive homozygotes. After a few easy adjustments, we get the shape:


 * $$2p = 2 p^2 + 2pq$$

From this form, we can go to the absolute frequencies of the N(AA) and N(Aa) phenotypes:


 * $$p = \frac{2N(AA) + N(Aa)}{N}$$

It should be noted, however, that a much faster way is also in this case through the calculation only from the frequency of the homozygous phenotype:


 * $$p = \sqrt{\frac{N(AA)}{N}}$$

Equilibrium for multiple alleles
The multi-allele procedure is, as long as Hardy-Weinberg equilibrium conditions are met, a completely straightforward generalization. It is very easy to see that the gametic urn model can be used in this case as well. The gametic urn will generate n possible gametes Ai, each with frequency pi (i=1,2,..n). There will be n homozygotes in the population for which, based on exactly the same considerations as above, the following will apply:


 * $$P(A_iA_i) = p_i^2$$

Furthermore, there will be n(n-1)/2 heterozygotes in the population. This number is arrived at by considering that any individual can receive n possible alleles from the "mother" to be heterozygous, he can receive only n-1 alleles from the "father". Well, since both chromosomes are indistinguishable, the genotype "A1A3" is identical to the phenotype "A3A1". Completely analogous to the considerations for only two alleles, it can be deduced that the following applies to the frequency of heterozygotes:


 * $$P(A_iA_j) = 2 p_i p_j$$

The Hardy-Weinberg equation can then be written in several forms. The parsimonious form, which is more suitable for solving simulations, is the following:


 * $$ \sum_{i=1}^{n}\sum_{j=1}^{n} p_i p_j= 1$$

Sometimes it is convenient to divide the equation into a part corresponding to homozygotes and a part corresponding to heterozygotes:


 * $$ \underbrace{\sum_{i=1}^{n} p_ip_i}_{\mbox{homozygoti}} + \underbrace{\sum_{i=1}^{n}\sum_{j=i+1}^{n} 2p_ip_j}_{\mbox{heterozygoti}} = 1$$

Application to real populations
In order for the model to be applied to real populations, the following conditions must be met:


 * Although the model was built on the assumption of bisexuality of individuals, this assumption is not necessary. The existence of separate sexes would complicate the gametic urn model. However, it is not difficult to show that if the gene under study is not influenced by the gender of the carrier, the same frequency of alleles will be maintained in the subpopulations of both sexes.
 * The population must meet the condition of panmixity, i.e. the free combinability of genes. Panmixia can be disturbed by, for example, geographical conditions, limited migration of the species and significantly uneven initial distribution. For example, a cat population on one island will be panmictic, but a cat population on several nearby islands may no longer be panmictic. Another way in which panmixity is disrupted is that the presence of a certain allele can affect the choice of a mate who has or does not have a similar allele. Thus, for example, the tendency of people to choose partners with similar IQ is a factor that violates the panmixity condition when studying the population distribution of alleles responsible for intelligence.
 * The population must be large enough. The smaller the number of individuals in the population, the more statistical fluctuations such as genetic drift will be applied. In relatively small populations, this can lead to the disappearance of some alleles from the population.
 * All gene flows (selection, emigration, immigration, mutation) must be negligible.
 * Asexual reproduction violates the assumptions of the model. With it, an individual creates copies of itself regardless of how numerous they are in the population.
 * Assumptions in the real population are also disturbed by overlapping generations. If the population is close to equilibrium, this may not lead to a disequilibrium.

Derived balances
In the event of non-fulfilment of any or more conditions, a different type of equilibrium may be established, or their combination:


 * Mutational balance
 * Selection balance, eg selection against homozygotes, selection against heterozygotes
 * A population with a significant influence of genetic drift. In them, the balance is usually achieved by fixing one of the alleles and the disappearance of the other.

Related articles

 * Genetic drift
 * Population genetics