Genetic methods of association analysis
Genetic association studies investigate the relationship of genetic markers (one or more) to the onset and course of disease, including diseases caused by external pathogens. In genetics, they are mainly used to reveal genetic predispositions to multifactorial diseases, i.e. genes (genotypes from different but also association given genes) that resolve or cause the disease.
Such genotypes are relatively easy to find – they occur more often or, conversely, significantly less often in a group of patients (cases) compared to individuals (in the sense of not suffering from the given disease) control.
There are two main types of association studies: case-control studies and SNP association studies.
CASE-CONTROL STUDIES (RETROSPECTIVE)
In the classic variant, we monitor the occurrence of different genotypes of a pre-selected candidate gene in groups of cases and controls. If we do not know the pre-selected candidate gene or cannot even imagine it, we can perform a genome-wide association study. In this case, we determine the genotype for a large number of polymorphic loci on all chromosomes simultaneously in both groups of people (most often using DNA microarrays), which can determine up to approximately 1.8 million genotypes in each individual.
In a case-control study, the prevalence of a risk factor (exposure) is compared. We proceed from the effect to the cause, while looking for an answer to the question of whether the disease under study was caused by a suspected factor. For example, the relationship between smoking and lung cancer - here we select from a clearly defined population (e.g. patients of one healthcare facility - primarily people with lung cancer - the monitored group) and people without lung cancer (the control group).
Advantages
- relatively fast, cheap, possibility of quick repetition
- suitable for studying rare diseases
- suitable for chronic diseases and diseases with long latency
- possibility of monitoring multiple risk factors for one disease
Disadvantages
- need to rely on human memory – i.e. problematic retrospective assessment of exposure to a suspect factor
- high risk of selection bias (systematic selection errors) – a clear definition of the source population, both the observed and the control group, is necessary
COHORT STUDY
Here the incidence of a disease (effect) is compared. Here we proceed from cause to effect, seeking an answer to the question of whether exposure to a suspected factor (cause) causes the disease. For example, investigating the relationship between smoking and lung cancer, where the study group consists of smokers (exposed group) and the control group consists of non-smokers (unexposed group).
Advantages
- accuracy, reliability
- objectivity – can assess multiple consequences of a single exposure
Disadvantages
- financial and time-consuming
- not suitable for studying rare diseases
STUDY SNPs (single nucleotide polymorphism)
The human genome project aimed to describe the entire structure of DNA. One of the great benefits of this project was the discovery of millions of variants of DNA sections, most of which were made up of SNPs. The international SNP consortium was formed by the union of several pharmaceutical companies, technology companies and academic centers, which focused on compiling detailed SNP maps of the human genome and publishes its results in public places. Thanks to this step, individual SNPs can be used by independent laboratories for further research and do not have to deal with patent issues. Determination of thousands of DNA samples from different patients used in typical studies using rapid sequencing devices must be well reproducible and its cost should be as low as possible.
Then, the genotypes (SNPs) of individual patients are compared with their phenotype (clinical manifestations) using sophisticated statistical software. Association studies focused on SNP research are divided into two types:
direct testing of the functional manifestations of SNPs for a given disease
using SNPs as markers for linkage disequilibrium (LD)
LD is generally defined by measuring the association between two genetic markers, which can then be used to identify regions associated with the disease.
The principle of this method is the same as when determining the genes responsible for a disease in a family by searching for the connecting line of the disease in ancestors (linkage analysis). Unfortunately, too few generations (or rather their genomes) are available in family studies, so the manifestations of the disease are searched for in pedigrees. There are approximately 300 highly specific repetitive markers in the human genome that directly indicate the gene responsible for a monogenic disease. Conversely, LD in a population indicates locations where genetic markers have been eliminated by recombination.
LD also provides information about the occurrence of new mutations or genetic drift. The question of the number of markers needed to describe LD on the genome to identify genes associated with a given disease is very unclear. If we had a sufficiently detailed description of the structure of the human genome (using sequencing), we could reduce the number of markers needed to describe LD on the genome and focus only on gene-rich sites.
A complete knowledge of the degree and pattern of recombination variation in the genome would allow us to distribute markers in a reasonable manner and probably reduce the number of markers needed to determine LD. By simply comparing different DNA segments with physical condition, we will reveal a large number of variants of the relationship between genetic predisposition and its reflection in the body shell.
Sources
Genetické metody asociační analýzy. Online. Wikiskripta. 2025. Dostupné z: https://www.wikiskripta.eu/w/Genetick%C3%A9_metody_asocia%C4%8Dn%C3%AD_anal%C3%BDzy. [cit. 2025-05-30].
