Gaussian Distribution

Normal Distribution
Normal (Gaussian) distribution is described as:
 * a bell-shaped curve.
 * valid for data that is symmetrically distributed around the mean.
 * Parameter μ is the mean (location of the peak) and σ2 is the variance (the measure of the width of the distribution).
 * The distribution with μ = 0 and σ2 = 1 is called the standard normal and the mean = median = mode

When a random variable X is distributed normally with mean μ and variance σ2, we denote the normal distribution as:
 * 68.2% of all scores cluster around the mean within approximately 1 standard deviation
 * 95.4% within approximately 2 standard deviations
 * 99.7% within approximately 3 standard deviations

Several biological variables are normally distributed (e.g., blood pressure, serum cholesterol, height, and weight). The normal curve can be used to estimate probabilities (frequency of occurrence) associated with these variables.

In real life, normal distributions are by far standard (μ ≠ 0, σ2 ≠1) and tend to be skewed in the positive or negative direction:
 * Negatively skewed: Example of values: 1,1000,1001,1002,1003. The tail is on the left, and there are relatively few low values and many high values. Median < Mean < Mode
 * Positively skewed: Example of values: 1,2,3,4,100. The tail is on the right and there are relatively few high values and many low values. Mean > Median > Mode



The mode is least affected by the outliers of the sample. Some distributions show "disruption", rendering them bimodal (they have 2 humps). This shows that in the sampled population exist two distinct "sub-populations", that each one of them have their own normal distribution.

Confidence Interval (CI)
Confidence interval (CI): Used when, instead of simply wanting the mean value of a sample, we want a range that is likely to contain the true population value. The confidence interval denotes with a certain confidence (95%, 97%, etc) that the true value of the population mean within that interval. The narrower the CI, the more accurate the true (population) mean prediction is, but with a lower confidence. The opposite applies for a wide CI. The best combination is to have as large sample as possible with a non-wide CI (<99%).

When taking a sample for a normally distributed variable such as blood pressure, one sample of x people and another sample of x other people might have different means. This, of course, does not allow us to know the true value of the mean. The CI will denote a range of values in which both of these means will lie (if the samples were representative, non-biased and not small), but also the true population sample (which can be calculated if we measure the blood pressure of every human being).