Last year Dige was having a statistic course and had to solve many problems and calculate many results. However, he thought it was worthless to practice that much since that after solving several problems, the rest was just repeating and repeating, which is too simple, sometimes naive. Therefore, he wanted someone to help him write a program, which would output all the results he needs immediately instead of calculating them himself.
The statistic problems might include z-scores, one-sample t-tests, correlation and regression.
For a population of real numbers, the standard deviation (SD) of the numbers is defined as
, where n is the number of the numbers, and Mx is the mean value of the numbers. The z-score for each number is defined as
.
You know, sometimes (almost always) we cannot get data for a whole population, and what we’ve got is just a sample from it. For a sample from a population, the SD is
, while the standard error (SE) is
. In a one-sample t-test, the t-value is defined as
.
There are many ways to calculate a correlation coefficient between two variables X and Y, and one of the most popular one is Pearson’s r, which is
. Note that the SDs here are population SDs.
It is a little complicated for regression, which is meant to find the “best” a and b for the estimation
. Here
, and
. If you use these equations to solve a regression problem, it doesn’t matter which SD you use here, sample or population. However, you should use the same type of SD for and .