- is this insight just observed by chance or is it a real insight?

Statistical significance can be accessed using hypothesis testing:

– Stating a null hypothesis which is usually the opposite of what we wish to test (classifiers A and B perform equivalently, Treatment A is equal of treatment B)

– Then, we choose a suitable statistical test and statistics used to reject the null hypothesis

– Also, we choose a critical region for the statistics to lie in that is extreme enough for the null hypothesis to be rejected (p-value)

– We calculate the observed test statistics from the data and check whether it lies in the critical region

Common tests:

– One sample Z test

– Two-sample Z test

– One sample t-test

– paired t-test

– Two sample pooled equal variances t-test

– Two sample unpooled unequal variances t-test and unequal sample sizes (Welch’s t-test)

– Chi-squared test for variances

– Chi-squared test for goodness of fit

– Anova (for instance: are the two regression models equals? F-test)

– Regression F-test (i.e: is at least one of the predictor useful in predicting the response?)

Source