What is Non-parametric Statistics?
Definition 1
In statistics, nonparametric statistics refers to statistical methodologies that generally do not assume a specific distribution for the population. They are particularly known for having minimal conditions for hypothesis testing.
Description
As an example, let’s see how hypothesis testing is conducted in analysis of variance.
One-way ANOVA: Consider that there are treatments in an experimental design, and samples are drawn from each treatment, with a total of samples. Suppose that the samples from the -th treatment are independent and randomly follow a normal distribution and that the population variances of the normal distributions are equal, . In the one-way ANOVA that compares population means among groups, the hypothesis testing is as follows:
- :
- : At least one of the differs from the other population means.
Another example is a hypothesis test in regression analysis.
-test for Regression Coefficient:
Given independent variables and data points, the linear multiple regression model can be represented by a design matrix as shown above, simplified to . Assume that in the model diagnosis, the residuals satisfy linearity, homoscedasticity, independence, and normality. In multiple regression analysis, the hypothesis testing for each regression coefficient is as follows:
- : i.e., the -th independent variable has no correlation with the dependent variable.
- : i.e., the regression coefficient for the -th independent variable is significant.
In the broader scope of statistics, analysis of variance and regression analysis are accessible to students by sophomore year in college and can be sufficiently understood and utilized. However, as observed, there are numerous and complex conditions to employ these analyses effectively. Simply applying the techniques without meeting the assumptions might yield results, but they cannot be trusted due to a lack of theoretical basis.
Examples of data not meeting assumptions include lack of normality or homoscedasticity, as well as situations where applying parametric methods is challenging due to the nature of the data itself, such as:
- When only the order or rank of data is known: World university rankings, new product test preference results
- Truncated data: Data beyond a certain range cannot be measured or is intentionally missing
Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p630. ↩︎