logo

What is Non-parametric Statistics? 📂Statistical Test

What is Non-parametric Statistics?

Definition 1

In statistics, nonparametric statistics refers to statistical methodologies that generally do not assume a specific distribution for the population. They are particularly known for having minimal conditions for hypothesis testing.

Description

As an example, let’s see how hypothesis testing is conducted in analysis of variance.

One-way ANOVA: Consider that there are kk treatments in an experimental design, and njn_{j} samples are drawn from each treatment, with a total of n=n1++nkn = n_{1} + \cdots + n_{k} samples. Suppose that the samples from the j=1,,kj = 1 , \cdots , k-th treatment are independent and randomly follow a normal distribution N(μj,σj2)N \left( \mu_{j} , \sigma_{j}^{2} \right) and that the population variances of the normal distributions are equal, σ2=σ12==σk2\sigma^{2} = \sigma_{1}^{2} = \cdots = \sigma_{k}^{2}. In the one-way ANOVA that compares population means among groups, the hypothesis testing is as follows:

  • H0H_{0}: μ1==μk\mu_{1} = \cdots = \mu_{k}
  • H1H_{1}: At least one of the μj\mu_{j} differs from the other population means.

Another example is a hypothesis test in regression analysis.

tt-test for Regression Coefficient: [y1y2yn]=[1x11xp11x12xp21x1nxpn][β0β1βp]+[ε1ε2εn] \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix}
Given pp independent variables and nn data points, the linear multiple regression model can be represented by a design matrix as shown above, simplified to Y=Xβ+εY = X \beta + \varepsilon. Assume that in the model diagnosis, the residuals satisfy linearity, homoscedasticity, independence, and normality. In multiple regression analysis, the hypothesis testing for each regression coefficient is as follows:

  • H0H_{0}: βj=0\beta_{j} = 0 i.e., the jj-th independent variable has no correlation with the dependent variable.
  • H1H_{1}: βj0\beta_{j} \ne 0 i.e., the regression coefficient for the jj-th independent variable is significant.

In the broader scope of statistics, analysis of variance and regression analysis are accessible to students by sophomore year in college and can be sufficiently understood and utilized. However, as observed, there are numerous and complex conditions to employ these analyses effectively. Simply applying the techniques without meeting the assumptions might yield results, but they cannot be trusted due to a lack of theoretical basis.

Examples of data not meeting assumptions include lack of normality or homoscedasticity, as well as situations where applying parametric methods is challenging due to the nature of the data itself, such as:

  • When only the order or rank of data is known: World university rankings, new product test preference results
  • Truncated data: Data beyond a certain range cannot be measured or is intentionally missing

  1. Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p630. ↩︎