Parametric vs non-parametric statistical tests in Python

Zach Zazueta
5 min readSep 6, 2020

Once one has a good understanding of the data they have to work with, they next need to decide what they aim to answer with this information. Understanding the problem at hand is part of the Business Understanding step in the Data Science Process.

The Data Science Process

A business question with a data solution can often be posed as a hypothesis. For example “Is there a difference in the customer conversion rate between our old website design and a proposed new layout?” Having a hypothesis to test is a must-have before statistical testing can occur.

Two types of hypotheses are exploratory and confirmatory; as the names might suggest, exploratory analysis seeks to uncover the “why” and dig into the data while confirmatory hypotheses are more applicable when you have a pretty good idea of what is going on with the data and need evidence to support thinking. It is important to decide a priori which of your hypotheses belong to these categories. It has been argued that limiting exploratory hypothesis testing can help to increase certainty in results.

Once the hypothesis has been determined, the next question to answer is “am I comparing the mean or the median of two groups?”. Parametric tests will compare group means, while non-parametric tests compare group medians. A common misconception is that the decision rests solely on whether the data is normally distributed or not, especially when there is a smaller sample size and distribution of the data can matter significantly. Other factors should also be considered.

Parametric tests are widely regarded as handling data that is normally distributed — data with a Gaussian distribution — well. However, parametric tests also:

  • Work well with skewed and non-normal distributions.
  • Perform well when the spread of each group is different or the groups have different amounts of variability.
  • Typically have more statistical power than non-parametric tests.

If sample size is sufficiently large and group mean is the preferred measure of central tendency, parametric tests are the way to go.

If group median is the preferred measure of central tendency for the data, go with non-parametric tests regardless of sample size. Non-parametric tests are great for comparing data that is prone to outliers, like salary. They are also useful for data with small sample size and/or non-normal, and are especially useful for working with ordinal or ranked data. You should also stick with non-parametric tests for ordinal and ranked data.

Some of the most commonly used statistical parametric tests and their non-parametric counterparts are as follows:

Where n = sample size

There are also tests which compare correlation — looking for associations between variables e.g. Pearson, Spearman, Chi-Squared — and regression tests — seeing if a change in one or more independent variables will predict the change in a dependent variable e.g. simple & multiple regression.

A quick overview of when you might use each of the above tests:

The Paired t test is used when you are looking at one population sample with a before and after score or result. This could be comparing a classroom of students beginning of year proficiency on reading to their end of year proficiency to determine if there was growth or decrease in understanding. The non-parametric counterpart is the Wilcoxon Signed Rank test, which can be used to determine whether two dependent samples were selected from populations having the same distribution and takes into account the magnitude and direction of the difference.

The Unpaired t test, also widely known as the 2-sample or independent t test, is used to compare two samples from different, unrelated groups to determine if there is a difference in the group means. The Mann-Whitney U test, also known as the Wilcoxon rank-sum test, is similar to the Wilcoxon Signed Rank test but measures the magnitude and direction of the difference between independent samples.

Finally, the One-way ANalysis Of VAriance (ANOVA) is used to determine difference in group means for two or more groups where there is one independent variable with at least two distinct levels. An example of this would be predicting the weight of a dog based on breed given a set of dogs of different breeds. The Kruskal Wallis test, an extension of the Mann-Whitney U test for comparing two groups, can be used to compare medians of multiple groups where the distribution of residuals is assumed to not be normal.

There are certain assumptions that are made for data that is to be analyzed using parametric tests. The four assumptions are that 1) the data is normally distributed (or that difference between the samples is normally distributed for paired test), 2) there is similarity in variance in the data, 3) sample values are numeric and continuous, and 4) that sample observations are independent of each other. The below functions from the statsmodels.api module allow us to explore these assumptions during data exploration.

statsmodels.api.graphics.plot_regress_exog()statsmodels.api.graphics.qqplot()

Let’s examine how to call up these tests in Python 3. First, the parametric data:

The stats module is a great resource for statistical tests.

Paired t test is

scipy.stats.ttest_rel

Unpaired t test is

scipy.stats.ttest_ind
  • For ttest_rel and ttest_ind, the P-value in the output measures an alternative hypothesis that 𝜇0 != 𝜇1; for one-sided hypothesis, e.g. 𝜇0 > 𝜇1, divide p by 2 and if p/2 < alpha (usually 0.05).

One-way ANOVA is

scipy.stats.f_oneway
  • A significant P-value signals that there is a difference between some of the groups, but additional testing is needed to determine where the difference lies.

For the non-parametric data:

Wilcoxon Signed Rank is

scipy.stats.wilcoxon

Wilcoxon Rank-Sum is

scipy.stats.ranksums
  • Signed rank and rank-sum tests should be used for continuous distributions.

Kruskal Wallis is:

scipy.stats.kruskal(group1, group2, group3)
  • Similar to ANOVA, rejection of the null hypothesis does not tell us which of the groups is different, so additional post hoc group comparison is necessary.

In terms of takeaways, it is never good practice to conclude on the results of one test, but significant findings should lead to additional investigation. Bonferroni corrections, a topic for another time, can be used to reduce spurious positives.

--

--