Data transformation in ML — Standardization vs Normalization

credit to Analytics Vidhya
  • Removes mean and scale data to unit variance
  • Cannot guarantee balanced feature scales in the presence of outliers
  • Rescales data so all values are in a 0–1 range
  • Also sensitive to outliers, as inliers are often squeezed into a small range
  • Centering and scaling are based on percentiles — median is removed and data are scaled according to interquantile range
  • Median and IQR are robust to outliers, as opposed to other measures like min, max, mean, standard deviation
  • Implements the Yeo-Johnson and Box-Cox transforms to make the data more Gaussian-like by finding optimal scaling factors to stabilize variance and minimize skewness through MLE.
  • Rescales the variable so every sample has unit norm independent of the distribution of samples.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store