Machine Learning A-Z: Hands-On Python and java
About Lesson

Feature scaling is a crucial preprocessing step in machine learning that ensures all features contribute equally to the model by adjusting their scales. Two popular methods for feature scaling are standardization and normalization.

Standardization

Standardization transforms features to have a mean of 0 and a standard deviation of 1. This technique is useful when features have different units or magnitudes, as it centers the data and scales it based on standard deviation.

Formula: Z=X−μσZ = frac{X – mu}{sigma}Z=σX−μ​ Where XXX is the original data, μmuμ is the mean, and σsigmaσ is the standard deviation.

Example in Python:

pythonCopier le codefrom sklearn.preprocessing import StandardScaler scaler = StandardScaler()scaled_data = scaler.fit_transform(data)

Benefits:

  • Suitable for algorithms like SVM and logistic regression.
  • Preserves the distribution of data.

Normalization

Normalization scales features to a fixed range, typically [0, 1], by adjusting the data relative to the minimum and maximum values. This method is ideal when the distribution of features is not Gaussian or when the model relies on the magnitude of inputs.

Formula: X′=X−XminXmax−XminX’ = frac{X – X_{min}}{X_{max} – X_{min}}X′=Xmax​−Xmin​X−Xmin​​

Example in Python:

pythonCopier le codefrom sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler()normalized_data = scaler.fit_transform(data)

Benefits:

  • Ideal for algorithms like K-nearest neighbors and neural networks.
  • Ensures features are within the same range.