Machine Learning A-Z: Hands-On Python and java
About Lesson

PCA is an unsupervised learning algorithm used for dimensionality reduction. It transforms high-dimensional data into a lower-dimensional form by identifying the directions (principal components) that maximize variance. This reduces the complexity of data while retaining its most informative features.

Implementation in Python (Scikit-learn)

  • Step 1: Import Libraries

·         python·         Copier le code·         from sklearn.decomposition import PCA

  • Step 2: Prepare and Standardize Data

·         python·         Copier le code·         from sklearn.preprocessing import StandardScaler·          ·         scaler = StandardScaler()·         X_scaled = scaler.fit_transform(X)

  • Step 3: Apply PCA

·         python·         Copier le code·         pca = PCA(n_components=2)·         X_pca = pca.fit_transform(X_scaled)

  • Step 4: Visualize or Use Transformed Data

·         python·         Copier le code·         import matplotlib.pyplot as plt·          ·         plt.scatter(X_pca[:, 0], X_pca[:, 1])·         plt.show()

Implementation in Java (Weka)

  • Step 1: Load the Dataset

·         java·         Copier le code·         Instances data = new Instances(new BufferedReader(new FileReader(“data.arff”)));

  • Step 2: Apply PCA

·         java·         Copier le code·         PrincipalComponents pca = new PrincipalComponents();·         pca.setVarianceCovered(0.95); // Keep 95% of variance·         pca.buildEvaluator(data);

  • Step 3: Transform Data

·         java·         Copier le code·         Instances transformedData = pca.transformedData(data);