PCA is an unsupervised learning algorithm used for dimensionality reduction. It transforms high-dimensional data into a lower-dimensional form by identifying the directions (principal components) that maximize variance. This reduces the complexity of data while retaining its most informative features.
Implementation in Python (Scikit-learn)
- Step 1: Import Libraries
· python· Copier le code· from sklearn.decomposition import PCA
- Step 2: Prepare and Standardize Data
· python· Copier le code· from sklearn.preprocessing import StandardScaler· · scaler = StandardScaler()· X_scaled = scaler.fit_transform(X)
- Step 3: Apply PCA
· python· Copier le code· pca = PCA(n_components=2)· X_pca = pca.fit_transform(X_scaled)
- Step 4: Visualize or Use Transformed Data
· python· Copier le code· import matplotlib.pyplot as plt· · plt.scatter(X_pca[:, 0], X_pca[:, 1])· plt.show()
Implementation in Java (Weka)
- Step 1: Load the Dataset
· java· Copier le code· Instances data = new Instances(new BufferedReader(new FileReader(“data.arff”)));
- Step 2: Apply PCA
· java· Copier le code· PrincipalComponents pca = new PrincipalComponents();· pca.setVarianceCovered(0.95); // Keep 95% of variance· pca.buildEvaluator(data);
- Step 3: Transform Data
· java· Copier le code· Instances transformedData = pca.transformedData(data);