Principal Component Analysis in Python | How to Apply PCA | Scree Plot, Biplot, Elbow & Kaisers Rule
Statistics Globe Statistics Globe
30.4K subscribers
4,372 views
117

 Published On Jul 27, 2023

This video explains how to apply a Principal Component Analysis (PCA) in Python. More details: https://statisticsglobe.com/principal...

The video is presented by Cansu Kebabci, a data scientist and statistician at Statistics Globe. Find more information about Cansu here: https://statisticsglobe.com/cansu-keb...

In the video, Cansu explains the steps and application of a Principal Component Analysis in Python. Watch the video to learn more on this topic!

Here can you find the previous videos of this series:

Introduction to Principal Component Analysis (Pt. 1 - Theory):    • Introduction to Principal Component A...  

Principal Component Analysis in R Programming (Pt. 2 - PCA in R):    • Principal Component Analysis in R Pro...  

Links to the tutorials mentioned in the video:

PCA Using Correlation & Covariance Matrix (Examples): https://statisticsglobe.com/pca-corre...

Biplot for PCA Explained: https://statisticsglobe.com/biplot-pc...

Python code of this video:

Install libraries
!pip install scikit-learn
!pip install pandas
!pip install matplotlib
!pip install numpy

Load Libraries & Modules
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

Load Breast Cancer Dataset
breast_cancer = load_breast_cancer()

Data Elements of breast_cancer
breast_cancer.keys()
breast_cancer.data.shape
breast_cancer.feature_names

Print Data in DataFrame Format
DF = pd.DataFrame(data = breast_cancer.data[:, :10], # Create DataFrame DF
columns = breast_cancer.feature_names[:10])
DF.head(6) # Print first 6 rows of DF

Standardize Data
scaler = StandardScaler() # Create scaler
data_scaled = scaler.fit_transform(DF) # Fit scaler
print(data_scaled) # Print scaler


Print Standardized Data in DataFrame Format
DF_scaled = pd.DataFrame(data = data_scaled,
columns = data.feature_names[:10])
DF_scaled.head(6)


Print Standardized Data in DataFrame Format
DF_scaled = pd.DataFrame(data = data_scaled, # Create DataFrame DF_scaled
columns = breast_cancer.feature_names[:10])
DF_scaled.head(6) # Print first 6 rows of DF_scaled

Ideal Number of Components
pca = PCA(n_components = 10) # Create PCA object forming 10 PCs
pca_trans = pca.fit_transform(DF_scaled) # Transform data
print(pca_trans) # Print transformed data
print(pca_trans.shape) # Print dimensions of transformed data


prop_var = pca.explained_variance_ratio_ # Extract proportion of explained variance
print(prop_var) # Print proportion of explained variance


PC_number = np.arange(pca.n_components_) + 1 # Enumarate component numbers
print(PC_number) # Print component numbers

Scree Plot
plt.figure(figsize=(10, 6)) # Set figure and size
plt.plot(PC_number, # Plot prop var
prop_var,
'ro-')
plt.title('Scree Plot (Elbow Method)', # Plot Annotations
fontsize = 15)
plt.xlabel('Component Number',
fontsize = 15)
plt.ylabel('Proportion of Variance',
fontsize = 15)
plt.grid() # Add grid lines
plt.show() # Print graph

#Alternative Scree Plot Data
var = pca.explained_variance_ # Extract explained variance
print(var) # Print explained variance

The remaining code is unfortunately too long for a YouTube description.

Follow me on Social Media:
Facebook – Statistics Globe Page:   / statisticsglobecom  
Facebook – R Programming Group for Discussions & Questions:   / statisticsglobe  
Facebook – Python Programming Group for Discussions & Questions:   / statisticsglobepython  
LinkedIn – Statistics Globe Page:   / statisticsglobe  
LinkedIn – R Programming Group for Discussions & Questions:   / 12555223  
LinkedIn – Python Programming Group for Discussions & Questions:   / 12673534  
Twitter:   / joachimschork  
Instagram:   / statisticsglobecom  
TikTok:   / statisticsglobe  

show more

Share/Embed