Statistical method that uses an orthogonal transformation to convert a set of observations of correlated variables into a set of values of linearly uncorrelated variables called principal components.
Reduce the data from nn to kk dimensions: find the kk vectors onto which to project the data so as to minimize the projection error.
Algorithm:
1) Preprocessing (standardization): PCA is sensitive to the relative scaling of the original variable
2) Compute covariance matrix ΣΣ
3) Compute eigenvectors of ΣΣ
4) Choose kk principal components so as to retain xx% of the variance (typically x=99x=99)
Applications:
1) Compression
– Reduce disk/memory needed to store data
– Speed up learning algorithm. Warning: mapping should be defined only on training set and then applied to test set
- Visualization: 2 or 3 principal components, so as to summarize data
Limitations:
– PCA is not scale invariant
– The directions with largest variance are assumed to be of most interest
– Only considers orthogonal transformations (rotations) of the original variables
– PCA is only based on the mean vector and covariance matrix. Some distributions (multivariate normal) are characterized by this but some are not
– If the variables are correlated, PCA can achieve dimension reduction. If not, PCA just orders them according to their variances
Source