# What is principal component analysis? Explain the sort of problems you would use PCA for. Also explain its limitations as a method

Data Science Interview QuestionsCategory: Data ScienceWhat is principal component analysis? Explain the sort of problems you would use PCA for. Also explain its limitations as a method

Statistical method that uses an orthogonal transformation to convert a set of observations of correlated variables into a set of values of linearly uncorrelated variables called principal components.
Reduce the data from nn to kk dimensions: find the kk vectors onto which to project the data so as to minimize the projection error.
Algorithm:
1) Preprocessing (standardization): PCA is sensitive to the relative scaling of the original variable
2) Compute covariance matrix ΣΣ
3) Compute eigenvectors of ΣΣ
4) Choose kk principal components so as to retain xx% of the variance (typically x=99x=99)
Applications:
1) Compression
– Reduce disk/memory needed to store data
– Speed up learning algorithm. Warning: mapping should be defined only on training set and then applied to test set

1. Visualization: 2 or 3 principal components, so as to summarize data

Limitations:
– PCA is not scale invariant
– The directions with largest variance are assumed to be of most interest
– Only considers orthogonal transformations (rotations) of the original variables
– PCA is only based on the mean vector and covariance matrix. Some distributions (multivariate normal) are characterized by this but some are not
– If the variables are correlated, PCA can achieve dimension reduction. If not, PCA just orders them according to their variances
Source

High level explanation: Remember curse of dimensionality? If you are trying to solve for that, want to reduce the dimensionality of your data, one way to do that is through PCA.