PCA data dimensionality reduction in machine learning

1 Basic Concepts

PCA is principal component analysis. Principal component analysis, also known as principal component analysis, aims to use the idea of dimensionality reduction to transform multiple indicators into a few comprehensive indicators.
In statistics, PCA is a technique to simplify data sets. It's a linear transformation. This transformation transforms the data into a new coordinate system, so that the first variance of any data projection is on the first coordinate (called the first principal component), the second variance is on the second coordinate (the second principal component), and so on.
Principal component analysis (PCA) is often used to reduce the dimension of data sets, while maintaining the feature of the largest contribution of the opposite difference. This is achieved by retaining the low-order principal components and ignoring the high-order principal components. In this way, low-order components can often retain the most important aspects of the data. However, this is not certain. It depends on the specific application.

2 principle and mathematical derivation

1. The gradient rise method is used in principal component analysis.
Characteristic
Principle:

Mathematical derivation:

3 realize PCA algorithm by yourself

3.1 simulation of PCA by gradient rise method

1. Simulation preparation of data

import numpy as np
import matplotlib.pyplot as plt
X = np.empty((100, 2))
X[:,0] = np.random.uniform(0., 100., size=100)
X[:,1] = 0.75 * X[:,0] + 3. + np.random.normal(0, 10., size=100)
plt.scatter(X[:,0], X[:,1])
plt.show()

2.demean operation (for each feature, set the mean value to 0 according to the column of X)

def demean(X):
    return X - np.mean(X, axis=0)
 
 X_demean = demean(X)

3. Use gradient rise method

def f(w, X):
    return np.sum((X.dot(w)**2)) / len(X)

def df_math(w, X):
    return X.T.dot(X.dot(w)) * 2. / len(X)

def df_debug(w, X, epsilon=0.0001):
    res = np.empty(len(w))
    for i in range(len(w)):
        w_1 = w.copy()
        w_1[i] += epsilon
        w_2 = w.copy()
        w_2[i] -= epsilon
        res[i] = (f(w_1, X) - f(w_2, X)) / (2 * epsilon)
    return res

def direction(w):
    return w / np.linalg.norm(w)  # Unit vectors

def gradient_ascent(df, X, initial_w, eta, n_iters = 1e4, epsilon=1e-8):
    
    w = direction(initial_w) 
    cur_iter = 0

    while cur_iter < n_iters:
        gradient = df(w, X)
        last_w = w
        w = w + eta * gradient
        w = direction(w) # Note 1: one unit direction at a time
        if(abs(f(w, X) - f(last_w, X)) < epsilon):
            break
            
        cur_iter += 1

    return w


initial_w =np.random.random(X.shape[1])  # Note that 2 cannot start with the 0 vector
w = gradient_ascent(df_math,X_demean,initial_w,eta)
plt.scatter(X_demean[:,0],X_demean[:,1])
plt.plot([0,w[0]*30],[0,w[1]*30],color = 'r')
plt.show()

Note 3: StandardScaler cannot be used to standardize data

4. The first principal component is calculated above. If other principal components are calculated, the first principal component should be subtracted and then put into it for solution.

#Other principal components
X2 = np.empty(X.shape)
for i in range(len(X)):
    X2[i] = X[i] - X[i].dot(w)*w    #Remove the first principal component
    
#Or it can be expressed as
X2 = X - X.dot(w).reshape(-1,1)*w

5. Find the first n principal components

def first_n_components(n,X,eta=0.01,n_iters=1e4,epsilon=1e-8):
    X_pca = X.copy()
    X_pca = demean(X_pca)
    res = []   # Main ingredients for storage
    for i in range(n):
        initial_w = np.random.random(X_pca.shape[1])
        w = first_component(df,X_pca,initial_w,eta)
        res.append(w)
        
        X_pca = X_pca - X_pca.dot(w).reshape(-1,1)*w
    return res

first_n_components(2,X)   #call

5. Realize PCA and encapsulate it

import numpy as np


class PCA:

    def __init__(self, n_components):
        """Initialization PCA"""
        assert n_components >= 1, "n_components must be valid"
        self.n_components = n_components
        self.components_ = None

    def fit(self, X, eta=0.01, n_iters=1e4):
        """Get data set X Before n Principal components"""
        assert self.n_components <= X.shape[1], \
            "n_components must not be greater than the feature number of X"

        def demean(X):
            return X - np.mean(X, axis=0)

        def f(w, X):
            return np.sum((X.dot(w) ** 2)) / len(X)

        def df(w, X):
            return X.T.dot(X.dot(w)) * 2. / len(X)

        def direction(w):
            return w / np.linalg.norm(w)

        def first_component(X, initial_w, eta=0.01, n_iters=1e4, epsilon=1e-8):

            w = direction(initial_w)
            cur_iter = 0

            while cur_iter < n_iters:
                gradient = df(w, X)
                last_w = w
                w = w + eta * gradient
                w = direction(w)
                if (abs(f(w, X) - f(last_w, X)) < epsilon):
                    break

                cur_iter += 1

            return w

        X_pca = demean(X)
        self.components_ = np.empty(shape=(self.n_components, X.shape[1]))
        for i in range(self.n_components):
            initial_w = np.random.random(X_pca.shape[1])
            w = first_component(X_pca, initial_w, eta, n_iters)
            self.components_[i,:] = w

            X_pca = X_pca - X_pca.dot(w).reshape(-1, 1) * w

        return self

    def transform(self, X):
        """Given X，Mapping to principal components"""
        assert X.shape[1] == self.components_.shape[1]

        return X.dot(self.components_.T)

    def inverse_transform(self, X):
        """Given X，Reverse mapping back to the original feature space"""
        assert X.shape[1] == self.components_.shape[0]

        return X.dot(self.components_)

    def __repr__(self):
        return "PCA(n_components=%d)" % self.n_components

Dimension reduction of data

import numpy as np
import matplotlib.pyplot as plt
X = np.empty((100, 2))
X[:,0] = np.random.uniform(0., 100., size=100)
X[:,1] = 0.75 * X[:,0] + 3. + np.random.normal(0, 10., size=100)
from playML.PCA import PCA

pca = PCA(n_components=2)
pca.fit(X)
pca.components_
pca = PCA(n_components=1)
pca.fit(X)
X_reduction = pca.transform(X)
X_reduction.shape
X_restore = pca.inverse_transform(X_reduction)
X_restore.shape
plt.scatter(X[:,0], X[:,1], color='b', alpha=0.5)
plt.scatter(X_restore[:,0], X_restore[:,1], color='r', alpha=0.5)
plt.show()

N? Components represents the required principal components. X ﹣ reduction is the matrix after dimension reduction. X ﹣ restore represents the matrix recovered after dimension reduction.

PCA algorithm in 4 sklearn

1. Reduce dimension by pca

from sklearn.decomposition import PCA
pca = PCA(n_components=1)
pca.fit(X)
pca.components_
X_reduction = pca.transform(X)
X_restore = pca.inverse_transform(X_reduction)

2. After dimension reduction by PCA with specific handwritten digit recognition, KNN is used for classification

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
digits = datasets.load_digits()
X = digits.data
y = digits.target
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(X_train)
X_train_reduction = pca.transform(X_train)
X_test_reduction = pca.transform(X_test)

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train_reduction, y_train)
knn_clf.score(X_test_reduction, y_test)

3. Take the number of principal component components to explain the principal component fraction (PCA. Explained "variance" explained by principal component)

from sklearn.decomposition import PCA

pca = PCA(n_components=X_train.shape[1])
pca.fit(X_train)

plt.plot([i for i in range(X_train.shape[1])], 
         [np.sum(pca.explained_variance_ratio_[:i+1]) for i in range(X_train.shape[1])])
plt.show()

Or use the following:

pca = PCA(0.95)
pca.fit(X_train)
pca.n_components_
X_train_reduction = pca.transform(X_train)
X_test_reduction = pca.transform(X_test)

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train_reduction, y_train)
knn_clf.score(X_test_reduction, y_test)

PCA. N ﹣ components ﹣ is the number of all principal components whose corresponding proportion is 0.95
4. Data visualization after dimensionality reduction

%%time 
pca = PCA(n_components=2)
pca.fit(X)
X_reduction = pca.transform(X)
for i in range(10):
    plt.scatter(X_reduction[y==i,0], X_reduction[y==i,1], alpha=0.8)

5 other applications of PCA

1. Noise reduction by PCA

from sklearn import datasets

digits = datasets.load_digits()
X = digits.data
y = digits.target
noisy_digits = X + np.random.normal(0, 4, size=X.shape)
example_digits = noisy_digits[y==0,:][:10]
for num in range(1,10):
    example_digits = np.vstack([example_digits, noisy_digits[y==num,:][:10]])
example_digits.shape
def plot_digits(data):
    fig, axes = plt.subplots(10, 10, figsize=(10, 10),
                             subplot_kw={'xticks':[], 'yticks':[]},
    gridspec_kw=dict(hspace=0.1, wspace=0.1)) 
    for i, ax in enumerate(axes.flat):
        ax.imshow(data[i].reshape(8, 8),
                  cmap='binary', interpolation='nearest',
                  clim=(0, 16))

    plt.show()
    
plot_digits(example_digits)
pca = PCA(0.5).fit(noisy_digits)
pca.n_components_
components = pca.transform(example_digits)
filtered_digits = pca.inverse_transform(components)
plot_digits(filtered_digits)
components = pca.transform(example_digits)
filtered_digits = pca.inverse_transform(components)
plot_digits(filtered_digits)

Using the positive and negative transformation of PCA to remove the existing noise
2. feature faces

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people()
faces.keys()
faces.data.shape
faces.target_names
faces.images.shape
random_indexes = np.random.permutation(len(faces.data))
X = faces.data[random_indexes]
example_faces = X[:36,:]
example_faces.shape
def plot_faces(faces):
    
    fig, axes = plt.subplots(6, 6, figsize=(10, 10),
                         subplot_kw={'xticks':[], 'yticks':[]},
    gridspec_kw=dict(hspace=0.1, wspace=0.1)) 
    for i, ax in enumerate(axes.flat):
        ax.imshow(faces[i].reshape(62, 47), cmap='bone')
    plt.show()
    
plot_faces(example_faces)

#  Characteristic faces
from sklearn.decomposition import PCA 
pca = PCA(svd_solver='randomized')
pca.fit(X)
pca.components_.shape
plot_faces(pca.components_[:36,:])


faces2 = fetch_lfw_people(min_faces_per_person=60)
faces2.data.shape
faces2.target_names
len(faces2.target_names)
len(faces2.target_names)

5 Summary

When PCA is used for dimensionality reduction, it is suitable for data with more features. However, it should be noted that the data cannot be standardized, otherwise the PCA dimension reduction will fail. The basic principle is to use the gradient rise method to maximize the variance. Standardization will change the variance, so it cannot be standardized.
PCA can also be used for noise reduction. In the process of dimension reduction, the main components are actually extracted and the noise is filtered indirectly, so the accuracy of the model can be improved.
pca can also be used to represent feature faces.

Kenny Ho

Published 16 original articles, won praise 2, visited 2244

Private letter follow

Posted by ItsWesYo on Wed, 19 Feb 2020 04:29:52 -0800

Programmer Group