----A question posted by Jinbo Bi
DATA:
|
|
y1 y2 ... yl |
x1 x2
x3 x4 x5
xn |
|
Lead1 Lead2 Lead3 * * * LeadN |
1.2 1.3
1.4 2.1 2.2
2.4 3.1 3.2
3.4 * * * 3.1 3.2 3.4 |
0.23 0.14
0.4 0.31 0.2
0.176 0.33 0.12
0.4 0.32 0.5
0.177 0.23 0.14
0.4 0.31 0.2
0.176 * * * 0.33 0.12
0.4 0.32 0.5
0.177 |
|
|
Y |
X |
![]()
![]()
*
*
![]()
![]()
![]()
![]()

![]()
![]()
Principal Component Analysis
(PCA)
![]()
![]()

Each p (loading) vector is actually a
projection operator, and the t (score) vector is the coordinate of each
point along these projection directions p's.
What are those p vectors? What are those q vectors? And what are in the diagonal of the diagonal matrix S?


So ps are the eigenvectors of matrix
, and qs are eigenvectors of matrix
, diagonal elements of S are sqrt of eigenvalues.

Principal Component Regression
Cut off some eigenvectors corresponding to smaller eigenvalues, i.e., use fewer rank-one matrices to approximate original data X.

![]()
![]()

Partial Least Squares
Algorithm:
1. do PCA for matrix X, find score Tx,
2. do PCA for matrix Y, find score Ty,
3. fit regression model between Tx and Ty, like

A real-life algorithm: (iteratively finding ts and ps for X and Y)
For each h=1,
,c,
where A0=X'Y, M0=X'X, C0=I,
and given,
1.
compute
qh, the dominant eigenvector of AhTAh
2.
wh=ChAhqh, wh=wh/||wh||,
and store wh into W as a column
3.
ph=Mhwh, ch=whTMhwh,
ph=ph/ch, and store ph
into P as a column
4.
qh=AhTwh/ch, and store qh
into Q as a column
5.
Ah+1=Ah - chphqhT and Bh+1=Mh
- chphphT
6.
Ch+1=Ch - whphT
A theorem is:
Consider one dependent y,

where ![]()
Krylov Subspace Methods
It is an equation solver, i.e., an iterative algorithm to solve an equation system like
![]()
A kth Krylov subspace generated by a vector c is like
![]()
The method is to find solutions of the following problem within the kth Krylov subspace generated by y at the kth iteration

Gram-Schmidt scheme to construct orthonormal basis for ![]()


Krylov method is shown can converge in super-linear rate under certain assumptions.
For a real-life algorithm, please check the paper
The idea behind Krylov methods by Ilse C.F. Ipsen and Carl D. Meyer
The real-life algorithms for PLS and Krylov method are pretty much similar to each other, so the questions are
1) if PLS is really a kind of Krylov method if you consider fitting regression is somehow kind of solving equation system?
2) if PLS is not Krylov method, can it be tailored to be able to apply any of Krylov methods, which be able to make solving PLS faster?