qr_factorization 和 householder transformation解决最小二乘法

Least-Squares Problems by QR Factorization

令 A 为训练集 $A=\left[a_{1}, a_{2}, \ldots, a_{n}\right] \in R^{m \times n}$

$a_{1}, a_{2}, \ldots, a_{n} \in R^{m}$ y 为target。

我们的目标是 找到一个c
$argmin_{c \in R^{n}}|A c-y|, \quad c=\left[c_{1}, c_{2}, \ldots, c_{n}\right]^{T}$

QR Factorization

$\hat{R}=$\left[\begin{array}{l}R \ O\end{array}\right]$\in \mathbb{R}^{m \times n}, \quad m>n$
R 是一个对角线以下都为0的矩阵 ,例如

$$\left[\begin{array}{rrr}1 & -5.00 & -15.00 \\ 0 & 2.24 & 11.18 \\ 0 & 0 & -1.93 \\ 0 & 0 & 0\end{array}\right]$$

Q 是一个正交矩阵 (正交矩阵有很多性质,其中之一就是$QQ^T$ = I , 我们也在这里需要这个性质)
$$A = Q\hat{R}$$
我们的argmin目标变成了$\min _{\mathbf{c} \in \mathbb{R}^{n}}||Q \hat{R} \mathbf{c}-\mathbf{y}||$
$||Q \hat{R} \mathbf{c}-\mathbf{y}||=\left||Q^{T}(Q \hat{R} \mathbf{c}-\mathbf{y})\right||=\left||\hat{R} \mathbf{c}-Q^{T} \mathbf{y}\right||$ 因为一个向量乘上一个正交的矩阵之后norm并不会变。证明:
Q是正交矩阵 ,$||Q \mathbf{u}|^{2}=(Q \mathbf{u})^{T}(Q \mathbf{u})=\mathbf{u}^{T} Q^{T}(Q \mathbf{u})=\mathbf{u}^{T}\left(Q^{T} Q\right) \mathbf{u}=\mathbf{u}^{T} \mathbf{u}=||\mathbf{u}||^{2}$
最后,目标变成了
$argmin _{\mathbf{c} \in \mathbb{R}^{n}}||A \mathbf{c}-\mathbf{y}||=argmin _{\mathbf{c} \in \mathbb{R}^{n}}||Q \hat{R} \mathbf{c}-\mathbf{y}||=argmin _{\mathbf{c} \in \mathbb{R}^{n}}\left||\hat{R} \mathbf{c}-Q^{T} \mathbf{y}\right||$

QR Factorization by Householder Matrices

QR分解有很多实现方法。
例如 Givens 旋转、Householder 变换,以及 Gram-Schmidt 正交化等等
这里不提及Givens旋转和 Gram-Schmidt 正交化实现方式,只讨论优缺点。
(1)Givens旋转 会比 Householder变换多出来大概50%的额外工作量和内存需求。
(2) Givens旋转在应对sparse matirx的时候会更好。
(3) 传统Gram-Schmidt 正交化的精确度不足。会有改进后的Gram-Schmidt 方法提供了更好的精度。
Big(O) for givens 旋转大概是 $2mn^2+2mn+c$ http://www.cs.nthu.edu.tw/~cherung/teaching/2011anm/note07.pdf
Big(O) for householder大概是 $2mn^2-2n^3+c$
Big(O) for gram-schimidt 大概是$2mn^2$, 比 householder快一点, 但是精度不足https://www.researchgate.net/publication/220767145_Comparison_of_Different_Parallel_Modified_Gram-Schmidt_Algorithms 第一页
(似乎这些算法都有parallel版本,但以后再看吧)
QR分解的例子:
A= $\left[\begin{array}{rrr}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \\ 0 & -1 & 1\end{array}\right]$ $\boldsymbol{b} = \left[\begin{array}{r}1237 \\ 1941 \\ 2417 \\ 711 \\ 1177 \\ 475\end{array}\right]$

1
2
A = np.array([[1,0,0],[0,1,0],[0,0,1],[-1,1,0],[-1,0,1],[0,-1,1]])
y = np.array([1237,1941,2417,711,1177,475])

$v_1$ = $a_1$ - $\alpha e_1$ = $\left[\begin{array}{l}1 \\ 0 \\ 0 \\ -1\\ -1 \\ 0\end{array}\right]$ - $\sqrt{3} * \left[\begin{array}{l}1 \ 0 \ 0 \ 0\ 0 \ 0\end{array}\right]$ = $\left[\begin{array}{l}2.7321 \\ 0 \\ 0 \\ -1\\ -1 \\ 0\end{array}\right]$

$\alpha$ = - $sign(a1)||a_2||_2$

$$\boldsymbol{H_1}=\boldsymbol{I}-2 \frac{\boldsymbol{v_1} \boldsymbol{v_1}^{T}}{\boldsymbol{v_1}^{T} \boldsymbol{v_1}}$$
$H1$ = $\left[\begin{array}{ccc}-1.7321 & 0.5774 & 0.5774 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ -1.80 & 0.7887 & -0.2113 \\ -1.80 & -0.2113 & 0.7887 \\ 0 & -1 & 1\end{array}\right]$ 注意第一列的-1.80实际上要省略掉的
$\boldsymbol{H}_{1} \boldsymbol{b}=\left[\begin{array}{r}376 \\ 1941 \\ 2417 \\ 1026 \\ 1492 \\ 475\end{array}\right]$

1
H = np.array(np.eye(6) - 2*(v.dot(v.transpose())/(v.transpose().dot(v))))

$v_{2}=\left[\begin{array}{c}0 \\ 1 \\ 0 \\ 0.7887 \\ -0.2113 \\ -1\end{array}\right]-\left[\begin{array}{c}0 \\ -1.6330 \\ 0 \\ 0 \\ 0 \\ 0\end{array}\right]=\left[\begin{array}{c}0 \\ 2.6330 \\ 0 \\ 0.7887 \\ -0.2113 \\ -1\end{array}\right]$
注意$\alpha_2$ 计算norm的时候是不算0.5774的

$H_{2} H_{1} A=\left[\begin{array}{ccc}-1.7321 & 0.5774 & 0.5774 \\ 0 & -1.6330 & 0.8165 \\ 0 & 0 & 1 \\ 0 & 0 & 0.0332 \\ 0 & 0 & 0.7231 \\ 0 & 0 & 0.6899\end{array}\right], \quad H_{2} H_{1} b=\left[\begin{array}{r}376 \\ -1200 \\ 2417 \\ 85 \\ 1744 \\ 1668\end{array}\right]$
一直做下去就会得到

$H_{3} H_{2} H_{1} A=\left[\begin{array}{ccc}-1.7321 & 0.5774 & 0.5774 \\ 0 & -1.6330 & 0.8165 \\ 0 & 0 & -1.4142 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{array}\right]=\left[\begin{array}{l}R \\ O\end{array}\right]$

$H_{3} H_{2} H_{1} b=\left[\begin{array}{r}376 \\ -1200 \\ -3417 \\ 5 \\ 3 \\ 1\end{array}\right]=Q^{T} b=\left[\begin{array}{l}c_{1} \\ c_{2}\end{array}\right]$

然后对$Rx=c_1$用back-subsitution就会得到
$x^{T}=[1236,1943,2416]$