矩阵求导
矩阵求导
矩阵求导的定义
自变量↓\因变量→ | 标量$y$ | 向量$\mathbf{y}$ | 矩阵$\mathbf{Y}$ |
---|---|---|---|
标量$x$ | $\frac{\partial y}{\partial x}$ | $\frac{\partial \mathbf{y}}{\partial x}$ | $\frac{\partial \mathbf{Y}}{\partial x}$ |
向量$\mathbf{x}$ | $\frac{\partial y}{\partial \mathbf{x}}$ | $\frac{\partial \mathbf{y}}{\partial \mathbf{x}}$ | $\frac{\partial \mathbf{Y}}{\partial \mathbf{x}}$ |
矩阵$\mathbf{X}$ | $\frac{\partial y}{\partial \mathbf{X}}$ | $\frac{\partial \mathbf{y}}{\partial \mathbf{X}}$ | $\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}$ |
矩阵求导的两种布局:
分子布局($numerator\ layout$)和分母布局($denominator\ layout$ )。
对于分子布局和分母布局的结果来说,两者相差一个转置。
自变量↓\因变量→ | 标量$y$ | 向量$\mathbf{y}$ | 矩阵$\mathbf{Y}(m \times n)$ |
---|---|---|---|
标量$x$ | / | $\frac{\partial \mathbf{y}}{\partial x}$ 分子布局:m维列向量(默认布局) 分母布局:m维行向量 |
$\frac{\partial \mathbf{Y}}{\partial x}$ 分子布局:$𝑝 \times 𝑞$ 矩阵(默认布局) 分母布局:$𝑞 \times 𝑝$矩阵 |
向量$\mathbf{x}$ | $\frac{\partial y}{\partial \mathbf{x}}$ 分子布局:m维列向量 分母布局:m维行向量(默认布局) |
$\frac{\partial \mathbf{y}}{\partial \mathbf{x}}$ 分子布局:$m \times n$雅克比矩阵(默认布局) 分母布局:$𝑛 \times 𝑚$梯度矩阵 |
/ |
矩阵$\mathbf{X}$ | $\frac{\partial y}{\partial \mathbf{X}}$ 分子布局:$n \times m$矩阵 分母布局:$𝑚 \times 𝑛$矩阵(默认布局) |
/ | / |
矩阵求导的计算
自变量↓\因变量→ | 标量$y$ | 向量$\mathbf{y}$ | 矩阵$\mathbf{Y}$ |
---|---|---|---|
标量$x$ | $\frac{\partial y}{\partial x}$ | $\frac{\partial \mathbf{y}}{\partial x} = \left[ \begin{matrix} \frac{\partial{y_1}}{\partial{x}} \frac{\partial{y_2}}{\partial{x}} \end{matrix} \dots \frac{\partial{y_m}}{\partial{x}} \right] ^T$ (分子布局) $\frac{\partial \mathbf{y}}{\partial x} = \left[ \begin{matrix} \frac{\partial{y_1}}{\partial{x}} \frac{\partial{y_2}}{\partial{x}} \end{matrix} \dots \frac{\partial{y_m}}{\partial{x}} \right]$ (分母布局) |
$R_{m \times n}$ $R_{i,j} = {\frac{\partial{y_{i,j}}}{\partial x}}$ (分子布局), $R_{i,j} = {\frac{\partial{y_{j,i}}}{\partial x}}$(分母布局) |
向量$\mathbf{x}$ | $\frac{\partial y}{\partial \mathbf{x}} = \left[ \begin{matrix} \frac{\partial{y}}{\partial{x_1}} \frac{\partial{y}}{\partial{x_2}} \end{matrix} \dots \frac{\partial{y}}{\partial{x_m}} \right]$ (分子布局) $\frac{\partial y}{\partial \mathbf{x}} = \left[ \begin{matrix} \frac{\partial{y}}{\partial{x_1}} \frac{\partial{y}}{\partial{x_2}} \end{matrix} \dots \frac{\partial{y}}{\partial{x_m}} \right]^T$(分母布局) |
$\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \left( \begin{array}{ccc} \frac{\partial y_1}{\partial x_1}& \frac{\partial y_1}{\partial x_2}& \ldots & \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1}& \frac{\partial y_2}{\partial x_2} & \ldots & \frac{\partial y_2}{\partial x_n}\\ \vdots& \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1}& \frac{\partial y_m}{\partial x_2} & \ldots & \frac{\partial y_m}{\partial x_n} \end{array} \right)$ (分子布局) $\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \left( \begin{array}{ccc} \frac{\partial y_1}{\partial x_1}& \frac{\partial y_2}{\partial x_1}& \ldots & \frac{\partial y_m}{\partial x_1}\\ \frac{\partial y_1}{\partial x_2}& \frac{\partial y_2}{\partial x_2} & \ldots & \frac{\partial y_m}{\partial x_2}\\ \vdots& \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n}& \frac{\partial y_2}{\partial x_n} & \ldots & \frac{\partial y_m}{\partial x_n} \end{array} \right)$ (分母布局) |
$\frac{\partial \mathbf{Y}}{\partial \mathbf{x}}$ |
矩阵$\mathbf{X}$ | $R_{m \times n}$ $R_{i,j} = {\frac{\partial{y}}{\partial{x_{i,j}}}}$ (分子布局), $R_{i,j} = {\frac{\partial{y}}{\partial{x_{j,i}}}}$(分母布局) |
$\frac{\partial \mathbf{y}}{\partial \mathbf{X}}$ | $\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}$ |
矩阵求导规则
$$
\frac{d x^T}{dx} = I
\tag{1.1}
$$
$$
\frac{dx}{dx^T} = I
\tag{1.2}
$$
$$
\frac{dx^T A}{dx} = A
\tag{2.1}
$$
$$
\frac{dAx}{dx^T} = A
\tag{2.2}
$$
$$
\frac{dAx}{dx} = A^T
\tag{2.3}
$$
$$
\frac{dxA}{dx} = A^T
\tag{2.4}
$$
$$
\frac{\partial{u}}{\partial{x^T}} = \left ( \frac{\partial{u^T}}{\partial x} \right)^T
\tag{3}
$$
$$
\frac{\partial{u^Tv}}{\partial{x}} = \frac{\partial{u^T}}{\partial{x}}v + \frac{\partial{v^T}}{\partial{x}}u^T
\tag{4.1}
$$
$$
\frac{\partial{uv^T}}{\partial{x}} = \frac{\partial{u}}{\partial{x}}v^T + u\frac{\partial{v^T}}{\partial{x}}
\tag{4.2}
$$
$$
\frac{d x ^Tx}{dx} = 2x
\tag{4.3}
$$
$$
\frac{dx^TAx}{dx} = (A+A^T)x
\tag{4.4}
$$
$$
\frac{\partial{AB}}{\partial{x}} = \frac{\partial{A}}{x}B + A\frac{\partial{B}}{\partial{x}}
\tag{5}
$$
$$
\frac{\partial{u^T}Xv}{\partial{X}} = uv^T
\tag{6.1}
$$
$$
\frac{\partial{u^T}X^TXu}{\partial{X}} = 2Xuu^T
\tag{6.2}
$$
$$
\frac{\partial{\left [ \left( Xu - v \right)^T \left( Xu - v \right)\right]}}{\partial X} = 2 \left( Xu - v \right) u^T
\tag{6.3}
$$