首页 > 代码库 > 两个有关矩阵求导的问题

两个有关矩阵求导的问题

  对于$D$维数据集$X$,若其样本$\boldsymbol{x}$服从$\mathcal{N} (\boldsymbol{x} | \boldsymbol{\mu}, \boldsymbol{A})$,那么用极大似然法对协方差矩阵$\boldsymbol{A}$进行估计时,则不可避免会遇到\begin{align} \label{eq: ln} \frac{\partial \ln |\boldsymbol{A}|}{\partial \boldsymbol{A}} \end{align}与\begin{align} \label{eq: qp} \frac{\partial \boldsymbol{x}^\top \boldsymbol{A}^{-1} \boldsymbol{x}}{\partial \boldsymbol{A}} \end{align}这样的问题。

  下面先看(\ref{eq: ln}),设\begin{align*} \boldsymbol{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1D} \\ a_{21} & a_{22} & \cdots & a_{2D} \\ \vdots & \vdots & \ddots & \vdots \\ a_{D1} & a_{D2} & \cdots & a_{DD} \end{bmatrix} \end{align*}并记$a_{ij}$有一个微小增量$\epsilon$后的矩阵为$\boldsymbol{A}(a_{ij} + \epsilon)$,根据第$i$行或第$j$列Laplace展开易知有\begin{align*} |\boldsymbol{A}(a_{ij} + \epsilon)| - |\boldsymbol{A}| = \epsilon A_{ij} \end{align*}其中$A_{ij}$是关于$a_{ij}$的代数余子式。于是\begin{align*} \frac{\partial |\boldsymbol{A}|}{\partial a_{ij}} = A_{ij} \end{align*}注意逆矩阵与伴随矩阵有如下关系\begin{align*} \boldsymbol{A}^{-1} = \frac{1}{|\boldsymbol{A}|} \boldsymbol{A}^* = \frac{1}{|\boldsymbol{A}|} \begin{bmatrix} A_{11} & A_{21} & \cdots & A_{D1} \\ A_{12} & A_{22} & \cdots & A_{D2} \\ \vdots & \vdots & \ddots & \vdots \\ A_{1D} & A_{2D} & \cdots & A_{DD} \end{bmatrix} \end{align*}故\begin{align*} \frac{\partial \ln |\boldsymbol{A}|}{\partial \boldsymbol{A}} = \frac{1}{|\boldsymbol{A}|} \frac{\partial |\boldsymbol{A}|}{\partial \boldsymbol{A}} = \frac{1}{|\boldsymbol{A}|} \begin{bmatrix} A_{11} & A_{12} & \cdots & A_{1D} \\ A_{21} & A_{22} & \cdots & A_{2D} \\ \vdots & \vdots & \ddots & \vdots \\ A_{D1} & A_{D2} & \cdots & A_{DD} \end{bmatrix} = (\boldsymbol{A}^{-1})^\top \end{align*}这就解决了(\ref{eq: ln})。

  下面再看(\ref{eq: qp}),易知有\begin{align*} \frac{\partial (\boldsymbol{A} \boldsymbol{B})}{\partial x} & = \begin{bmatrix} \frac{\partial (\sum_{i=1}^D a_{1i} b_{i1})}{\partial x} & \frac{\partial (\sum_{i=1}^D a_{1i} b_{i2})}{\partial x} & \cdots & \frac{\partial (\sum_{i=1}^D a_{1i} b_{iD})}{\partial x} \\ \frac{\partial (\sum_{i=1}^D a_{2i} b_{i1})}{\partial x} & \frac{\partial (\sum_{i=1}^D a_{2i} b_{i2})}{\partial x} & \cdots & \frac{\partial (\sum_{i=1}^D a_{2i} b_{iD})}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial (\sum_{i=1}^D a_{Di} b_{i1})}{\partial x} & \frac{\partial (\sum_{i=1}^D a_{Di} b_{i2})}{\partial x} & \cdots & \frac{\partial (\sum_{i=1}^D a_{Di} b_{iD})}{\partial x} \end{bmatrix} \\ & = \begin{bmatrix} \sum_{i=1}^D \left( \frac{\partial a_{1i}}{\partial x} b_{i1} + a_{1i} \frac{\partial b_{i1}}{\partial x} \right) & \sum_{i=1}^D \left( \frac{\partial a_{1i}}{\partial x} b_{i2} + a_{1i} \frac{\partial b_{i2}}{\partial x} \right) & \cdots & \sum_{i=1}^D \left( \frac{\partial a_{1i}}{\partial x} b_{iD} + a_{1i} \frac{\partial b_{iD}}{\partial x} \right) \\ \sum_{i=1}^D \left( \frac{\partial a_{2i}}{\partial x} b_{i1} + a_{2i} \frac{\partial b_{i1}}{\partial x} \right) & \sum_{i=1}^D \left( \frac{\partial a_{2i}}{\partial x} b_{i2} + a_{2i} \frac{\partial b_{i2}}{\partial x} \right) & \cdots & \sum_{i=1}^D \left( \frac{\partial a_{2i}}{\partial x} b_{iD} + a_{2i} \frac{\partial b_{iD}}{\partial x} \right) \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{i=1}^D \left( \frac{\partial a_{Di}}{\partial x} b_{i1} + a_{Di} \frac{\partial b_{i1}}{\partial x} \right) & \sum_{i=1}^D \left( \frac{\partial a_{Di}}{\partial x} b_{i2} + a_{Di} \frac{\partial b_{i2}}{\partial x} \right) & \cdots & \sum_{i=1}^D \left( \frac{\partial a_{Di}}{\partial x} b_{iD} + a_{Di} \frac{\partial b_{iD}}{\partial x} \right) \end{bmatrix} \\ & = \frac{\partial \boldsymbol{A}}{\partial x} \boldsymbol{B} + \boldsymbol{A} \frac{\partial \boldsymbol{B}}{\partial x} \end{align*}特别地,取$\boldsymbol{B} = \boldsymbol{A}^{-1}$且$x = a_{ij}$可知有\begin{align*} \boldsymbol{0} = \frac{\partial \boldsymbol{I}}{\partial a_{ij}} = \frac{\partial (\boldsymbol{A} \boldsymbol{A}^{-1})}{\partial a_{ij}} = \frac{\partial \boldsymbol{A}}{\partial a_{ij}} \boldsymbol{A}^{-1} + \boldsymbol{A} \frac{\partial \boldsymbol{A}^{-1}}{\partial a_{ij}} \end{align*}也即\begin{align*} \frac{\partial \boldsymbol{A}^{-1}}{\partial a_{ij}} = - \boldsymbol{A}^{-1} \frac{\partial \boldsymbol{A}}{\partial a_{ij}} \boldsymbol{A}^{-1} \end{align*}于是\begin{align*} \frac{\partial \boldsymbol{x}^\top \boldsymbol{A}^{-1} \boldsymbol{x}}{\partial a_{ij}} = \frac{\partial tr(\boldsymbol{x}^\top \boldsymbol{A}^{-1} \boldsymbol{x})}{\partial a_{ij}} = \frac{\partial tr(\boldsymbol{A}^{-1} \boldsymbol{x} \boldsymbol{x}^\top)}{\partial a_{ij}} = tr \left( \frac{\partial \boldsymbol{A}^{-1} \boldsymbol{x} \boldsymbol{x}^\top}{\partial a_{ij}} \right) = tr \left( - \boldsymbol{A}^{-1} \frac{\partial \boldsymbol{A}}{\partial a_{ij}} \boldsymbol{A}^{-1} \boldsymbol{x} \boldsymbol{x}^\top \right) = - tr \left( \frac{\partial \boldsymbol{A}}{\partial a_{ij}} \boldsymbol{A}^{-1} \boldsymbol{x} \boldsymbol{x}^\top \boldsymbol{A}^{-1} \right) \end{align*}注意$\frac{\partial \boldsymbol{A}}{\partial a_{ij}}$是一个在$(i,j)$处为$1$其余均为$0$的矩阵,于是\begin{align*} \frac{\partial \boldsymbol{x}^\top \boldsymbol{A}^{-1} \boldsymbol{x}}{\partial a_{ij}} = - [\boldsymbol{A}^{-1} \boldsymbol{x} \boldsymbol{x}^\top \boldsymbol{A}^{-1}]_{ji} \end{align*}故\begin{align*} \frac{\partial \boldsymbol{x}^\top \boldsymbol{A}^{-1} \boldsymbol{x}}{\partial \boldsymbol{A}} = - (\boldsymbol{A}^{-1} \boldsymbol{x} \boldsymbol{x}^\top \boldsymbol{A}^{-1})^\top \end{align*}特别地,若$\boldsymbol{A}$是对称矩阵,则$\boldsymbol{A}^{-1}$也是对称矩阵,于是\begin{align*} \frac{\partial \boldsymbol{x}^\top \boldsymbol{A}^{-1} \boldsymbol{x}}{\partial \boldsymbol{A}} = - \boldsymbol{A}^{-1} \boldsymbol{x} \boldsymbol{x}^\top \boldsymbol{A}^{-1} \end{align*}

两个有关矩阵求导的问题