MatrixCalculus

HomePage | Memberswill go by tor | RecentChanges | Join |

Denote FF a matrix of size A x C, GG a matrix of size A x B, HH a matrix of size B x C and F=GHF = G H . For a real number variable xx , it is known Fx=GxH+GHx\frac{\partial F}{\partial x} = \frac{\partial G}{\partial x} H + G \frac{\partial H}{\partial x}. Lay out a collection of variables x i,jx_{i,j} by entries of a matrix XX of size M x N. Then layout block-wise the ii in row and jj in column block with respect the variable x i,jx_{i,j} also the definition of FX\frac{\partial F}{\partial X} whose entry blocks are all A x C matrix:

FX[Fx 1,1 Fx 1,2 Fx 1,N Fx 2,1 Fx 2,2 Fx 2,N Fx M,1 Fx M,2 Fx M,N]\frac{\partial F}{\partial X}\equiv\begin{bmatrix}\frac{\partial F}{\partial x_{1,1}}& \frac{\partial F}{\partial x_{1,2}}& \cdots & \frac{\partial F}{\partial x_{1,N}}\\ \frac{\partial F}{\partial x_{2,1}}& \frac{\partial F}{\partial x_{2,2}} & \cdots & \frac{\partial F}{\partial x_{2,N}} \\ \vdots & \vdots & \ddots & \vdots \\\frac{\partial F}{\partial x_{M,1}}& \frac{\partial F}{\partial x_{M,2}}& \cdots &\frac{\partial F}{\partial x_{M,N}}\end{bmatrix}

Obviously, (FX) T=F TX T\left(\frac{\partial F}{\partial X}\right)^T = \frac{\partial F^T}{\partial X^T} . Also by the mentioned identity for single variable, it leads to:

[Fx 1,1 Fx 1,2 Fx 1,N Fx 2,1 Fx 2,2 Fx 2,N Fx M,1 Fx M,2 Fx M,N]=[Gx 1,1H+GHx 1,1 Gx 1,2H+GHx 1,2 partialGx 1,NH+GHx 1,N Gx 2,1H+GHx 2,1 Gx 2,2H+GHx 2,2 Gx 2,NH+GHx 2,N Gx M,1H+GHx M,1 Gx M,2H+GHx M,2 Gx M,NH+GHx M,N]\begin{bmatrix}\frac{\partial F}{\partial x_{1,1}} & \frac{\partial F}{\partial x_{1,2}} & \cdots & \frac{\partial F}{\partial x_{1,N}} \\ \frac{\partial F}{\partial x_{2,1}} & \frac{\partial F}{\partial x_{2,2}} & \cdots & \frac{\partial F}{\partial x_{2,N}} \\ \vdots &\vdots &\ddots& \vdots \\ \frac{\partial F}{\partial x_{M,1}} & \frac{\partial F}{\partial x_{M,2}} & \cdots & \frac{\partial F}{\partial x_{M,N}}\end{bmatrix} =\begin{bmatrix}\frac{\partial G}{\partial x_{1,1}} H + G \frac{\partial H}{\partial x_{1,1}} & \frac{\partial G}{\partial x_{1,2}} H + G \frac{\partial H}{\partial x_{1,2}} & \cdots &\frac{partial G}{\partial x_{1,N}} H + G \frac{\partial H}{\partial x_{1,N}} \\ \frac{\partial G}{\partial x_{2,1}} H + G \frac{\partial H}{\partial x_{2,1}} & \frac{\partial G}{\partial x_{2,2}} H + G \frac{\partial H}{\partial x_{2,2}} & \cdots & \frac{\partial G}{\partial x_{2,N}} H + G \frac{\partial H}{\partial x_{2,N}} \\ \vdots &\vdots &\ddots& \vdots \\\frac{\partial G}{\partial x_{M,1}} H + G \frac{\partial H}{\partial x_{M,1}} &\frac{\partial G}{\partial x_{M,2}} H + G \frac{\partial H}{\partial x_{M,2}} & \cdots & \frac{\partial G}{\partial x_{M,N}} H + G \frac{\partial H}{\partial x_{M,N}}\end{bmatrix}

=[Gx 1,1H Gx 1,2H Gx 1,NH Gx 2,1H Gx 2,2H Gx 2,NH Gx M,1H Gx M,2H Gx M,NH]+[GHx 1,1 GHx 1,2 GHx 1,N GHx 2,1 GHx 2,2 GHx 2,N GHx M,1 GHx M,2 GHx M,N]=\begin{bmatrix}\frac{\partial G}{\partial x_{1,1}} H & \frac{\partial G}{\partial x_{1,2}} H & \cdots & \frac{\partial G}{\partial x_{1,N}} H \\ \frac{\partial G}{\partial x_{2,1}} H & \frac{\partial G}{\partial x_{2,2}} H & \cdots & \frac{\partial G}{\partial x_{2,N}} H \\\vdots &\vdots& \ddots& \vdots\\ \frac{\partial G}{\partial x_{M,1}} H & \frac{\partial G}{\partial x_{M,2}} H & \cdots & \frac{\partial G}{\partial x_{M,N}} H\end{bmatrix} + \begin{bmatrix}G \frac{\partial H}{\partial x_{1,1}} & G \frac{\partial H}{\partial x_{1,2}} & \cdots & G \frac{\partial H}{\partial x_{1,N}} \\ G \frac{\partial H}{\partial x_{2,1}} & G \frac{\partial H}{\partial x_{2,2}}& \cdots &G \frac{\partial H}{\partial x_{2,N}}\\ \vdots &\vdots &\ddots& \vdots\\ G \frac{\partial H}{\partial x_{M,1}} & G \frac{\partial H}{\partial x_{M,2}} & \cdots & G \frac{\partial H}{\partial x_{M,N}}\end{bmatrix}

=[Gx 1,1 Gx 1,2 Gx 1,N Gx 2,1 Gx 2,2 Gx 2,N Gx M,1 Gx M,2 Gx M,N][H H H]+[G G G][Hx 1,1 Hx 1,2 Hx 1,N Hx 2,1 Hx 2,2 Hx 2,N Hx M,1 Hx M,2 Hx M,N]= \begin{bmatrix}\frac{\partial G}{\partial x_{1,1}} & \frac{\partial G}{\partial x_{1,2}} &\cdots& \frac{\partial G}{\partial x_{1,N}} \\ \frac{\partial G}{\partial x_{2,1}} & \frac{\partial G}{\partial x_{2,2}}& \cdots& \frac{\partial G}{\partial x_{2,N}}\\ \vdots &\vdots& \ddots& \vdots\\ \frac{\partial G}{\partial x_{M,1}} & \frac{\partial G}{\partial x_{M,2}} &\cdots& \frac{\partial G}{\partial x_{M,N}}\end{bmatrix} \begin{bmatrix}H&&&\\&H&&\\&&\ddots&\\&&&H\end{bmatrix} + \begin{bmatrix}G&&&\\&G&&\\&&\ddots&\\&&&G\end{bmatrix} \begin{bmatrix}\frac{\partial H}{\partial x_{1,1}} & \frac{\partial H}{\partial x_{1,2}} &\cdots& \frac{\partial H}{\partial x_{1,N}} \\ \frac{\partial H}{\partial x_{2,1}} & \frac{\partial H}{\partial x_{2,2}} & \cdots & \frac{\partial H}{\partial x_{2,N}}\\ \vdots& \vdots& \ddots& \vdots\\ \frac{\partial H}{\partial x_{M,1}}& \frac{\partial H}{\partial x_{M,2}} & \cdots & \frac{\partial H}{\partial x_{M,N}}\end{bmatrix}

Given an integer aa and a matrix SS , define I a,S[S S S]I_{a,S} \equiv \begin{bmatrix}S&&&\\&S&&\\&&\ddots&\\&&&S\end{bmatrix} which has aa number of SS along the diagonal blocks and all other entry blocks are zero,

Then FX=GXI N,H+I M,GHX\frac{\partial F}{\partial X} = \frac{\partial G}{\partial X} I_{N,H} + I_{M,G} \frac{\partial H}{\partial X}

I a,SI_{a,S} is the extension of the concept of scalar product. Suppose a matrix of size M x N, right-multiply a scalar ss is actually right-multiply the matrix I N,sI_{N,s} and left-multiply a scalar ss is actually left-multiply the matrix I M,sI_{M,s} . Assuming matrix are all compatible size below, some facts about I a,SI_{a,S} :

I 1,S=S (I a,S) T=(I a,S) T I a,S 1S 2=I a,S 1I a,S 2 I a,S 1+S 2=I a,S 1+I a,S 2 I M,1=I M I M,I N=I MN\begin{aligned}I_{1,S}=S\\(I_{a,S})^T=(I_{a,S})^T\\I_{a,S_1 S_2}=I_{a,S_1}I_{a,S_2}\\I_{a,S_1 + S_2}=I_{a,S_1}+I_{a,S_2}\\I_{M,1}=I_M\\I_{M,I_N}=I_{M N}\end{aligned}

Let X[x 1 x M]X \equiv \begin{bmatrix}x_1\\ \vdots \\x_M\end{bmatrix}, Y[y 1 y M]Y \equiv \begin{bmatrix}y_1&\cdots&y_M\end{bmatrix}, e ie_i be the M x 1 matrix whose ii-th row is 1 and other entries are 0, w iw_i be the 1 x M matrix whose ii-th column is 1 and other entries are 0.

X TX=I M Y TY=I M XX=[e 1 e 2 e M] YY=[w 1 w M] I M,X T[e 1 e 2 e M]=X [w 1 w M]I M,X=X T\begin{aligned}\frac{\partial X^T}{\partial X} = I_M\\\frac{\partial Y^T}{\partial Y} = I_M\\\frac{\partial X}{\partial X} =\begin{bmatrix}e_1\\e_2\\\vdots\\e_M\end{bmatrix}\\\frac{\partial Y}{\partial Y} = \begin{bmatrix}w_1&\cdots&w_M\end{bmatrix}\\I_{M,X^T} \begin{bmatrix}e_1\\e_2\\\vdots\\e_M\end{bmatrix}= X\\\begin{bmatrix}w_1&\cdots&w_M\end{bmatrix} I_{M,X} =X^T\end{aligned}

Example

G=AXG = A X where AA is a constant 4 x 3 matrix and XX is 3 x 1 matrix of variables and therefore G TGG^T G is a real number function of XX and denoted by L(X)L(X) . Then the gradient of L(X)L(X) is defined as a 3 x 1 matrix L(X)LX\nabla L(X) \equiv \frac{\partial L}{\partial X} (while it is somewhere defined as the 1 x 3 matrix LX T\frac{\partial L}{\partial X^T})

L(X)=G TGX=X TA TAXX=X TA TAXI 1,X+I 3,X TA TAXX =(X TXI 1,A TA+0)X+I 3,X TA TAXX=A TAX+I 3,X TA TA[e 1 e 2 e 3]\begin{aligned}\nabla L(X) = \frac{\partial G^T G}{\partial X} = \frac{\partial X^T A^T A X}{\partial X} = \frac{\partial X^T A^T A}{\partial X} I_{1,X} + I_{3,X^T A^T A} \frac{\partial X}{\partial X} \\= \left(\frac{\partial X^T}{\partial X} I_{1,A^T A} + 0 \right) X + I_{3,X^T A^T A} \frac{\partial X}{\partial X} = A^T A X + I_{3,X^T A^T A} \begin{bmatrix}e_1\\e_2\\e_3\end{bmatrix}\end{aligned}

Let AA's 3 column vectors be C 1C_1 and C 2C_2 and C 3C_3 therefore A=[C 1 C 2 C 3]A = \begin{bmatrix}C_1&C_2&C_3\end{bmatrix} then A TAX+I 3,X TA TA[e 1 e 2 e 3]=[C 1 T C 2 T C 3 T][C 1 C 2 C 3]X+I 3,X TA TI 3,A[e 1 e 2 e 3] =[C 1 T C 2 T C 3 T][C 1 C 2 C 3]X+[X T[C 1 T C 2 T C 3 T]C 1 X T[C 1 T C 2 T C 3 T]C 2 X T[C 1 T C 2 T C 3 T]C 3]\begin{aligned}A^T A X + I_{3,X^T A^T A} \begin{bmatrix}e_1\\e_2\\e_3\end{bmatrix} = \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} \begin{bmatrix}C_1&C_2&C_3\end{bmatrix} X + I_{3,X^T A^T} I_{3,A} \begin{bmatrix}e_1\\e_2\\e_3\end{bmatrix} \\= \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} \begin{bmatrix}C_1&C_2&C_3\end{bmatrix} X + \begin{bmatrix}X^T \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} C_1\\X^T \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} C_2\\ X^T \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} C_3\end{bmatrix}\end{aligned}

Because   X T[C 1 T C 2 T C 3 T]C iX^T \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} C_i is a number, it is the same as its transpose C i T[C 1 C 2 C 3]XC_i^T \begin{bmatrix}C_1&C_2&C_3\end{bmatrix} X

therefore

[X T[C 1 T C 2 T C 3 T]C 1 X T[C 1 T C 2 T C 3 T]C 2 X T[C 1 T C 2 T C 3 T]C 3]=[C 1 T[C 1 C 2 C 3]X C 2 T[C 1 C 2 C 3]X C 3 T[C 1 C 2 C 3]X]=[C 1 T C 2 T C 3 T][C 1 C 2 C 3]X\begin{bmatrix}X^T \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} C_1\\X^T \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} C_2\\ X^T \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} C_3\end{bmatrix}=\begin{bmatrix}C_1^T \begin{bmatrix}C_1&C_2&C_3\end{bmatrix} X\\C_2^T \begin{bmatrix}C_1&C_2&C_3\end{bmatrix} X\\C_3^T \begin{bmatrix}C_1&C_2&C_3\end{bmatrix} X\end{bmatrix}= \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} \begin{bmatrix}C_1&C_2&C_3\end{bmatrix} X

Therefore the answer is 2[C 1 T C 2 T C 3 T][C 1 C 2 C 3]X=2A TAX2 \begin{bmatrix}C_1^T\\C_2^T\\C_3^T\end{bmatrix} \begin{bmatrix}C_1&C_2&C_3\end{bmatrix} X = 2 A^T A X

When the column vectors of AA are orthonormal, A TA=I 3A^T A = I_3 ,so A TAX+I 3,X TA TA[e 1 e 2 e 3]A^T A X + I_{3,X^T A^T A} \begin{bmatrix}e_1\\e_2\\e_3\end{bmatrix} becomes X+I 3,X T[e 1 e 2 e 3]=X+X=2XX + I_{3,X^T} \begin{bmatrix}e_1\\e_2\\e_3\end{bmatrix} = X + X = 2X . Actually LL is x 1 2+x 2 2+x 3 2x_1^2 + x_2^2 + x_3^2 and LX=[2x 1 2x 2 2x 3]\frac{\partial L}{\partial X} = \begin{bmatrix}2 x_1\\2 x_2\\2 x_3\end{bmatrix} followed by direct calculation of the definition.

Multiple variables integration

Let A(Y)A(Y) be a function of R MR^M to RR, Y=F(X)Y=F(X) be a R MR^M to R MR^M change of variables. Layout as

[y 1 y M]Y=F(X)[F 1(X) F M(X)]\begin{bmatrix}y_1\\\vdots\\y_M\end{bmatrix} \equiv Y = F(X) \equiv \begin{bmatrix}F_1(X)\\\vdots\\F_M(X)\end{bmatrix} and X[x 1 x M]X \equiv \begin{bmatrix}x_1&\cdots&x_M\end{bmatrix}

Then A(y 1,,y M)dy 1dy M=A(F 1(X),,F M(X)det(FX)dx 1dx M\int A(y_1,\cdots,y_M) d y_1\cdots d y_M = \int A(F_1(X),\cdots,F_M(X) det(\frac{\partial F}{\partial X})d x_1 \cdots d x_M

which is the typical change of variable of integration of one variable when MM is 1

Example. Calculate e y 1 2+y 2 22dy 1dy 2\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{- \frac{y_1^2 + y_2^2}{2}} d y_1 d y_2

Let [y 1 y 2]Y=F(X)[x 1cos(x 2) x 1sin(x 2)]\begin{bmatrix}y_1\\y_2\end{bmatrix} \equiv Y = F(X) \equiv \begin{bmatrix}x_1 \cos(x_2)\\x_1 \sin(x_2)\end{bmatrix}

and X[x 1 x 2]X \equiv \begin{bmatrix}x_1&x_2\end{bmatrix} . Then FX=[cos(x 2) x 1sin(x 2) sin(x 2) x 1cos(x 2)]\frac{\partial F}{\partial X} =\begin{bmatrix}\cos(x_2)&- x_1 \sin(x_2)\\\sin(x_2)&x_1 \cos(x_2)\end{bmatrix} and therefore det(FX)=x 1det(\frac{\partial F}{\partial X}) = x_1

e y 1 2+y 2 22dy 1dy 2= 0 2π 0 e x 1 22x 1dx 1dx 2=2π\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{- \frac{y_1^2 + y_2^2}{2}} d y_1 d y_2 = \int_0^{2 \pi} \int_0^{\infty} e^-{\frac{x_1^2}{2}} x_1 d x_1 d x_2 = 2\pi

As a consequence, e Z 22dZ=2π\int_{-\infty}^{\infty} e^{- \frac{Z^2}{2}} d Z = \sqrt{2 \pi} aka the density of standard normal random variable is 12πe Z 22\frac{1}{\sqrt{2 \pi}} e^{- \frac{Z^2}{2}}

Let ff be the probability density of NN random variables with zero mean. Layout X=[X 1 X N]X = \begin{bmatrix}X_1\\\vdots\\X_N\end{bmatrix}. Then its covariance matrix is XX Tf(X)dX 1dX 2dX N\int_{-\infty}^\infty X X^T f(X) d X_1 d X_2 \cdots d X_N aka E(XX T)E(X X^T) aka COV XCOV_X . Let Y=RXY = R X where RR is a NN x NN matrix. COV Y=E(YY T)=E(RXX TR T)=RE(XX T)R T=RCOV XR TCOV_Y = E(Y Y^T) = E(R X X^T R^T) = R E(X X^T) R^T = R COV_X R^T. So if with a series of kk row operations R=R kR 1R = R_k \cdots R_1 as well as the correspondent column operations R TR^T on COV XCOV_X leading to COV Y=ICOV_Y = I , with AR 1=R 1 1R k 1A \equiv R ^{- 1} = R_1^{-1} \cdots R_k^{-1} , then a change of variables of XX defined as X=AYX = A Y will have covariance matrix COV X=AA TCOV_X = A A^T because I=RCOV XR TI = R COV_X R^T . Any symmetric matrix can be operated with this row-column operations, once a diagonal entry is not positive, this symmetric matrix fails to be a legit covariance matrix. Also det(COV X)=det(A)det(A T)=det(A) 2det(COV_X) = det(A) det(A^T) = det(A)^2 so det(A)=det(COV X)det(A)= \sqrt{det(COV_X)}

Demonstrate the row-column operation on COV=[2 4 2 4 10 2 2 2 40]COV = \begin{bmatrix}2&4&-2\\4&10&2\\-2&2&40\end{bmatrix} . COVvsACOV vs A:

[2 4 2 4 10 2 2 2 40]vs[1 0 0 0 1 0 0 0 1] [2 0 0 0 2 6 0 6 38]vs[1 0 0 2 1 0 1 0 1] [2 0 0 0 2 0 0 0 20]vs[1 0 0 2 1 0 1 3 1] [1 0 0 0 2 0 0 0 20]vs[2 0 0 22 1 0 2 3 1] [1 0 0 0 1 0 0 0 20]vs[2 0 0 22 2 0 2 32 1] [1 0 0 0 1 0 0 0 1]vs[2 0 0 22 2 0 2 32 20]\begin{aligned}\begin{bmatrix}2&4&-2\\4&10&2\\-2&2&40\end{bmatrix} vs \begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}\\\begin{bmatrix}2&0&0\\0&2&6\\0&6&38\end{bmatrix} vs \begin{bmatrix}1&0&0\\2&1&0\\-1&0&1\end{bmatrix}\\\begin{bmatrix}2&0&0\\0&2&0\\0&0&20\end{bmatrix} vs \begin{bmatrix}1&0&0\\2&1&0\\-1&3&1\end{bmatrix}\\\begin{bmatrix}1&0&0\\0&2&0\\0&0&20\end{bmatrix} vs \begin{bmatrix}\sqrt{2}&0&0\\2\sqrt{2}&1&0\\-\sqrt{2}&3&1\end{bmatrix}\\\begin{bmatrix}1&0&0\\0&1&0\\0&0&20\end{bmatrix} vs \begin{bmatrix}\sqrt{2}&0&0\\2 \sqrt{2}&\sqrt{2}&0\\-\sqrt{2}&3\sqrt{2}&1\end{bmatrix}\\\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix} vs \begin{bmatrix}\sqrt{2}&0&0\\2 \sqrt{2}&\sqrt{2}&0\\-\sqrt{2}&3\sqrt{2}&\sqrt{20}\end{bmatrix}\end{aligned}

So the A=[2 0 0 22 2 0 2 32 20]A= \begin{bmatrix}\sqrt{2}&0&0\\2 \sqrt{2}&\sqrt{2}&0\\-\sqrt{2}&3\sqrt{2}&\sqrt{20}\end{bmatrix} and AA T=COVA A^T = COV:

[2 0 0 22 2 0 2 32 20][2 22 2 0 2 32 0 0 20]=[2 4 2 4 10 2 2 2 40]\begin{bmatrix}\sqrt{2}&0&0\\2 \sqrt{2}&\sqrt{2}&0\\-\sqrt{2}&3\sqrt{2}&\sqrt{20}\end{bmatrix} \begin{bmatrix}\sqrt{2}&2\sqrt{2}&-\sqrt{2}\\0&\sqrt{2}&3\sqrt{2}\\0&0&\sqrt{20}\end{bmatrix}= \begin{bmatrix}2&4&-2\\4&10&2\\-2&2&40\end{bmatrix}

Example. Calculate e 12X T(COV X) 1XdX 1dX 2dX N\int_{-\infty}^\infty e^{-\frac{1}{2}X^T (COV_X)^{-1} X} d X_1 d X_2\cdots d X_N

Let AA be a matrix such that COV X=AA TCOV_X = A A^T by the above procedures. Then a change of variables leads to

e 12X T(COV X) 1XdX 1dX 2dX N= e 12Y TA T(AA T) 1AYdet(A)dY 1dY 2dY N = e 12Y TYdet(A)dY 1dY 2dY N=(2π) Ndet(COV X)\begin{aligned}\int_{-\infty}^\infty e^{-\frac{1}{2}X^T (COV_X)^{-1} X} d X_1 d X_2\cdots d X_N = \int_{-\infty}^\infty e^{-\frac{1}{2} Y^T A^T (A A^T)^{-1} A Y} det(A) d Y_1 d Y_2 \cdots d Y_N \\= \int_{-\infty}^\infty e^{-\frac{1}{2} Y^T Y} det(A) d Y_1 d Y_2 \cdots d Y_N = \sqrt{(2 \pi)^N det(COV_X)}\end{aligned}

Meaning, let Y 1Y NY_1 \cdots Y_N be iid standard normal distribution, then random variables X=AYX = A Y will have COV X=AA TCOV_X = A A^T and its density is 1(2π) Ndet(COV X)e 12X T(COV X) 1X\frac{1}{\sqrt{(2 \pi)^N det(COV_X)}} e^{-\frac{1}{2} X^T (COV_X)^{-1} X} where AA can be found by the procedure mentioned above.