Matrix Calculus#
Revised
17 Jun 2023
. |
scalar |
vector |
matrix |
---|---|---|---|
scalar |
\(\begin{aligned}\frac{\partial y}{\partial x}\end{aligned}\) |
\(\begin{aligned}\frac{\partial\mathbf{y}}{\partial x}\end{aligned}\) |
\(\begin{aligned}\frac{\partial\mathbf{Y}}{\partial x}\end{aligned}\) |
vector |
\(\begin{aligned}\frac{\partial y}{\partial\mathbf{x}}\end{aligned}\) |
\(\begin{aligned}\frac{\partial\mathbf{y}}{\partial\mathbf{x}}\end{aligned}\) |
|
matrix |
\(\begin{aligned}\frac{\partial y}{\partial\mathbf{X}}\end{aligned}\) |
scalar-by-scalar#
derivative of a scalar (function) wrt a scalar
[example]
\( \begin{aligned} f(x) &= kx \\ \frac{df}{dx} &= k \end{aligned} \)
scalar-by-vector#
derivative of a scalar (function) wrt a vector
\( \begin{aligned} \mathbf{x} &= \begin{bmatrix} x_1 & \dots & x_n \end{bmatrix}^\top \\ \frac{\partial y}{\partial\mathbf{x}} &= \begin{bmatrix} \frac{\partial y}{\partial x_1} & \dots & \frac{\partial y}{\partial x_n} \end{bmatrix} \end{aligned} \)
An example from the vector calculus is the gradient vector.
\( \begin{aligned} \nabla f(\mathbf{x}) = \frac{\partial f}{\partial\mathbf{x}} = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \dots & \frac{\partial f}{\partial x_n} \end{bmatrix} \end{aligned} \)
vector-by-scalar#
derivative of a vector (function) wrt a scalar
\( \begin{aligned} \mathbf{y} &= \begin{bmatrix} y_1 & \dots & y_m \end{bmatrix}^\top \\ \frac{\partial\mathbf{y}}{\partial x} &= \begin{bmatrix} \frac{\partial y_1}{\partial x} \\ \dots \\ \frac{\partial y_m}{\partial x} \\ \end{bmatrix} \end{aligned} \)
An example in the vector calculus is the tangent vector.
vector-by-vector#
derivative of a vector (function) wrt a vector
taking the derivative of a linear transformation (not “taking the derivative of a matrix”)
\( \begin{aligned} \frac{\partial\mathbf{y}}{\partial\mathbf{x}} &= \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \dots & \frac{\partial y_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \dots & \frac{\partial y_m}{\partial x_n} \\ \end{bmatrix} \end{aligned} \)
Jacobian#
An example from the vector calculus is the Jacobian.
\(\begin{aligned}\frac{\partial}{\partial\mathbf{x}} \mathbf{Ax} = \mathbf{A}\end{aligned}\)#
[example]
\( \begin{aligned} \mathbf{f}(\mathbf{x}) = \mathbf{Ax} &= \begin{bmatrix}1&2\\3&4\\\end{bmatrix} \begin{bmatrix}x_1\\x_2\\\end{bmatrix} = \begin{bmatrix} x_1+2x_2 = f_1(\mathbf{x}) \\ 3x_1+4x_2 = f_2(\mathbf{x}) \\ \end{bmatrix} \\ \frac{\partial\mathbf{f}}{\partial\mathbf{x}} &= \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} \\ \end{bmatrix} = \begin{bmatrix}1&2\\3&4\\\end{bmatrix} = \mathbf{A} \end{aligned} \)
\(\begin{aligned}\frac{\partial}{\partial\mathbf{x}} \mathbf{x^\top Ax} = 2\mathbf{x^\top A^\top}\end{aligned}\)#
[example]
\( \begin{aligned} f(\mathbf{x}) = \mathbf{x^\top Ax} &= \begin{bmatrix}x_1&x_2\\\end{bmatrix} \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{bmatrix} \begin{bmatrix}x_1\\x_2\\\end{bmatrix} \\ &= \begin{bmatrix}x_1&x_2\\\end{bmatrix} \begin{bmatrix} a_{11}x_1 + a_{12}x_2 \\ a_{21}x_1 + a_{22}x_2 \\ \end{bmatrix} \\ &= \begin{bmatrix} x_1(a_{11}x_1 + a_{12}x_2) + x_2(a_{21}x_1 + a_{22}x_2) \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1^2 + a_{12}x_1x_2 + a_{21}x_1x_2 + a_{22}x_2^2 \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1^2 + 2ax_1x_2 + a_{22}x_2^2 = f_1(\mathbf{x}) \end{bmatrix} && a_{12} = a_{21} \, \text{for a symmetric matrix} \\ \frac{\partial\mathbf{f}}{\partial\mathbf{x}} &= \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} \\ \end{bmatrix} \\ &= \begin{bmatrix} 2a_{11}x_1 + 2ax_2 & 2ax_1 + 2a_{22}x_2 \\ \end{bmatrix} \\ &= 2 \begin{bmatrix} a_{11}x_1 + ax_2 & ax_1 + a_{22}x_2 \\ \end{bmatrix} \\ &= 2(\mathbf{Ax})^\top \\ &= 2\mathbf{x^\top A^\top} \end{aligned} \)
matrix-by-scalar#
derivative of a matrix (function) wrt a scalar
this is called the tangent matrix
\( \begin{aligned} \frac{\partial\mathbf{Y}}{\partial x} &= \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \dots & \frac{\partial y_{1n}}{\partial x} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_{m1}}{\partial x} & \dots & \frac{\partial y_{mn}}{\partial x} \\ \end{bmatrix} \end{aligned} \)
scalar-by-matrix#
derivative of a scalar (function) wrt a matrix
this is called the gradient matrix
important examples of scalar functions of matrices include the trace of a matrix and the determinant
\( \begin{aligned} \frac{\partial y}{\partial\mathbf{X}} &= \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \dots & \frac{\partial y}{\partial x_{p1}} \\ \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{1q}} & \dots & \frac{\partial y}{\partial x_{pq}} \\ \end{bmatrix} \end{aligned} \)
Resources#
[Y] ritvikmath. (09 Sep 2019). “Derivative of a Matrix : Data Science Basics”. YouTube.