Matrix Calculus#


Revised

17 Jun 2023


.

scalar

vector

matrix

scalar

\(\begin{aligned}\frac{\partial y}{\partial x}\end{aligned}\)

\(\begin{aligned}\frac{\partial\mathbf{y}}{\partial x}\end{aligned}\)

\(\begin{aligned}\frac{\partial\mathbf{Y}}{\partial x}\end{aligned}\)

vector

\(\begin{aligned}\frac{\partial y}{\partial\mathbf{x}}\end{aligned}\)

\(\begin{aligned}\frac{\partial\mathbf{y}}{\partial\mathbf{x}}\end{aligned}\)

matrix

\(\begin{aligned}\frac{\partial y}{\partial\mathbf{X}}\end{aligned}\)

scalar-by-scalar#

derivative of a scalar (function) wrt a scalar

[example]

\( \begin{aligned} f(x) &= kx \\ \frac{df}{dx} &= k \end{aligned} \)

scalar-by-vector#

derivative of a scalar (function) wrt a vector

\( \begin{aligned} \mathbf{x} &= \begin{bmatrix} x_1 & \dots & x_n \end{bmatrix}^\top \\ \frac{\partial y}{\partial\mathbf{x}} &= \begin{bmatrix} \frac{\partial y}{\partial x_1} & \dots & \frac{\partial y}{\partial x_n} \end{bmatrix} \end{aligned} \)

An example from the vector calculus is the gradient vector.

\( \begin{aligned} \nabla f(\mathbf{x}) = \frac{\partial f}{\partial\mathbf{x}} = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \dots & \frac{\partial f}{\partial x_n} \end{bmatrix} \end{aligned} \)

vector-by-scalar#

derivative of a vector (function) wrt a scalar

\( \begin{aligned} \mathbf{y} &= \begin{bmatrix} y_1 & \dots & y_m \end{bmatrix}^\top \\ \frac{\partial\mathbf{y}}{\partial x} &= \begin{bmatrix} \frac{\partial y_1}{\partial x} \\ \dots \\ \frac{\partial y_m}{\partial x} \\ \end{bmatrix} \end{aligned} \)

An example in the vector calculus is the tangent vector.

vector-by-vector#

derivative of a vector (function) wrt a vector

taking the derivative of a linear transformation (not “taking the derivative of a matrix”)

\( \begin{aligned} \frac{\partial\mathbf{y}}{\partial\mathbf{x}} &= \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \dots & \frac{\partial y_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \dots & \frac{\partial y_m}{\partial x_n} \\ \end{bmatrix} \end{aligned} \)

Jacobian#

An example from the vector calculus is the Jacobian.

\(\begin{aligned}\frac{\partial}{\partial\mathbf{x}} \mathbf{Ax} = \mathbf{A}\end{aligned}\)#

[example]

\( \begin{aligned} \mathbf{f}(\mathbf{x}) = \mathbf{Ax} &= \begin{bmatrix}1&2\\3&4\\\end{bmatrix} \begin{bmatrix}x_1\\x_2\\\end{bmatrix} = \begin{bmatrix} x_1+2x_2 = f_1(\mathbf{x}) \\ 3x_1+4x_2 = f_2(\mathbf{x}) \\ \end{bmatrix} \\ \frac{\partial\mathbf{f}}{\partial\mathbf{x}} &= \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} \\ \end{bmatrix} = \begin{bmatrix}1&2\\3&4\\\end{bmatrix} = \mathbf{A} \end{aligned} \)

\(\begin{aligned}\frac{\partial}{\partial\mathbf{x}} \mathbf{x^\top Ax} = 2\mathbf{x^\top A^\top}\end{aligned}\)#

[example]

\( \begin{aligned} f(\mathbf{x}) = \mathbf{x^\top Ax} &= \begin{bmatrix}x_1&x_2\\\end{bmatrix} \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{bmatrix} \begin{bmatrix}x_1\\x_2\\\end{bmatrix} \\ &= \begin{bmatrix}x_1&x_2\\\end{bmatrix} \begin{bmatrix} a_{11}x_1 + a_{12}x_2 \\ a_{21}x_1 + a_{22}x_2 \\ \end{bmatrix} \\ &= \begin{bmatrix} x_1(a_{11}x_1 + a_{12}x_2) + x_2(a_{21}x_1 + a_{22}x_2) \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1^2 + a_{12}x_1x_2 + a_{21}x_1x_2 + a_{22}x_2^2 \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1^2 + 2ax_1x_2 + a_{22}x_2^2 = f_1(\mathbf{x}) \end{bmatrix} && a_{12} = a_{21} \, \text{for a symmetric matrix} \\ \frac{\partial\mathbf{f}}{\partial\mathbf{x}} &= \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} \\ \end{bmatrix} \\ &= \begin{bmatrix} 2a_{11}x_1 + 2ax_2 & 2ax_1 + 2a_{22}x_2 \\ \end{bmatrix} \\ &= 2 \begin{bmatrix} a_{11}x_1 + ax_2 & ax_1 + a_{22}x_2 \\ \end{bmatrix} \\ &= 2(\mathbf{Ax})^\top \\ &= 2\mathbf{x^\top A^\top} \end{aligned} \)

matrix-by-scalar#

derivative of a matrix (function) wrt a scalar

this is called the tangent matrix

\( \begin{aligned} \frac{\partial\mathbf{Y}}{\partial x} &= \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \dots & \frac{\partial y_{1n}}{\partial x} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_{m1}}{\partial x} & \dots & \frac{\partial y_{mn}}{\partial x} \\ \end{bmatrix} \end{aligned} \)

scalar-by-matrix#

derivative of a scalar (function) wrt a matrix

this is called the gradient matrix

important examples of scalar functions of matrices include the trace of a matrix and the determinant

\( \begin{aligned} \frac{\partial y}{\partial\mathbf{X}} &= \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \dots & \frac{\partial y}{\partial x_{p1}} \\ \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{1q}} & \dots & \frac{\partial y}{\partial x_{pq}} \\ \end{bmatrix} \end{aligned} \)


Resources#

  • [Y] ritvikmath. (09 Sep 2019). “Derivative of a Matrix : Data Science Basics”. YouTube.


Terms#

  • [W] Jacobian

  • [W] Matrix Calculus

  • [W] Tangent Vector