Parsiad Azimzadeh

A pedagogical introduction to the determinant

Motivation

The determinant of a matrix is typically introduced in an undergraduate linear algebra course via either the Leibniz Formula or a recurrence relation arising from the Leibniz Formula. Pedagogically, it is better to introduce the determinant as a mapping which satisfies some desirable properties and only then show that it is equivalent to the Leibniz Formula. This short expository post attempts to do just that.

Determinant

A determinant function is a mapping from the square complex matrices to complex numbers satisfying the following properties:

  1. If the matrix is obtained by swapping two rows of , then .
  2. If the matrix is obtained by multiplying a single row of by a constant , then .
  3. If the matrix is obtained by adding a multiple of a row of to another (not the same) row, then .
  4. .

The first three points above correspond to the three elementary row operations.

Proposition. Let be a determinant function and be a square complex matrix whose rows are linearly dependent. Then, .

Proof. In this case, we can perform a sequence of elementary row operations (excluding multiplying a row by ) that result in a row consisting of only zeros. The result then follows by property (2).

Indeed, by performing elimination to reduce the matrix into either the identity or a matrix with at least one row of zeros, we can unambiguously define a determinant function (note that we have not yet proven that such a function is unique). The code below does just that, proving the existence of a determinant function. For now, we refer to this as the canonical determinant.

Remark. The code below operates on floating point numbers. The definition of the canonical determinant should be understood to be the “algebraic” version of this code that runs without deference to floating point error.

View code
def det(mat: np.ndarray) -> float:
    """Computes a determinant.

    This algorithm works by eliminating the strict lower triangular part of the
    matrix and then eliminating the strict upper triangular part of the matrix.
    This elimination is done using row operations, while keeping track of any
    swaps that may change the sign parity of the determinant.

    If you are already familiar with the determinant, you will note that
    eliminating the strict upper triangular part is not necessary. Even if this
    algorithm was optimized to remove that step, this is still not a performant
    way to compute determinants!

    Parameters
    ----------
    mat
        A matrix

    Returns
    -------
    Determinant
    """
    m, n = mat.shape
    assert m == n

    mat = mat.copy()

    sign = 1
    for _ in range(2):
        for j in range(n):
            # Find pivot element
            p = -1
            for i in range(j, n):
                if not np.isclose(mat[i, j], 0.0):
                    p = i
                    break 
            if p < 0:
                continue
            
            # Swap
            if j != p:
                r = mat[p].copy()
                mat[p] = mat[j]
                mat[j] = r
                sign *= -1

            # Eliminate
            for i in range(j + 1, n):
                if not np.isclose(mat[i, j], 0.0):
                    mat[i] -= mat[p] * mat[i, j] / mat[p, j]

        mat = mat.T

    return float(sign) * np.diag(mat).prod().item()

Alternating multilinear maps

Notation. For a set , we write to denote an element of A𝑛.

Definition (Alternating multilinear map). Let A and B be vector spaces. An alternating multilinear map is a multilinear map 𝑓 :A𝑛 B that satisfies 𝑓(𝐴) =0 whenever 𝑎𝑖 =𝑎𝑖+1 for some 𝑖 <𝑛.

Notation. Let 𝜎 be a permutation of {1, …, n}. Since 𝐴 in A𝑛 can be thought of as a function from {1, …, n} to A, we write 𝐴 𝜎 (𝑎𝜎(1),,𝑎𝜎(𝑛)) to denote a permutation of the elements of 𝐴.

Proposition (Transposition parity). Let 𝑓 be an alternating multilinear map. Let 𝜎 be a transposition (a permutation which swaps two elements). Then, 𝑓(𝐴) =𝑓(𝐴 𝜎).

Proof. Let 𝑖 <𝑗 denote the swapped indices in the transposition. Fix 𝐴 and let

𝑔(𝑥,𝑦)𝑓(𝑎1,,𝑎𝑖1,𝑥,𝑎𝑖+1,,𝑎𝑗1,𝑦,𝑎𝑗+1,,𝑎𝑛).

It follows that

𝑔(𝑥,𝑦)+𝑔(𝑦,𝑥)=𝑔(𝑥,𝑦)+𝑔(𝑦,𝑦)+𝑔(𝑦,𝑥)+𝑔(𝑥,𝑥)=𝑔(𝑥+𝑦,𝑦)+𝑔(𝑥+𝑦,𝑥)=𝑔(𝑥+𝑦,𝑥+𝑦)=0

and hence 𝑔(𝑥,𝑦) =𝑔(𝑦,𝑥), as desired.

Corollary. Let 𝑓 be an alternating multilinear map. Then 𝑓(𝐴) =0 whenever 𝑎𝑖 =𝑎𝑗 for some (𝑖,𝑗) with 𝑖 <𝑗.

Proof. Let 𝜎 be the transposition which swaps indices 𝑖 +1 and 𝑗. Then, 𝑓(𝐴) =𝑓(𝐴 𝜎) =0.

Corollary. Let 𝑓 be an alternating multilinear map and 𝜎 be a permutation. Then, 𝑓(𝐴) =sgn(𝜎)𝑓(𝐴 𝜎) where sgn(𝜎) is the parity of the permutation.

Proof. The result follows from the fact that a permutation can be written as a composition of transpositions.

Proposition. A multilinear map 𝑓 :A𝑛 B is alternating multilinear if and only if 𝑓(𝐴) =0 whenever 𝑎1,,𝑎𝑛 are linearly dependent.

Proof. Suppose the map is alternating multilinear. Let 𝑎1,,𝑎𝑛 be linearly dependent so that, without loss of generality, 𝑎1 =𝑖>1𝛼𝑖𝑎𝑖. By linearity,

𝑓(𝐴)=𝑖>1𝛼𝑖𝑓(𝑎𝑖,𝑎2,,𝑎𝑛)=0.

The converse is trivial.

The Leibniz formula

Notation. If A =𝑛, then A𝑛 is isomorphic to the set of 𝑛 ×𝑛 complex matrices. In light of this, an element in A𝑛 can be considered as a matrix 𝐴 (𝑎𝑖𝑗) or as a tuple 𝐴 (𝑎1,,𝑎𝑛) consisting of the rows of said matrix.

Proposition (Uniqueness). Let 𝑓 :(𝑛)𝑛 be an alternating multilinear map such that 𝑓(𝐼) =1. Then,

𝑓(𝐴)=𝜎𝑆𝑛sgn(𝜎)𝑎1𝜎(1)𝑎𝑛𝜎(𝑛).(Leibniz Formula)

where 𝑆𝑛 is the set of all permutations on {1, …, n}.

Proof. First, note that

𝑓(𝐴)=𝑓(𝑗𝑎1𝑗𝑒𝑗,,𝑗𝑎𝑛𝑗𝑒𝑗)=1𝑗1,,𝑗𝑛𝑛𝑎1𝑗1𝑎𝑛𝑗𝑛𝑓(𝑒𝑗1,,𝑒𝑗𝑛).

Since 𝑓 is alternating multilinear and hence equal to zero whenever any of its two inputs are equal, we can restrict our attention to the permutations:

𝑓(𝐴)=𝜎𝑆𝑛𝑎1𝜎(1)𝑎𝑛𝜎(𝑛)𝑓(𝑒𝜎(1),,𝑒𝜎(𝑛)).

Since 𝑓 is alternating multilinear, we can change the order of its inputs so long as we count the number of transpositions and use that to account for a possible sign-change:

𝑓(𝐴)=𝜎𝑆𝑛sgn(𝜎)𝑎1𝜎(1)𝑎𝑛𝜎(𝑛)𝑓(𝐼).

Using the assumption 𝑓(𝐼) =1, the desired result follows.

Remark. sgn(𝜎) is sometimes represented as 𝜖𝑖1𝑖𝑛 where 𝑖𝑗 =𝜎(𝑗). This is called the Levi-Civita symbol. Using this symbol and Einstein notation, the Leibniz Formula becomes

𝜖𝑖1𝑖𝑛𝑎1𝑖1𝑎𝑛𝑖𝑛.

Proposition. A determinant function is multilinear.

Proof. Let 𝐴 be a square complex matrix and be a vector. It is sufficient to show that

det𝐴+det(,𝑎2,,𝑎𝑛)=det(𝑎1+,𝑎2,,𝑎𝑛).

Suppose the rows of 𝐴 are linearly dependent. Without loss of generality, write 𝑎1 =𝑖>1𝛼𝑖𝑎𝑖 and =𝑏 +𝑖>1𝛽𝑖𝑎𝑖 where 𝑏 is orthogonal to the 𝑎𝑖. Then, det𝐴 =0. Moreover,

det(,𝑎2,,𝑎𝑛)=det(𝑏+𝑖>1𝛽𝑖𝑎𝑖,𝑎2,,𝑎𝑛)=det(𝑏,𝑎2,,𝑎𝑛)

and

det(𝑎1+,𝑎2,,𝑎𝑛)=det(𝑏+𝑖>1(𝛼𝑖+𝛽𝑖)𝑎𝑖,𝑎2,,𝑎𝑛)=det(𝑏,𝑎2,,𝑎𝑛),

as desired.

Suppose the rows of 𝐴 are linearly independent. It follows that we can write =𝑖𝛽𝑖𝑎𝑖. Then,

det𝐴+det(,𝑎2,,𝑎𝑛)=det𝐴+det(𝑖𝛽𝑖𝑎𝑖,𝑎2,,𝑎𝑛)=det𝐴+det(𝛽1𝑎1,𝑎2,,𝑎𝑛)=det((1+𝛽1)𝑎1,𝑎2,,𝑎𝑛)=det(𝑎1+𝑖𝛽𝑖𝑎𝑖,𝑎2,,𝑎𝑛)=det(𝑎1+,𝑎2,,𝑎𝑛).

Corollary. A determinant function is an alternating multilinear map.

Corollary. There is only one determinant function and it is given by the Leibniz Formula.

Determinant properties

We can now use the Leibniz Formula to derive various properties of the determinant. The following results are concerned with complex matrices 𝐴 (𝑎𝑖𝑗) and 𝐵 (𝑏𝑖𝑗).

Proposition. det𝐴 =det𝐴.

Proof.

det𝐴=𝜎sgn(𝜎)𝑖𝑎𝑖𝜎(𝑖)=𝜎sgn(𝜎)𝑖𝑎𝜎1(𝑖)𝜎(𝜎1(𝑖))=𝜎sgn(𝜎1)𝑖𝑎𝜎1(𝑖)𝑖=det𝐴.

Notation. For a matrix 𝐴, let 𝐴(𝑖,𝑗) be the same matrix after the simultaneous removal of its 𝑖-th row and 𝑗-th column.

Lemma.

det𝐴=𝑗(1)𝑗1𝑎1𝑗det𝐴(1,𝑗)

Proof. We demonstrate the idea for a 3 ×3 matrix; the generalization is straight-forward.

Using multilinearity,

det𝑎11𝑎12𝑎13𝑎21𝑎22𝑎23𝑎31𝑎32𝑎33=𝑎11det100𝑎21𝑎22𝑎23𝑎31𝑎32𝑎33+𝑎12det010𝑎21𝑎22𝑎23𝑎31𝑎32𝑎33+𝑎13det001𝑎21𝑎22𝑎23𝑎31𝑎32𝑎33=𝑎11det100𝑎21𝑎22𝑎23𝑎31𝑎32𝑎33𝑎12det100𝑎22𝑎21𝑎23𝑎32𝑎31𝑎33+𝑎13det100𝑎23𝑎21𝑎22𝑎33𝑎31𝑎32

Moreover, by the Leibniz Formula,

det100𝑎21𝑎22𝑎23𝑎31𝑎32𝑎33=𝜎sgn(𝜎)𝑎1𝜎(1)𝑎2𝜎(2)𝑎3𝜎(3)=𝜎:𝜎(1)=1sgn(𝜎)𝑎2𝜎(2)𝑎3𝜎(3)=det(𝑎22𝑎23𝑎32𝑎33).

The remaining terms are handled similarly.

Proposition (Cofactor expansion). For any 𝑖 between 1 and 𝑛 (inclusive),

det𝐴=𝑗(1)𝑖+𝑗𝑎𝑖𝑗det𝐴(𝑖,𝑗)

Proof. Recalling that the determinant flips signs when any two rows are swapped, we can perform a sequence of 𝑖 1 transpositions to move 𝑎𝑖, the 𝑖-th row of the matrix, to the “top” and apply the previous lemma:

(1)𝑖1det𝐴=det𝑎𝑖𝑎1𝑎2𝑎𝑖1𝑎𝑖+1𝑎𝑛.

Corollary. If 𝐴 is either lower or upper triangular, det𝐴 =𝑖𝑎𝑖𝑖.

Proof. First, note that it is sufficient to consider the lower triangular case since the transpose of an upper triangular matrix is lower triangular. The result then follows from performing a cofactor expansion along the first row inductively.

Proposition.

det(𝐴𝐵)=det𝐴det𝐵

Proof. If either 𝐴 or 𝐵 are singular, the claim is trivial since both sides are zero. Therefore, proceed assuming 𝐴 and 𝐵 are nonsingular.

As with the construction of the canonical determinant, we can write

𝐼=𝐸𝑘𝐸1𝐴

where 𝐸1,,𝐸𝑘 are a sequence of elementary row operations. It is easy to see that elementary row operations are nonsingular and their inverses are themselves elementary row operations. Therefore, 𝐴 can be written as a product of elementary row operations. To arrive at the desired result, it is sufficient to show that for any sequence of row operations 𝐸1,,𝐸𝑘 there exists a constant 𝛼 such that for any matrix 𝑀

det(𝐸1𝐸𝑘𝑀)=𝛼det𝑀.

Corollary. The determinant of an 𝑛 ×𝑛 complex matrix is the product of its 𝑛 (possibly non-unique) eigenvalues.

Proof. Let 𝐴 be an 𝑛 ×𝑛 complex matrix and denote by 𝐴 =𝑃1𝐽𝑃 its Jordan normal form. Since the matrix 𝐽 has the eigenvalues 𝜆1,,𝜆𝑛 of 𝐴 on its diagonal and is upper triangular,

det𝐴=det𝑃1det𝐽det𝑃=det𝐽=𝑛𝑖=1𝜆𝑖.