Direct Sum of Matrices
📂Matrix Algebra Direct Sum of Matrices Definition The direct sum of two matrices B ∈ M m × n B \in M_{m\times n} B ∈ M m × n , C ∈ M p × q C \in M_{p\times q} C ∈ M p × q is defined as matrix A A A of the following ( m + p ) × ( n + q ) (m+p) \times (n+q) ( m + p ) × ( n + q ) , and is denoted by B ⊕ C B \oplus C B ⊕ C .
A = B ⊕ C : = [ b 11 ⋯ b 1 n 0 ⋯ 0 ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ b m 1 ⋯ b m n 0 ⋯ 0 0 ⋯ 0 c 11 ⋯ c 1 q ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 0 ⋯ 0 c p 1 ⋯ c p q ]
A = B \oplus C := \begin{bmatrix}
b_{11} & \cdots & b_{1n} & 0 & \cdots & 0 \\
\vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\
b_{m1} & \cdots & b_{mn} & 0 & \cdots & 0 \\
0 & \cdots & 0 & c_{11} & \cdots & c_{1q} \\
\vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\
0 & \cdots & 0 & c_{p1} & \cdots & c_{pq} \\
\end{bmatrix}
A = B ⊕ C := b 11 ⋮ b m 1 0 ⋮ 0 ⋯ ⋱ ⋯ ⋯ ⋱ ⋯ b 1 n ⋮ b mn 0 ⋮ 0 0 ⋮ 0 c 11 ⋮ c p 1 ⋯ ⋱ ⋯ ⋯ ⋱ ⋯ 0 ⋮ 0 c 1 q ⋮ c pq
A i j : = { [ B ] i j for 1 ≤ i ≤ m , 1 ≤ j ≤ n [ C ] ( i − m ) , ( j − n ) for m + 1 ≤ i ≤ p + m , n + 1 ≤ j ≤ q + n 0 otherwise
A_{ij} := \begin{cases}
[B]_{ij} & \text{for } 1\le i \le m,\ 1\le j \le n \\
[C]_{(i-m),(j-n)} & \text{for } m+1\le i \le p+m,\ n+1\le j \le q+n \\
0 & \text{otherwise}
\end{cases}
A ij := ⎩ ⎨ ⎧ [ B ] ij [ C ] ( i − m ) , ( j − n ) 0 for 1 ≤ i ≤ m , 1 ≤ j ≤ n for m + 1 ≤ i ≤ p + m , n + 1 ≤ j ≤ q + n otherwise
If expressed in block matrix form,
A = [ B O m q O p n C ]
A = \begin{bmatrix}
B & O_{mq} \\ O_{pn} & C
\end{bmatrix}
A = [ B O p n O m q C ]
In this case, O O O is the zero matrix .
Generalization The direct sum of matrix B 1 , B 2 , … , B k B_{1}, B_{2}, \dots, B_{k} B 1 , B 2 , … , B k is recursively defined as follows.
B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k : = ( B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k − 1 ) ⊕ B k
B_{1} \oplus B_{2} \oplus \cdots \oplus B_{k} := (B_{1} \oplus B_{2} \oplus \cdots \oplus B_{k-1}) \oplus B_{k}
B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k := ( B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k − 1 ) ⊕ B k
If A = B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k A = B_{1} \oplus B_{2} \oplus \cdots \oplus B_{k} A = B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k ,
A = [ B 1 O ⋯ O O B 2 ⋯ O ⋮ ⋮ ⋱ ⋮ O O ⋯ B k ]
A = \begin{bmatrix}
B_{1} & O & \cdots & O \\
O & B_{2} & \cdots & O \\
\vdots & \vdots & \ddots & \vdots \\
O & O & \cdots & B_{k} \\
\end{bmatrix}
A = B 1 O ⋮ O O B 2 ⋮ O ⋯ ⋯ ⋱ ⋯ O O ⋮ B k
Explanation Simply put, it’s about making a block diagonal matrix with matrices.
B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k = diag [ B 1 B 2 ⋮ B k ]
B_{1} \oplus B_{2} \oplus \cdots \oplus B_{k} = \href{../2048}{\diag} \begin{bmatrix}
B_{1} \\ B_{2} \\ \vdots \\ B_{k}
\end{bmatrix}
B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k = diag B 1 B 2 ⋮ B k
For a concrete example, if B 1 = [ 1 1 1 1 1 1 ] B_{1} = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} B 1 = [ 1 1 1 1 1 1 ] , B 2 = [ 2 ] B_{2} = \begin{bmatrix} 2 \end{bmatrix} B 2 = [ 2 ] , and B 3 = [ 3 3 3 3 3 3 3 3 3 ] B_{3} = \begin{bmatrix} 3 & 3 & 3 \\ 3 & 3 & 3 \\ 3 & 3 & 3 \end{bmatrix} B 3 = 3 3 3 3 3 3 3 3 3 ,
B 1 ⊕ B 2 ⊕ B 3 = [ 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 3 3 0 0 0 0 3 3 3 0 0 0 0 3 3 3 ]
B_{1} \oplus B_{2} \oplus B_{3} =
\begin{bmatrix}
1 & 1 & 1 & 0 & 0 & 0 & 0 \\
1 & 1 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 2 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 3 & 3 & 3 \\
0 & 0 & 0 & 0 & 3 & 3 & 3 \\
0 & 0 & 0 & 0 & 3 & 3 & 3
\end{bmatrix}
B 1 ⊕ B 2 ⊕ B 3 = 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 3 3 0 0 0 3 3 3 0 0 0 3 3 3
In many cases, one may encounter direct sum of subspaces before the direct sum of matrices, but the theorem below is sufficient to understand why such a definition is called a direct sum. When a linear transformation T : V → V T : V \to V T : V → V is given, if V = W 1 ⊕ ⋯ ⊕ W k V = W_{1} \oplus \cdots \oplus W_{k} V = W 1 ⊕ ⋯ ⊕ W k , the matrix representation of T T T appears as the direct sum of the matrix representations of projections T ∣ W i T|_{W_{i}} T ∣ W i , therefore, there’s no reason not to call this operation a direct sum.
Theorem Let T : V → V T : V \to V T : V → V be a linear transformation on the finite-dimensional vector space V V V . Let W 1 , … , W k W_{1}, \dots, W_{k} W 1 , … , W k be an T T T -invariant subspace , and V V V be the direct sum of W i W_{i} W i .
V = W 1 ⊕ ⋯ ⊕ W k
V = W_{1} \oplus \cdots \oplus W_{k}
V = W 1 ⊕ ⋯ ⊕ W k
Let β i \beta_{i} β i be the ordered basis of W i W_{i} W i , and β = β 1 ∪ ⋯ ∪ β k \beta = \beta_{1} \cup \cdots \cup \beta_{k} β = β 1 ∪ ⋯ ∪ β k (then β \beta β is a basis for V V V ). And if A = [ T ] β A = \begin{bmatrix} T \end{bmatrix}_{\beta} A = [ T ] β , B i = [ T ∣ W ] β i B_{i} = \begin{bmatrix} T|_{W}\end{bmatrix}_{\beta_{i}} B i = [ T ∣ W ] β i , then the following holds.
A = B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k = [ B 1 O ⋯ O O B 2 ⋯ O ⋮ ⋮ ⋱ ⋮ O O ⋯ B k ]
A = B_{1} \oplus B_{2} \oplus \cdots \oplus B_{k} =
\begin{bmatrix}
B_{1} & O & \cdots & O \\
O & B_{2} & \cdots & O \\
\vdots & \vdots & \ddots & \vdots \\
O & O & \cdots & B_{k} \\
\end{bmatrix}
A = B 1 ⊕ B 2 ⊕ ⋯ ⊕ B k = B 1 O ⋮ O O B 2 ⋮ O ⋯ ⋯ ⋱ ⋯ O O ⋮ B k
Proof The proof is by mathematical induction.
It holds when k = 2 k=2 k = 2 .
Let’s say v ∈ β 1 \mathbf{v} \in \beta_{1} v ∈ β 1 . Since β \beta β is a basis for V V V , T v ∈ V T \mathbf{v} \in V T v ∈ V is expressed as a linear combination of β \beta β . But since W 1 W_{1} W 1 is an invariant subspace, T v ∈ W 1 T \mathbf{v} \in W_{1} T v ∈ W 1 holds. Therefore, in the linear combination for T v T \mathbf{v} T v , the coefficients of the elements of β 2 \beta_{2} β 2 are all 0 0 0 . This means, when n = dim ( W 1 ) n = \dim(W_{1}) n = dim ( W 1 ) , the components of the coordinate vector [ T v ] β \begin{bmatrix} T \mathbf{v} \end{bmatrix}_{\beta} [ T v ] β are all 0 0 0 from the n + 1 n+1 n + 1 nd position onwards. Therefore,
[ T ∣ W 1 v ] β 1 = [ b 1 ⋮ b n ] and [ T v ] β = [ b 1 ⋮ b n 0 ⋮ 0 ]
\begin{bmatrix} T|_{W_{1}}\mathbf{v}\end{bmatrix}_{\beta_{1}} = \begin{bmatrix} b_{1} \\ \vdots \\ b_{n} \end{bmatrix} \quad \text{and} \quad \begin{bmatrix} T \mathbf{v} \end{bmatrix}_{\beta} = \begin{bmatrix} b_{1} \\ \vdots \\ b_{n} \\ 0 \\ \vdots \\ 0 \end{bmatrix}
[ T ∣ W 1 v ] β 1 = b 1 ⋮ b n and [ T v ] β = b 1 ⋮ b n 0 ⋮ 0
Similarly, if v ∈ β 2 \mathbf{v} \in \beta_{2} v ∈ β 2 , m = dim ( W 2 ) m = \dim(W_{2}) m = dim ( W 2 ) , then T v ∈ W 2 T \mathbf{v} \in W_{2} T v ∈ W 2 applies and the coordinate vector is as follows.
[ T ∣ W 2 v ] β 2 = [ b n + 1 ⋮ b n + m ] and [ T v ] β = [ 0 ⋮ 0 b n + 1 ⋮ b n + m ]
\begin{bmatrix} T|_{W_{2}}\mathbf{v}\end{bmatrix}_{\beta_{2}} = \begin{bmatrix} b_{n+1} \\ \vdots \\ b_{n+m} \end{bmatrix} \quad \text{and} \quad \begin{bmatrix} T \mathbf{v} \end{bmatrix}_{\beta} = \begin{bmatrix} 0 \\ \vdots \\ 0 \\ b_{n+1} \\ \vdots \\ b_{n+m} \end{bmatrix}
[ T ∣ W 2 v ] β 2 = b n + 1 ⋮ b n + m and [ T v ] β = 0 ⋮ 0 b n + 1 ⋮ b n + m
Therefore,
[ T ] β = [ [ T ∣ W 1 ] β 1 O O [ T ∣ W 2 ] β 2 ]
\begin{bmatrix} T \end{bmatrix}_{\beta} = \begin{bmatrix}
\begin{bmatrix} T|_{W_{1}}\end{bmatrix}_{\beta_{1}} & O \\ O & \begin{bmatrix} T|_{W_{2}}\end{bmatrix}_{\beta_{2}}
\end{bmatrix}
[ T ] β = [ [ T ∣ W 1 ] β 1 O O [ T ∣ W 2 ] β 2 ]
If it holds when k − 1 k-1 k − 1 , it also holds when k k k .
Let’s say W = W 1 ⊕ ⋯ ⊕ W k − 1 W = W_{1} \oplus \cdots \oplus W_{k-1} W = W 1 ⊕ ⋯ ⊕ W k − 1 , β W = β 1 ∪ ⋯ ∪ β k − 1 \beta_{W} = \beta_{1} \cup \cdots \cup \beta_{k-1} β W = β 1 ∪ ⋯ ∪ β k − 1 . Assuming it holds when k − 1 k-1 k − 1 ,
[ T ∣ W ] β W = [ [ T ∣ W 1 ] β 1 ⋯ O ⋮ ⋱ ⋮ O ⋯ [ T ∣ W k − 1 ] β k − 1 ]
\begin{bmatrix} T|_{W} \end{bmatrix}_{\beta_{W}} =
\begin{bmatrix}
\begin{bmatrix} T|_{W_{1}}\end{bmatrix}_{\beta_{1}} & \cdots & O \\ \vdots & \ddots & \vdots \\ O &\cdots & \begin{bmatrix} T|_{W_{k-1}}\end{bmatrix}_{\beta_{k-1}}
\end{bmatrix}
[ T ∣ W ] β W = [ T ∣ W 1 ] β 1 ⋮ O ⋯ ⋱ ⋯ O ⋮ [ T ∣ W k − 1 ] β k − 1
But since V = W ⊕ W k V = W \oplus W_{k} V = W ⊕ W k , β = β W ∪ β k \beta = \beta_{W} \cup \beta_{k} β = β W ∪ β k and it holds when k = 2 k=2 k = 2 ,
[ T ] β = [ [ T ∣ W ] β W O O [ T ∣ W k ] β k ] = [ [ T ∣ W 1 ] β 1 ⋯ O O ⋮ ⋱ ⋮ ⋮ O ⋯ [ T ∣ W k − 1 ] β k − 1 O O ⋯ O [ T ∣ W k ] β k ]
\begin{bmatrix} T \end{bmatrix}_{\beta} = \begin{bmatrix}
\begin{bmatrix} T|_{W}\end{bmatrix}_{\beta_{W}} & O \\ O & \begin{bmatrix} T|_{W_{k}}\end{bmatrix}_{\beta_{k}}
\end{bmatrix} =
\begin{bmatrix}
\begin{bmatrix}
T|_{W_{1}}\end{bmatrix}_{\beta_{1}} & \cdots & O & O \\
\vdots & \ddots & \vdots & \vdots \\
O & \cdots & \begin{bmatrix} T|_{W_{k-1}}\end{bmatrix}_{\beta_{k-1}} & O \\
O & \cdots & O & \begin{bmatrix} T|_{W_{k}}\end{bmatrix}_{\beta_{k}} \\
\end{bmatrix}
[ T ] β = [ [ T ∣ W ] β W O O [ T ∣ W k ] β k ] = [ T ∣ W 1 ] β 1 ⋮ O O ⋯ ⋱ ⋯ ⋯ O ⋮ [ T ∣ W k − 1 ] β k − 1 O O ⋮ O [ T ∣ W k ] β k
■