What is STLSQ?
Algorithm
STLSQ
Given a full rank matrix $X \in \mathbb{R}^{m \times p}$ and $Y \in \mathbb{R}^{m \times n}$ for the matrix equation $$ Y = X B $$ , let’s consider the problem of finding as sparse an $B \in \mathbb{R}^{p \times n}$ as possible in the sparse regression problem. STLSQ is a method used in a paper by Brunton, who proposed the SINDy algorithm, and as the name suggests, it consists of repeatedly applying the least squares method, treating each component whose absolute value drops below a given threshold $\lambda$ as $0$ and removing the corresponding column (independent variable).
Pseudocode
Commonly, the superscript enclosed in parentheses $\cdot ^{(k)}$ represents the $k$th, and $[p] = \left\{ 1 , \cdots , p \right\} \subset \mathbb{N}$ represents the set of natural numbers from $1$ to $p$. $X^{\dagger}$ is the pseudoinverse of matrix $X$, obtained through the least squares method. For the index set $S$, $\mathbf{x}_{S}$ is the vector with only the corresponding indices of vector $\mathbf{x}$, and $X_{S}$ is the matrix with only the corresponding columns of matrix $X$.
Algorithm: STLSQ | ||
---|---|---|
In | Threshold $\lambda \ge 0$ is given. | |
1. | Initialize the number of runs $k := 0$, $B^{(0)} := X^{\dagger} Y$, and $S^{(-1)} = \emptyset$. | |
2. | $S^{(0)} := \left\{ S_{j}^{(0)} = \left\{ i \in [p] : \left| \left( B^{(0)} \right)_{ij} \right| > \lambda \right\} : j \in [n] \right\}$ | # $i \in S_{j} \subset S$ means the same as $(B)_{ij} \ne 0$. |
3. | while $S^{(k)} \ne S^{(k-1)}$ do | |
4. | for $j \in [n]$ do | |
5. | $\widehat{X} = X_{S_{j}^{(k)}}$ | # Remove column $B_{ij} = 0$ |
6. | $\left( B^{(k+1)} \right)_{S_{j}^{(k)}, \left\{ j \right\}} = \widehat{X}^{\dagger} Y_{\left\{ j \right\}}$ | # Update one column at a time |
7. | $S_{j}^{(k+1)} = \left\{ i \in [p] : \left| \left( B^{(k+1)} \right)_{ij} \right| > \lambda \right\}$ | |
8. | end for | |
9. | $k = k+1$ | |
10. | end while | |
11. | For all $(i,j)$ that do not satisfy $i \in S_{j}^{(k)} \subset S^{(k)}$, apply $B_{ij}^{(k)} = 0$ | |
Out | Obtain the matrix $B^{(k)} = \argmin_{B} \left\| XB - Y \right\|$ whose components’ absolute values are greater than $\lambda$. |
Explanation
When STLSQ is specifically performed on the matrix $Y = \dot{X}$ made of the derivatives of the dynamical system $\dot{\mathbf{x}} = f \left( \mathbf{x} \right)$ and the library matrix $X = \Theta (X)$, we call it SINDy1. Ironically, STLSQ came out for SINDy, so to speak, but from the algorithm’s perspective, STLSQ can operate entirely on its own, unrelated to dynamical systems, so it is sometimes mentioned instead of ‘SINDy’.
As you can read in the pseudocode of the algorithm, STLSQ is a kind of greedy algorithm. Regardless of the fact that the convergence of the algorithm has been proven2, from the statistical aspect of regression analysis, frankly, it’s meaningless, but it’s undeniably a useful method as it’s easy to implement, converges very fast, and performs sparse regression, after all.
Code
Matlab
The following is the actual Matlab code used by Professor Brunton in his research3.
%% compute Sparse regression: sequential least squares
Xi = Theta\dXdt; % initial guess: Least-squares
% lambda is our sparsification knob.
for k=1:10
smallinds = (abs(Xi)<lambda); % find small coefficients
Xi(smallinds)=0; % and threshold
for ind = 1:n % n is state dimension
biginds = ˜smallinds(:,ind);
% Regress dynamics onto remaining terms to find sparse Xi
Xi(biginds,ind) = Theta(:,biginds)\dXdt(:,ind);
end
end
Python
This is the Python code found in Professor Brunton’s book4.
def sparsifyDynamics(Theta,dXdt,lamb,n):
# Initial guess: Least-squares
Xi = np.linalg.lstsq(Theta,dXdt,rcond=None)[0]
for k in range(10):
smallinds = np.abs(Xi) < lamb # Find small coeffs.
Xi[smallinds]=0 # and threshold
for ind in range(n): # n is state dimension
biginds = smallinds[:,ind] == 0
# Regress onto remaining terms to find sparse Xi
Xi[biginds,ind] = np.linalg.lstsq(Theta[:,
biginds],dXdt[:,ind],rcond=None)[0]
return Xi
Julia
This is my Julia code.
using SparseArrays
function STLSQ(Θ, dXdt, λ = 10^(-6))
if Θ isa AbstractSparseMatrix
Θ = Matrix(Θ)
end
Ξ = Θ \ dXdt
dim = size(Ξ, 2)
__🚫 = 0
while true
print(".")
🚫 = abs.(Ξ) .< λ
Ξ[🚫] .= 0
for j in 1:dim
i_ = .!🚫[:, j]
Ξ[i_, j] = Θ[:,i_] \ dXdt[:,j]
end
if __🚫 == 🚫 println("Stopped!"); break end # Early stopping
__🚫 = deepcopy(🚫)
end
println("MSE: ", sum(abs2, Θ * Ξ - dXdt) / length(dXdt))
return Ξ
end
Brunton. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems: https://doi.org/10.1073/pnas.1517384113 ↩︎
Zhang, L., & Schaeffer, H. (2019). On the convergence of the SINDy algorithm. Multiscale Modeling & Simulation, 17(3), 948-972. https://doi.org/10.1137/18M1189828 ↩︎
https://www.pnas.org/doi/suppl/10.1073/pnas.1517384113/suppl_file/pnas.1517384113.sapp.pdf ↩︎
Brunton. (2022). Data-Driven Science and Engineering : Machine Learning, Dynamical Systems, and Control(2nd Edition): p276. ↩︎