What is STLSQ?

Algorithm

STLSQ

Given a full rank matrix $X \in \mathbb{R}^{m \times p}$ and $Y \in \mathbb{R}^{m \times n}$ for the matrix equation $Y = X B$ , let’s consider the problem of finding as sparse an $B \in \mathbb{R}^{p \times n}$ as possible in the sparse regression problem. STLSQ is a method used in a paper by Brunton, who proposed the SINDy algorithm, and as the name suggests, it consists of repeatedly applying the least squares method, treating each component whose absolute value drops below a given threshold $\lambda$ as $0$ and removing the corresponding column (independent variable).

Pseudocode

Commonly, the superscript enclosed in parentheses $\cdot ^{(k)}$ represents the $k$ th, and $[p] = \left\{ 1 , \cdots , p \right\} \subset \mathbb{N}$ represents the set of natural numbers from $1$ to $p$ . $X^{\dagger}$ is the pseudoinverse of matrix $X$ , obtained through the least squares method. For the index set $S$ , $\mathbf{x}_{S}$ is the vector with only the corresponding indices of vector $\mathbf{x}$ , and $X_{S}$ is the matrix with only the corresponding columns of matrix $X$ .

Algorithm: STLSQ
In	Threshold $\lambda \ge 0$ is given.
1.	Initialize the number of runs $k := 0$ , $B^{(0)} := X^{\dagger} Y$ , and $S^{(-1)} = \emptyset$ .
2.	$S^{(0)} := \left\{ S_{j}^{(0)} = \left\{ i \in [p] : \left\| \left( B^{(0)} \right)_{ij} \right\| > \lambda \right\} : j \in [n] \right\}$	# $i \in S_{j} \subset S$ means the same as $(B)_{ij} \ne 0$ .
3.	while $S^{(k)} \ne S^{(k-1)}$ do
4.	for $j \in [n]$ do
5.	$\widehat{X} = X_{S_{j}^{(k)}}$	# Remove column $B_{ij} = 0$
6.	$\left( B^{(k+1)} \right)_{S_{j}^{(k)}, \left\{ j \right\}} = \widehat{X}^{\dagger} Y_{\left\{ j \right\}}$	# Update one column at a time
7.	$S_{j}^{(k+1)} = \left\{ i \in [p] : \left\| \left( B^{(k+1)} \right)_{ij} \right\| > \lambda \right\}$
8.	end for
9.	$k = k+1$
10.	end while
11.	For all $(i,j)$ that do not satisfy $i \in S_{j}^{(k)} \subset S^{(k)}$ , apply $B_{ij}^{(k)} = 0$
Out	Obtain the matrix $B^{(k)} = \argmin_{B} \left\\| XB - Y \right\\|$ whose components’ absolute values are greater than $\lambda$ .

Explanation

When STLSQ is specifically performed on the matrix $Y = \dot{X}$ made of the derivatives of the dynamical system $\dot{\mathbf{x}} = f \left( \mathbf{x} \right)$ and the library matrix $X = \Theta (X)$ , we call it SINDy¹. Ironically, STLSQ came out for SINDy, so to speak, but from the algorithm’s perspective, STLSQ can operate entirely on its own, unrelated to dynamical systems, so it is sometimes mentioned instead of ‘SINDy’.

As you can read in the pseudocode of the algorithm, STLSQ is a kind of greedy algorithm. Regardless of the fact that the convergence of the algorithm has been proven², from the statistical aspect of regression analysis, frankly, it’s meaningless, but it’s undeniably a useful method as it’s easy to implement, converges very fast, and performs sparse regression, after all.

Code

Matlab

The following is the actual Matlab code used by Professor Brunton in his research³.

%% compute Sparse regression: sequential least squares
Xi = Theta\dXdt; % initial guess: Least-squares

% lambda is our sparsification knob.
for k=1:10
    smallinds = (abs(Xi)<lambda); % find small coefficients
    Xi(smallinds)=0; % and threshold
    for ind = 1:n % n is state dimension
        biginds = ˜smallinds(:,ind);
        % Regress dynamics onto remaining terms to find sparse Xi
        Xi(biginds,ind) = Theta(:,biginds)\dXdt(:,ind);
    end
end

Python

This is the Python code found in Professor Brunton’s book⁴.

def sparsifyDynamics(Theta,dXdt,lamb,n):
    # Initial guess: Least-squares
    Xi = np.linalg.lstsq(Theta,dXdt,rcond=None)[0]

    for k in range(10):
        smallinds = np.abs(Xi) < lamb # Find small coeffs.
        Xi[smallinds]=0 # and threshold
        for ind in range(n): # n is state dimension
            biginds = smallinds[:,ind] == 0
            # Regress onto remaining terms to find sparse Xi
            Xi[biginds,ind] = np.linalg.lstsq(Theta[:,
                biginds],dXdt[:,ind],rcond=None)[0]
        return Xi

Julia

This is my Julia code.

using SparseArrays

function STLSQ(Θ, dXdt, λ = 10^(-6))
    if Θ isa AbstractSparseMatrix
        Θ = Matrix(Θ)
    end
    Ξ = Θ \ dXdt
    dim = size(Ξ, 2)
    __🚫 = 0
    
    while true
        print(".")
        🚫 = abs.(Ξ) .< λ
        Ξ[🚫] .= 0
        for j in 1:dim
            i_ = .!🚫[:, j]
            Ξ[i_, j] = Θ[:,i_] \ dXdt[:,j]
        end
        if __🚫 == 🚫 println("Stopped!"); break end # Early stopping
        __🚫 = deepcopy(🚫)
    end
    println("MSE: ", sum(abs2, Θ * Ξ - dXdt) / length(dXdt))

    return Ξ
end

Brunton. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems: https://doi.org/10.1073/pnas.1517384113 ↩︎
Zhang, L., & Schaeffer, H. (2019). On the convergence of the SINDy algorithm. Multiscale Modeling & Simulation, 17(3), 948-972. https://doi.org/10.1137/18M1189828 ↩︎
https://www.pnas.org/doi/suppl/10.1073/pnas.1517384113/suppl_file/pnas.1517384113.sapp.pdf ↩︎
Brunton. (2022). Data-Driven Science and Engineering : Machine Learning, Dynamical Systems, and Control(2nd Edition): p276. ↩︎