How sparsity affects convergence

The convergence rate in sparse problems is usually expressed in terms of the number of iterations T and the dimensionality d.

For smooth convex functions, standard gradient descent has a rate of O(1/T). However, if the optimal solution is s-sparse (has only s non-zero elements), specialized algorithms (e.g., proximal methods with L1-regularization) can achieve a rate where the dependence on the dimension d becomes logarithmic: O(slogdT). This is significantly faster when sd.

To recall, from Active inference convergence estimation, we have already shown that E is convex and smooth. This means we can say that not only do we have logarithmic convergence, but it can be even faster with L1 regularization, since functions in RL are typically sparse.

In deep linear networks with overparameterization, gradient descent has an implicit bias towards solutions with low rank, which, strictly speaking, is not fully understood, but it might indicate that even without L1 regularization, we still achieve accelerated convergence under overparameterization.

Results:

  1. We know how to increase speed of agency convergence.
  2. There is a possibility that overparametrisation leads to it too.