How sparsity affects convergence
The convergence rate in sparse problems is usually expressed in terms of the number of iterations
For smooth convex functions, standard gradient descent has a rate of
To recall, from Active inference convergence estimation, we have already shown that
In deep linear networks with overparameterization, gradient descent has an implicit bias towards solutions with low rank, which, strictly speaking, is not fully understood, but it might indicate that even without
Results:
- We know how to increase speed of agency convergence.
- There is a possibility that overparametrisation leads to it too.