Convergence convexity relations
Continuing the topic of Joar questions, I would like to draw attention to the limitations of neural networks in performing tasks according to complexity.
https://math.bu.edu/people/mkon/hhh12.pdf
The complexity of constructing neural networks for approximating functions with a given accuracy can be divided into neuron complexity and information complexity.
For linear combinations and a single layer:
-
Neuron complexity grows polynomially.
. -
Information complexity grows similarly, but more information (samples) is required than neurons to achieve a given accuracy.
, where .
Moreover, the presence of noise immediately adds
These estimates do not account for time, but they seem fine regardless. If we believe in Agency through active inference, then we get an estimate for the worst case.
Nevertheless, in our research, this is roughly useless, since for multilayer networks with activations for smooth functions we always have
https://arxiv.org/pdf/1610.01145
This work provides an estimate for deep networks for Sobolev space.
For any function from the Sobolev space
The work establishes a lower bound of
In a scenario where the network architecture can adapt to the function being approximated (rather than being fixed), even more efficient upper bounds can be obtained. For example, for 1D Lipschitz functions, a depth of 6 and complexity
For a ReLU network architecture capable of approximating any function from
From this, we can make estimates by varying the number of weights.
The article https://arxiv.org/pdf/1709.05289 proposes precise estimates for piecewise linear functions.
In any ambiguous situation like Weierstrass function, one can use catalogue networks to obtain combinations of functions and logarithmic complexity, respectively. Of course, this does not allow working with wide classes, but it works with individual ones, and indirectly confirms this work https://arxiv.org/pdf/1610.01145.
https://www.mins.ee.ethz.ch/pubs/files/deep-approx-18.pdf
According to this article, neural networks also perfectly approximate affine systems with polynomial speed.
https://arxiv.org/pdf/2502.02679v1
VC dimension (Vapnik-Chervonenkis dimension) is a measure of the complexity or expressive power of a class of functions that can be implemented by a network. Although the article is useless, it can be looked at.
The VC dimension of a neural network is in the range from
Here
https://arxiv.org/pdf/1905.01208 - introduces the concept of approximation spaces.
Deep neural networks have the ability to efficiently approximate piecewise polynomial functions. Functions with low smoothness can often be represented as such piecewise polynomials. Note: not always, which is important to remember.
The complexity of a neural network is measured by the number of connections (connectivity) or the number of neurons, where connectivity is related to the number of operations to apply the model. It has nothing to do with Connectivity, just a similar term.
If a function
The article focuses on polynomial rates of error decay
A network can approximate a function with a rate exponent
That is, for multivariate functions, the approximation rate
The theorems below establish upper bounds for the exponent
- For weight complexity (
):
- For neuron complexity (
):
These inverse estimates show that the approximation rate exponent
Summing up: we can establish upper bounds for the error decay rate based on the function's smoothness and the architectural parameters of the network (depth, activation type, number of connections), either polynomial or less.
In summary: This review provides the following insights:
- According to https://arxiv.org/pdf/1905.01208, we can estimate the convergence of a Besov function with respect to model parameters and function smoothness.
- According to https://arxiv.com/pdf/2502.02679v1, there are limitations on exact binary classification with respect to model parameters.
- According to https://arxiv.org/pdf/1610.01145, we can estimate convergence for Sobolev-smooth functions with respect to model parameters and error.
- According to https://arxiv.org/pdf/1901.02220, we have estimates for smooth functions.