Murata, N., Yoshizawa, S., Amari, S. IEEE Transactions on Neural Networks Volume 5, Issue 6, November 1994, Pages 865-872 https://doi.org/10.1109/72.329683

The problem of model selection, or determination of the number of hidden units, can be approached statistically, by generalizing Akaike’s information criterion (AIC) to be applicable to unfaithful (i.e., unrealizable) models with general loss criteria including regularization terms. The relation between the training error and the generalization error is studied in terms of the number of the training examples and the complexity of a network which reduces to the number of parameters in the ordinary statistical theory of the AIC. This relation leads to a new Network Information Criterion (NIC) which is useful for selecting the optimal network model based on a given training set.