Accuracy & Simplicity

Overfitting & Errors

Jul 21, 2022

In machine learning (aka automated statistical reasoning) there is a concept of overfitting. It describes a situation in which your model did not generalize well from the data it observed. It’s something akin to a student who memorizes material for the test but does not figure out how to apply the material outside of the classroom. What is perhaps counterintuitive is that this typically occurs when the model is too complex relative to the dataset being used to train the model. In human terms, this is like when someone misses the forest for the trees or pays too much attention to irrelevant details. The model being too complex means that in some sense the student was not being appropriately challenged, that the material was too processed, too artificially constructed, not raw enough for the organism to grow stronger by digesting it. A good education should be extremely useful outside of the classroom. Some might conclude that overfitting means the students need to be made dumber to make them match the complexity of the material. Others might take a more pro-human approach and make the material more challenging.

In machine learning, a common method for discouraging overfitting is called regularization, which consists of explicitly rewarding the model for being simpler. This simplicity is usually expressed in terms of the model having fewer parameters, so if something could be expressed as a combination of three variables or two, regularization causes it to prefer using two. This is akin to Occam’s Razor, where the simplest model that still explains the data is best. Of course in practice every model is imperfect, so we must specify a tradeoff of how much simplicity is worth relative to inaccuracy. This exchange rate of accuracy / simplicity is what generalization is all about.

So instead of picking the one-true-exchange-rate like some sort of omniscient central planner, let us let the exchange rate vary, and observe the effects as we sample from a variety of such exchange rates. As we turn this knob one way, simplicity is favored more, turning it the other way favors accuracy more.

In the accuracy direction, turning the knob introduces new parameters, new concepts, new names for things that help us navigate the data. It as if our map is bigger, more detailed, we have zoomed in and now the substructure is visible, smoothness became texture, the One broke into the Many. Turning the knob as far as we can (is there a limit?), we get a kind of holographic reconstruction of what is, a map as big as the thing it describes, exquisite and excruciatingly impractical.

As we turn the knob back in the simplicity direction, we zoom out, our map describes more things but with less detail. The names glom together and disappear underneath a blanket of hand waving gestures. Precision degenerates into vague generalizations and our sampling rate goes down as we climb the pyramid of abstractions, climbing until we reach the top, the all-seeing-I, the only thing that exists, God.

Sat Chit Ananda

Weaponized Schizophrenia

Discussion about this post