Statisticians sometimes use fancy words for commonplace phenomena. Indeed a statistic is simply a way of representing a larger dataset with a smaller dataset. For example, “the average” replaces a potentially gigantic dataset with a much smaller one, a single number. It’s a kind of numeric summary. Since the output is simpler than the input, it is easier to reason about.
Words, too, are simpler than the things they refer to. The word “orange” requires only six holy letters, but a single orange tree’s genome would take millions of symbols to express, and a photograph of an orange might take several megabytes of disk space to store. Indeed, this is a large part of why we have so many words, they are light and flexible, easy to design, manufacture and distribute.
Naturally occurring datasets, those encountered in scientific research, necessarily involve some randomness, some uncertainty about the totality of influences upon the dataset, so statistics has evolved a rich language for describing uncertainty. This language gives chaos a home, gives names to the unknowable. It acknowledges ignorance, and in doing so, structures it, so that it may be connected to the things that we do know.
And since the output is simpler than the input, not every input is equally represented in the output. We must pick and choose what to pay attention to, what significance to ascribe to which feature of our beloved dataset. And if we have not one but many datasets derived from many experiments, then we can weight the experiments by how interested we are in their results, how desirous we are of attending to them.
This is where p-values come from. We would like to quantify how surprising the results of an experiment are, relative to our current belief system. The more surprising an experiment is, the more valuable it is. We quantify this surprise by expressing our current belief system in terms of structured randomness, so that we may measure the distance from what-we-expect to what-we-actually-see. The probability of seeing these results, given our current belief system, is called a p-value. The lower the p-value, the more surprising the results and the more valuable the experiment. If the p-value is below some threshold, the experiment is said to be “statistically significant”.
Our brains are scientists, constantly doing experiments upon our lives, categorizing memories according to their topic and significance. As sensory data pours into our eyes, ears, nose, mouth, and skin, it gets routed thru our thalamus, homogenized and sent to the neo-cortex for sense-making. Indeed the question of statistical significance is addressed even at the level of a single neuron. As the input signals to a neuron accumulate they are blocked from passing thru to the outputs until a certain threshold is reached, the “threshold potential”. Input data that does not exceed this threshold is not significant, it fails the neuron’s standards of journalistic integrity, and the neuron will not repeat it.
But what is this threshold of significance? Who decides it? And what happens if we change it? As we loosen this threshold of significance, our network of meaningful associations grows thicker, coincidences become infrequent phenomena, and what was dismissed as irrelevant returns with alienated majesty.
Reversing our trend towards gullibility, we tighten our definition of significance, we become more skeptical, we reject the merely interesting for the essential, and we focus only on that which is quite certain. The fog clears and illusions evaporate in the bright sun as our mind is relieved of the burden of abundance.
And what of the extremes, those demons at the corners of the mandala? Here we see our mandala is actually a torus, the opposites are identified with each other, as extreme skepticism finally leads to knowledge only of existence itself, and extreme gullibility results in identification with everything, the only Thing that exists.
For what are we if not Truth?
Sat Chit Ananda