The exception is the rule

One pretty interesting feature of the VM is that there is virtually no universally applicable rule. (To word it slightly more witty, with the VM it is a rule that for every rule there is an exception.)

One of the most obvious example is the VM character set, which can be reduced to about something like 17 frequent characters — but there is also a whole host of (depending on your count) up to 100 less frequent to rare letters; simply too much to be ignored. The same holds true for the VM words: While the majority of the volume is made up of a small number of very frequent words, there is also a large “tail” in the distribution of rare words. Again, these rare words are still too many to be ignored or to be attributed to transcription/copyist errors.

There are many more examples one discovers when examining the VM.

But what do we make of it? IMHO this “exception rule” speaks against the VM being generated in a purely automated way. (Such a way would be Rugg’s cardan grille scheme.) Of course, it doesn’t rule out such a mechanical generation with a bit of arbitrary operator intervention thrown in, just for the kicks. And it doesn’t rule out glossolalia.

The jury’s still out. As always.


One thought on “The exception is the rule

  1. This reminds me of the edit statistics of Wikipedia. The vast majority of edits has been done by a group of about 1400 hardcore Wikipedia users (and those were mostly small edits). However, the minority of edits comes from all the rest of millions of the people. While those edits are small in number, they can obviously not be ignored. When printing this as a graph, with the number of editors on the X axis and the number of edits on the Y axis, we get pretty much the typical logarithmic graph. Do we also have such a logarithmic scale for the letter frequency in the VM?

