It may be Fake, but it’s not a Hoax

Some time ago, I wrote a note to myself on my smartphone*), saying essentially “The Voynich can’t be a hoax, because there’s wordwrap.” As usual, for the longest time afterwards I didn’t have a clear idea what I had meant on that fateful day.

Now, it dawned on me again, especially in the light of the observations I made lately in a little paper about the word-length distribution (which still needs to be amended for several of my oversights others have kindly pointed out to me.)

What I apparently had wanted to say was: “The VM may be a fake (ie, not the genuine 15th cty article it pretends to be), but it’s not a hoax (ie a meaningless sequence of letters), because it exhibits all behaviour conistent with word-wrap.”

As the abovementioned article and the work of several others has shown, the word-length distribution of the text which makes up the VM exhibits all the features one observes in a “regular” (natural language) text which is subject to word-wrapping (ie, long words which would otherwise run over the right margin of the page body are moved to the beginning of the next line as a whole.) Most importantly,

  • the average word length decreases as the line runs on from left to right, and
  • the first word of a line is significantly longer than average.

Both effects are due to the fact that longer words near line ends run a higher “risk” of running over the margin, and be subject to word-wrap.

But this means that the author of the VM didn’t introduce line breaks wherever it pleased him, but he did carefully word-wrap the VM text to keep the individual (ciphertext) words as a whole, rather than allowing them to be spread across line boundaries. A slightly stronger presumption would be that the author even had to word-wrap his text because the words are “information units” (either in the sense of plaintext words, or as “blocks” on which the enciphering algorithm worked.) In either case, if the VM was gibberish and not designed to be deciphered, (ergo if it didn’t matter whether the contents were still readable, because there is no content), then why bothering to wrap the words at line ends, rather than introducing a line break whenever space limits dictated it?

The VM text is all but random; it shows a high degree of structure: a lot of work went into creating it. This could have been done by a brainless Rugg automaton, or through some sensible enciphering algorithm. But work and thought also went into writing down the cipertext and keeping the integrity of the cipher words, and to me this seems to make sense only if information was supposed to be retrieved again as well — which would suggest information to be enciphered therein in the first place.

