It may be Fake, but it’s not a Hoax

Some time ago, I wrote a note to myself on my smartphone*), saying essentially “The Voynich can’t be a hoax, because there’s wordwrap.” As usual, for the longest time afterwards I didn’t have a clear idea what I had meant on that fateful day.

Now, it dawned on me again, especially in the light of the observations I made lately in a little paper about the word-length distribution (which still needs to be amended for several of my oversights others have kindly pointed out to me.)

What I apparently had wanted to say was: “The VM may be a fake (ie, not the genuine 15th cty article it pretends to be), but it’s not a hoax (ie a meaningless sequence of letters), because it exhibits all behaviour conistent with word-wrap.”

As the abovementioned article and the work of several others has shown, the word-length distribution of the text which makes up the VM exhibits all the features one observes in a “regular” (natural language) text which is subject to word-wrapping (ie, long words which would otherwise run over the right margin of the page body are moved to the beginning of the next line as a whole.) Most importantly,

  • the average word length decreases as the line runs on from left to right, and
  • the first word of a line is significantly longer than average.

Both effects are due to the fact that longer words near line ends run a higher “risk” of running over the margin, and be subject to word-wrap.

But this means that the author of the VM didn’t introduce line breaks wherever it pleased him, but he did carefully word-wrap the VM text to keep the individual (ciphertext) words as a whole, rather than allowing them to be spread across line boundaries. A slightly stronger presumption would be that the author even had to word-wrap his text because the words are “information units” (either in the sense of plaintext words, or as “blocks” on which the enciphering algorithm worked.) In either case, if the VM was gibberish and not designed to be deciphered, (ergo if it didn’t matter whether the contents were still readable, because there is no content), then why bothering to wrap the words at line ends, rather than introducing a line break whenever space limits dictated it?

The VM text is all but random; it shows a high degree of structure: a lot of work went into creating it. This could have been done by a brainless Rugg automaton, or through some sensible enciphering algorithm. But work and thought also went into writing down the cipertext and keeping the integrity of the cipher words, and to me this seems to make sense only if information was supposed to be retrieved again as well — which would suggest information to be enciphered therein in the first place.

*) Said “smartphone” is actually a stone-age Palm, back from the time when Palm was still cool.


6 thoughts on “It may be Fake, but it’s not a Hoax

  1. I certainly hope you are correct. I think that proving the Voynich either real or fake will be very difficult if the content cannot be read. That applies if it had meaning and cannot be deciphered; or if it has no meaning and so, decipherment is impossible. Our only hope is, real or fake, that we can somehow read it.

    If not readable for either reason, the only proofs after that are circumstantial. As such it would be very difficult to make a case either way. I don’t think it would be impossible, but very difficult. Reading it would free us all very quickly.

  2. It appears to me that the lines break as they would in ordinary non-hyphenated* handwriting. That is something we need to look at. In wordwrapped text, words in the middle of a line, after the second word, have as much chance to increase in length as they do to decrease. In the texts I have checked, there is a rough alternation in length. In Quire 20, mid-line words only decrease in average length. That’s like getting six Heads in succession beginning with the first toss. The “coin” could be fair. If it isn’t, the cause is significant.

    When paragraph-initial words in Quire 20 are removed from column 1, average token length in that column is less than what is expected in wordwrapped text. If the non-wrapped paragraph-initial words are included, the average length (and the entire text) fits the wordwrap model. It might be that the longest line-initial word in an already wordwrapped paragraph was moved to paragraph-initial position. I don’t think that is the explanation. We need to see if such transposition would give the same effect in other texts. If not, there is a major bit of manipulation to be discovered. That some first words have anomalous prefixes may be a separate issue.

    * What is a word for words like “non-hyphenated”?

    PS: I like the heroine’s name in “Der Fall Zita S.”.

  3. Elmar – interesting observation. One question: when you say “The VM text is all but random” do you mean it’s near-enough to random, or definitely not random? I got the sense that you meant the second.

  4. Elmar a few days ago, on another of your posts, I asked whether you choose the section for your header at random, and if not, what it meant to you? I wanted to ask, in case you were working on an idea which had just occurred to me when I saw it – that is, that the habit of most people is to number around a circle using a known sequence. If they happened to use an alphanumeric sequence, then we could have here the first six glyphs in their order of natural correspondence.

    Naturally, there are many systems – not just the alpha-numeric – to which the six might be referring, but it should be a possibility, don’t you think?

    • I mostly chose the header image since this is one of the illustrations where Rich found a close match to a modern microscopic picture of a diatom, ie it is one of the indications that the VM might actually be a “late fake”, and since I’ve changed paradigms to that effect, I thought it would be proper to acknowledge with the header.
      Besides, it fits in better with the new WordPress theme.

