The “Face Value”-Fallacy

With close to absolute certainty, one word in the VM ciphertext does not correspond to one plaintext word, and one ciphertext letter is not equivalent to one plaintext letter.

I’ll call adhering to the “one-word-one-letter” the Face Value-Fallacy, and I will describe it again, because we’ve just had another instance of someone wasting three years of their life because of it.

Of course, a first look at the VM will show you that the text is broken up in paragraphs and short chunks of text, and a slightly closer look will show that the majority of the text is composed from between 20 and 30 different glyphs, roughly the number of different latin or cyrillic letters — at “face value,” it’s obvious that these are “letters” composing “words”, and naturally the assumption is that these are equivalent to plaintext letters and words.

And of course this turns out wrong, as soon as one starts to engage with the VM on a quantitative basis and begins to do statistics:

  1. VM words are on average shorter than those of other Indo-Germanic languages.
  2. VM words follow a complex, but fairly rigid grammar in composition. I’ve tried my hands at formalizing this grammar myself, others have produced a more exhaustive “core-mantle-crust” paradigm or determined a set of prefixes and stems. While superficially similar composition rules exist for other languages (eg word endings of “-ing” or “-er” in English), none show this strict and consistent rigidity on rules.
  3. Individual VM letters follow similar rules — <q> followed by <o>, <d> followed by <y>, and <q> always being word-initial*), <y> being word-terminal. Again, natural languages have some rules like that (cf “qu” in English), but none are nearly as exhaustive as those governing the VM composition.
  4. The VM text is composed of sequences of nearly or completely identical words. While it is possible to compose similar strings in natural languages (“Es war das das Dasein, dass da dasaß”), this will always be extremely contrived and not form a coherent narrative, and doesn’t occur naturally to this degree. There have been attempts to explain this as “chants” or poems, but this seems far-fetched to me.
  5. As a consequence of the properties above, the average information content of a VM word is smaller than that of a natural language word, or, in other words, the VM vocabulary is considerably smaller than that of a natural language voacbulary. (In other words, there are fewer options to select your next letter or word when composing a VM sentence as compared to a natural language.) René Zandbergen went into some depths on his comprehensive Voynich site.
  6. Some words show rather erratic behaviour, like showing up only as the first word on a page, etc.
  7. Gallows and their idiosyncratic behaviour.
  8. Some time ago I was at the center of some discussion whether the word length in the VM depended on the position within the text line. The debate ended somewhat inconclusive as to whether this was a natural effect due to wordwrap at line breaks. If it was “systemic”, it would mean that the line is somehow an enciphering unit, unlike in natural languages.**)

(This list of features is not exhaustive and I may add to it over time.)

These factors are all not just incidental, but defining characteristics of the VM text, and ignoring them definitely means two things:

  • You waste about the only potential clues we have to deciphering the VM.
  • You also waste your time with your decipherment effort, because it will be wrong.

Before you start on the VM, I strongly, strongly advise you to delve into the currents and eddies of statistics, how ever uncomfortable you may feel with them at first. They are the maelstrom you have to cross to arrive at a solution.


*) I use the term “word” in this context to denote a “block of glyphs, seperated by whitespace from the rest of the text,” and of course not as representative of a plaintext word. Maybe it would be better to call those “blocks” “chunks” or such.

**) Personally I don’t subscribe to this, but suspect a ciphertext paragraph may be equivalent to a plaintext sentence. But that’s just gut feeling.