Not a Rose by Any Other Name

Thing with the VM is, it offers so preciously little in terms of “hard” information helping us to decipher it, that we tend to cling to whatever shred of facts we can find to guide us along. First and foremost among these, there are the statistics on the VM, but with using them there also comes the danger of overly relying on the results along with it.

For example, to a high degree of certainty, we will have gotten our transcription wrong somewhere; not in the sense of writing down some individual error, but of consistently misidentifying letters as being different or seperate where they are truly the same or compound, or vice versa. But as long as we don’t know whether “iiin” is one letter or four, whether “ch” is one letter, two different letters or just “cc” in disguise, and as long as we can’t even know for sure whether there’s one, two, or four different gallows, all our estimates about character frequencies and word lengths are on very shaky ground. And hence, all our results.

So, while I’m all for statistical tests (After all, it’s all we’ve got, right?), and while I’m wishing for a a “universal statisticator” which would spit out the essential statistical parameters for various ciphertext candidates, I recommend taking such results with a grain of salt.

Take for example something as simple as a monoalphabetic substitution cipher.*) If someone employed that cipher and used the VM alphabet for the ciphertext, we’d probably despair in something even as simple as that, because our prime tool in that case, viz. statistical frequency analysis, would fail for sure as long as we fed it with a transciption which misidentified the ciphertext character set.

Likewise, wordlength distributions etc. are all prone to be distorted due to systematic transcription errors. So, while I agree that statistics can give us valuably hints, I’d regard them as circumstantial evidence, but not as hard facts, and if one test does give results which don’t put a theory in accordance with the VM, I wouldn’t dismiss the theory immediately, if the rest of the story looked good.

As for Zipf’s law, which is so often quoted in the context of the VM — I’ve got my hesitations about that in particular. IIUC, we don’t really know what it means if a distributions does or doesn’t obey Zipf’s law, and since random texts and even the sizes of cities can follow Zipf’s law, I wouldn’t assign too much importance to this test.

So, my advice is, stay tuned, but don’t stand with baited breath.

*) I know, the VM is not a monoalphabetic substitution.

Slightly edited from a post to the Voynich Mailing List.

An Eye for an Eye, and a… Letter for a Letter…?

When one follows the developments around the VM, research into it seems to be like a stream, where new ideas and theories appear somewhere upstream in the distance to come closer, and let themselves be examined more fully while they drift by you, before they follow the water downstream and finally vanish in the dusk of forgetableness. Rarely one of them will leave so much as a beacon behind.

One repeated pattern that seems to crop up time after time, and goes mostly unnoticed by even the old-time and seasoned Voynicheros, is the more or less explicit assumption that one VM letter is the equivalent to one plaintext letter, and that one VM word corresponds to one plaintext word.

Now, I can see where this would come from: Of course it’s natural to assume, it’s the most simple and straightforward way to do it (Which, in itself ought to be a warning sign: If the VM was enciphered “simply and straightforward”, it would have been solved long ago…), it’s the way ciphers were done in period, and it lends itself readily to easy analysis.

Unfortunately, in all probability this is not the way the VM was cooked up. Let’s look at a few of the arguments against this case:
Continue reading