Crossing the Line from Amateur to Dilettante

Although in general I recommend the Voynich mailing list to everybody interested in the subject, the current discussion about using distributed computing to crack the VM with a statistical brute-force attack very much reminds me of people planning some open-heart surgery with their medical knowledge gained from watching a few episodes of House.

“If at first you don’t succeed, use a bigger hammer.” The sad thing is that while one supposed side effect of this would be to make the project and VM research more public (in the vein of the SETI project), and hence to attract more people to it, I’m afraid a bungling approach like what is currently considered would have the opposite effect of exposing us to ridicule.

(Not to mention the fact that people seem to think that using a distributed approach would invalidate all previously gathered statistical information about the VM, like that it certainly is not a simple substitution, and that with a high degree of confidence one VM word is not equivalent to one plaintext word.) is up again

Rich SantaColoma has taken over the helm at This site features some information about the VM, but most importantly it serves as the subscription point for the Voynich mailing list (which had temporarily been mirrored at Rich’s own site).

So, everybody feel free to subscribe to the list (if you haven’t already done so). The list still is the central hub for information and “research” regarding the VM.

Thanks for the good job, Rich!

What did Huxley mean?

I just noticed that apparently somebody came across this site by searching for: the great tragedy of science — the slaying of a beautiful hypothesis by an ugly fact what did he mean

To avoid leaving the pour soul in the dark, the way I understood this quote is that in the course of scientific research, many a “beautiful” (elegant, simple, powerful) hypothesis is developed when one digs into a topic. Unfortunately, it happens quite so often that later experimental findings “rear their ugly head” by giving proof that the beautiful theory is beautiful, but false. And since we are not in the art department, truthfulness takes precedence over beauty.

In the case of the VM (admittedly not exactly a “science”), beautiful theories about its encryption get constantly slayed by statistical or systematic evidence to the contrary. Unfortunately, there is a tendency for people to twist the facts (or outrightly ignore them), rather than giving up the perceived beauty of their approach.

Chasing your own Tail

As I mentioned the other day, I had committed a serious omission in my assessment of the Stroke Theory.

I had written a little tool to analyse the VM ciphertext and decompose it into the hypothetical “syllables” of the Stroke Theory, where each ciphertext “syllable” would represent one plaintext letter. This tool worked by constantly modifying the hypothetical syllable set and retaining those modifications which lead to an overall increase of “coverable text” (= ciphertext words that could be composed from the syllable set). The tool got “saturated” (ie, further changes would increase the overall coverage no more), when the syllable set could compose 66% and 74% of the ciphertext by volume, for Currier A and B, resp. This was interesting, but by no means convincing.

A different approach used a second tool which would synthesize ciphertext from inoccuous plaintext, according to the rules of the Stroke Theory. This test was simply on a qualitative basis, to see whether the ciphertext rendered this way would look anything like the VM, and, in my humble opinion it actually did.

It took me about a year to figure out that one could combine the two approaches, namely letting the analytical tool work on the results of the synthetical tool. In a perfect world, namely if the analytical tool worked correctly, this chase of one’s own tail should result in a 100% coverage of the plaintext.

Of course it didn’t. For instance, the plaintext used — the German 15th century “Weinbuch” — contained a few characters (like arab digits or umlaute) which couldn’t be properly transcribed in the synthesis and which could thus not be recovered.

Still, the result was that about 68% of the plaintext words by volume could be synthesized before the analysis program showed signs of stalling. This teaches us two things:

  1. It’s — pardon my french — fucking close to the results for the VM, and
  2. This can’t be due to special unencrypted characters alone.

Upon closer inspection, I found the culprit indeed. Namely, I had set the minimum syllable length in the analyzer to 2 strokes, and this is of course stupid. Letters like “I”, “l”, “o” will hardly require more than one stroke (and indeed required only one in my synthesis). Thus, these letters were effectively indecipherable, and this may well account for a good deal of the lost coverage.

This error of course would also hold true for the VM. (There is no reason why the VM author should have chosen to use a minimum of two strokes per plaintext letter.) And the fact that this programming error led to almost the same amount of lost coverage in the VM as in the synthesized text could be a hint that the same effects are in play, and that I’m thus on the right track by chasing my own tail…

Holds True for VM Decipherment Attempts as well

“If a piece of calculation leads into an ever-denser thicket, nature probably did not intend you to go that way. Try a different approach.”
David P. Stern: “All I really need to know”

Most VM decipherment attempts can claim initial success, but then begin to stall, and get bogged down. To make any more progress at all, ad hoc rules are then invented which add further stages, exceptions or modifications to the originally supposed scheme. As we all know, this can result in ludicrously complex fabrications.

At this point I usually ask myself: If the algorithm you suggest is really that complex, how could you ever claim initial success with your simple startup version? And if you couldn’t, how could you deduce the ever more complicated steps from a simple start which would never have worked?