Smart Force Required

Fellow-Voynichero Rich SantaColoma just asked in a different post, why I wouldn’t get my lazy butt up and do a little brute force statistics regarding my Stroke theory. Namely, if each plaintext letter is always represented by the same group of ciphertext letters (which I’ll call a “syllable”), why not simply count the ciphertext syllables, and then do a reasonable frequency match?*)

Actually, I had a similar idea some time ago, sat down at my computer, fired up my trustworthy interpreter, and stalled. It dawned upon me, that a few problems stood in the way of a brute force statistical attack:

  1. We don’t know the plaintext language, hence we don’t know the frequency distribution of its letters,
  2. We don’t know the plaintext character set used, ie whether it was cursive (batarde or modern?), block writing, print letters, etc. If you look at it, this has grave consequences for the numbers of strokes required to compose each letter, and hence for the length of the corresponding syllable, not to mention for the relationship between different syllables.
  3. We can’t even count on the plaintext to be written in a 26-letter latin alphabet. Letters like “j”, “y” or “x” may well be missing,
  4. Special characters (digits, astrological symbols) complicate matters,
  5. We don’t know the ciphertext alphabet, ie we don’t know if daiin and daiir are really two distinct words oder not; we can’t even be sure c, h, and e are different letters,
  6. Most annoyingly, we also don’t know exactly what the syllable repertoire is. VM words apparently are mostly composed from more than one syllable, but where the syllable “boundaries” run, is unclear: Is qocheedy supposed to be split qo-cheedy, qoch-eedy, or perhaps even qochee-dy?**)
  7. We only have limited statistical material, namely some 70,000 chars at the most.***)

It seems what is required is not a brute force, but a Smart Force(tm) attack.

*) He actually used a much more friendly wording.

**) Robert Firth had an idea, but apparently was not able to find a solution which was completely statisfying for him. As always, there is a number of solutions which yield varying degrees of success, but none with a 100% match. I plan to do some analysis on the labels, which should help at least insofar as the word boundaries of the labels seem to be more clear-cut than of words in the continuous text.

***) Out of a total of roughly 120,000 chars. But with the different Currier hands, it’s reasonable to assume that different enciphering schemes were used between Currier A and B, hence only either A or B should be used for any statistical test.

A Constant Case of Lower Case

One very interesting fact of the marginalia to me seems, that there are a lot of ambiguities and uncertainties which letter many of the shapes are supposed to represent — Yet all the reasonable options for even the most dubious cases appear to be minor letters.

Where have the capitals gone? How come, no matter how crappy the author scribbled across the tortured vellum, nothing looks like upper case?

26 + 26 + 10 < VM

We see that many words in the VM comply with a comparatively simple and straightforward grammar, but we also see that lots of words break with those rules — hence, the underlying rules are either more complex than we think, or not all words must obey them.

To paraphrase this in terms of the Stroke theory, it means that a character set of 26 capital and 26 minor letters plus 10 digits apparently was not enough to transcribe the VM plaintext, otherwise we’d only see 62 different “syllables” making up the VM.

Now, idly browsing across the web I came across a German astronomical manuscript from around 1500. If you take a look at f28r —

(click the image to get the full resolution), you’ll notice that while the top half of the page consists almost entirely of the latin alphabet, the bottom section is riddled with astrological symbols.

If such was the case with the VM itself, these “special characters” would require special enciphering: Some graphical elements in those symbols aren’t present in the latin character set, which would give rise to the use of rare ciphertext letters, plus their combinations would be different from the grammar of the body of the text. Hence, we’d have occurences of unusual letters and breach of the grammar rules.

Under this assumption, the hypothesis would be that the VM word “grammar” is not so much a grammar per se, but rather an artifact of the existance of only a limited set of syllables to begin with.

Safety in Numbers (Roman and Arab)

As we all know, there is no shortage of oddities regarding the VM. One of those is that throughout the whole manuscript, no numbers can be found.*) While it has been suggested that the structure of VM words seems to be governed by rules similar to those for the composition of roman numbers, as of now no consistent system has been proposed which would allow one to generate VM words.**)

But what if we look at it from the Stroke theory’s point of view? Remember, this theory says that each VM ciphertext letter represents one plaintext penstroke, and hence that each VM word is the equivalent of one or two plaintext letters. But in the same manner the letters were enciphered, it would also be possible to encipher digits, if arab numbers were used. If roman numbers were used, they would be enciphered just like their letter equivalents.

What would be the consequences?

I have suggested in the wake of Robert Firth’s observations that in the Stroke theory the plaintext was written in capital and small letters alternating for the most part, so that a KiNd Of CaMeLcAsE writing was produced. With around 25 capital and small letter shapes each, this means the the better part of them VMs vocabulary should be composed of around 50 ciphertext “syllables”. If arab numbers were used, this number should increase by the 10 or so shapes required for the individual digits.***)

Furthermore, since it’s impossible to write “capital” and “small” numbers, a string of digits should show up as a digression of the pattern of alternating “prefixes” and “suffixes” (which are identified in the Stroke theory as capital and small letters). Note also that many of the arab numbers are fairly similar to each other (6-8-9-0, for example)****), which should lead to a string of quite similar words or syllables in the ciphertext. (As is observed.)

If roman numbers were used, they could have been written in camelcase (like MmIx rather than MMIX). Again, strings of similar or identical words should be observed, as the letters denoting the roman numbers often resembled each other. (For example, “D” was supposed to be “one half of (the letter) ‘M'”, ie 500 was the half of 1000. Same with “V” and “X”.) The total number of different syllables should not go up though.

Thus, I think it’s quite conceivable that the VM text is interspersed with numbers. They just don’t show up, because they’er enciphered as the rest of the letters are.

If the VM turns out to be the very earliest book on cocktail recipes, the first round is on me.

*) With the except of the pagination, which appears to be a later emendment though.

**) If this was possible, ie if all VM words could be explained as roman numbers, the question is — what would that mean for the deciphering of the VM? That it was one of the earliest phone books in existence, predating the phone by some 400 years, and that unfortunately all the people’s names were dropped from it…?

***) Or less, perhaps. It’s difficult to see how the Stroke theory should discriminate between “o” and “0”.

****) Bear in mind though that in period the shapes of arab numbers could significantly vary with the exact time and location when they were used.

Moat so deep, donjon so high…

I didn’t get around doing much regarding the Voynich lately, much less post anything about it.

To avoid the impression that I’ve fallen off the edge of the world, rather than presenting you something original, let me point your attention today to Richard SantaColoma’s latest work: He has taken the mysterious “rosettes foldout” (aka “f86v”) from the VM, recreated the landscape depicted in 3D, and fed his results into a CAD program.

The original "rosettes foldout" with the mysterious castles, towers and cities

The original rosettes foldout with its mysterious castles, towers and cities

The result of Rich’s work is this amazing animation which includes a “flyby” of the rosettes landscape. Of course, much of this is speculation, and everything will depend on Rich’s interpretation of the VM painter’s original ideas. I have no idea how useful this will turn out in the end, but in any case it’s a novel and well crafted approach.

Enjoy it on Youtube!

screenshot_youtube