You have a Theory of your Own about the Voynich Manuscript?

That’s cool, but, frankly, I don’t want to hear about it anymore.

I used to give you the option –

use the contact form provided on this blog. If you outline your ideas there, I will turn them into a new post and publish them here.

but lately so much crud has come up, that I feel it’s simply a waste of my time to edit that stuff and respond to it, and it’s a was of my readers’ time to post it for them.

So, please go somewhere else. If you decide to use an existing thread to publish your theories, I’ll mock you and block you. (The contact form will remain open , but I won’t feel obliged to answer anymore if your stuff is too far out.)

Thank you for your understanding.

Scandinavia Hailing the Arabs

And another incoming message, this time from Peter Ole Kvint:

I would guess that the text is Berber. Berbers have had several writing systems but none of them to be suitable for writing with quill and ink.

When you consider how cookbooks copied today, then herbal books from middelaldren be copied copy. If you have a list of herbs from a book then most could be found in the illustrations. Or the options limited.
Since the patterns seen in other books, so it must be possible to retrieve the same recipes. And thus rediscover plant names.
Note that on some artwork, the plants are cut and put back together on the thicker root. These plants may be large, perhaps trees and shrubs.

sincerely,

Peter Ole Kvint

Period wordlists

Dan wrote already some time ago, and again I must apologize that I’m currently fairly busy with other projects, and hence can’t devote as much time to the VM as I should. Nevertheless, I finally should give him the floor:

Yeah yeah, here’s another theory. Actually I’m not going into the theory, but simply asking if you can provide any assistance in resources I am seeking. Let me back up a bit – I’m a full time software developer of over 20 years, and I have had some insights regarding the manuscript. I’ve written software to generate various statistics about the document and have found some surprising and very obvious (once distilled down to hard numbers) patterns that further validate the insights. These are not “hunches”, or “gut feelings” or any mystical, nutty stuff. It’s simply what it is, and the analysis doesn’t lie.

I am currently running brute force deciphering attempts using additional software I have developed, based on my theory of how the document is ciphered. The main resource I am lacking at this time are simply word lists of the candidate languages the manuscript may have been written in (in its decoded form of course), and specifically, the vernacular and spelling of those languages when the manuscript was written in the 1400s.

I have always assumed the Voynich manuscript was a hoax, but when it was positively dated a few years ago I took a harder look, again with the expectation that it was a hoax but at least a hoax contemporary to the 15th century. My attempt was actually to prove (just to myself) that very thing – that it is just a contrived hoax. Unfortunately the insights and analysis I have done over the last few years have left no other option but to follow the logical progression until it peters out and comes to a dead end. I have not yet reached that point.

Thanks for you time, and again, if you know of simple word lists (or who can provide them or assist in that) of good candidate languages from the 15th century, that would be quite helpful.

This question isn’t so easy to answer. First of all, even when taking the the C14 dating of the vellum as a given, we still have about a century of leeway regarding the actual production date of the manuscript. A century is a long time in which languages can change.

Secondly, languages weren’t “codified” as strictly as they are today, and pretty much everyone would write down their MSs in their local dialect, not to mention the fact that strict orthography wasn’t enforced yet either. Which means that even two people from the same region writing at the same time wouldn’t necessarily employ the same spelling. (An extreme example of this is the Bayeux Tapestry (admittedly predating the VM by some 400 years), where the name of William the Conqueror is written IIRC in not less than seven different manners.) Hence, to make a long story short, any word list should be taken with a grain of salt.

I did some statistics in the past myself, and to get decent wordlists I simply went to Gutenberg.org, downloaded a few works I considered representative of the era, and ran my own little wordcount scripts on these files.

IMHO, prime candidates for the plaintext languages are Latin, English, French, German (including the various dialects like Swiss), and perhaps Spanish. But though I wouldn’t bet on it, more exotic options like Hungarian, Finnish or maybe the Lingua Franca can’t be ruled out either.

Sorry, but this is probably as less simple answer than you asked for?

The Cuttest Critter

James recently asked me:

Just wondering what type of animal you think it is eating what is believed to be a Woad Plant on f25v

referring to this cute little critter.

I don’t think it’s supposed to be a real animal. My guess is it’s a little dragon; the scaly back, the comparatively short legs, the ears and comb on the neck, and the fact that it may only have two legs seem to be a good match for me. Compare here for the idea of a 15th century painter (Uccello) what a dragon is supposed to look like.

“Censorship”

Lately, I received a brief message –

Censorship should be consistent

which I feel deserves a bit of comment because it represents a widespread, but false notion from the web. (I presume the missive didn’t allude to the general state of international politics, but referred to my decision to block some user comments on my blog.)

First of all, “censorship” means the suppression of information or opinion, usually through a public body. This is definitely not the same thing as deciding to ignore a contribution.

But, secondly and more importantly, you seem to feel you are entitled to using my blog for your messages. This is simply wrong, contribution here is a priviledge I grant (or withhold), as is the case with any private web page. My blog is not a public place to which anyone should have access, but a private BBQ I hold in my backyard. You are invited to drop by and share the party, but if you act inappropriately, I’ll kick you out, and, as the digital landlord, here I’m the sole arbiter to what constitutes appropriate behaviour. Simple as that. Play somewhere else.

Considering that I work for the maintenance of this site, that I’m legally responsible for the contents and that finally I’ll also be judged on the merits of the contributions here, I feel this is only fair. You’re free to go any other place, and party and voice your opinion there, and you will find I do nothing to hinder your free speech there. (That would be censorship.)

So while you’d be able to publicize on your own, you prefer to parasite from the infrastructure provided by me, insulting me with claims of “censorship” when I refuse to comply. This in itself should justify blocking your access.

“This conversation can serve no further purpose.”

A Plea to all Voynicheros

If you pursue a theory, please keep your website up-to-date.

As happened several times on the Voynich list during the last weeks, readers were encouraged to test other’s deciphering schemes based on publications on certain websites, but ran into dead ends or couldn’t arrive at the same results as the original poster. Only later or after complaining about this were they told that the information on the website was outdated.

This is impolite, cause it’s a waste of time on your readers’ part, it will make them irritated and discourage them to get seriously engaged with your theory, it will make them miss the point (of testing your theory), and it will do your reputation in general no good. So, it’s a win-win if you first update your website and then publicize it.

Thanks!

We put the “Brute” in the “Force”

There is no end to the string of new theories and commenters on this blog (Keep em coming, boys and girls!) Today it’s Zach, who sent me more a general question than a theory:

Please forgive me if this has already been covered in your site and I’m just not seeing it, but hasn’t anyone tried feeding the VM into a computer and brute-forcing it? Computers are really good at trying every possible combination and weeding out the ones that don’t make sense, so it seems like they ought to be perfect for grinding away at the VM by one means or another.

Any idea whether computers have been tried?

And followed it immediately up, for good measure, with some more detail:

Me again. I’m elaborating a bit, in case I really am
a) the first person to think of this and
b) there’s really no reason it wouldn’t work

I did see one article here on computers and frequency analysis, but the author made a good point about not knowing the language and so not knowing the frequency rules. I envision a brute-force that attacks the words rather than the letters. It would work like this:
FIrst, your computer is fed lots and lots and lots of example texts, the more the better, so it can build a probability map that stores the likely hood of any given word being followed by (or just found near) a second word, and given those two words, what third word is likely to come next, and so on. You will never get firm answers, but this is fuzzy logic – as long as we can grade potential sentences as more or less likely to be linguistically correct and sensible, we are good to go.
Once the computer has this probability map, the rest is just a matter of brute force:
Start with one of the repeating ‘words’ in the VM.
Assign it an arbitrary English meaning.
Using that meaning, and the probability map, assign English words to each word preceding and following your starting word. Branch outward from there until you have ‘translated’ all words. Use your probability map to judge how likely it is that the ‘translation’ is actual English sentences (best if you can ignore word order and focus on word proximity, because word order falsely assumes they had the same grammar as we do). Store that probability and start over with a new guess for your first repeating word. Repeat over and over until you’ve found a high-probability match or, more likely, you’ve run out of choices.
Assuming you’ve not found a 100% match, present the 10 most probable ‘translations’ for human inspection.
Compiling tables of synonyms would also improve things; that way the computer could also consider how likely CONCEPTS are to be grouped together, since English and VM-speak are very unlikely to have a 1:1 mapping.

The core assumption is that, even if there are no direct word matches, there’s a 1:1 map of concepts between VM-speak and your target language (English, in my case). I think that’s a safe assumption, because if there’s no such mapping, then it seems translation would be impossible, like trying to translate a 1-time-pad cipher without the key.

Anyhoo, I hope you enjoy my ramblings. Thanks for listening.

Cheers.

Thanks, Zach, for your input.

For a start, there is actually a wealth of computer power which has been pumped (more or less in vain) into the black hole of information we call the Voynich. I myself tried a little with my Stroke theory, for example, but these efforts are dwarfed by people like Jorge Stolfi or Julian Bunn, among others. But these mostly focus on analysis of the VM, not directly on a translation. Why is this?

Well, a brute force attack is hampered by several constraints:

  1. Our statistical material is limited. The VM comprises some 130,000 characters, which appears to be a lot. But when you look at it, that’s only some 30,000 words. If you further take into account the different encoding schemes (aka “Currier A” and “Currier B”, resp.), which differ subtly but do differ, you’re left with only a sample of some 15,000 words, which isn’t that much.
  2. We don’t know the plaintext language underlying the VM. English is possible, yet some of the marginalia point to French or Spanish, the images provide hints to Italy, and some clues point to Germany, not to mention that Latin would have been the lingua franca of the era.
  3. We know next to nothing about the subject matter, and accordingly little about the vocabulary used.
  4. We’re unclear about the ciphertext alphabet. We have really no idea whether the sequence of two connected “c”s really means “two ‘c’s in a row”, or is a completely different letter. (Compare this to the case of latin letters where “nn” is something completely different than “m”.) We don’t know if the “drops” above some “cc” groups only modify the underlying letter(s) (compare “O” -> “Ö”), or if they make it a completely different letter (compare “O” -> “Q”).
  5. Some characters like the notorious “gallows” show a tendency to only show paragraph-initial or in the first row of a page. They may be embellishments of “regular” characters (as was often done in manuscripts of the era), but we don’t know which “regulars” they’d replace.

But there is one obstacle even more great than this, and much more fundamental: Any “brute force” attack would presume that the ciphertext words of the VM are mapped 1:1 from the plaintext words. And this is extremely unlikely for a number of reasons:

  1. The ciphertext alphabet seems to consist of around 17 frequent letters, plus a large number of rare “wierdos”. That maps poorly to a latin alphabet.
  2. Some frequent letter groups show up almost exclusively word-initial (“qo”) or word-terminal (“dy”). That’s unknown for any Central European language.
  3. Word-length distribution is odd: There is a shortage of both very short and very long words; words have a comparatively uniform length — Again, this is unusual for Central European languages.
  4. Overall, the words exhibit a very regular structure — check out Stolfi’s “Core-Mantle-Crust” paradigm. (Yes, it’s a tough read, but worth working it through if you want to understand the VM.) they are composed by a fairly rigid “grammar”, the like of it is unknown for European languages.
  5. Nobody has been able to identify particles and articles (“a”, “and”, “with”…) in the VM.

All of these differences between natural languages and the VM make it highly unlikely that the enciphering mechanism simply always turned plaintext word “A” into ciphertext word “X”, and “B” into “Y”.*) I’m convinced that one VM word is not equivalent to a plaintext word, but rather that it only represents a few letters.

There are other assumptions — Don of Tallahassee assumes it’s a list of highly abbreviated recipes, David Suter presumes it could be geographical coordinates encoded. Theoretically, all these schemes could be attacked by brute computer force, but this would only make sense once the enciphering method was sufficiently clear. And exactly this is not the case — to my knowledge, no theory has been put forth which would sufficiently explain all the peculiarities we observe in the ciphertext, and hence there’s simply no starting point for a computer programme to launch.

*) There are actually two scenarios where it would be just conceivable that there is a 1:1 correspondence between plaintext words and cipher words.

One is that the VM was written with the aid of a dictionary, where all plaintext words were numbered, and in the VM their numbers were written down not in arab numerals, but in something like the Roman numbering system — ie word “259” in the dictionary would have been written “CCLIX”. While this is conceivable, up to now nobody has been able to provide a coherent numbering system which would result in the “word grammar features” mentioned above.

The second idea would be that the VM was written in an artificial language, in particular in one of the “logical” or “A priori languages“. (Check out Solresol for an example.) These artificial languages construct their words from “blocks” which do resemble the “core-mantle-crust” syllables found by Stolfi. But the first comparable logical languages date from at least two centuries after the VM was written, so their use is fairly unlikely.