We put the “Brute” in the “Force”

There is no end to the string of new theories and commenters on this blog (Keep em coming, boys and girls!) Today it’s Zach, who sent me more a general question than a theory:

Please forgive me if this has already been covered in your site and I’m just not seeing it, but hasn’t anyone tried feeding the VM into a computer and brute-forcing it? Computers are really good at trying every possible combination and weeding out the ones that don’t make sense, so it seems like they ought to be perfect for grinding away at the VM by one means or another.

Any idea whether computers have been tried?

And followed it immediately up, for good measure, with some more detail:

Me again. I’m elaborating a bit, in case I really am
a) the first person to think of this and
b) there’s really no reason it wouldn’t work

I did see one article here on computers and frequency analysis, but the author made a good point about not knowing the language and so not knowing the frequency rules. I envision a brute-force that attacks the words rather than the letters. It would work like this:
FIrst, your computer is fed lots and lots and lots of example texts, the more the better, so it can build a probability map that stores the likely hood of any given word being followed by (or just found near) a second word, and given those two words, what third word is likely to come next, and so on. You will never get firm answers, but this is fuzzy logic – as long as we can grade potential sentences as more or less likely to be linguistically correct and sensible, we are good to go.
Once the computer has this probability map, the rest is just a matter of brute force:
Start with one of the repeating ‘words’ in the VM.
Assign it an arbitrary English meaning.
Using that meaning, and the probability map, assign English words to each word preceding and following your starting word. Branch outward from there until you have ‘translated’ all words. Use your probability map to judge how likely it is that the ‘translation’ is actual English sentences (best if you can ignore word order and focus on word proximity, because word order falsely assumes they had the same grammar as we do). Store that probability and start over with a new guess for your first repeating word. Repeat over and over until you’ve found a high-probability match or, more likely, you’ve run out of choices.
Assuming you’ve not found a 100% match, present the 10 most probable ‘translations’ for human inspection.
Compiling tables of synonyms would also improve things; that way the computer could also consider how likely CONCEPTS are to be grouped together, since English and VM-speak are very unlikely to have a 1:1 mapping.

The core assumption is that, even if there are no direct word matches, there’s a 1:1 map of concepts between VM-speak and your target language (English, in my case). I think that’s a safe assumption, because if there’s no such mapping, then it seems translation would be impossible, like trying to translate a 1-time-pad cipher without the key.

Anyhoo, I hope you enjoy my ramblings. Thanks for listening.

Cheers.

Thanks, Zach, for your input.

For a start, there is actually a wealth of computer power which has been pumped (more or less in vain) into the black hole of information we call the Voynich. I myself tried a little with my Stroke theory, for example, but these efforts are dwarfed by people like Jorge Stolfi or Julian Bunn, among others. But these mostly focus on analysis of the VM, not directly on a translation. Why is this?

Well, a brute force attack is hampered by several constraints:

  1. Our statistical material is limited. The VM comprises some 130,000 characters, which appears to be a lot. But when you look at it, that’s only some 30,000 words. If you further take into account the different encoding schemes (aka “Currier A” and “Currier B”, resp.), which differ subtly but do differ, you’re left with only a sample of some 15,000 words, which isn’t that much.
  2. We don’t know the plaintext language underlying the VM. English is possible, yet some of the marginalia point to French or Spanish, the images provide hints to Italy, and some clues point to Germany, not to mention that Latin would have been the lingua franca of the era.
  3. We know next to nothing about the subject matter, and accordingly little about the vocabulary used.
  4. We’re unclear about the ciphertext alphabet. We have really no idea whether the sequence of two connected “c”s really means “two ‘c’s in a row”, or is a completely different letter. (Compare this to the case of latin letters where “nn” is something completely different than “m”.) We don’t know if the “drops” above some “cc” groups only modify the underlying letter(s) (compare “O” -> “Ö”), or if they make it a completely different letter (compare “O” -> “Q”).
  5. Some characters like the notorious “gallows” show a tendency to only show paragraph-initial or in the first row of a page. They may be embellishments of “regular” characters (as was often done in manuscripts of the era), but we don’t know which “regulars” they’d replace.

But there is one obstacle even more great than this, and much more fundamental: Any “brute force” attack would presume that the ciphertext words of the VM are mapped 1:1 from the plaintext words. And this is extremely unlikely for a number of reasons:

  1. The ciphertext alphabet seems to consist of around 17 frequent letters, plus a large number of rare “wierdos”. That maps poorly to a latin alphabet.
  2. Some frequent letter groups show up almost exclusively word-initial (“qo”) or word-terminal (“dy”). That’s unknown for any Central European language.
  3. Word-length distribution is odd: There is a shortage of both very short and very long words; words have a comparatively uniform length — Again, this is unusual for Central European languages.
  4. Overall, the words exhibit a very regular structure — check out Stolfi’s “Core-Mantle-Crust” paradigm. (Yes, it’s a tough read, but worth working it through if you want to understand the VM.) they are composed by a fairly rigid “grammar”, the like of it is unknown for European languages.
  5. Nobody has been able to identify particles and articles (“a”, “and”, “with”…) in the VM.

All of these differences between natural languages and the VM make it highly unlikely that the enciphering mechanism simply always turned plaintext word “A” into ciphertext word “X”, and “B” into “Y”.*) I’m convinced that one VM word is not equivalent to a plaintext word, but rather that it only represents a few letters.

There are other assumptions — Don of Tallahassee assumes it’s a list of highly abbreviated recipes, David Suter presumes it could be geographical coordinates encoded. Theoretically, all these schemes could be attacked by brute computer force, but this would only make sense once the enciphering method was sufficiently clear. And exactly this is not the case — to my knowledge, no theory has been put forth which would sufficiently explain all the peculiarities we observe in the ciphertext, and hence there’s simply no starting point for a computer programme to launch.

*) There are actually two scenarios where it would be just conceivable that there is a 1:1 correspondence between plaintext words and cipher words.

One is that the VM was written with the aid of a dictionary, where all plaintext words were numbered, and in the VM their numbers were written down not in arab numerals, but in something like the Roman numbering system — ie word “259” in the dictionary would have been written “CCLIX”. While this is conceivable, up to now nobody has been able to provide a coherent numbering system which would result in the “word grammar features” mentioned above.

The second idea would be that the VM was written in an artificial language, in particular in one of the “logical” or “A priori languages“. (Check out Solresol for an example.) These artificial languages construct their words from “blocks” which do resemble the “core-mantle-crust” syllables found by Stolfi. But the first comparable logical languages date from at least two centuries after the VM was written, so their use is fairly unlikely.

Advertisements

Die Antwort der Teutonen

After all the suggestions for the VM which arrived from Russia and France over the last few weeks, with Michael Hadlich it’s now another German VM afficionado’s turn to throw his intellectual hat into the ring, so to speak:

I did a graphical analysis of the words on page f76r (http://www.voynich.nu/q13/index.html#f76). I found it strange that some words are written in different angles even when they stand very close. So I drew a rectangle around each word to see if there are words with same angle:
http://img513.imageshack.us/img513/2362/8r8l.png

It seems there are correlations between single words with exact the same angle. I’ll continue with the analysis to find some more relations. My first thought was about a cardan grille. But the letters often have very long ascenders.

Another point is that the words are written not as a whole sentence but letter by letter and word by word. It looks like the author stopped after each word, sometimes after each letter. Only a few letters are connected with ligatures. This is not typical for a natural language and a very inefficient way to write. When you look at the technique the letters are written, you can find that some letters are darker than others. This is due to the fact that one can’t write a lot of letters with this little amount of ink on the feather. BUT: Sometimes you can see phrases in same brightness with just one dark letter in the middle. That’s also very strange. Try to write with a feather and you see what I mean.

I have two possible solutions for the points above:
1) The text is not the original book but a copy from a person who did not understand the content.
2) The text is constucted using a mechanical device, lets say a cardan grille or a wheel as shown on page f57v (http://www.voynich.nu/q08/index.html#f57). What looks like a word is just a symbol for a word in reality. Single “letters” are also symbols for words or numbers.

My guess is the use of a wheel as shown on page f57v. Maybe this wheel is not the one that has been used. It’s possible that the “real” wheel doesn’t exist anymore. But we can try to re-construct it.

Let’s have a look at the wheel on page f57v: Very interesting is the second circle / band (seen from outside). You can see 17 single symbols written once in each quadrant (N/E/S/W). This is the only circle (band) on the whole disc with a double line (key) at NW position. Obviously the disc is a device to set symbols in relation to words. One important question is: what mask (or cardan grille) is used to see the selected word / symbol and how is the wheel turned to point from a small symbol to a word symbol. I guess the mask looks like a disk too but with cutouts at some positions to see “words” and “symbols” thru these holes.

These are just my thougths at the moment about the VMS, and I’m far away from a real solution. But I’ll keep on trying and tell you my findings.

I wanted to reply to Michael’s ideas, but due to my tardiness he has meanwhile apparently assumed that he had to take matters in his own hands, and has subscribed to the Voynich Mailing List, where there now is a lively discussion going on.

Still, I’d like to publish his ideas here, too. Thus, if you don’t feel like subscribing to the list (though I highly recommend it if you have any interest in the VM, or simply in a bunch of quirky individuals), please do discuss Michael’s ideas here!

Letter from France Encore

And here is the next missive from France, this time from Stephanie Levavasseur, and Stephanie has much confidence in either my French faculties or in the abilities of Google translate and sent me her message in French, a language in which in consider myself a dilettante at best. Anyway, alors, mes enfants:

J’ai trouvé ces fragments dans le livre : Pseudo-Apulée, De medicaminibus herbarum liber, SIUE Herbarius (pseudo-Apulée, Herbier) http://gallica.bnf.fr/ark:/12148/btv1b84262821

D’après l’analyse de la BNF qui accompagne le livre, plus de 5 personnes ont annoté le livre après l’auteur.

Voici les pages du manuscrit dans lesquelles se trouvent les fragments:

Qu’en pensez-vous ? des similitudes pour certaines lettres ? tentative de cryptage ?

Now, if I do properly make sense of Stephanie’s mail, she has come across the herbal by Pseudo-Apulée and noted that, according to BNF’s analysis (whoever that is…), the various marginalia found therein were written by no less than five different hands. This being somehow similar to the way the VM marginalias were written, she wonders whether this could be a crib to the VM. (Forgive me, Stephanie, if I mangled your message too badly!)

Now, this looks interesting, especially f135, with the top line which — to me — looks vaguely like Sanskrit, and the characters below looking like regular latin letters which have been blown to pieces. But I guess it would be necessary to know a little more about the history of this book in particular.

The others from the Pseudo-Apulée look like “innocent” (ha!) writing exercises to me, with people writing down the alphabet, with the exception of the second part on f37v, which, at a first glance, makes no sense to me, much like the VM marginalia.

Your opinions, ladies and gentlemen?

Thoughts from France

For some odd reason, my blog seems to attract mostly people not from the US, and here’s what Michel Travers from France sent me lately regarding the VM. It’s not really a “theory” but a few notes which deserve attention.

I’am a new comer in the Field but eventually it occured to me that too little attemps has been seriously done about identifying numbers. EVA alphabet has assigned all of the Voynich characters to letters and not even to one single number ! Some researchers have tentatively suggested a 2, an 8, a 9, eventually a 4, but that’s all.
In that respect the choice of your main page with the drawing from 69r is probably not a coincidence: between the arms of the starfish we have 6 different characters with apparent counter characteristics. Then we have 22 radius-like, clearly identified sentences/sequences and another 16 on the brim the wheel, mostly all of them ending with one of those characters… The radius at “2 o’clock” seems to be a starting index. Interesting… Have you worked on this page ?

Michel, you seem to be the victim of a misunderstanding regarding EVA. EVA is not a suggested decipherment, but it is simply a transcription, that is, a rendering of the Voynich characters in a digital, computer-readable format. The idea is no more and no less than to always designate the same transcription character to identical VM characters, and different transcription characters to different VM characters. (Of course, this is a delicate enough task difficult.) The purpose is to allow communication through email with the transcription, and to be able to start computerized statistics on it, so that two researches on the VM list can exchange their ideas, “I’ve noticed that ‘qo’ seems to be always word-initial, while ‘dy’ is word-final.”

But which transcription symbol is used for any VM character is completely arbitrary, provided your system is consistent. Some researchers have chosen to include numbers into their transcription, because, well, it’s a fairly obvious thing to do to transcribe a character which looks like “8” or “9” with — you guessed it “8”, and “9”, respectively. In the case of EVA, the visual similarity was relegated in favour of a representation which allowed researchers to pronounce the VM transcription, ie, most VM words can be spoken when transcribed into EVA,*) so two researchers can’t only exchange mails about it, but even literally talk about EVA words.

But in no way was the EVA transcription be meant to be a translation. EVA doesn’t assume any special meaning behind the VM letters; the EVA letters were simply codes chosen to represent the VM letters, much like ZIP codes are used to represent cities, but of course aren’t the cities.

Does that clear up the misunderstanding?

(As for 69r, I haven’t done particular work on it, but Rich SantaColoma has, and I like his ideas…)

*) I think I (and others) have mentioned before that the fact that this is possible at all is most remarkable, but nobody really knows what to make of it.

From Russia, with Voynich

I’m finally getting around to process the mails which have arrived over the last few weeks regarding readers’ ideas about the VM. (I apologize again to the posters for the delay!)

First of all in the queue is “Y.M.” (Yurij? Yewgenij?), hailing from Russia. He’s a self-declared newcomer to the VM (as opposed to us “old hands” who have been so successful in making heads or tails of it over the last years, apparently ;-), and he has a suggested reading for the notorious marginalia of the VM’s last page, f116v. (Click on the image to get a full-res display.)

M.V.last page

I’m not so sure what to make of it, because to me it looks like just another possible reading of the letter sequences, and one which doesn’t make any more or less sense than the scores which have been presented previously.

One distinguishing feature “Y.M.” failed to address explicitly, though, is the idea that the last “word” on the second line might actually be the astrological symbol for Capricorn, rather than the letter “m” as is usually believed. While inserting Capricorn here doesn’t make the short paragraph much more readable, in general it might be a good idea to include astrological/alchemical symbols into the “character repertoire” against which we try to match the marginalia.

Your ideas? Comments welcome, as always.