More about lines and curves

Brian Cham, who in April this year surprised the Voynich community with a razorsharp deduction that none but René Zandbergen is the actual author of the VM (probably nobody was more surprised to hear that than René himself ;-)), is at it again, but this time on a more serious note:

In a long, but well worth the reading blog post, he presents the “Curve-Line System” which he has detected in the VM.*) He has attempted to poke through the undergrowth of the well-known Voynich word grammar rules: In the past, a number of people have tried to explain the obviously regular VM word structure with a bundle of more or less complex rules, and with more or less success. Brian now goes one step further and divides the better part of VM characters based on the shape of their basic stroke, judging whether it’s “curvy” or “linear” (ie straight). Starting from this assumption he arrives at a surprisingly simple set of rules which allow to re-create the better part of the VM corpus.

While the statistics seem sound, though of course always open to attack, the beauty of this discovery lies in the fact that it doesn’t need to arbitrarily divide letters into different classes, but the division is done base on the shape of the letter, ie, it’s “obvious”.

One thing which puzzles me though is that Brian apparently doesn’t discriminate between Currier A and B (except in his test in section 3.7.2). It should come as a surprise to me if the grammar rules would actually hold for both “languages” without modification.

Overall, I think Brian may be well underway to finding a method in which the VM text was created. As yet, he hasn’t suggested how the Curve-Line System may be connected with the encipherment of a plaintext, or with the generation of meaningless pseudo-ciphertext.


*) I only learned about his post these days, but it seems it was already posted late in 2014. 

The Last Word Hasn’t Been Spoken

In 2009 the McCrone institute did what all Voynicheros had been longing for for the longest time: They performed a scientific analysis on the VM.

As had been the case with the high resolution pictures of the VM, it was hoped that this new enterprise would yield more insight into the VM — and like in that case, it served to generate some confusion.

(You can read an abstract of the McCrone analysis).

Part of the McCrone analysis was the carbon dating which had established that the sheep which donated their skin for the vellum bleated for the last time around the 1450’s, a result in line with previous assumptions based on the Sagittarius archer’s dress and crossbow, and assessment of the writing style of letters and numbers.

The part which concerns us here right now (and which has caused a considerable stir on the VM mailing list lately, only five years after its original publication ;-) is their survey of the ink composition. Having taken some twenty samples from various corners of the VM (regular, text, drawings, quire numbers and marginalia), they come to these conclusions:

  • (While the ink isn’t of uniform composition) … “We found no significant differences between the writing inks (for the main body — ev) and the drawing inks used throughout the document and tentatively conclude that the text and drawings were most likely created contemporaneously”
  • For the page numbers, for the quire numbers, and for the latin alphabet on f1r, three different inks were used, which are also different from the main body inks.

So far, this is in line with what had been assumed all along: The writing was done at the same stage as the drawings (possible with the colouring coming at a later time). Over the course of time, the VM had been disassembled and rebound (the discussion about this process can be found on the web), at which time the current page and quire numbers were added. The marginalia were also written after the main body of text, probably by a later owner of the VM.

The question of whether the marginalia were written at the same time as the rest of the document bears a large significance on the “fake” discussion which is currently on:

a) If both were written at the same time (with equivalent inks), it would stronly point to a fake, because it would be fairly unusual for the author to write marginalia in his own book — especially if, while the body was written in Voynichese, the maginalia are in latin letters.

b) If OTOH the inks were different, this would indicate a genuine book which went through various hands and had been annotated at various points in time.

At first glance, McCrone seems to support b), if it wasn’t for a small detail: Among the samples for the “main” body, there was also the notorious sample #16, which was taken from f116v — the very last page of the VM, with the “anchiton oladabas” marginalia.

So this seems to paint the following picture:

In a first phase, the main body of the VM with its Voynichese and the illustrations was drawn. This includes the marginalia on f116v. Only at a later stage the “pure latin letter” marginalia and the page/quire numbers were added.

So, interestingly enough, we end once more in a peculiar situation: While some parts of the marginalia point to a “genuine” MS, the biggest and most prominent piece of marginalia, the one on f116v, seems to have been applied with the rest of the writing, and would hint of a fake.

Scandinavia Hailing the Arabs

And another incoming message, this time from Peter Ole Kvint:

I would guess that the text is Berber. Berbers have had several writing systems but none of them to be suitable for writing with quill and ink.

When you consider how cookbooks copied today, then herbal books from middelaldren be copied copy. If you have a list of herbs from a book then most could be found in the illustrations. Or the options limited.
Since the patterns seen in other books, so it must be possible to retrieve the same recipes. And thus rediscover plant names.
Note that on some artwork, the plants are cut and put back together on the thicker root. These plants may be large, perhaps trees and shrubs.

sincerely,

Peter Ole Kvint

Period wordlists

Dan wrote already some time ago, and again I must apologize that I’m currently fairly busy with other projects, and hence can’t devote as much time to the VM as I should. Nevertheless, I finally should give him the floor:

Yeah yeah, here’s another theory. Actually I’m not going into the theory, but simply asking if you can provide any assistance in resources I am seeking. Let me back up a bit – I’m a full time software developer of over 20 years, and I have had some insights regarding the manuscript. I’ve written software to generate various statistics about the document and have found some surprising and very obvious (once distilled down to hard numbers) patterns that further validate the insights. These are not “hunches”, or “gut feelings” or any mystical, nutty stuff. It’s simply what it is, and the analysis doesn’t lie.

I am currently running brute force deciphering attempts using additional software I have developed, based on my theory of how the document is ciphered. The main resource I am lacking at this time are simply word lists of the candidate languages the manuscript may have been written in (in its decoded form of course), and specifically, the vernacular and spelling of those languages when the manuscript was written in the 1400s.

I have always assumed the Voynich manuscript was a hoax, but when it was positively dated a few years ago I took a harder look, again with the expectation that it was a hoax but at least a hoax contemporary to the 15th century. My attempt was actually to prove (just to myself) that very thing – that it is just a contrived hoax. Unfortunately the insights and analysis I have done over the last few years have left no other option but to follow the logical progression until it peters out and comes to a dead end. I have not yet reached that point.

Thanks for you time, and again, if you know of simple word lists (or who can provide them or assist in that) of good candidate languages from the 15th century, that would be quite helpful.

This question isn’t so easy to answer. First of all, even when taking the the C14 dating of the vellum as a given, we still have about a century of leeway regarding the actual production date of the manuscript. A century is a long time in which languages can change.

Secondly, languages weren’t “codified” as strictly as they are today, and pretty much everyone would write down their MSs in their local dialect, not to mention the fact that strict orthography wasn’t enforced yet either. Which means that even two people from the same region writing at the same time wouldn’t necessarily employ the same spelling. (An extreme example of this is the Bayeux Tapestry (admittedly predating the VM by some 400 years), where the name of William the Conqueror is written IIRC in not less than seven different manners.) Hence, to make a long story short, any word list should be taken with a grain of salt.

I did some statistics in the past myself, and to get decent wordlists I simply went to Gutenberg.org, downloaded a few works I considered representative of the era, and ran my own little wordcount scripts on these files.

IMHO, prime candidates for the plaintext languages are Latin, English, French, German (including the various dialects like Swiss), and perhaps Spanish. But though I wouldn’t bet on it, more exotic options like Hungarian, Finnish or maybe the Lingua Franca can’t be ruled out either.

Sorry, but this is probably as less simple answer than you asked for?

The Cuttest Critter

James recently asked me:

Just wondering what type of animal you think it is eating what is believed to be a Woad Plant on f25v

referring to this cute little critter.

I don’t think it’s supposed to be a real animal. My guess is it’s a little dragon; the scaly back, the comparatively short legs, the ears and comb on the neck, and the fact that it may only have two legs seem to be a good match for me. Compare here for the idea of a 15th century painter (Uccello) what a dragon is supposed to look like.

“Censorship”

Lately, I received a brief message —

Censorship should be consistent

which I feel deserves a bit of comment because it represents a widespread, but false notion from the web. (I presume the missive didn’t allude to the general state of international politics, but referred to my decision to block some user comments on my blog.)

First of all, “censorship” means the suppression of information or opinion, usually through a public body. This is definitely not the same thing as deciding to ignore a contribution.

But, secondly and more importantly, you seem to feel you are entitled to using my blog for your messages. This is simply wrong, contribution here is a priviledge I grant (or withhold), as is the case with any private web page. My blog is not a public place to which anyone should have access, but a private BBQ I hold in my backyard. You are invited to drop by and share the party, but if you act inappropriately, I’ll kick you out, and, as the digital landlord, here I’m the sole arbiter to what constitutes appropriate behaviour. Simple as that. Play somewhere else.

Considering that I work for the maintenance of this site, that I’m legally responsible for the contents and that finally I’ll also be judged on the merits of the contributions here, I feel this is only fair. You’re free to go any other place, and party and voice your opinion there, and you will find I do nothing to hinder your free speech there. (That would be censorship.)

So while you’d be able to publicize on your own, you prefer to parasite from the infrastructure provided by me, insulting me with claims of “censorship” when I refuse to comply. This in itself should justify blocking your access.

“This conversation can serve no further purpose.”

A Plea to all Voynicheros

If you pursue a theory, please keep your website up-to-date.

As happened several times on the Voynich list during the last weeks, readers were encouraged to test other’s deciphering schemes based on publications on certain websites, but ran into dead ends or couldn’t arrive at the same results as the original poster. Only later or after complaining about this were they told that the information on the website was outdated.

This is impolite, cause it’s a waste of time on your readers’ part, it will make them irritated and discourage them to get seriously engaged with your theory, it will make them miss the point (of testing your theory), and it will do your reputation in general no good. So, it’s a win-win if you first update your website and then publicize it.

Thanks!

We put the “Brute” in the “Force”

There is no end to the string of new theories and commenters on this blog (Keep em coming, boys and girls!) Today it’s Zach, who sent me more a general question than a theory:

Please forgive me if this has already been covered in your site and I’m just not seeing it, but hasn’t anyone tried feeding the VM into a computer and brute-forcing it? Computers are really good at trying every possible combination and weeding out the ones that don’t make sense, so it seems like they ought to be perfect for grinding away at the VM by one means or another.

Any idea whether computers have been tried?

And followed it immediately up, for good measure, with some more detail:

Me again. I’m elaborating a bit, in case I really am
a) the first person to think of this and
b) there’s really no reason it wouldn’t work

I did see one article here on computers and frequency analysis, but the author made a good point about not knowing the language and so not knowing the frequency rules. I envision a brute-force that attacks the words rather than the letters. It would work like this:
FIrst, your computer is fed lots and lots and lots of example texts, the more the better, so it can build a probability map that stores the likely hood of any given word being followed by (or just found near) a second word, and given those two words, what third word is likely to come next, and so on. You will never get firm answers, but this is fuzzy logic – as long as we can grade potential sentences as more or less likely to be linguistically correct and sensible, we are good to go.
Once the computer has this probability map, the rest is just a matter of brute force:
Start with one of the repeating ‘words’ in the VM.
Assign it an arbitrary English meaning.
Using that meaning, and the probability map, assign English words to each word preceding and following your starting word. Branch outward from there until you have ‘translated’ all words. Use your probability map to judge how likely it is that the ‘translation’ is actual English sentences (best if you can ignore word order and focus on word proximity, because word order falsely assumes they had the same grammar as we do). Store that probability and start over with a new guess for your first repeating word. Repeat over and over until you’ve found a high-probability match or, more likely, you’ve run out of choices.
Assuming you’ve not found a 100% match, present the 10 most probable ‘translations’ for human inspection.
Compiling tables of synonyms would also improve things; that way the computer could also consider how likely CONCEPTS are to be grouped together, since English and VM-speak are very unlikely to have a 1:1 mapping.

The core assumption is that, even if there are no direct word matches, there’s a 1:1 map of concepts between VM-speak and your target language (English, in my case). I think that’s a safe assumption, because if there’s no such mapping, then it seems translation would be impossible, like trying to translate a 1-time-pad cipher without the key.

Anyhoo, I hope you enjoy my ramblings. Thanks for listening.

Cheers.

Thanks, Zach, for your input.

For a start, there is actually a wealth of computer power which has been pumped (more or less in vain) into the black hole of information we call the Voynich. I myself tried a little with my Stroke theory, for example, but these efforts are dwarfed by people like Jorge Stolfi or Julian Bunn, among others. But these mostly focus on analysis of the VM, not directly on a translation. Why is this?

Well, a brute force attack is hampered by several constraints:

  1. Our statistical material is limited. The VM comprises some 130,000 characters, which appears to be a lot. But when you look at it, that’s only some 30,000 words. If you further take into account the different encoding schemes (aka “Currier A” and “Currier B”, resp.), which differ subtly but do differ, you’re left with only a sample of some 15,000 words, which isn’t that much.
  2. We don’t know the plaintext language underlying the VM. English is possible, yet some of the marginalia point to French or Spanish, the images provide hints to Italy, and some clues point to Germany, not to mention that Latin would have been the lingua franca of the era.
  3. We know next to nothing about the subject matter, and accordingly little about the vocabulary used.
  4. We’re unclear about the ciphertext alphabet. We have really no idea whether the sequence of two connected “c”s really means “two ‘c’s in a row”, or is a completely different letter. (Compare this to the case of latin letters where “nn” is something completely different than “m”.) We don’t know if the “drops” above some “cc” groups only modify the underlying letter(s) (compare “O” -> “Ö”), or if they make it a completely different letter (compare “O” -> “Q”).
  5. Some characters like the notorious “gallows” show a tendency to only show paragraph-initial or in the first row of a page. They may be embellishments of “regular” characters (as was often done in manuscripts of the era), but we don’t know which “regulars” they’d replace.

But there is one obstacle even more great than this, and much more fundamental: Any “brute force” attack would presume that the ciphertext words of the VM are mapped 1:1 from the plaintext words. And this is extremely unlikely for a number of reasons:

  1. The ciphertext alphabet seems to consist of around 17 frequent letters, plus a large number of rare “wierdos”. That maps poorly to a latin alphabet.
  2. Some frequent letter groups show up almost exclusively word-initial (“qo”) or word-terminal (“dy”). That’s unknown for any Central European language.
  3. Word-length distribution is odd: There is a shortage of both very short and very long words; words have a comparatively uniform length — Again, this is unusual for Central European languages.
  4. Overall, the words exhibit a very regular structure — check out Stolfi’s “Core-Mantle-Crust” paradigm. (Yes, it’s a tough read, but worth working it through if you want to understand the VM.) they are composed by a fairly rigid “grammar”, the like of it is unknown for European languages.
  5. Nobody has been able to identify particles and articles (“a”, “and”, “with”…) in the VM.

All of these differences between natural languages and the VM make it highly unlikely that the enciphering mechanism simply always turned plaintext word “A” into ciphertext word “X”, and “B” into “Y”.*) I’m convinced that one VM word is not equivalent to a plaintext word, but rather that it only represents a few letters.

There are other assumptions — Don of Tallahassee assumes it’s a list of highly abbreviated recipes, David Suter presumes it could be geographical coordinates encoded. Theoretically, all these schemes could be attacked by brute computer force, but this would only make sense once the enciphering method was sufficiently clear. And exactly this is not the case — to my knowledge, no theory has been put forth which would sufficiently explain all the peculiarities we observe in the ciphertext, and hence there’s simply no starting point for a computer programme to launch.

*) There are actually two scenarios where it would be just conceivable that there is a 1:1 correspondence between plaintext words and cipher words.

One is that the VM was written with the aid of a dictionary, where all plaintext words were numbered, and in the VM their numbers were written down not in arab numerals, but in something like the Roman numbering system — ie word “259” in the dictionary would have been written “CCLIX”. While this is conceivable, up to now nobody has been able to provide a coherent numbering system which would result in the “word grammar features” mentioned above.

The second idea would be that the VM was written in an artificial language, in particular in one of the “logical” or “A priori languages“. (Check out Solresol for an example.) These artificial languages construct their words from “blocks” which do resemble the “core-mantle-crust” syllables found by Stolfi. But the first comparable logical languages date from at least two centuries after the VM was written, so their use is fairly unlikely.

Die Antwort der Teutonen

After all the suggestions for the VM which arrived from Russia and France over the last few weeks, with Michael Hadlich it’s now another German VM afficionado’s turn to throw his intellectual hat into the ring, so to speak:

I did a graphical analysis of the words on page f76r (http://www.voynich.nu/q13/index.html#f76). I found it strange that some words are written in different angles even when they stand very close. So I drew a rectangle around each word to see if there are words with same angle:
http://img513.imageshack.us/img513/2362/8r8l.png

It seems there are correlations between single words with exact the same angle. I’ll continue with the analysis to find some more relations. My first thought was about a cardan grille. But the letters often have very long ascenders.

Another point is that the words are written not as a whole sentence but letter by letter and word by word. It looks like the author stopped after each word, sometimes after each letter. Only a few letters are connected with ligatures. This is not typical for a natural language and a very inefficient way to write. When you look at the technique the letters are written, you can find that some letters are darker than others. This is due to the fact that one can’t write a lot of letters with this little amount of ink on the feather. BUT: Sometimes you can see phrases in same brightness with just one dark letter in the middle. That’s also very strange. Try to write with a feather and you see what I mean.

I have two possible solutions for the points above:
1) The text is not the original book but a copy from a person who did not understand the content.
2) The text is constucted using a mechanical device, lets say a cardan grille or a wheel as shown on page f57v (http://www.voynich.nu/q08/index.html#f57). What looks like a word is just a symbol for a word in reality. Single “letters” are also symbols for words or numbers.

My guess is the use of a wheel as shown on page f57v. Maybe this wheel is not the one that has been used. It’s possible that the “real” wheel doesn’t exist anymore. But we can try to re-construct it.

Let’s have a look at the wheel on page f57v: Very interesting is the second circle / band (seen from outside). You can see 17 single symbols written once in each quadrant (N/E/S/W). This is the only circle (band) on the whole disc with a double line (key) at NW position. Obviously the disc is a device to set symbols in relation to words. One important question is: what mask (or cardan grille) is used to see the selected word / symbol and how is the wheel turned to point from a small symbol to a word symbol. I guess the mask looks like a disk too but with cutouts at some positions to see “words” and “symbols” thru these holes.

These are just my thougths at the moment about the VMS, and I’m far away from a real solution. But I’ll keep on trying and tell you my findings.

I wanted to reply to Michael’s ideas, but due to my tardiness he has meanwhile apparently assumed that he had to take matters in his own hands, and has subscribed to the Voynich Mailing List, where there now is a lively discussion going on.

Still, I’d like to publish his ideas here, too. Thus, if you don’t feel like subscribing to the list (though I highly recommend it if you have any interest in the VM, or simply in a bunch of quirky individuals), please do discuss Michael’s ideas here!