Julian Bunn’s Introduction Booklet

In the course of my renewed interest in the Voynich, I browsed also through my old notes and literature. On my bookshelves, I was reminded of Julian Bunn’s small booklet, Puzzles of the Voynich Manuscript.

It’s a most readable introduction to the Voynich from the point of few of a new acolyte who wants to tackle the VM. Julian covers most of the relevant features of the VM in a concise and easy-to-grasp format, so you will know what you’re up against if you attempt any decipherment. Checking off this list of features will also help you in a first “sanity check” if you have come up with an idea — if your theory doesn’t acommodate for the observed effects, it probably won’t hold water.

At the same time Julian isn’t championing any particular idea, but in a most unbiased manner presents ideas and arguments from other people and weighs their pros and cons. If one had to find any flaws with it, I’d say the layout could have done with a little more TLC. The font is a little too big, the pictures are a little too small, the pagination sometimes is awkward. At A4 the book is a bit unwieldy, a more compact pocket format “vademecum” might have served better, but YMMV. Quite obviously Julian prioritized content over presentation, which need not be a bad thing.

Overall, get the book. You won’t regret it, it’s a short read of highly concentrated insight to kickstart your VM research career.

You may obtain it from Amazon through the link above, for free as a digital version, or for a small obol in printed form.

Putting the Cart Before the Horse

Lately I complained about the fact that all the “old guard” of the VM apparently had quit their occupation, but I found out that I have thoroughly misunderstood the situation. The venerable Julian Bunn for example is still as number crunching as ever, and the last time I checked on his blog I simply had the bad luck of visiting him when he was in a period of inactivity. Likewise the highly esteemed Rene Zandbergen has published a new paper about the VM, and both seem to be working on a method of recreating the VM content semi-automatically by use of a number of wheels containng “syllables” which are constantly re-arranged to form VM words — a “Voynich slot machine”, if you pardon the expression. So, their work goes in the same direction as mine, though Rene’s approach appears to be more focussed on a three-component approach to word composition, rather than the two-component of Robert Firth and me.

Regardless, both approaches try to compose the VM vocabulary from a limited number of building blocks. When perusing Rene’s essay, it occured to me that it might be helpful “to put the cart before the horse”.

What if we tried these composition methods to see which of the VM words it could compose do not actually occur in the VM? Would we gain insight from that? Robert’s method, taken naively, will only compose some 500 different words (compare to some 3000 different words in total in the VM, but with many being very rare). Are all 500 of these frequent in the VM? Is it only 250, and the other 250 are never used in the VM?

If there were two sets of 25 building blocks, they could compose between them 625 different words. Are 500 of them frequent in the VM, and the other 125 non-existant? If we also measure the frequency of two-letter combinations in a natural language (say, Italian), and find that 125 are “forbidden” and don’t occur,1 do we have a strong indication that each VM word represents two letters from an Italian plaintext…?

It hadn’t occured to me before that checking for exclusion rather than inclusion might be useful.


  1. Eg in German, “c” is almost always followed be “h” or “k”, but not by any other characters. ↩︎

Would Elias Schwerdtfeger come to the courtesy phone, please?

Elias, if you read this (or if anybody else knows his whereabouts): I lately tried to access your wonderful and convenient Voynich transcription extractor (http://vib.tamagothi.de/index.php), but while it seems to still be up, it seems to have issues with the separation of Currier languages. (My guess is, the corresponding php script is no longer supported.) Can you fix this?

Or does anybody know of a comparable tool which lets you filter for certain VM transcriptions? Any help welcome.

Time for an Inventory

It’s that time of the year again — Christmas season. I’ve been very quiet on this blog for some time, but I’ve lately picked up some interest in the Voynich again, and I thought I’d use the opportunity to make an inventory of sorts — where do we stand, where do we want to go, and how will we get there? Other things aside, I opened this blog in January 2009, so it’s been around close to 15 years, and perhaps it’s time to reflect.

Oddly enough, since the Hi-res scans from Beinecke were available and in the wake of the McCrone analysis of the VMs phyiscal properties, very little progress appear to have been made. IIRC I took an interest in the VM, and the mailing list back then was brimming with ideas and competent analysis, but AFAICT this has ebbed over the last years. The voices of Rene Zandbergen, Julian Bunn, Jorge Stolfi, Marke Fincher and many of the others of the “early days” of computer power being available for number crunching seem mostly quiet these days. Rich SantaColoma is still on his theory that the VM is a fake by Voynich himself, but AIUI he’s also kind of stalled in finding definite proof for this assumption. (Or did I miss breaking news, Rich?) Other than that, the same old ideas are being re-hashed endlessly, namely, that the VM is in one way or another the phonetic transcription of some kind of spoken language, usually Proto-Etruscan with a hint of Rongorongo, and the VM contains a selection of cocktail recipes for the Antipope’s personal barkeeper. Or some such thing.

This is even more frustrating since the very first thing you notice about the VM writing is that it is obviously different from all know natural languages. (At the same time, it can be transcribed so that it is almost pronouncable — which is what Rene and Gabriel Landini did with the EVA transcription–, and this is a most remarkable thing, but nobody seems to have taken up the hint.) So, the people running down that alley simly didn’t even invest the time and effort to get their basics right, and I’m under the impression that all the serious researchers have given up by now (or at least continue their work in seclusion), and the field is left to the clueless.

Personally, I think it was a early as 2010 that I had the idea that the VM’s encipherment may be based on the graphic dissection of the source letters. Namely, the letters of the plaintext would be decomposed into their original graphic elements (lines, circles, arcs), or correspondingly into their penstrokes. Thus, the letter “A” would be disassembled into three elements, namely a slash “/”, a horizontal hyphen “-“, and a backslash “\”. These three elements could be reassembled into the original plaintext letter: “/-\”. Now all the VM author had to do was assign to each element one ciphertext letter and write that down: “/” might be represented by <q>, “-” by <o>, and “\” by <c>, so the plaintext letter “A” would turn out “<qoc> in the VM ciphertext. And of course the graphical elements would turn up in different letters of the plaintext alphabet. “V” might be decomposed into “\/” and “W” into “\/\/”, so the ciphertext for “V” would be <qc>, and that for “W” <qcqc>. (Read the whole lengthy story here.) This scheme would have accounted for a large number of features of Voynichese, though not all of them.

In any way, having built upon Robert Firth’s observation, I ran into a dead end. The scheme as suggested leaves open a huge number of degrees of freedom and ambiguities.1 Most naturally, we don’t know the source language of the VM, hence we can’t directly apply anything like frequency counts to the statistics. There are many different ways to decompose letters, and there are many different letter fonts you can chose from to start with (compare printed letters and cursive). So the apparently easy tasks of finding the 50 or so building blocks which make up the majority of the ciphertext leads to a virtually unlimited number of possible combinations one would have to try. So, I wasn’t even able to verify where the Stroke theory was the right idea or whether I was barking up the wrong tree. And even if a Voynich fairy had told me, Elmar, you’re on the right track, I wouldn’t have known how to go on from there. This is where I stalled.

Anyway, maybe with a fresh look at things, I’ll finally have the grand idea which allows me to write a few lines of code, let the computer crunch away at the numbers for a few hours while I recline in my chair, and then let me reap success, fame and fortune. So, what needs to be done to reach that stage? Here are a few steps which I might tackle in the coming year, and see where this leads me:

  1. Check the VM character set. Of course, this lies at the heart of everything. Though the Stroke theory should not respond too sensitively to errors in the character set (thanks to Rene’s and Gabriele’s foresight, and such errors could be worked around), of course it would be best to set out with the correct set of characters from the start.
  2. Define a vocabulary for working with the Stroke theory. I feel in the past I have confused people tremenduously with the assumption that everybody uses the same definitions for terms like “word”, “token” or “syllable.” I need to clarify the vocabulary to be able to lead meaningful discussions with people.
  3. Check Robert Firth’s assumptions about the building blocks. Hitherto I accepted the “set” of building blocks Robert claimed would constitute the majority of the VM text. But it certainly wouldn’t hurt to see if this is really the optimum building block set. Likewise, it would be reasonable to extend Robert’s work, which has been done on Currier language “A” to my knowledge, to Currier “B” as well.
  4. Check two-letter statistics. Under the original Stroke theory2, VM words can represent any number of plaintext letters. Robert claimed that most VM words represent a single or two plaintext letters.3 So it would be reasonable to compare plaintext language statistics of two-letter groups with the statistics for VM word. If, for example, “st” and “rt” turned out to be the most frequent letter pairs in a certain language, and if the most frequent VM words were <qocheedy> and <qokcheedy>, it would be reasonable to assume that “s”, “r” and “t” were represented by the Firth blocks <qoch>, <qok>, and <eedy>, respectively.4 (Of course, not knowing which is the VM plaintext language complicates matters here once more.)
  5. Do more menial labor. Yes, this is what I like to do least. But perhaps I should finally get around to delve into the chores of finding out what the Stroke theory means for the VM language. If there is something to it, what can we deduce from that fact? For example, if we assume that a vertical line “|”, a horizontal hyphen “-“, and a circle “o” were used in the “(de-)construction set” it would immediately follow that sequences around “|” should abound. “I”, “P”, “B”, and “d” could be disassembled to “|”, “|o”, “|oo”, and “o|”, while “L”, “F”, and “E” would turn to “|-“, “|–“, and “|—“. The latter three groups could be equivalent to <in>, <iin>, and <iiin> (just in reverse order), but what about the others? Can they be found as well?

So, this might actually be my homework for the next few weeks and months. Let’s see where we’ll arrive at.


  1. E.g., as you see above, both “VV” and “W” in plaintext would lead to <qcqc> in the ciphertext, and it would be impossible to resolve this upon deciperhing, though there would be ways around it. ↩︎
  2. It would probably better to call it the “Stroke hypothesis”. ↩︎
  3. This would also fit in with the observation that most Currier B words can be split up in one mandatory and one optional part. See Grammar. ↩︎
  4. Please note that this and all other examples here are completely arbitrary, and were just the first things that came to my mind. I don’t expect any of this to hit on the actual truth. ↩︎

Strangest Things

Having lately watched the Science Channel docu series Strangest Things, I noticed that they covered the Voynich Manunscript as well. (It was episode 7, IIRC)

Overall I like the series very much for its down-to-earth examination of riddles, prefering the plausbile over the spectacular and keeping a (let’s use the dirty word:) scientific mindset, rather than an esoteric one. I found the coverage of the VM overall fairly balanced and thorough enough. (Maybe they didn’t put enough emphasis on the cryptological aspects, but this may be my personal bias. ;-))

I found it interesting that they exmained the plausibility of the VM being a “historical” or modern forgery (Hello, Rich!). I especially like the new angle they gave it by suggesting the VM may not have been a hoax perpetrated by the man himself, but by the Villa Frascati team, meaning they duped Voynich because they themselves direly needed the money.

It’s a nice twist, though IMHO a fanciful one. One crucial aspect is that the VM is missing almost all the Christian iconography present in medieval art. I doubt the Jesuits would have so strongly violated their work ethos, when they could have used the opportunity just the same to insert spectacular new theological content.

Notes from the Past

I’ve been browsing through an old notebook in which I kept notes about my Voynich studies lately. In June 2004 (boy, have we made progess since…!) I jotted down some ideas I had completely forgotten about by now:

“Apparently there are medieval ciphers where one vowel and the following letter are encoded with the same character: “a”/”b” -> <q>, “e”/”f” -> <r>, etc.

Of course, this would explain the occurence of triple glyphs.”

By “following” I meant “next in the alphabet.” Thus, the cipher would be basically a monoalphabetic subsitution cipher, but with “a” and “b” from the plaintext mapping to the same ciphertext character, “e” and “f” mapping to the same, etc.

Back then I devoted a bit of time and statistics to the issue, but aside of a suggestion of Italian as the plaintext language and a tentative mapping of EVA <e> to letters “i”/”l” (both “j” and “k” being uncommon at the time of creation of the VM, “l” would be the character following “i” in the alphabet), I didn’t get far.

Since 2004, I’ve moved away from simple substitution ciphers, because while the above scheme would indeed explain the occurence of three or more identical letters in a row, it fails to give an answer to the VM word grammar, why certain letter combinations are only ever found word-intitial or word-terminal, and other oddities.

Nevertheless, does one of my readers have more background where I may have come about this scheme?

“Guest Column”: Randall’s Thoughts

Starting the New Year by receiving new comments on your blog is always a wonderful thing. In 2021 it is Randall Galera, who stumbled upon my recent post about Cistercian Numbers and used the opportunity to throw in a few ideas of his own. Rather than discussing this in private, i suggested “coming out in the open” and present Randall’s thoughts for discussion in a larger forum. (Getting only answers from a single bloke like me probably wouldn’t do him justice.)

So, with Randall’s permission, here goes:

Origins.
Facts: The book was found on this planet. Its pages have been dated and are consistent. (Someone grabbed a stack off the printer and set to work). The ink and paper coincide with materials available during the time period. So physically we can say the book is real and is aged correctly.

One must then make a couple of assumptions. It was written by a human or it was not.
On one hand, the precision and consistency are impressive for a mortal scribe. Since to this day, we can not attribute the letter-word-cypher model, we are either dealing with an extremely complex code that today’s quantum computers might have a crack at, divine inspiration, demonic whisperings, multidimensional influence, or extraterrestrial copy. The mortal writer/artist either contained this knowledge first hand, learned it, copied it, or it was dictated.

That being the case, let us look at the artwork.
For my taste, it is too close to what we call familiar while simultaneously being unnervingly alien. First what appears to be women are depicted throughout the manuscript. ONLY women. They appear to have hairstyles, fair features, breasts, and pubic hair. Some are naked and others clothed, one has a crossbow. Most, if not all, have enlarged abdomens. (customary for ‘plump’ women in middle age art.) Feet are rarely depicted. They all appear to have a natural feminine appearance (https://brbl-zoom.library.yale.edu/viewer/1006205) There appears to be a queen depicted at the top. Nothing described as ‘man’ could be found. If we are to ascribe to the “alien” theory, we are looking at a society that 1)Is devoid of the male sex. 2) depicts humans comprised of both sexes in one or one in which there need be only one sex for reproduction 3)that has males but views them insignificant or irrelevant to included in the manuscript. (girl scouts handbook)

In all cases, I asked but why the medieval dresses? It’s too similar, yet distinctly strange. I’m leaning towards a ‘multiverse/dimensional/back to the future’ theory.

Plants.
Obviously, the example of flora are detailed and we assume descriptions are included. We see a similarity that is either the fault of the artist or of design. The root systems follow rules similar to that of the text itself. Some are snake-like, some bulbous, others square, most short, and almost all unlike anything we have growing out of the ground. If there is a soil system (barring hydroponics or another undiscovered agricultural system), it is not earth.

This means to me, the source material of the VM has not been discovered on this planet in current or fossilized form. It either originates from someone’s very vivid and detailed imagination, or from one of the aforementioned external influences.

Numbers:
As stated in your earlier entry, there are no recognizable numbers in the manuscript. If there are astrological charts, it defies our reality to assume one could chart stars, dates, and time without numerical reference. So if numbers are in the manuscript we have not identified them. Please correct me if I am wrong, but what if the entire manuscript ARE numbers? Is the number 12 always written in its long-form, twelve? Are there letters that are numbers, or words, or both?

It is said math is the universal language. We’ve sent out probes with equations written on them in hopes of contact with an alien race. Now, we can’t assume everyone in the universe understands roman numerals, but the core concept of math should be universal.

The issue of language is that it is fluid and ever-changing. Words die out, change meaning, and shift over time. But a language based on math would make sense in an alternate/advanced civilization. If the text in the manuscript depicts equations instead of words, perhaps that could answer many of the anomalies we encounter when trying to decipher the VM, and still adhere to a strict grammatical construct. It would eliminate dialects, and different languages altogether. No more French vs English vs Russian.

If 1+1=2, and two equals love, we run into other issues with the definition of love but as a written text, we can modify 1-1=0. (Death) Now of course this is an oversimplified view on the concept, but my point is has anyone looked at the universal language of math? I haven’t found any examples. What if the language is an actual mathematical equation that then translates into something we understand as language.

I’m just throwing out wild ideas here, and if you have already explored these concepts I’m quite happy to read about them.

To sum up: I think the VM although found on this planet, depicts things we understand in a way we haven’t seen. Raising several existential questions. Little green men from mars? Intelligent design? Simulation code? The magical world of fairies and elves? Thanks for taking the time, let me know your thoughts.

So far Randall’s ideas. A few comments of mine:

  1. Of course the suggestion of an extraterrestrial source for the VM material is a bit outlandish, raising many more questions than it answers. I’ll simply ignore this idea for the minute
  2. Overall, with the C14 dating of the vellum, the handwriting and the illustrations all pointing to a 15th century origin, I think the most workable idea is one or more scribes in Renaissance Italy with a few unconventional ideas about enciphering and scientific ideas.
  3. It has been suggested by a number of people (last but not least by myself: see The “Face Value”-Fallacy) that, rather than being “words” in the conventional sense, the ciphertext character “groups” in the VM might be “codes” of some kind. Ideas suggested have included, but are not limited to —
    • an upgraded system of Roman numerals
    • something akin to the Dewey Decimal system (where the plaintext “value” of the VM word might be the word of the title of a book found in a universal library)
    • coordinates pointing… somewhere?
    • a constructed a priori language, where words aren’t derived from an existing language, but the vocabulary is synthetically constructed, with eg a particular prefix for all things living and a different prefix for all things dead, a second syllable denoting the size etc., so that hopefully in the end the string of attributes will allow one to identify the object in question.

The last item is something which probably comes closest to Randall’s ideas of a “mathematical” language, though I might add a few observations of my own: First, while I think such an enciphering system is conceivable, I’d hardly rate it probable. Secondly, “a priori languages” are virtually unknown before the 18th century, so one would have to admit the dating of the VM is substantially flawed, or our authors were far ahead of their time. (To be fair, whatever their actual enciphering method was, it is still unknown today, so they were in any case ahead of their time.) Thirdly, and that’s more of a personal opinion, the whole concept of the VM with it’s wildly imaginative illuminations to me seems to point more to a “stream-of-consciousness” approach to it, than to a strictly scientific/logical concept. (In this context I still think that Churchill and Kennedy’s glossolalia hypothesis is worth a thought.)

But, dear readership, don’t let this discourage you: Fire away!

*) Last but not least by myself:

Cistercian Numbers

Probably this isn’t particularly relevant for the VM as such, but I found it interesting nonetheless when Wikipedia today pointed my attention to Cistercian numerals.

Cistercian_numbers_in_Turin_mss

From what I gather from the article, these numbers or numerals were in limited use through the latter part of the European middle ages, and are particularly interesting since they for the first time (in Europe), and independently of arab-hindu numerals developed a “digit” system where the numeric value of a character would be indicated by its position within a number. (In other words, as opposed to roman numerals, where “M” would always indicate “1000”, in the arab-hindu system the character “2” will represent a different value depening on whether it’s been inserted at the end of a number (where its value is always “two”) or anywhere else (in the last-but-one position, the value will be “twenty”, etc)

The Cistercian system takes a little to wrap one’s head around it, but once you get the idea, it’s not that difficult: There is a basic “stave”, vertical or horizontal, plus nine different shapes, representing digits “1” through “9”. The digit shapes are attached to the four “corners” of the stave, with each position representing the ones, tens, hundreds and thousands, resp. (Top left could be the tens, top right the hundreds, etc — details vary according to the particular use) This means that one character consisting of a stave and four digit symbols attached to the corners was enough to represent any integer between “1” and “9999”, so it was fairly powerful, compared eg to roman numerals.

You will notice that there is no need for a zero in this sytem, and probably this was also the reason why it never saw widespread application: If zero was lacking, the artithmetic power of the system was limited (doing divisions would still have been a nightmare, for example), and the Cistercian numerals stayed limited to page foliation and simple numbering tasks, as opposed to calculations. So near, and yet so far to have invented a real rival to arab-hindu numerals…

So, what impact does this have on the VM? Little in particular, since none of the VM characters much resembles the Cistercian numerals, and it also doesn’t look like the characters or words of the VM were composed in a directly comparable manner. Yet, the Cistercian numerals show a surprisingly sophisticated encoding system for numbers, and so it’s not implausible that the VM author employed a similarily complex system for enciperhing his text.

The general view is that in the times of creation of the VM (15th century, as the mainstream opinion is) (quiet, Rich!), not much more than simple monoalphabetic substitution ciphers were in use, hence the VM must be considerably younger, or contain nonsense.

Cistercian_numeral_concordance

On the other hand, the above picture shows a concordance from a 13th century (sic) MS, twohundred years before the creation of the VM. The Cistercian numbers in the center column refer to occurences of the word “aqua” in the corresponding manuscript, giving the page or columns numbers where the word occured. Now it would only be a small leap for any would-be encipherer to create his ciphertext by replacing his plaintext words with the numbers corresponding to this word from his concordance — in this example, he would have plenty of Cistercian numbers to chose from to represent the word “aqua”. Provided the recipient of this message has the same MS available, they are able to reconstruct the plaintext by simply looking up the numbers written in the ciphertext.*) This system**) would have two advantages:

  • The message is safe as long as anyone trying to intercept it doesn’t know which MS was used for reference.
  • It’s a poly-subtitution cipher, meaning one and the same plaintext word can be enciphered in different ways, making any codebreaking attempt so much more difficult.

And, as the Cistercian numerals show, such a system would have been within the intellectual grasp of any educated person in central Europe during the second half of the middle ages. So maybe it really is time to scour the VM for clues to some more complex enciphering systems beyond simple subsitution.


*) Of course this isn’t practical in this example, because on one page of the reference MS there would be many words found. But rather than refering to the page numbers of the word’s occurences, one could use the position of the word in the stream of the text (numbering all the words in the MS as they occured).

**) I know it has a name, but I’ve forgotten what it’s called.

(Images taken from Wikipedia.)

Theory of the Month: Rainer Hannig

Who is the candidate for the “Voynich Theory of the Month” in June 2020?

It is Prof. Dr. Rainer Hannig, who has even managed to briefly be featured in the VM’s Wikipedia entry before being excised again. Hannig has followed the usual spiel of VTotM to the letter:

  1. He is an “outsider”, namely an Egyptologist with no direct links to cryptography, or medieval manuscripts, or…
  2. His solution doesn’t build on previous work, but is the result of a maverick approach.
  3. He assumes an initially simple substitution cipher where one letter represents one sound, and one ciphertext word is equivalent to one plaintext word. He is original inasfar as he assumes the underlying plaintext language to be Hebrew, which would be exotic enough not to have been considered by other researches, and at the same time not completely implausible.*)
  4. After having some initial success in manufacturing Hebrew words out of this, the enciphering rules become increasingly complex the longer he progresses. Things turn into a labyrinthine set of rules with multi-value letters, the ommission and reintroduction of vowels, etc.
  5. At the same time, little thought is given to problems of the transcription, which may well hold surprises for the would-be decipherer — have we correctly identified different and identical characters? Is <ch> really <cc> or a different character? Is <r> the same as <s>?
  6. The multi-faceted structure of VM words, the complex rules governing their composition, is ignored. Which is odd considering such a large number of alternative ways the author had in enciphering his text — if he had so many different options to compose his ciphertext, why do all the words so strictly adhere to only a narrow selection of rules?
  7. While it is possible to create a string of words in this way, the creation of meaningful sentences remains elusive, even when one discards most of Hebrew grammar from the game (as Hannig apparently does.) A coherent narrative spanning paragraphs is nowhere in sight. And this is where I consider the case closed and loose interest.

As an example, let me give you Hannig’s translation from f17r:

I am a bull ready which facilitates and renews house and ruins.
You are a piece of lamb which opens the mouth and is discouraged
when eye-in-eye.

Or f2v, the nymphaea page:**)

Surely, Nymphaea is the twin. Enough juice in the tip.
Drink carefully, this is like something which provides spirit.
Will come juice with repetition. Juice facilitates prophecies...
like rebellion in presence of philosophers.
All which is in Greek about is silence without talking. [sic]
When not speaking about juice, spoke: Do dig... spoken in Arabic.

We’ve had a number of those theories, and I do not only present this piece of scientifically and methodologically somewhat unsound work out of malice (though I wonder about the quailty of Hannig’s other work, if his VM paper is representative for it) or to ridicule it. But it is exemplary for a mistake made so often in VM approaches that it cannot be pointed out often enough.

No, really.


*) Interestingly enough it seems that Hannig never bothered with the question whether the text is supposed to be read left-to-right as in Western languages, or right-to-left as in Hebrew, but opted for left-to-right from the start. Which is strange, considering his background in hieroglyphics.

**) Notice the highly repetitive text with a very limited vocabulary.