Period wordlists

Dan wrote already some time ago, and again I must apologize that I’m currently fairly busy with other projects, and hence can’t devote as much time to the VM as I should. Nevertheless, I finally should give him the floor:

Yeah yeah, here’s another theory. Actually I’m not going into the theory, but simply asking if you can provide any assistance in resources I am seeking. Let me back up a bit – I’m a full time software developer of over 20 years, and I have had some insights regarding the manuscript. I’ve written software to generate various statistics about the document and have found some surprising and very obvious (once distilled down to hard numbers) patterns that further validate the insights. These are not “hunches”, or “gut feelings” or any mystical, nutty stuff. It’s simply what it is, and the analysis doesn’t lie.

I am currently running brute force deciphering attempts using additional software I have developed, based on my theory of how the document is ciphered. The main resource I am lacking at this time are simply word lists of the candidate languages the manuscript may have been written in (in its decoded form of course), and specifically, the vernacular and spelling of those languages when the manuscript was written in the 1400s.

I have always assumed the Voynich manuscript was a hoax, but when it was positively dated a few years ago I took a harder look, again with the expectation that it was a hoax but at least a hoax contemporary to the 15th century. My attempt was actually to prove (just to myself) that very thing – that it is just a contrived hoax. Unfortunately the insights and analysis I have done over the last few years have left no other option but to follow the logical progression until it peters out and comes to a dead end. I have not yet reached that point.

Thanks for you time, and again, if you know of simple word lists (or who can provide them or assist in that) of good candidate languages from the 15th century, that would be quite helpful.

This question isn’t so easy to answer. First of all, even when taking the the C14 dating of the vellum as a given, we still have about a century of leeway regarding the actual production date of the manuscript. A century is a long time in which languages can change.

Secondly, languages weren’t “codified” as strictly as they are today, and pretty much everyone would write down their MSs in their local dialect, not to mention the fact that strict orthography wasn’t enforced yet either. Which means that even two people from the same region writing at the same time wouldn’t necessarily employ the same spelling. (An extreme example of this is the Bayeux Tapestry (admittedly predating the VM by some 400 years), where the name of William the Conqueror is written IIRC in not less than seven different manners.) Hence, to make a long story short, any word list should be taken with a grain of salt.

I did some statistics in the past myself, and to get decent wordlists I simply went to, downloaded a few works I considered representative of the era, and ran my own little wordcount scripts on these files.

IMHO, prime candidates for the plaintext languages are Latin, English, French, German (including the various dialects like Swiss), and perhaps Spanish. But though I wouldn’t bet on it, more exotic options like Hungarian, Finnish or maybe the Lingua Franca can’t be ruled out either.

Sorry, but this is probably as less simple answer than you asked for?


4 thoughts on “Period wordlists

  1. Sorry if I sound a bit ignorant but why isn’t the Italian language among the prime candidates for the plaintext? If the Voynich MS was supposed to be written in Northern Italy, the most reasonable language would be Italian. Or any vernacular dialect from that area. If it is a (meaningful) personal document, the owner may have used his /her own mother tongue.
    It is just an idea, not a theory :D

  2. I guess it has to do with “If all you’ve got is a hammer, everything looks like a nail”, ie, people will tend to investigate languages they themselves master.
    My own take is, Latin would be a prime candidate for any book from the period, the drawings point to Italy, the marginalia to Germany and France or Spain, resp. I also entertanined the thought that the idiosyncrasies of the text might be explained with more exotic languages like Finnish, and I’ve heard Hungarian being mentioned.
    So, as far as I’m concerend, it’s really pretty much “anything goes.”

    • Thanks for your answer.
      I see what you mean. Using your metaphor, the hammers used so far have been hitting walls, not the nail.
      Thanks again. :)

    • Not “anything.” The Voynich manuscript appears to contain some sort of chanting in the tradition of the joik and the Karelian charm rune. Certain characteristics point strongly to this conclusion. First, the text appears to be largely trochaic, which is the meter of choice for such purposes. Second, like joiks and other north European chant songs, the Voynich text is alliterative and repetitious, conspicuously playing with sound. Finally, several pages depict nothing but women involved in some sort of ritual, dancing and shaking torcs and possibly beating the water, so contextually-speaking the potential for chant songs to be included is very high.

      The three major bases of Voynichese are an unidentified Finno-Ugric tongue, Old Norse, and, to a lesser extent, Slavic. Perhaps the closest extant language to Voynichese is Meänkieli or one of the Kven dialects: Lyngen, Nordreisa, Kvænangen, Alta, Porsanger, Tana, Nord-Varanger, or Sør-Varanger.

      The following words appear fairly frequently.



      Finnish (Finno-Ugric rooted words abound.)

      Examples of words in the Voynich and their possible meanings
      Perheit – medicine to boost fertility (to have a family) f102v2
      Alkeisa – fixative, chemical base f99r
      Taikuus – magic
      Team – time of giving birth
      Isogaisa/esaikaisa – superb f23r
      Teit – to do
      Eparlasai – a fixative for liver of sulfur
      Kela – coil f99r
      Apai – aunt
      Eiere – source
      Ikke and ei for not
      Apara – upper body
      Epais – chubby
      Kepkei – capped
      Kepka – cap
      Ekepker – caps
      Leiks – dock/sorrel
      Sareiaia – wound treatment
      Eluksa elusa samalla elusai – life will come as life is now as life was in the past. f82v

