Testing the Stroke theory, Round 1 (Voynich leads by points)

Okay, I finally got around to do a little bit of number crunching on my beloved Stroke theory.

In an nutshell, it’s not exactly a landslide victory I achieved, but the results are not nearly disparaging enough for me to give up…

My approach was a fairly simple one. As you probably know from extensively studying my page on the subject, the idea is that at least the words of the Currier A corpus of the VM can be synthesized by combining one of the 23 “odd” with one of the 21 “even” “Firth groups”, which ought to lead to a vocabulary of 483 VM words.

Trying to synthesize Currier A

In his note, Robert Firth had been somewhat vague and alluded to the “majority” of VM vocabulary he was able to recreate in this manner, but he hadn’t given exact numbers. So, I wanted to find out how great his success actually was. I took a bunch of VM folios which were written in Currier A (just what came in handy) and ran a little program over it.

This is the result:

Number of source words: 7576
Number of different source words: 2846
Firth words (synthesized from the odd/even groups): 3065

Which means that this scheme was able to reproduce about 40% of the whole volume of the sample. This ain’t so bad, considering that the average word frequency in Currier A was only about 3, ie on average any word only showed up three times.

Besides, looking at the list of VM words my little hack examined, revealed —

...
doleodaiin 1
qokshdy 1
keodal 1
lshody 1
qokeoloteeody 1
orsheoldy 1
chopychofol 1
qoorar 1
pshodaiin 1
ekey 1
oraroekeol 1
orain 2
lolain 1
ctheoly 1
...

(The numbers behind the words are the total occurences of this word.)

Clearly, there is a large amount of words much longer than only the sum of two Firth groups. It is conceivable that these should actually be split up into more words, either because they were transcribed wrongly, or because the author ignored the group breaks. In both cases words might be running together which should really be seperated and would lead to a higher success ratio in the synthesis.

Advertisements

3 thoughts on “Testing the Stroke theory, Round 1 (Voynich leads by points)

  1. Hi Elmar,

    I enjoyed reading your articles about the Stroke theory. The slightly disappointing results of this first test motivated me to try some own similar tests.

    I tried to synthesize better sets of odd and even groups in an automated way. The basic approach was to build every possible bipartition of every word to form an initial set of groups and to eliminate the least likely ones until a sweet spot of relatively few groups with relatively many matches would be reached.

    Using Curier A, I receive a set of groups that is somewhat similar to that of Robert.

    24 odd groups with number of tokens, number of words using it:
    ych: 64, 14
    tch: 84, 16
    qoke: 151, 16
    otch: 113, 15
    chok: 86, 17
    oke: 157, 18
    kch: 109, 17
    dch: 109, 19
    she: 238, 17
    ckh: 128, 17
    t: 112, 20
    yt: 104, 19
    yk: 127, 16
    s: 189, 16
    qot: 180, 18
    k: 171, 19
    che: 366, 18
    ot: 255, 18
    qok: 329, 20
    ok: 359, 20
    cth: 355, 20
    sh: 613, 19
    d: 1138, 18
    ch: 1086, 20

    20 even groups:
    od: 49, 20
    oiin: 76, 15 -> Maybe Robert’s “oii”
    om: 70, 19
    os: 84, 19
    eody: 120, 20
    ain: 150, 20
    am: 109, 22
    eor: 180, 22
    chy: 258, 16
    ody: 206, 24
    eol: 283, 23
    eey: 276, 23
    al: 242, 22
    o: 283, 21
    ar: 280, 23
    ey: 471, 23
    aiin: 808, 23
    or: 687, 24
    y: 944, 24
    ol: 1047, 24

    Rather disappointing is the fact that this not a sweet spot pointing to an underlying alphabet, nor was I able to find one anywhere. With the above set I receive 5562 out of 11415 tokens (49%) and 403 out of 3446 (12%) words matched. Adding or removing one group (in the range of 10 and 40 groups per set) leads to 50-100 more or less matched tokens with few exceptions. So the size of Robert’s set appears to me as rather arbitrary chosen.

    Applying the same method on Curier B parts revealed similar results, yet with very different sets of odd and even groups.

    What remains remarkable in my opinion, that 403 out of 480 (84%) possible words that can be formed from the above set of groups acually exist in the text.

  2. Hi Tobias,

    See this post for a possible reason why the results were so poor: Currier would probably make a better transcription candidate than EVA.

    I’ll try this one of these days, but of course if you wanted to give Currier a shot as well, I’d be more than happy!

  3. i see you are doing a thing i wanted to do more or less: counting words (and more importantly base’s of words) , but i am not sure if i have to program everything myself: Are there tools on the market for word analysis ?

    PS
    i do not agree to Currier being better than EVA. For “manual /visual” word analysis it seems Frogguy is handy, cause it shows when ligatures are possible.
    However the Japanese Takahashi presented far the most consistent text results, there where Currier has a lot of missing and wrong letters.
    If you feel that 1 Takahashi word could be 2 words, a good “partial” analysis would have no problem with that. Example: abcdefghi –> base for example is cdefg. If used two words: abcd and efghi the bases would be (in my opinion) bc and fgh, you would detect during analysis that cdefg is unique (and therefore very interesting to look at) and otherwise you would see that abcd and efghi are very common (indicate that those are good bases). Chances are nobody follows my text but nevermind then.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s