Okay, I finally got around to do a little bit of number crunching on my beloved Stroke theory.
In an nutshell, it’s not exactly a landslide victory I achieved, but the results are not nearly disparaging enough for me to give up…
My approach was a fairly simple one. As you probably know from extensively studying my page on the subject, the idea is that at least the words of the Currier A corpus of the VM can be synthesized by combining one of the 23 “odd” with one of the 21 “even” “Firth groups”, which ought to lead to a vocabulary of 483 VM words.
Trying to synthesize Currier A
In his note, Robert Firth had been somewhat vague and alluded to the “majority” of VM vocabulary he was able to recreate in this manner, but he hadn’t given exact numbers. So, I wanted to find out how great his success actually was. I took a bunch of VM folios which were written in Currier A (just what came in handy) and ran a little program over it.
This is the result:
Number of source words: 7576
Number of different source words: 2846
Firth words (synthesized from the odd/even groups): 3065
Which means that this scheme was able to reproduce about 40% of the whole volume of the sample. This ain’t so bad, considering that the average word frequency in Currier A was only about 3, ie on average any word only showed up three times.
Besides, looking at the list of VM words my little hack examined, revealed —
(The numbers behind the words are the total occurences of this word.)
Clearly, there is a large amount of words much longer than only the sum of two Firth groups. It is conceivable that these should actually be split up into more words, either because they were transcribed wrongly, or because the author ignored the group breaks. In both cases words might be running together which should really be seperated and would lead to a higher success ratio in the synthesis.