Okay, I’ve made a mistake, so my attacks lost their punch.
Dennis Stallings and other acute readers have pointed out to me that the hit ratio I achieved — around 40% by token*) — was much less than what even superficial attempts from them achieved (around 80%).
At first, I attributed this to the Takahashi transcription which I had used, and which features a number of words running together (like “cthaiinydaiin” or “cheoeesykeor”), which in all probability should be split up in two words each. But I was doubtful if those run-togethers would really be so numerous as to account for half of the possible hits I had obviusly missed.
Turns out, I had made a mistake at one point: Robert Firth had worked from the Currier transcriptions, while I was using EVA, assuming that both could be unambiguously converted back and forth into each other. I was wrong there. The translation between the two systems is “lossy”, hence an unsophisticated (ie “dumb”) matching system as the one I used will of course render different results in the two domains.
Thus, either I adapt my programs to use Currier, or I find a real EVA equivalent to Robert’s odd and even groups
Time for some infighting, Mr. Voynich!
*) I’m also indebted to Dennis for pointing out to me the difference between “the number of words” (which is usually understood to mean the number of different words), and “the number of tokens” (the amount of words in total). Thus, “I was very, very ignorant” amounts to 4 (different) words, but 5 tokens in the above count. “By token” would mean something like “by volume”.