Chasing your own Tail

As I mentioned the other day, I had committed a serious omission in my assessment of the Stroke Theory.

I had written a little tool to analyse the VM ciphertext and decompose it into the hypothetical “syllables” of the Stroke Theory, where each ciphertext “syllable” would represent one plaintext letter. This tool worked by constantly modifying the hypothetical syllable set and retaining those modifications which lead to an overall increase of “coverable text” (= ciphertext words that could be composed from the syllable set). The tool got “saturated” (ie, further changes would increase the overall coverage no more), when the syllable set could compose 66% and 74% of the ciphertext by volume, for Currier A and B, resp. This was interesting, but by no means convincing.

A different approach used a second tool which would synthesize ciphertext from inoccuous plaintext, according to the rules of the Stroke Theory. This test was simply on a qualitative basis, to see whether the ciphertext rendered this way would look anything like the VM, and, in my humble opinion it actually did.

It took me about a year to figure out that one could combine the two approaches, namely letting the analytical tool work on the results of the synthetical tool. In a perfect world, namely if the analytical tool worked correctly, this chase of one’s own tail should result in a 100% coverage of the plaintext.

Of course it didn’t. For instance, the plaintext used — the German 15th century “Weinbuch” — contained a few characters (like arab digits or umlaute) which couldn’t be properly transcribed in the synthesis and which could thus not be recovered.

Still, the result was that about 68% of the plaintext words by volume could be synthesized before the analysis program showed signs of stalling. This teaches us two things:

  1. It’s — pardon my french — fucking close to the results for the VM, and
  2. This can’t be due to special unencrypted characters alone.

Upon closer inspection, I found the culprit indeed. Namely, I had set the minimum syllable length in the analyzer to 2 strokes, and this is of course stupid. Letters like “I”, “l”, “o” will hardly require more than one stroke (and indeed required only one in my synthesis). Thus, these letters were effectively indecipherable, and this may well account for a good deal of the lost coverage.

This error of course would also hold true for the VM. (There is no reason why the VM author should have chosen to use a minimum of two strokes per plaintext letter.) And the fact that this programming error led to almost the same amount of lost coverage in the VM as in the synthesized text could be a hint that the same effects are in play, and that I’m thus on the right track by chasing my own tail…


3 thoughts on “Chasing your own Tail

  1. I am listening. I thought I knew what you are doing two or three times. Then I lost it in the hyperlinked pages. When able, I’ll try to get the concept and findings into a linear format and try again. Just a little faster and you’ll be closing the gap to your tail. Faster — faster.

  2. Sorry to have to ask the question, but upon what kind of view of palaeography are you basing your decomposition into strokes? People wrote letters in different ways in different centuries (even decades) and in different countries, so I don’t know what your comparative statistics are based upon. For example, you seem to be relying on computer transcriptions when a stroke-based analysis of a comparable document should be based on, well, strokes in a manuscript hand. Aren’t you comparing 15th century manuscript stats with a 20th century take on printed stats?

    PS: also… I know what you’re aiming at, but don’t you think that calling it a “stroke theory” makes you sound like some kind of academic specialising in S&M? :-)

  3. Knox — Yes, I guess I’ll have to cut down the organic growth of the pages again and replace it with something more concise…

    Nick — I started with various batarde handwritings but discovered quite quickly that they are fairly low on capital letters, while blackface would be too complicated, so I simply switched to antiqua print letters for startes.

    What I did was —

    * manually decompose the antiqua alphabet into strokes,

    * encipher a bunch of plaintext (in this case, the Winbuch) according to the Stroke theory, using the antiqua alphabet as the “key”,

    * compare the statistics of the ciphertext rendered this way with the statistics of the VM, by trying to retrieve the syllable set which constitutes the “key” from the ciphertext.

    Mind you, currently all I’m doing is a qualitative (read: superficial) survey: To what degree would the resulting ciphertext share characteristics with the VM?

    If (and only if) there is a reasonable match, then one can surmise that the enciphering algorithm (the “Strokes”) is correct. And only then is there a point in trying to retrieve the “key”, ie the actual decomposition of the plaintext font into strokes.

    The last step is of course the most complex, because not only are there a number of different fonts to chose from, but each font could also be decomposed into different stroke sets with equal validity. OTOH I’m hoping to analytically retrieve a complete set of “prefix” and “suffix” syllables from the ciphertext directly, where each prefix would represent one lower case plaintext letter, and each suffix one upper case plaintext letter (or vice versa), from whence one could proceed with the VM as a simple substitution cipher.

    Right now I’m pretty excited about the fact that a faulty algorithm detected the same results for my fake ciphertext as for the real VM (namely, the notorious 68% — which may or may not be conincidental). And I’m still pondering a good design for a program which would be able to handle single-letter syllables.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s