Fail Again, Fail Better!

Okay, since the fat lady hasn’t sung, the Stroke theory (as outlined here for your elucidation) isn’t dead yet, and these days I had what I thought was a relevation.

As you probably recall (because I’ve been harping on about it endlessly) is that of course for the Stroke theory it is essential to discern the set of “syllables” which compose the words of the ciphertext, since each of these “syllables”, or fragments, represents one of the plaintext letters. It’s a bit akin to the knapsack problem, where you are given a number of blocks of a given size and have to find out how to optimally fill certain shaped space. In the case of the VM, you have scores of differrently shaped spaces and must find the minimum set of building blocks.

Now, of course one can write a program to do that for one by stupid number crunching and trying ever-different syllable sets to see which one would render the best match. Alas, such a naive approach would quickly hit a dead end, because of course the best fit to the ciphertext would be a set of syllables consisting of only one letter each. Actually, that would guarantee a 100% perfect match, but of course isn’t helpful for the assumption of the Stroke theory. This is what held me up for quite some time.

But the epiphany I had was that the weighing system for my program should not be the percentage of covered text. Rather, whether a solution was better or worse than any other would be determined by the number of syllables required to synthesize the better part of the VM’s volume. In other words, the longer the syllables, the better.

In the end, the resulting syllable set could be used as a start to find out how the original plaintext letters were segmented. By frequency analysis, this would effectively have reduced the solution of the VM to a monoalphabetic substituion cipher.

I gave this a run of about 24 hrs, and for one thing noticed that this rendered only tests for about 15,000 different syllable sets. Checking a syllable set is computationally expensive, because obviously a set of 50 syllables can be employed in almost infinite ways to compose word, plus we have to test for at least the 150 most frequent VM words to arrive at any useful statistics anyway.

I started from a regular syllable set, with minimum syllables like “a”, “b”, etc, and allowed for random mutations of individual characters in that set, adding, removing or changing one character at a time. While this should render a pretty good result in the long run, of course it also meant that many variations would have little or no effect on the result at all, reducing the number of “helpful” variations even more.

But, anyway, it became apparent that I probably wouldn’t be going anywhere. After some initial progress, the program stalled: Neither did the average syllable length grow much beyond two letters or so, nor did the coverage of the VM (by volume) exceed 80% — which was even more disappointing, since I had set 80% to be the minimum required by a syllable set to still be considered valid. (Of course it’s trivial to create syllable sets with syllables of maximum length if there is no obligation of actually covering the better part of the ciphertext.)

So, once more I hit a wall, and once more the results are tantalisingly ambiguous: A 95% “saturation value” would have given me lots of confidence in my approach. 40% would have clearly shown I was definitely, once and for all, on the wrong track. But 80%? It’s surprisingly close to the results of previous attempts regarding the Stroke theory. One of the problems is certainly the limited amount of mutations of the test syllable set available. It may also have been wiser to chose a different transcription set than Takahashi, of which I begin to think less and less, the more I look at it. And finally, the whole approach of composing words may be wrong — perhaps it would be wiser to disect existing cipher text.

Someone has given me an idea…

Advertisements

4 thoughts on “Fail Again, Fail Better!

  1. Elmar, I expect that most of the known writing systems have been explored, but I wonder if you know who has looked at one or more of the mixed character-and-phonetic sort? As examples – I mean structures like those for Old Coptic, and Japanese.. and I suppose even hieroglyphic Egyptian. Don’t know how many other there might be.
    Is such a system unlikely because of the number of characters exploding the number of possible glyphs?

  2. Hi Diane,
    Yes, I would say the biggest obstacle against a syllabic or logographic script (or a mix of both) is the comparatively small character set of less than two dozen in the regular cast. If you assume the VM is “regularly enciphered” (ie, one ciphertext letter stands for one “piece” of plaintext information), that leaves little room for anything but one plaintext character or “sound” per ciphertext letter.
    Of course, one could imagine more exotic schemes. For example, each VM word could correspond to one word in chinese script, and the VM letters are actually the constituents of the chinese signs. I know that the Chinese are able to “sort” their words in a dictionary, so there must be something like a “description” of how the words are composed, but I don’t know enough to say if a VM word could encode the strokes required for one chinese sign.
    (Please note that this is different from the Stroke theory touted by me, which assumes that each plaintext letter is segmented into its constituents, each of which is represented by one ciphertext letter. The Stroke theory suggests each ciphertext *word* enciphers several plaintext *letters*, while the “chinese theory” would (applied naively) have a 1:1 relationship between plaintext words and ciphertext words.)

  3. I hadn’t been back to see your response. As it happens, I know a little about how characters work – spent a couple of years living in Japan. I should imagine that the system used in the Vms was an approximation only. As you say, Chinese has too many different characters; it also has tonal variations that ideally should be indicated.
    Are you by any chance the same Elmar Vogt who is listed in connection with a Buddhist centre of studies?
    And finally – your note about ‘Fits and seizures’ which you shared with everyone through the mailing list. Which post is it in?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s