Entropy Wonder

I wonder in how far the correct or wrong transcription system affects the observed entropy of the VM text, namely the observed “low information content”.

Obviously, there are two major ways in which the transcription can be wrong: Either ciphertext character strings are broken up or joined at the wrong position (Is qo really one letter or two? What about dain, daiin and daiiin?), or characters which are identical are treated as different, or vice versa. (C/e/cc/ch come to mind. How many different gallows are really there?)

What would the effect on entropy be? Perhaps I should look up the old statistics books and see what difference a larger/smaller word length and/or character set would make.


3 thoughts on “Entropy Wonder

  1. …or just try and do some sets of replace and calculate entropy…
    suggested replacements would make a good start:
    e.g. replace all gallows by one character.
    would make a good start.

    Make a test-run!

    (actually I have already wondered in idle minutes exactly the same question)

  2. Character Entropy with some letters equated* per Monkey
    Herbal-A only (quires 1-8 only)
    32000 total characters tested
    Spaces ON
    h0 4.08746 (17 different characters)
    h1 3.50017
    h2 1.97895
    h1-h2 1.52122
    Spaces OFF
    h0 4.00000 (16 different characters/letters)
    h1 3.50275
    h2 2.27693
    h1-h2 1.22582

    *Evita6 major modifications are …
    = –> a single letter
    geminates to single letters (iterated)
    isolated asterisks removed

    To the degree and have similar adjacencies to other letters and the different gallows do, predictability increases. Removing from and and truncating i-series and e-series decreases predictability. Spaces should be relatively more predictable in the modified text.

    I understand there is not enough text to get good results for higher order word entropy.
    However, once in the virtual machine to run 16-bit apps, I kept pushing buttons.
    Word Entropy
    (1722 different, 7964 total)
    h0 10.74987
    h1 8.51277
    h2 3.91671
    h3 0.50701
    h4 0.02468
    h5 0.00201
    h6 0.00126
    h7 0.00126
    h8 0.00126
    h9 0.00126
    h10 0.00101
    h11 0.00126
    and 0.00126 through h23 when I quit pushing buttons.
    What’s the significance of 0.00126?
    How to explain h10 with 0.00101?
    If the other higher order scores were only a little variable, it wouldn’t seem odd.
    But they are not.
    Again, maybe results cannot be interpreted because text is not long enough.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s