# Entropy Wonder

I wonder in how far the correct or wrong transcription system affects the observed entropy of the VM text, namely the observed “low information content”.

Obviously, there are two major ways in which the transcription can be wrong: Either ciphertext character strings are broken up or joined at the wrong position (Is qo really one letter or two? What about dain, daiin and daiiin?), or characters which are identical are treated as different, or vice versa. (C/e/cc/ch come to mind. How many different gallows are really there?)

What would the effect on entropy be? Perhaps I should look up the old statistics books and see what difference a larger/smaller word length and/or character set would make.

## 3 thoughts on “Entropy Wonder”

1. a script enthusiast and casual reader of Voynich research

…or just try and do some sets of replace and calculate entropy…
suggested replacements would make a good start:
e.g. replace all gallows by one character.
and
s/iii/M/
s/ii/N/
s/i/I/
s/cc/A/
would make a good start.

Make a test-run!

(actually I have already wondered in idle minutes exactly the same question)

2. Knox

Character Entropy with some letters equated* per Monkey
Herbal-A only (quires 1-8 only)
32000 total characters tested
Spaces ON
h0 4.08746 (17 different characters)
h1 3.50017
h2 1.97895
h1-h2 1.52122
Spaces OFF
h0 4.00000 (16 different characters/letters)
h1 3.50275
h2 2.27693
h1-h2 1.22582

*Evita6 major modifications are …
f=k=p=t
–>
= –> a single letter
–>
geminates to single letters (iterated)
isolated asterisks removed

To the degree and have similar adjacencies to other letters and the different gallows do, predictability increases. Removing from and and truncating i-series and e-series decreases predictability. Spaces should be relatively more predictable in the modified text.

I understand there is not enough text to get good results for higher order word entropy.
However, once in the virtual machine to run 16-bit apps, I kept pushing buttons.
Word Entropy
(1722 different, 7964 total)
h0 10.74987
h1 8.51277
h2 3.91671
h3 0.50701
h4 0.02468
h5 0.00201
h6 0.00126
h7 0.00126
h8 0.00126
h9 0.00126
h10 0.00101
h11 0.00126
and 0.00126 through h23 when I quit pushing buttons.
What’s the significance of 0.00126?
How to explain h10 with 0.00101?
If the other higher order scores were only a little variable, it wouldn’t seem odd.
But they are not.
Again, maybe results cannot be interpreted because text is not long enough.
?