# Safety in Numbers (Roman and Arab)

As we all know, there is no shortage of oddities regarding the VM. One of those is that throughout the whole manuscript, no numbers can be found.*) While it has been suggested that the structure of VM words seems to be governed by rules similar to those for the composition of roman numbers, as of now no consistent system has been proposed which would allow one to generate VM words.**)

But what if we look at it from the Stroke theory’s point of view? Remember, this theory says that each VM ciphertext letter represents one plaintext penstroke, and hence that each VM word is the equivalent of one or two plaintext letters. But in the same manner the letters were enciphered, it would also be possible to encipher digits, if arab numbers were used. If roman numbers were used, they would be enciphered just like their letter equivalents.

What would be the consequences?

I have suggested in the wake of Robert Firth’s observations that in the Stroke theory the plaintext was written in capital and small letters alternating for the most part, so that a KiNd Of CaMeLcAsE writing was produced. With around 25 capital and small letter shapes each, this means the the better part of them VMs vocabulary should be composed of around 50 ciphertext “syllables”. If arab numbers were used, this number should increase by the 10 or so shapes required for the individual digits.***)

Furthermore, since it’s impossible to write “capital” and “small” numbers, a string of digits should show up as a digression of the pattern of alternating “prefixes” and “suffixes” (which are identified in the Stroke theory as capital and small letters). Note also that many of the arab numbers are fairly similar to each other (6-8-9-0, for example)****), which should lead to a string of quite similar words or syllables in the ciphertext. (As is observed.)

If roman numbers were used, they could have been written in camelcase (like MmIx rather than MMIX). Again, strings of similar or identical words should be observed, as the letters denoting the roman numbers often resembled each other. (For example, “D” was supposed to be “one half of (the letter) ‘M'”, ie 500 was the half of 1000. Same with “V” and “X”.) The total number of different syllables should not go up though.

Thus, I think it’s quite conceivable that the VM text is interspersed with numbers. They just don’t show up, because they’er enciphered as the rest of the letters are.

If the VM turns out to be the very earliest book on cocktail recipes, the first round is on me.

*) With the except of the pagination, which appears to be a later emendment though.

**) If this was possible, ie if all VM words could be explained as roman numbers, the question is — what would that mean for the deciphering of the VM? That it was one of the earliest phone books in existence, predating the phone by some 400 years, and that unfortunately all the people’s names were dropped from it…?

***) Or less, perhaps. It’s difficult to see how the Stroke theory should discriminate between “o” and “0”.

****) Bear in mind though that in period the shapes of arab numbers could significantly vary with the exact time and location when they were used.

## 13 thoughts on “Safety in Numbers (Roman and Arab)”

1. The stroke theory seems to reduce to a kind of simple-minded verbose cipher – but because this would be too trivial, it has been simultaneously complexified by the use of cameltoes, sorry CaMeLcAsE, which itself is just a two-cipherbet varitn.

But even that wouldn’t be enough to create Voynichese as we see it (with all its complicated rules and structures), so you’d then need to add a cunning camelcase DeMoN, who is smart enough to add arbitrarily tricky rules to flip between the two alphabets, and to add cheeky misspellings etc.

…it’s a bit thin, wouldn’t you say? This devious camelcase demon is more than a bit like Gordon Rugg’s randomising demon… and that’s probably not a good thing. :-(

2. Oh Nick, what have I done to deserve the evil R-word? Nowhere did I mention a hoax, nor did I say anything about “arbitrarily tricky rules” to explain the results. Au contraire, I think one could randomly switch between cases and still arrive at a legible result, so the point is that the author might have generally stuck to the capital-minor sequences in a word, but at his leisure (or as necessity arose, eg when enciphering numbers), he may have deviated from the scheme.

“Thin”? Perhaps. At least this scheme would explain the comparatively low information content in the ciphertext, how some 70 odd syllables composed of less than 20 different characters suffice to produce the better part of the VM, would give an approach to the origin of word grammar observed, would be able to encipher meaningful text without too many ambiguities, and would have been within the reach of a 15th century author.

I’m not saying the Stroke theory is the Answer To All Questions(tm), I simply think it’s a theory (hypothesis, if you so desire) worth pursuing. Do you have any better ideas?

3. proto57

Strokes, numbers, or other code… I agree that this is plausible (for the reasons you give)… that it is a good way to explain much of what is observed. I think the VMs is a code, not a cipher… and this would account for it’s difficulty, while being dirt simple in (now lost) concept for the original users.

I think seeing complexity in Voynichese is subjective, a result of being frustrated. Like, “If I cannot figure it out, it must be layer on layer, complex beyond compare…”. On the contrary, the system can be very simple, and yet indecipherable, if a numerical or other, code. Lose your code book, and you will not figure the system; and without the system, not figure out the list of codes. Simple, but Hellish.

4. For me, the evil “R” word is not “Rugg” but “random” (which reappears in your comment)… any conception of randomness is a back-projection on something this old!

5. Christopher Hagedorn

The stroke theory seems to be easily testable by a brute force attack, which would of course require the correct distinction between individual glyphs for a computer to handle it.
Let me see if I am understanding this correctly… The whole basis of the concept is that each Voynichese letter corresponds to a pen stroke instruction, such that, as an example, EVA ‘4’ could mean “draw a vertical line”, EVA ‘o’ could mean “draw a horizontal line”, and Voynichese 4o- would thus translate into T or L?
I can’t see how the person deciphering would be able to deduce the “borders” between letters, just as a continuous string of Morse code is undecipherable, since you don’t know when a letter stops and the next begins (i.e. “…” could be “S”, “EEE”, “IE” or “EI”.

But surely, this “letter border” could be represented by an individual character, just like ‘/’ is often used in Morse code?

You all know I’m just rambling, but at least I’m making more sense than the VMS.

6. @Nick: Didn’t the Italian codebooks include several synonymous codes for often repeated word from which the encoder could chose… randomly? (Or if you loathe the word, “arbitrarily”? Either way, I can’t see on what your connection of the Strokes to Rugg is based.)

@Christopher: Yeah, that’s essentially the idea. Of course, cou could use EVA ‘c’ for a horizontal dash on the top of the letter, ‘h’ for one in the middle and ‘s’ for one at the bottom to remove ambiguities between “T” and “L”, for example.

My original idea was that every ciphertext word enciphered two plaintext letters, one in capital and one in minor letters, so that would make it feasible to detect letter boundaries. (Especially with a bit of experience — after all, you’d only need to know 26 VM-word-initial “syllables”.) Unfortunately, this doesn’t seem to be warranted by the statistics.

I did some brute force on the subject. See https://voynichthoughts.wordpress.com/stroke-theory/results/ for example. Thing is, it ain’t that easy as long as (as you mentioned) we don’t know the exact character set. Also, things get a bit complicated if special characters are also introduced in the plaintext (like numbers, punctuation etc.)

Furthermore, apparently far from all words follow the two-syllable grammar, so there is something else going on. (Relaxation of the rules as soon as the author got more “fluent” and confident…?)

Also, note that it’s possible to encipher the same plaintext letter differently depending on whether you chose block letters, print letters or cursive as your “source”!

I’ll invest a little more work into it as soon as I find the time. Stay tuned!

I’m pretty sure the VM author had no idea how difficult to crack his system would turn out in the end… bastard!

7. Hi Elmar,

I suspect that actual usage patterns of multi-sign substitution ciphers were somewhat simpler than you might imagine. From the blocked way in which most of the ciphers in the various Quattrocento cipher ledgers were laid out, I’d predict that only one row was in use at any time (perhaps with a ruler or other straightedge laid over the block).

Further, I can’t recall any books about or references to cryptography pre-1550 (possibly even pre-1600) that talk about consciously selecting from the multiple per-letter symbols at random. I believe the whole notion of randomness is wobbly ground to building pre-1600 (let alone pre-1500) Voynich cipher theories upon.

Incidentally, another argument against the stroke alphabet in the VMs is the presence of the letter “4” (EVA ‘q’), Isn’t this the only letter we ever see that has a N to W sharp diagonal stroke?

Cheers, ….Nick….

8. Re EVA ‘q’ — Yes, and…?

Mind you, according to “The Strokes”, each *ciphertext* letter represents one *plaintext* letter *stroke*. Ie, ‘q’ might mean “long vertical line in the plaintext”. If EVA ‘o’ was “horizontal dash” (which it probably is not; just to give an example), then ciphertext “qo” would mean the plaintext stroke combination “I-“, ie plaintext “T” or perhaps “L”. (‘qoo’ would then be “F”, ‘qooo’ then “E” — alas, we see neither ‘qoo’ nor ‘qooo’ in the plaintext.)*)

But in any case, the shape of the *ciphertext* letters is independent of the shape of the *plaintext* letters described by them.

*) To preempt the argument of ambiguity, it might well be that ‘o’ is “top horizontal dash”, making ‘qo’ unambiguously “T”. Eg ‘r’ could be “bottom horizontal dash”, ie ‘qr’ were “L”. Etc. etc.

9. Ooops, I woke up with Newbold’s stroke theory (where each component stroke corresponds to a different Latin shorthand letter) on the brain, sorry about that Elmar!

I’ll try not to do it again. =:-o

10. First Rugg, now Newbold… oh, the humanity… ;-)

11. Oh, so you want me to mention Ursula Papke’s stroke theory too? :-)

12. jonah fowl

About arabic number: ‘4’ (q) is for 5, l is for 4, ^ for 7. 9 is propably for 6.

13. Jonah, you seem to be on a tangent here…