2009.7.5 Yet another English spelling reform scheme

As long as maybe five years ago by now, I started thinking up a spelling reform system for the English language. This has been done before plenty of times, of course. (For instance, the bRitic scheme has a number of similarities to various versions of mine, I now see.) My intent was not and is not by any means to actually try to implement my system; rather, it's just one of the many linguistic thought experiments I've engaged in over the past twenty years that I entertain myself with. (It's hard to believe it's that long ago that I discovered the Arabic, Hebrew, Greek, and Russian alphabets, as well as the International Phonetic Alphabet, from a set of pages in our house copy of the Standard College Dictionary at, I think, the age of 5, or maybe 6. That is the singular moment that sparked my abiding interest in linguistics.) A couple of years, apparently, after I devised the initial version of the spelling reform scheme, I blogged about it, which now is a useful record for reviewing part of its development history. The objectives of the scheme were, and are,

A few days ago, here in Barcelona, my interest in the system was rekindled, for some reason I don't remember, and I took a new look at it. There were a few problems with the existing scheme that I decided needed fixing. The biggest issue that I acted on was the previous use of "i" and "u" as both semivowels and vowels, which made for unnecessarily complex workarounds. I also instituted an explicit denotation of stress, which I now think is a crucial feature for a logical system of English spelling, given that the stress in English words is unpredictable. Now the appearance of text written in the scheme has become rather more Polish/Dutch-looking than it did before, and it's not necessarily as aesthetically satisfying as the aforeblogged version, but it has gained a greater measure of coherence and therefore elegance, I think.

(Click to show this introduction in the new spelling, if you want to see an example of it right away.)

Following are the spelling rules for the scheme in its current revision. Phonemes will be listed in a phonetically relevant order rather than trying to follow the Roman alphabet.

Consonants
Manner Phoneme Reform spelling Notes below
Nasals m as in mare m
n as in night n
ng as in song q
Stops p as in pear, spare p
b as in bear b
t as in tar, star t
d as in dare d
c/k as in care, scare, kite k
g as in goat g
Fricatives f as in fair f
v as in vote w 1
th as in thin x 2
th as in there x 2
s as in sight s
z as in zone z
sh as in share c
s as in measure y 1
h as in hair h
Affricates ch as in chair tc
j as in jeer dy
Liquids/semivowels w as in wear v 1
l as in lair l
r as in rare r
y as in year j 1

Consonant notes:

  1. A possibly more natural alternative to these would be to use w and y as the semivowels and v and j as the fricatives, but I decided to use v and j as the semivowels because they occur so much in vowel diphthongs, as see below, and are more compact than w and y, respectively. This therefore improves the conciseness of text written in the scheme.
  2. This fails to distinguish between voiceless and voiced "th," but because those two sounds have very few (I've been having trouble thinking of any at all) minimal pairs, they can unambiguously be conflated, with the only issue being that one then has to learn for each word whether the "th" is voiced or not. (Edit, July 18th: I thought of a few minimal pairs: "thigh" and "thy," "teeth" and "teethe," "loath" and "loathe." But the distinction between the members of each of these pairs is pretty easily deduced from context.)
Vowels
Phoneme before r before r, marked stress before l before l, marked stress before other consonant (or if ij or uv, before i or u resp.) before other consonant (or if ij or uv, before i or u resp.), marked stress final or before vowel final or before vowel, marked stress
i as in sit, e as in here i ii i ii i ii
e as in set e (following r doubled) ee (following r doubled) e (following l doubled in multisyllable word) ee e (following consonant doubled in multisyllable word) ee
ea as in bear, ai as in bail ea eea ea eea
a as in cat a (following r doubled) aa (following r doubled) a (following l doubled) aa (following l doubled) a aa
o as in cot, a as in car/call a aa a aa a (following consonant doubled) aa (following consonant doubled) aa
u as in cut/curry/cull o (following r doubled) oo (following r doubled) o (following l doubled) oo (following l doubled) o oo
aw as in hawk/law, o as in sore/sole o oo o oo) o (following consonant doubled) oo (following consonant doubled) oa ooa
oo as in soot/door u uu u uu u uu
e as in trumpet/later/germ, a as in sofa e ee e e a
ea as in seat/seal, ie as in skier ij iij ij iij ij iij i ii
a as in sate, aye as in layer, aya as in betrayal ej eej ej eej ej eej e ee
oa as in coat, owe as in lower/Lowell ov oov ov oov ov oov o oo
oo as in boot/cool, ewe as in brewer uv uuv uv uuv uv uuv u uu
i as in site/fire/file aj aaj aj aaj aj aaj aj aaj
ou as in bout/sour/foul av aav av aav av aav av aav
oi as in coin/foil, oye as in foyer oj ooj oj ooj oj ooj oj ooj

Stress is marked when the stress in a word falls after the first non-schwa syllable. If stress is unmarked, then the stress falls on the first non-schwa syllable. (The "er" sound is counted here as non-schwa.) The reason for not always marking stress is conciseness. So, for instance, "situation" transliterates to sitcueejcen, while "believe" transliterates to belijw — in the latter case, the stressed i doesn't need to be doubled because the first syllable has a schwa. Emphasized single-syllable words can also be marked as stressed.

Note also that the vowel sounds of "cat" and "cot," and "cut" and "caught," which are represented respectively by a and o, switch their consonant-doubling behavior between following l/r and other consonants. It would of course be more consistent to have one sound from each pair cause all following consonants to double, but it's the way it is because in each case, where the following consonant is not doubled, the vowel is more common than its counterpart in that environment, or at least seems to me to be. That is, "a" of "car"/"call" seems more common than "a" of "carry"/"rally" (particularly because "arr" of "carry" cannot appear in a single-syllable word) while "a" of "cat" seems more common than "o" of "cot"; and "o/oa" of "core/coal" seems more common than "u" of "curry/cull" (and similar to "arr," "urr" of "curry" cannot appear in a single-syllable word), while "u" of "cut" seems more common than "ough" of "caught." This therefore, like the lazy stress marking, aids in conciseness (if my intuition about phoneme frequency is correct), and it also recalls normal English spelling a bit more closely than the alternatives.

The various uses of "e" seem complex, but they really come down to, once again, using the most economical spelling possible in the context. So, for instance, "embezzle" becomes embeezel rather than embeezzel because the impossibility of a stressed schwa makes it clear that the ee sounds like the "e" in "set" without having to double the z.

Here is an example of a paragraph of text that I transliterated into the reform scheme, first as transliterated (see how much you can read of it) and then in the original spelling. It's from Wikipedia's article on the phenomenon of document dump.

A dakkjument domp iz xi akt ow respanndiq tu an adverserri'z rekvesst for infermeejcen baj prezenntiq xi adverserri vix a lardy kvantiti ow dejta xat iz transferd in a maner xat indikejts onfreendlijnes, hasstiiliti, or a lijgel kannflikt betvijn xa tranzmiter and xa resijwer ow xi infermeejcen. Xa cipment ow xijz dompt dakkjuments is onsoorted, or kentejnz a lardy kvantiti ow infermeejcen xat is ekstreejnies tu xi icju onder inkveri, or iz prezennted in a nann-tajmli maner, or som kammbineejcen ow xijz xri karrekteriistiks. Xa frejz iz affen juvzd baj lojers, bot iz in iqkriijsiq juvs in xa blaggesfir. It iz affen sijn az part ow xa karrekteriistik behejwjer ow an enntiti xat iz eqgejdyiq in an anngoiq patern ow aktiiwitijz inteended tu kower op oneexikel or kriminel kanndokt.

A document dump is the act of responding to an adversary's request for information by presenting the adversary with a large quantity of data that is transferred in a manner that indicates unfriendliness, hostility, or a legal conflict between the transmitter and the receiver of the information. The shipment of dumped documents is unsorted, or contains a large quantity of information that is extraneous to the issue under inquiry, or is presented in a non-timely manner, or some combination of these three characteristics. The phrase is often used by lawyers, but is in increasing use in the blogosphere. It is often seen as part of the characteristic behavior of an entity that is engaging in an ongoing pattern of activities intended to cover up unethical or criminal conduct.