2009.7.5 Yet another English spelling reform scheme
As long as maybe five years ago by now, I started thinking up a spelling reform system for the English language. This has been done before plenty of times, of course. (For instance, the bRitic scheme has a number of similarities to various versions of mine, I now see.) My intent was not and is not by any means to actually try to implement my system; rather, it's just one of the many linguistic thought experiments I've engaged in over the past twenty years that I entertain myself with. (It's hard to believe it's that long ago that I discovered the Arabic, Hebrew, Greek, and Russian alphabets, as well as the International Phonetic Alphabet, from a set of pages in our house copy of the Standard College Dictionary at, I think, the age of 5, or maybe 6. That is the singular moment that sparked my abiding interest in linguistics.) A couple of years, apparently, after I devised the initial version of the spelling reform scheme, I blogged about it, which now is a useful record for reviewing part of its development history. The objectives of the scheme were, and are,
- to be fully predictable based on the sounds of words;
- to be close to a one-to-one correspondence between phonemes and graphemes;
- conciseness (although my latest revision has chipped away at this);
- avoidance of new letters or diacritics, for ease of writing.
A few days ago, here in Barcelona, my interest in the system was rekindled, for some reason I don't remember, and I took a new look at it. There were a few problems with the existing scheme that I decided needed fixing. The biggest issue that I acted on was the previous use of "i" and "u" as both semivowels and vowels, which made for unnecessarily complex workarounds. I also instituted an explicit denotation of stress, which I now think is a crucial feature for a logical system of English spelling, given that the stress in English words is unpredictable. Now the appearance of text written in the scheme has become rather more Polish/Dutch-looking than it did before, and it's not necessarily as aesthetically satisfying as the aforeblogged version, but it has gained a greater measure of coherence and therefore elegance, I think.
(Click to show this introduction in the new spelling, if you want to see an example of it right away.)
Az laqq az mejbi fajw jirz ego baj nav, aj started xiqkiq op a spelliq reform sistem for xi Iqglic laqgvedy. Xis haz bin don befor plennti ow tajmz, ow kors. (For instens, xa bRitic skijm haz a nomber ow similaarritijz tu wearies weryenz ow majn, aj nav si.) Maj inteent voz natt and iz natt baj enni mijnz ty akceli traj tu implemennt maj sistem; raxer, it's dyost von ow xa menni liqgviistik xatt eksperriments aj'w eqgejdyd in ovwer xa past tvennti jirz xat aj ennterteejn majseelf vix. (It's hard tu belijw it's xat laqq ego xat aj diskoowerd xi Arrebik, Hijbru, Grijk, and Rocen allfebetts, az vel az xi Internaacenel Fenettik Allfebett, from a set of pejdyez in avr havs kappi ow xa Standerd Kaledy Dikcenerri at, aj xiqk, xi ejdy ow fajw, or mejbi siks. Xat iz xa siqgjuler movment xat sparkt maj ebajdiq intrest in liqgviistiks.) A kopel ow jirz, eparrently, after aj dewajzd xi iniicel veryen ow xa spelliq reform skijm, aj blaggd ebavt it, vitc nav iz a juvsful rekkerd for rewjuiq xis part ow its dewellepment histeri. Xi ebdyekktiwz ow xa skijm ver, and ar,
- tu bi fuli prediktebel bejst ann xa savndz ow verdz;
- tu bi klovs tu a von-tu-von korespaanndens betvijn fovnijmz and grafijmz;
- kensajsnes (alxoo maj lejtest rewiyen haz tcipt ewe at xis);
- ewojdens ow nu letterz or dajekriitiks, for ijz ow rajtiq.
A fju dejz ego, hir in Barcelona, maj intrest in xa sistem voz rijkiindeld, for som rijzen aj dovn't rememmber, and aj tuk a nu luk at it. Xear ver a fju prabblemz vix xi egzistiq skijm xat aj desajded nijded fiksiq. Xa bigest icju xat aj akted ann vaz xa prijwies juvs ow "i" and "u" az bovx semmijwavlz and wavlz, vitc mejd for onesseseerrili kammplekks verkeravndz. Aj also instituvted an eksplisit dijnovteejcen ov stres, vitc aj nav xink iz a kruvcel fijtcer for a laddyikel sistem ow Iqglic spelliq, giwen xat xa stres in Iqglic verdz iz onprediiktebel. Nav xi epirens ow tekst riten in xa skijm haz bekom raxer mor Polic/Dotc-lukiq xan it did befor, and it's natt nesseseerrili az essxeetikli satisfajiq az xi aferblaaggd veryen, but it haaz gejnd a grejter meyyer ow kovhirens and xearfor ellegens, aj xiqk.
Following are the spelling rules for the scheme in its current revision. Phonemes will be listed in a phonetically relevant order rather than trying to follow the Roman alphabet.
|Manner||Phoneme||Reform spelling||Notes below|
|Nasals||m as in mare||m|
|n as in night||n|
|ng as in song||q|
|Stops||p as in pear, spare||p|
|b as in bear||b|
|t as in tar, star||t|
|d as in dare||d|
|c/k as in care, scare, kite||k|
|g as in goat||g|
|Fricatives||f as in fair||f|
|v as in vote||w||1|
|th as in thin||x||2|
|th as in there||x||2|
|s as in sight||s|
|z as in zone||z|
|sh as in share||c|
|s as in measure||y||1|
|h as in hair||h|
|Affricates||ch as in chair||tc|
|j as in jeer||dy|
|Liquids/semivowels||w as in wear||v||1|
|l as in lair||l|
|r as in rare||r|
|y as in year||j||1|
- A possibly more natural alternative to these would be to use w and y as the semivowels and v and j as the fricatives, but I decided to use v and j as the semivowels because they occur so much in vowel diphthongs, as see below, and are more compact than w and y, respectively. This therefore improves the conciseness of text written in the scheme.
- This fails to distinguish between voiceless and voiced "th," but because those two sounds have very few (I've been having trouble thinking of any at all) minimal pairs, they can unambiguously be conflated, with the only issue being that one then has to learn for each word whether the "th" is voiced or not. (Edit, July 18th: I thought of a few minimal pairs: "thigh" and "thy," "teeth" and "teethe," "loath" and "loathe." But the distinction between the members of each of these pairs is pretty easily deduced from context.)
|Phoneme||before r||before r, marked stress||before l||before l, marked stress||before other consonant (or if ij or uv, before i or u resp.)||before other consonant (or if ij or uv, before i or u resp.), marked stress||final or before vowel||final or before vowel, marked stress|
|i as in sit, e as in here||i||ii||i||ii||i||ii|
|e as in set||e (following r doubled)||ee (following r doubled)||e (following l doubled in multisyllable word)||ee||e (following consonant doubled in multisyllable word)||ee|
|ea as in bear, ai as in bail||ea||eea||ea||eea|
|a as in cat||a (following r doubled)||aa (following r doubled)||a (following l doubled)||aa (following l doubled)||a||aa|
|o as in cot, a as in car/call||a||aa||a||aa||a (following consonant doubled)||aa (following consonant doubled)||aa|
|u as in cut/curry/cull||o (following r doubled)||oo (following r doubled)||o (following l doubled)||oo (following l doubled)||o||oo|
|aw as in hawk/law, o as in sore/sole||o||oo||o||oo)||o (following consonant doubled)||oo (following consonant doubled)||oa||ooa|
|oo as in soot/door||u||uu||u||uu||u||uu|
|e as in trumpet/later/germ, a as in sofa||e||ee||e||e||a|
|ea as in seat/seal, ie as in skier||ij||iij||ij||iij||ij||iij||i||ii|
|a as in sate, aye as in layer, aya as in betrayal||ej||eej||ej||eej||ej||eej||e||ee|
|oa as in coat, owe as in lower/Lowell||ov||oov||ov||oov||ov||oov||o||oo|
|oo as in boot/cool, ewe as in brewer||uv||uuv||uv||uuv||uv||uuv||u||uu|
|i as in site/fire/file||aj||aaj||aj||aaj||aj||aaj||aj||aaj|
|ou as in bout/sour/foul||av||aav||av||aav||av||aav||av||aav|
|oi as in coin/foil, oye as in foyer||oj||ooj||oj||ooj||oj||ooj||oj||ooj|
Stress is marked when the stress in a word falls after the first non-schwa syllable. If stress is unmarked, then the stress falls on the first non-schwa syllable. (The "er" sound is counted here as non-schwa.) The reason for not always marking stress is conciseness. So, for instance, "situation" transliterates to sitcueejcen, while "believe" transliterates to belijw — in the latter case, the stressed i doesn't need to be doubled because the first syllable has a schwa. Emphasized single-syllable words can also be marked as stressed.
Note also that the vowel sounds of "cat" and "cot," and "cut" and "caught," which are represented respectively by a and o, switch their consonant-doubling behavior between following l/r and other consonants. It would of course be more consistent to have one sound from each pair cause all following consonants to double, but it's the way it is because in each case, where the following consonant is not doubled, the vowel is more common than its counterpart in that environment, or at least seems to me to be. That is, "a" of "car"/"call" seems more common than "a" of "carry"/"rally" (particularly because "arr" of "carry" cannot appear in a single-syllable word) while "a" of "cat" seems more common than "o" of "cot"; and "o/oa" of "core/coal" seems more common than "u" of "curry/cull" (and similar to "arr," "urr" of "curry" cannot appear in a single-syllable word), while "u" of "cut" seems more common than "ough" of "caught." This therefore, like the lazy stress marking, aids in conciseness (if my intuition about phoneme frequency is correct), and it also recalls normal English spelling a bit more closely than the alternatives.
The various uses of "e" seem complex, but they really come down to, once again, using the most economical spelling possible in the context. So, for instance, "embezzle" becomes embeezel rather than embeezzel because the impossibility of a stressed schwa makes it clear that the ee sounds like the "e" in "set" without having to double the z.
Here is an example of a paragraph of text that I transliterated into the reform scheme, first as transliterated (see how much you can read of it) and then in the original spelling. It's from Wikipedia's article on the phenomenon of document dump.
A dakkjument domp iz xi akt ow respanndiq tu an adverserri'z rekvesst for infermeejcen baj prezenntiq xi adverserri vix a lardy kvantiti ow dejta xat iz transferd in a maner xat indikejts onfreendlijnes, hasstiiliti, or a lijgel kannflikt betvijn xa tranzmiter and xa resijwer ow xi infermeejcen. Xa cipment ow xijz dompt dakkjuments is onsoorted, or kentejnz a lardy kvantiti ow infermeejcen xat is ekstreejnies tu xi icju onder inkveri, or iz prezennted in a nann-tajmli maner, or som kammbineejcen ow xijz xri karrekteriistiks. Xa frejz iz affen juvzd baj lojers, bot iz in iqkriijsiq juvs in xa blaggesfir. It iz affen sijn az part ow xa karrekteriistik behejwjer ow an enntiti xat iz eqgejdyiq in an anngoiq patern ow aktiiwitijz inteended tu kower op oneexikel or kriminel kanndokt.
A document dump is the act of responding to an adversary's request for information by presenting the adversary with a large quantity of data that is transferred in a manner that indicates unfriendliness, hostility, or a legal conflict between the transmitter and the receiver of the information. The shipment of dumped documents is unsorted, or contains a large quantity of information that is extraneous to the issue under inquiry, or is presented in a non-timely manner, or some combination of these three characteristics. The phrase is often used by lawyers, but is in increasing use in the blogosphere. It is often seen as part of the characteristic behavior of an entity that is engaging in an ongoing pattern of activities intended to cover up unethical or criminal conduct.