From nobody@digitalkingdom.org Thu Jul 06 17:49:30 2006
Received: with ECARTIS (v1.0.0; list lojban-list); Thu, 06 Jul 2006 17:49:30 -0700 (PDT)
Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62)	(envelope-from <nobody@digitalkingdom.org>)	id 1FyeWt-000284-Px	for lojban-list-real@lojban.org; Thu, 06 Jul 2006 17:49:11 -0700
Received: from web81309.mail.mud.yahoo.com ([68.142.199.125])	by chain.digitalkingdom.org with smtp (Exim 4.62)	(envelope-from <clifford-j@sbcglobal.net>)	id 1FyeWr-00027v-KP	for lojban-list@lojban.org; Thu, 06 Jul 2006 17:49:11 -0700
Received: (qmail 5731 invoked by uid 60001); 7 Jul 2006 00:49:07 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;  s=s1024; d=sbcglobal.net;  h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;  b=bNQBHai65AYFR9TNgLBDTGxPJwd7LD/5+IpIUPHFDKhUwpu8MTjdS+PvvbSgX50p1b50zyKILgYuGvIpcCWf3WQ9L+qBC32E+RpJ6n6iOINRg8hAxvZL5MRF056IIwYdBoD8Fqm/fhpVt7WWhOfqPaQ2NhCAMPVnw7UnQeDy9ZI=  ;
Message-ID: <20060707004907.5729.qmail@web81309.mail.mud.yahoo.com>
Received: from [70.237.228.212] by web81309.mail.mud.yahoo.com via HTTP; Thu, 06 Jul 2006 17:49:07 PDT
Date: Thu, 6 Jul 2006 17:49:07 -0700 (PDT)
From: John E Clifford <clifford-j@sbcglobal.net>
Subject: [lojban] Lojban Alphabet Starter B
To: lojban-list@lojban.org
In-Reply-To: <c2e346f90607061518s757f756dp6fa15b68d353ea90@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Spam-Score: -0.7 (/)
X-archive-position: 11946
X-ecartis-version: Ecartis v1.0.0
Sender: lojban-list-bounce@lojban.org
Errors-to: lojban-list-bounce@lojban.org
X-original-sender: clifford-j@sbcglobal.net
Precedence: bulk
Reply-to: lojban-list@lojban.org
X-list: lojban-list

What would be an ideal alphabet for Lojban?  We have fallen into an alphabet and – because
“everybody” knows it -- it works very well for our present purposes: getting as  many people
as possible learning the language.  But suppose that Lojban were established and secure and wanted
an alphabet that was thoroughly for Lojban, not just a borrowed form that had served countless
other languages already.  Thinking about this, I have come up with the following – very
tentative – thoughts.  Comments sought.

We can assume that, like the present alphabet, this new language would be phonemic, that is, have
a unique symbol for each distinctive sound of the language.  We might also assume that, unlike the
present alphabet, the letters of the Lojban alphabet would show on their face the useful
phonological facts about them in Lojban.

As recent discussion has shown, the direct way to do this is with a representational phonetic
alphabet, trimmed of features irrelevant to Lojban.  A representational phonetic alphabet would be
based on a set of features thought to be involved in speech production; consonant v. vowel, point
of articulation, voiced or not, nasal or not, and so on.  Each of these features would be assigned
a mark.  Letters in the alphabet, representing actually made sounds, would then consist of a
systematic concatenation of the marks which stand for the features of the sound indicated. 
Typically, the more similar two sounds are, the more similar will be their letters, differing only
in the features that differ.

On the assumption that such an alphabet is capable of recording a wide variety of sounds, it will
turn out that some sounds that it distinguishes with different letters will not be treated as
distinct in a given languages, but will rather all be treated as the same or as variants
determined by the phonological environment, as allophones of a single phoneme in short.  Now, to
describe a language to the outside, a detailed description which includes the irrelevancies is
important, for it is only with such a system that we can describe just how the internal structure
of a phoneme works.  On the other hand, for the people using the language, such a system is
useless, since, by definition, these people do not normally distinguish between the various phones
and so cannot handily write the words correctly.  What is needed is an alphabet that parallels the
phonemes and the natural way to get that is to take the letters that stand for all the phones of a
phoneme and drop all the parts that are different, leaving only the common core (it is axiomatic
that the phones of a phoneme are similar to a fairly great extent). Such a letter will (generally)
be both different from the letter for any other phoneme in the language and indicate what are the
basic features of the sound represented.  

But this is really only a first approximation to an alphabet that is for a particular language,
for among the features common to all the allophones of one phoneme may be some that – in this
language – play no real role: no two phonemes are distinguished by one having and one lacking
this feature (or at least not by that alone or even primarily).  So we can trim the symbols even
further in many cases.  The parentisis about “primarily” means that although two sounds may be
distinguished by this feature, there is another feature which also distinguishes them and appears
more important within the language Thus, while nasals are in fact voiced in Lojban – as in
English – their voicing is not just less important, noting it actually interferes with seeing
some significant phonological facts about Lojban (that a voiceless consonant can occur next to a
nasal, though not generally to a voiced consonant, for example), so we do not mark nasals for
voicing,  All of this is advantage for constructing an alphabet to be used, since the fewer
details required for the finished symbol, the easier it will be to use and recognize (ceteris
paribus). 

The usual objection to using such a representational alphabet for a language is that, since
similar sounds are represented by similar letters, the possibility for confusing two similar
sounds is carried over into writing, the possibility of confusing similar letters.  Since writing
is often used to remove spoken confusion, this seems counterproductive.  But it is a minor
problem, assuming the letters are not too complex.  Somewhat more to the point is the complaint
that a representational alphabet is loaded down with what may well be irrelevancies from the point
of view of the language.  We need to know that p is different from b and from k, perhaps but we
may not need to know that that difference is voiceless-voiced in one case and labial-velar in the
other.  That is, the particulars of the underlying articulation may be significant only for
displaying similarities and differences, not absolutely.  And this fact may allow dropping even
more features: we may need to know that stops are dental or velar for some purpose but not that
they are labial (that being what they must be if not dental or velar). 

What beyond the collapse of phones into phonemes and the overview of what is not used – or not
used in a particular category – can be used to drop features and simplify letters?  Differ
factors may turn up in different languages.  In Lojban it seems clear that at least one factor is
phonotactics: some sounds can go together in clusters of one sort or another, others cannot.  So,
within the already winnowed down set of features (but, in fact already part of the winnowing) we
can look at the phonotactic behavior of phonemes.

Applying all this to Lojban, it is clear first that we have to distinguish vowels from consonants
(even though phonetically – and even phonotactically some vowels.

Applying all this to Lojban, it is clear first that we have to distinguish vowels from consonants
(even though phonetically – and even phonotactically some vowels perform consonantal functions
and some consonants vocalic).  The basic morphological structure of Lojban is defined in terms of
patterns of C and V.  Within consonants, we clearly have to distinguish voiced and voiceless, the
second phonotactic rule being that voiced and voiceless consonants cannot cluster (the first is
that identical sounds cannot cluster). But this turns out – as noted earlier – not to be a
have-have not situation, for nasals and liquids, while strictly voiced, do cluster with voiceless
sounds.  So here we need a plus/minus/unmarked division.  

If we look at the phonotactic data (or rules) for Lojban, we find that we seem to need for
consonants, in addition to voicing, the following categories: stops, sibilants, nasals, dental,
back, and hyphen. We have a couple of cases that are left over but there is no good name for them,
since they are diverse. (To make symbols with these we need something to hang their markers on to
indicate the thing that is none of the above).  Stops are either voiced or voiceless and either
dental, back or neither (labial).  Sibilants are either voiced or voiceless and either dental or
back. Hyphens are either nasal (N) or not.  The unclassified are either back or not and these
latter are voiced voiceless or neither.  So seven features for 17 characters, no character needing
more than four.

Some of the gain there is lost with the vowels, which take need six features for six characters
high-mid, front-back, low and central.  Here it is easy to give the phonotactic analogs: high:
initial in diphthongs with any legal, mid: initial before I, low: initial before high, central:
not in diphthongs.  The front-back is not phonotactic and applies only to high and mid, central
and low involve no further subdivisions.

The phonotactic base for the consonants is harder to state.  The markerless consonant (h) occurs
only intervocalically.  The rest cluster under at least the three rules (the third prohibits pairs
of sibilants).  The otherwise unmarked back consonant, X, does not cluster initially but otherwise
clusters with any legal (under the three rules -- in this case, voiceless) nonback consonant.  The
hyphens do not lead initially but medially combine with anything. The sibilants come first in
initial clusters with anything legal (but X), except that the voiced ones do not lead the hyphens.
 The rest of the types lead only R and L (the non-nasal hyphens) initially, except the dentals,
which don’t lead L but do lead legal sibilants.  We are left with three special cases: M, L, and
Z.   L, as noted, doesn’t occur after dental stops. Z doesn’t occur after M.  M is now the
bare nasal, but L is not yet distinguished from R, but could, I suppose be called the dental one
(although they are all dentals to some degree).  

So as a tentative list we have

A: low vowel
B: vd stop consonant
C: vl back sibilant consonant
D: vd dental stop consonant
E: mid front vowel
F: vl consonant
G: vd back stop consonant
h: consonant
I: high front vowel
J: vd back sibilant consonant
K: vl back stop consonant
L: dental hyphen consonant
M: nasal consonant
N: nasal hyphen consonant
O: mid back vowel
P: vl stop consonant
R: hyphen consonant
S: vl sibilant consonant
T: vl dental stop consonant
U: high back vowel
V: vd consonant
X: back consonant
Y: (central) vowel
Z: vd sibilant consonant

There are surely other ways of doing this and probably much more efficient ones.  If you are
interested in this question – which will remain theoretical in your lifetime, pretty certainly
– put your suggestions out in this forum.  This is a fair (I think) sample of the sort of thing
that should work.

Given some such pattern of features, I invite the visual of us to offer up systems of
representation as well.  Some of the earlier suggested alphabets are a start, though generally not
featural.


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.