From nobody@digitalkingdom.org Thu Jul 06 17:49:30 2006 Received: with ECARTIS (v1.0.0; list lojban-list); Thu, 06 Jul 2006 17:49:30 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1FyeWt-000284-Px for lojban-list-real@lojban.org; Thu, 06 Jul 2006 17:49:11 -0700 Received: from web81309.mail.mud.yahoo.com ([68.142.199.125]) by chain.digitalkingdom.org with smtp (Exim 4.62) (envelope-from ) id 1FyeWr-00027v-KP for lojban-list@lojban.org; Thu, 06 Jul 2006 17:49:11 -0700 Received: (qmail 5731 invoked by uid 60001); 7 Jul 2006 00:49:07 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=sbcglobal.net; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=bNQBHai65AYFR9TNgLBDTGxPJwd7LD/5+IpIUPHFDKhUwpu8MTjdS+PvvbSgX50p1b50zyKILgYuGvIpcCWf3WQ9L+qBC32E+RpJ6n6iOINRg8hAxvZL5MRF056IIwYdBoD8Fqm/fhpVt7WWhOfqPaQ2NhCAMPVnw7UnQeDy9ZI= ; Message-ID: <20060707004907.5729.qmail@web81309.mail.mud.yahoo.com> Received: from [70.237.228.212] by web81309.mail.mud.yahoo.com via HTTP; Thu, 06 Jul 2006 17:49:07 PDT Date: Thu, 6 Jul 2006 17:49:07 -0700 (PDT) From: John E Clifford Subject: [lojban] Lojban Alphabet Starter B To: lojban-list@lojban.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.7 (/) X-archive-position: 11946 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: clifford-j@sbcglobal.net Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list What would be an ideal alphabet for Lojban? We have fallen into an alphabet and – because “everybody” knows it -- it works very well for our present purposes: getting as many people as possible learning the language. But suppose that Lojban were established and secure and wanted an alphabet that was thoroughly for Lojban, not just a borrowed form that had served countless other languages already. Thinking about this, I have come up with the following – very tentative – thoughts. Comments sought. We can assume that, like the present alphabet, this new language would be phonemic, that is, have a unique symbol for each distinctive sound of the language. We might also assume that, unlike the present alphabet, the letters of the Lojban alphabet would show on their face the useful phonological facts about them in Lojban. As recent discussion has shown, the direct way to do this is with a representational phonetic alphabet, trimmed of features irrelevant to Lojban. A representational phonetic alphabet would be based on a set of features thought to be involved in speech production; consonant v. vowel, point of articulation, voiced or not, nasal or not, and so on. Each of these features would be assigned a mark. Letters in the alphabet, representing actually made sounds, would then consist of a systematic concatenation of the marks which stand for the features of the sound indicated. Typically, the more similar two sounds are, the more similar will be their letters, differing only in the features that differ. On the assumption that such an alphabet is capable of recording a wide variety of sounds, it will turn out that some sounds that it distinguishes with different letters will not be treated as distinct in a given languages, but will rather all be treated as the same or as variants determined by the phonological environment, as allophones of a single phoneme in short. Now, to describe a language to the outside, a detailed description which includes the irrelevancies is important, for it is only with such a system that we can describe just how the internal structure of a phoneme works. On the other hand, for the people using the language, such a system is useless, since, by definition, these people do not normally distinguish between the various phones and so cannot handily write the words correctly. What is needed is an alphabet that parallels the phonemes and the natural way to get that is to take the letters that stand for all the phones of a phoneme and drop all the parts that are different, leaving only the common core (it is axiomatic that the phones of a phoneme are similar to a fairly great extent). Such a letter will (generally) be both different from the letter for any other phoneme in the language and indicate what are the basic features of the sound represented. But this is really only a first approximation to an alphabet that is for a particular language, for among the features common to all the allophones of one phoneme may be some that – in this language – play no real role: no two phonemes are distinguished by one having and one lacking this feature (or at least not by that alone or even primarily). So we can trim the symbols even further in many cases. The parentisis about “primarily” means that although two sounds may be distinguished by this feature, there is another feature which also distinguishes them and appears more important within the language Thus, while nasals are in fact voiced in Lojban – as in English – their voicing is not just less important, noting it actually interferes with seeing some significant phonological facts about Lojban (that a voiceless consonant can occur next to a nasal, though not generally to a voiced consonant, for example), so we do not mark nasals for voicing, All of this is advantage for constructing an alphabet to be used, since the fewer details required for the finished symbol, the easier it will be to use and recognize (ceteris paribus). The usual objection to using such a representational alphabet for a language is that, since similar sounds are represented by similar letters, the possibility for confusing two similar sounds is carried over into writing, the possibility of confusing similar letters. Since writing is often used to remove spoken confusion, this seems counterproductive. But it is a minor problem, assuming the letters are not too complex. Somewhat more to the point is the complaint that a representational alphabet is loaded down with what may well be irrelevancies from the point of view of the language. We need to know that p is different from b and from k, perhaps but we may not need to know that that difference is voiceless-voiced in one case and labial-velar in the other. That is, the particulars of the underlying articulation may be significant only for displaying similarities and differences, not absolutely. And this fact may allow dropping even more features: we may need to know that stops are dental or velar for some purpose but not that they are labial (that being what they must be if not dental or velar). What beyond the collapse of phones into phonemes and the overview of what is not used – or not used in a particular category – can be used to drop features and simplify letters? Differ factors may turn up in different languages. In Lojban it seems clear that at least one factor is phonotactics: some sounds can go together in clusters of one sort or another, others cannot. So, within the already winnowed down set of features (but, in fact already part of the winnowing) we can look at the phonotactic behavior of phonemes. Applying all this to Lojban, it is clear first that we have to distinguish vowels from consonants (even though phonetically – and even phonotactically some vowels. Applying all this to Lojban, it is clear first that we have to distinguish vowels from consonants (even though phonetically – and even phonotactically some vowels perform consonantal functions and some consonants vocalic). The basic morphological structure of Lojban is defined in terms of patterns of C and V. Within consonants, we clearly have to distinguish voiced and voiceless, the second phonotactic rule being that voiced and voiceless consonants cannot cluster (the first is that identical sounds cannot cluster). But this turns out – as noted earlier – not to be a have-have not situation, for nasals and liquids, while strictly voiced, do cluster with voiceless sounds. So here we need a plus/minus/unmarked division. If we look at the phonotactic data (or rules) for Lojban, we find that we seem to need for consonants, in addition to voicing, the following categories: stops, sibilants, nasals, dental, back, and hyphen. We have a couple of cases that are left over but there is no good name for them, since they are diverse. (To make symbols with these we need something to hang their markers on to indicate the thing that is none of the above). Stops are either voiced or voiceless and either dental, back or neither (labial). Sibilants are either voiced or voiceless and either dental or back. Hyphens are either nasal (N) or not. The unclassified are either back or not and these latter are voiced voiceless or neither. So seven features for 17 characters, no character needing more than four. Some of the gain there is lost with the vowels, which take need six features for six characters high-mid, front-back, low and central. Here it is easy to give the phonotactic analogs: high: initial in diphthongs with any legal, mid: initial before I, low: initial before high, central: not in diphthongs. The front-back is not phonotactic and applies only to high and mid, central and low involve no further subdivisions. The phonotactic base for the consonants is harder to state. The markerless consonant (h) occurs only intervocalically. The rest cluster under at least the three rules (the third prohibits pairs of sibilants). The otherwise unmarked back consonant, X, does not cluster initially but otherwise clusters with any legal (under the three rules -- in this case, voiceless) nonback consonant. The hyphens do not lead initially but medially combine with anything. The sibilants come first in initial clusters with anything legal (but X), except that the voiced ones do not lead the hyphens. The rest of the types lead only R and L (the non-nasal hyphens) initially, except the dentals, which don’t lead L but do lead legal sibilants. We are left with three special cases: M, L, and Z. L, as noted, doesn’t occur after dental stops. Z doesn’t occur after M. M is now the bare nasal, but L is not yet distinguished from R, but could, I suppose be called the dental one (although they are all dentals to some degree). So as a tentative list we have A: low vowel B: vd stop consonant C: vl back sibilant consonant D: vd dental stop consonant E: mid front vowel F: vl consonant G: vd back stop consonant h: consonant I: high front vowel J: vd back sibilant consonant K: vl back stop consonant L: dental hyphen consonant M: nasal consonant N: nasal hyphen consonant O: mid back vowel P: vl stop consonant R: hyphen consonant S: vl sibilant consonant T: vl dental stop consonant U: high back vowel V: vd consonant X: back consonant Y: (central) vowel Z: vd sibilant consonant There are surely other ways of doing this and probably much more efficient ones. If you are interested in this question – which will remain theoretical in your lifetime, pretty certainly – put your suggestions out in this forum. This is a fair (I think) sample of the sort of thing that should work. Given some such pattern of features, I invite the visual of us to offer up systems of representation as well. Some of the earlier suggested alphabets are a start, though generally not featural. To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.